Stylistic and lexical co-training for webpage block classification

Conclusion

•

Co-training model for web block classification

•

Achieves 28.5% reduction in error in main task

•

However, fails in

–

Detecting fine grained classes

→ Exploit templates, IE methods, path similarity and context

–

Likely needs enough unlabeled data

→ Re-run using more experimental data

–

Dependent on learning model

→ Looking to change learning package