Stylistic and lexical co-training for webpage block classification

13 Nov 2004

WIDM 04: Lee et al. Co-training Web Block Classification

Rough grained model

•Slightly different model of splitting than earlier work •Smaller amount of training examples •No significant gain from co-training but comparable to other work (19.5% error vs. 14-18 error%)

• Use only three categories:

- Related

- Important

- Unimportant

• Advocated by human agreement, well-founded (Song et al. 04)