Stylistic and lexical co-training for webpage block classification

Which approach to use


•	A obvious approach is to build a supervised
	classifier

		–	Train on labeled examples (f₁,f₂,…,f_i,…,f_n, C)


		–	Test by distilling features (f₁,f₂,…,f_i,…,f_n) = ?


•	Training data costly, need to use unlabeled data

•	The feature sets are largely orthogonal

= Try co-training!