Evaluations
• Adapted co-training:
– Sample balancing: preserve ratio of noisily labeled
examples, poor performance without it
– Replace unlabeled data at each round
• Use BoosTexter: handles word features easily
• Five fold cross validation
• General performance?
• Specific performance on:
– Fine-grained classification?
– XHTML / DIV pages?
– Others’ tasks?