Adapted co-training:
Sample balancing: preserve ratio of noisily labeled examples, poor performance without it
Replace unlabeled data at each round
Use BoosTexter:
handles word features easily
Five fold cross
validation
General performance?
»
Specific performance on:
Fine-grained classification?
XHTML / DIV pages?
Others tasks?