Lexical and Stylistic Co-training
1. Split the document into blocks using DOM tree
Nontrivial (overlapping blocks, visual segments differ)
2. Co-train
Learner 1 – Stylistic learner
Spatial and structural relationship
External relationship to other blocks
Learner 2 – Lexical learner
POS and link related features
Internal classification irrespective of other blocks