1.Split the document into blocks using DOM tree
–Nontrivial (overlapping blocks, visual segments
differ)
»
2.Co-train
–Learner 1 – Stylistic learner
•Spatial and structural relationship
•External relationship to other blocks
»
–Learner 2 – Lexical learner
•POS and link related features
•Internal classification irrespective of other blocks