|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. |
Split the
document into blocks using DOM tree
|
|
|
|
– |
Nontrivial
(overlapping blocks, visual segments differ)
|
|
2. |
Co-train
|
|
|
|
– |
Learner 1 –
Stylistic learner
|
|
|
|
• |
Spatial and
structural relationship
|
|
|
|
• |
External
relationship to other blocks
|
|
|
|
– |
Learner 2 –
Lexical learner
|
|
|
|
• |
POS and link
related features
|
|
|
|
• |
Internal
classification irrespective of other blocks
|
|