13 Nov 2004
WIDM 04: Lee et al. Co-training Web Block Classification
12
Stylistic Features
•
Layout: guess from first level DOM nodes
–
Linear
–
<Table>: Use reading order, cell type propagation
–
XHTML / CSS (e.g., <DIV>): Translate relative to
absolute positioning, model depth
»
•
Font (CSS too): relative features
»
•
Image size
–
•