PowerPoint Presentation

Min-Yen Kan and Danny C. C. Poo

Known Item Queries (JCDL 2005)

15/25

Bootstrapping

•Constructing a language model with only 320 annotated instances is small

–Usual language models use millions of examples

•Try bootstrapping a model

–Use a sample’s annotation and apply to all in sample’s it represents

–More data, but also more noise

Bigram
Lang. Model

Self-labeled corpus

290K instances

Self-label
(noisy process)

Bigram
Lang. Model

Original
annotated
corpus