•Constructing
a language model with only 320 annotated instances is
small
–Usual
language models use millions of examples
•Try bootstrapping a model
–Use a sample’s
annotation and apply to all in sample’s it represents
–More data, but also
more noise
Self-labeled
corpus
290K
instances
Self-label
(noisy process)
Bigram
Lang.
Model