|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
• |
Manually
checked F3 measure
|
|
|
|
|
|
– |
Based on
essential/acceptable answer nuggets
|
|
|
|
• |
NR – proportion
of returned essential answer nuggets
|
|
|
|
• |
NP – penalty to
longer answers
|
|
|
|
• |
Weighting NR 3
times as NP
|
|
|
|
– |
Subject to
inconsistent scoring among assessors
|
|
|
• |
Automatic ROUGE
score
|
|
|
|
– |
Gold standard:
sentences containing answer nuggets
|
|
|
|
– |
Counting the
trigrams shared in the gold standard and
|
|
system answers
|
|
|
|
– |
ROUGE-3-ALL
(R3A) and ROUGE-3-ESSENTIAL
|
|
|
(R3E)
|
|