•Manually checked F3 measure
–Based on essential/acceptable answer nuggets
•NR – proportion of returned essential answer
nuggets
•NP – penalty to longer answers
•Weighting NR 3 times as NP
–Subject to inconsistent scoring among assessors
–
•Automatic ROUGE score
–Gold standard: sentences containing answer nuggets
–Counting the trigrams shared in the gold standard and system answers
–ROUGE-3-ALL (R3A) and ROUGE-3-ESSENTIAL (R3E)