PowerPoint Presentation

Judgments

•

Participants graded on a 9-point Likert scale

•

We also simplified scale to a binary class

(1-2 → yes; 3-9 → no)

Let’s look at two examples:

•

Practical digital libraries

•

Practical digital archiving

Query judgments are subjective, may depend on subject

familiarity. Thus, we calculate inter-judge agreement to:

–

establish whether the tasks are well-defined

–

establish performance upper bound