|
|
|
|
|
|
|
|
|
|
|
|
|
|
Participants
graded on a 9-point Likert scale
|
|
|
|
We also
simplified scale to a binary class
|
|
|
(1-2 → yes;
3-9 → no)
|
|
|
Lets look at two
examples:
|
|
|
|
Practical
digital libraries
|
|
|
|
Practical
digital archiving
|
|
|
Query judgments
are subjective, may depend on subject
|
|
|
familiarity.
Thus, we calculate inter-judge agreement to:
|
|
|
|
establish whether
the tasks are well-defined
|
|
|
|
|
establish
performance upper bound
|
|