Human Subject Evaluation -2
Importance ratings
n
Inter-judge
agreement using
Kappa is low
( 0.2 to 0.4)
n
Ratings for
concrete nouns
were most
stable, follow by
backgrounds
and video
categories, with
actions being
the worse
n
Negative
correlations are
prominent in our
dataset