Slide 1
A scenario
|
|
|
Looking for
“Journal of housing
for the elderly” |
|
|
|
Tries using the default
keyword search |
|
|
|
But lots of results doesn’t
necessary equate with
finding the item … |
OPAC Query Types
|
|
|
|
Slone (2000) categorizes three
types of queries: |
|
|
|
Known Item: find a title that
patron knows exists |
|
|
|
Area: identify area of library
for certain resources |
|
|
|
Unknown Item: identify
resources to solve problem or address issue |
Importance of Known Item
Queries
|
|
|
|
Kilgour has noted effectiveness
of author / title combination |
|
Up to 50% of keyword searches
are known item queries (Larson, 91) |
|
despite having entry points for
author, title and subject search |
|
Partial answers normally don’t
help in known item search – either you find the item or you don’t |
Problem Statement
|
|
|
|
Two tasks: |
|
Query classification: is this
query searching for a known item? |
|
Search result classification:
which, if any, of the search results are the known item(s) sought? |
|
Use supervised machine learning
to solve problem |
Learning Architecture
Outline
|
|
|
Known Item Queries (KIQs) |
|
Data Collection |
|
Features of KIQs |
|
Evaluation |
|
Conclusions |
Data Collection
|
|
|
|
Used queries drawn from local
OPAC query collection |
|
Anonymized, sessioned queries |
|
Over 290K queries, purposefully
sampled for a wide array of query characteristics |
|
320 resulting queries were
judged by 9 participants; 1500 item judgments |
|
Most queries annotated by two
participants |
Data Collection
|
|
|
Tasks: |
|
Query Judgment |
|
|
|
Query Judgment,
with search results |
|
|
|
Search Results
Judgment |
Judgments
|
|
|
|
Participants graded on a
9-point Likert scale |
|
We also simplified scale to a
binary class
(1-2 → yes; 3-9 → no) |
|
|
|
Let’s look at two examples: |
|
Practical digital libraries |
|
Practical digital archiving |
|
Query judgments are subjective,
may depend on subject familiarity. Thus, we calculate inter-judge agreement
to: |
|
establish whether the tasks are
well-defined |
|
establish performance upper
bound |
Agreement levels
|
|
|
Data analysis: |
|
Relatively strong correlation
(mostly above .6) |
|
Stronger correlation with
search results shown; easier task with more information |
|
Most search results are not
known items, high correlation for the final task |
Outline
|
|
|
|
Known Item Queries (KIQs) |
|
Data Collection |
|
Features of KIQs |
|
Query features |
|
Search result features |
|
Evaluation |
|
Conclusion |
Query Classification
Features
|
|
|
|
Two examples: |
|
Hill Raymond Coding Theory – A
First Course |
|
japan and cultural |
|
|
|
Distinguishing characteristics: |
|
Longer: cut-and-paste, copying
from a reference |
|
Mixed Case |
|
Determiners: not present in
unknown item or area searches |
|
Proper Nouns: specific subjects
or author names |
|
Advanced Operators: title or
author restrictions |
|
Keywords: indicative of a type
of publication e.g., “journal”, “textbook”, “course” |
|
|
|
Use POS tagging to create a
total of 16 features that embody these characteristics |
Language Modeling
|
|
|
|
Idea: Model KIQ as a separate
“language” from non-KIQs |
|
Model: simple bigram language
model |
|
|
|
|
|
Create a language model for
each point on scale or each class |
|
|
|
Then, test new query’s goodness
of fit to LMs: |
Bootstrapping
|
|
|
|
Constructing a language model
with only 320 annotated instances is small |
|
Usual language models use
millions of examples |
|
Try bootstrapping a model |
|
Use a sample’s annotation and
apply to all in sample’s it represents |
|
More data, but also more noise |
Features with search
results
|
|
|
|
What about when we have search
results? |
|
|
|
We look at the pairing of
search results and the queries that generated them. |
|
|
|
Characteristics of query-search
result pairs: |
|
Sequence of words overlap
significantly |
|
First and last positions are
particularly important |
|
Publication keyword match |
|
Higher number of relevant
search results |
BLEU and NIST
|
|
|
|
Judge the fitness of a system
translation with a reference translation |
|
examine multiple granularities
of n-gram overlap |
|
BLEU – normalized between 0-1 |
|
NIST – only uses trigrams |
|
|
|
Use these features to model
subtle overlap properties |
Outline
|
|
|
Known Item Queries (KIQs) |
|
Data Collection |
|
Features of KIQs |
|
Evaluation |
|
Conclusions |
Machine Learning
Paradigms
|
|
|
Use Waikato’s Weka Machine
Learning Toolkit: |
|
|
|
Decision Trees (J48 module):
for their understandability in their hypotheses |
|
Support vector machines (SVM,
SMO module): for their robustness and general performance |
|
|
|
Compare versus majority
baseline (lower bound); and interjudge agreement (upper bound) |
Task 1: Query
Classification
Task 2: Query
Classification w/ search results
Task 3: Query Results
Classification
Applications of
Classifiers
|
|
|
|
Route patrons to intended
search results faster |
|
KIQs: Turn off fancy footwork:
no query expansion, spelling correction |
|
If we have it, skip to
circulation info |
|
If we don’t, show alternatives: |
|
Interlibrary Loan (ILL) |
|
Suggest to purchase |
Conclusions
|
|
|
|
First such work to demonstrate
an automated system that does query and search result classification |
|
System performance tied to
human performance |
|
Known item search possibly more
important than unknown item search |
|
Often we want to recall where
something is |
|
Known items are subjective; one
searcher’s known item is another’s unknown |
Future Work and
Acknowledgments
|
|
|
|
Extending query classification
to other areas |
|
E.g., the Web (Levinson and
Rose, 2003) |
|
Extending to user query
patterns |
|
Take advantage of sessioned
query logs |
|
|
|
Thanks to the NUS library staff
for their cooperation with our research!! |
|
Especially Ng Kok Koon and Yow
Wei Chui |
|
|
|
Thanks for sticking it out till
the end … |
|
Questions? |
Consulting the librarian
|
|
|
Five minutes later … stumped! |