Slide 1
Passage Retrieval in
Question Answering
Density Based Passage
Retrieval Method
|
|
|
However, density based can err
when … |
Our Solution
|
|
|
|
|
Examine the relationship
between words |
|
Dependency relations |
|
Exact match of relations for
answer extraction |
|
Has low recall because same
relations are often phrased differently |
|
|
|
Fuzzy match of dependency
relationship |
|
Statistical similarity of
relations |
Measuring Sentence
Similarity
Outline
|
|
|
Extracting and Paring Relation
Paths |
|
Measuring Path Match Scores |
|
Learning Relation Mapping
Scores |
|
Evaluations |
|
Conclusions |
Outline
|
|
|
Extracting and Paring Relation
Paths |
|
Measuring Path Match Scores |
|
Learning Relation Mapping
Scores |
|
Evaluations |
|
Conclusions |
What Dependency Parsing
is Like
|
|
|
|
Minipar (Lin, 1998) for
dependency parsing |
|
Dependency tree |
|
Nodes: words/chunks in the
sentence |
|
Edges (ignoring the direction):
labeled by relation types |
Extracting Relation Paths
|
|
|
|
Relation path |
|
Vector of relations between two
nodes in the tree |
Paired Paths from
Question and Answer
Outline
|
|
|
Extracting and Paring Relation
Paths |
|
Measuring Path Match Scores |
|
Learning Relation Mapping
Scores |
|
Evaluations |
|
Conclusions |
Measuring Path Match
Degree
|
|
|
|
|
Employ a variation of IBM
Translation Model 1 |
|
Path match degree (similarity)
as translation probability |
|
MatchScore (PQ, PS)
→ Prob (PS | PQ ) |
|
Relations as words |
|
|
|
Why IBM Model 1? |
|
No “word order” – bag of
undirected relations |
|
No need to estimate “target
sentence length” |
|
Relation paths are determined
by the parsing tree |
Calculating Translation
Probability (Similarity) of Paths
Outline
|
|
|
Extracting and Paring Relation
Paths |
|
Measuring Path Match Scores |
|
Learning Relation Mapping
Scores |
|
Evaluations |
|
Conclusions |
Training and Testing
Approach 1: MI Based
|
|
|
|
Measures bipartite
co-occurrences in training path pairs |
|
Accounts for path length
(penalize those long paths) |
|
Uses frequencies to approximate
mutual information |
|
|
Approach 2: EM Based
|
|
|
|
Employ the training method from
IBM Model 1 |
|
Relation mapping scores = word
translation probability |
|
Utilize GIZA to accomplish
training |
|
Iteratively boosting the
precision of relation translation probability |
|
Initialization – assign 1 to
identical relations and a small constant otherwise |
Outline
|
|
|
|
Extracting and Paring Relation
Paths |
|
Measuring Path Match Scores |
|
Learning Relation Mapping
Scores |
|
Evaluations |
|
Can relation matching help? |
|
Can fuzzy match perform better
than exact match? |
|
Can long questions benefit
more? |
|
Can relation matching work on
top of query expansion? |
|
Conclusions |
Evaluation Setup
|
|
|
|
Training data |
|
3k corresponding path pairs
from 10k QA pairs (TREC-8, 9) |
|
Test data |
|
324 factoid questions from
TREC-12 QA task |
|
Passage retrieval on top 200
relevant documents by TREC |
Comparison Systems
|
|
|
|
MITRE –baseline |
|
Stemmed word overlapping |
|
Baseline in previous work on
passage retrieval evaluation |
|
SiteQ – top performing density
based method |
|
using 3 sentence window |
|
NUS |
|
Similar to SiteQ, but using
sentences as passages |
|
Strict Matching of Relations |
|
Simulate strict matching in
previous work for answer selection |
|
Counting the number of exactly
matched paths |
|
|
|
Relation matching are applied
on top of MITRE and NUS |
Evaluation Metrics
|
|
|
|
Mean reciprocal rank (MRR) |
|
Measure the mean rank position
of the correct answer in the returned rank list |
|
On the top 20 returned passages |
|
Percentage of questions with
incorrect answers |
|
Precision at the top one
passage |
Performance Evaluation
|
|
|
|
All improvements are
statistically significant (p<0.001) |
|
MI and EM do not make much
difference given our training data |
|
EM needs more training data |
|
MI is more susceptible to
noise, so may not scale well |
Performance Variation to
Question Length
|
|
|
|
Long questions, with more
paired paths, tend to improve more |
|
Using the number of non-trivial
question terms to approximate question length |
|
|
Error Analysis
|
|
|
|
|
Mismatch of question terms |
|
e.g. In which city is the River
Seine |
|
Introduce question analysis |
|
Paraphrasing between the
question and the answer sentence |
|
e.g. write the book → be
the author of the book |
|
Most of current techniques fail
to handle it |
|
Finding paraphrasing via
dependency parsing (Lin and Pantel) |
Performance on Top of
Query Expansion
|
|
|
|
On top of query expansion,
fuzzy relation matching brings a further 50% improvement |
|
However |
|
query expansion doesn’t help
much on a fuzzy relation matching system |
|
Expansion terms do not help in
paring relation paths |
Outline
|
|
|
Extracting and Paring Relation
Paths |
|
Measuring Path Match Scores |
|
Learning Relation Mapping
Scores |
|
Evaluations |
|
Conclusions |
Conclusions
|
|
|
|
Proposed a novel fuzzy relation
matching method for factoid QA passage retrieval |
|
Brings dramatic 70%+
improvement over the state-of-the-art systems |
|
Brings further 50% improvement
over query expansion |
|
Future QA systems should bring
in relations between words for better performance |
|
|
|
Query expansion should be
integrated to relation matching seamlessly |
Q & A