Passage Retrieval in Question Answering

Density Based Passage Retrieval Method

However, density based can err when …

Our Solution

Examine the relationship between words

Dependency relations

Exact match of relations for answer extraction

Has low recall because same relations are often phrased differently

Fuzzy match of dependency relationship

Statistical similarity of relations

Measuring Sentence Similarity

Outline

Extracting and Paring Relation Paths

Measuring Path Match Scores

Learning Relation Mapping Scores

Evaluations

Conclusions

What Dependency Parsing is Like

Minipar (Lin, 1998) for dependency parsing

Dependency tree

Nodes: words/chunks in the sentence

Edges (ignoring the direction): labeled by relation types

Extracting Relation Paths

Relation path

Vector of relations between two nodes in the tree

Paired Paths from Question and Answer

Outline

Extracting and Paring Relation Paths

Measuring Path Match Scores

Learning Relation Mapping Scores

Evaluations

Conclusions

Measuring Path Match Degree

Employ a variation of IBM Translation Model 1

Path match degree (similarity) as translation probability

MatchScore (P_Q, P_S) → Prob (P_S| P_Q)

Relations as words

Why IBM Model 1?

No “word order” – bag of undirected relations

No need to estimate “target sentence length”

Relation paths are determined by the parsing tree

Calculating Translation Probability (Similarity) of Paths

Outline

Extracting and Paring Relation Paths

Measuring Path Match Scores

Learning Relation Mapping Scores

Evaluations

Conclusions

Training and Testing

Approach 1: MI Based

Measures bipartite co-occurrences in training path pairs

Accounts for path length (penalize those long paths)

Uses frequencies to approximate mutual information

Approach 2: EM Based

Employ the training method from IBM Model 1

Relation mapping scores = word translation probability

Utilize GIZA to accomplish training

Iteratively boosting the precision of relation translation probability

Initialization – assign 1 to identical relations and a small constant otherwise

Outline

Extracting and Paring Relation Paths

Measuring Path Match Scores

Learning Relation Mapping Scores

Evaluations

Can relation matching help?

Can fuzzy match perform better than exact match?

Can long questions benefit more?

Can relation matching work on top of query expansion?

Conclusions

Evaluation Setup

Training data

3k corresponding path pairs from 10k QA pairs (TREC-8, 9)

Test data

324 factoid questions from TREC-12 QA task

Passage retrieval on top 200 relevant documents by TREC

Comparison Systems

MITRE –baseline

Stemmed word overlapping

Baseline in previous work on passage retrieval evaluation

SiteQ – top performing density based method

using 3 sentence window

NUS

Similar to SiteQ, but using sentences as passages

Strict Matching of Relations

Simulate strict matching in previous work for answer selection

Counting the number of exactly matched paths

Relation matching are applied on top of MITRE and NUS

Evaluation Metrics

Mean reciprocal rank (MRR)

Measure the mean rank position of the correct answer in the returned rank list

On the top 20 returned passages

Percentage of questions with incorrect answers

Precision at the top one passage

Performance Evaluation

All improvements are statistically significant (p<0.001)

MI and EM do not make much difference given our training data

EM needs more training data

MI is more susceptible to noise, so may not scale well

Performance Variation to Question Length

Long questions, with more paired paths, tend to improve more

Using the number of non-trivial question terms to approximate question length

Error Analysis

Mismatch of question terms

e.g. In which city is the River Seine

Introduce question analysis

Paraphrasing between the question and the answer sentence

e.g. write the book → be the author of the book

Most of current techniques fail to handle it

Finding paraphrasing via dependency parsing (Lin and Pantel)

Performance on Top of Query Expansion

On top of query expansion, fuzzy relation matching brings a further 50% improvement

However

query expansion doesn’t help much on a fuzzy relation matching system

Expansion terms do not help in paring relation paths

Outline

Extracting and Paring Relation Paths

Measuring Path Match Scores

Learning Relation Mapping Scores

Evaluations

Conclusions

Conclusions

Proposed a novel fuzzy relation matching method for factoid QA passage retrieval

Brings dramatic 70%+ improvement over the state-of-the-art systems

Brings further 50% improvement over query expansion

Future QA systems should bring in relations between words for better performance

Query expansion should be integrated to relation matching seamlessly

Q & A


Examine the relationship between words
	Dependency relations
Exact match of relations for answer extraction
		Has low recall because same relations are often phrased differently

Fuzzy match of dependency relationship
	Statistical similarity of relations


	Extracting and Paring Relation Paths
	Measuring Path Match Scores
	Learning Relation Mapping Scores
	Evaluations
	Conclusions


	Minipar (Lin, 1998) for dependency parsing
	Dependency tree
		Nodes: words/chunks in the sentence
		Edges (ignoring the direction): labeled by relation types


	Relation path
		Vector of relations between two nodes in the tree


Employ a variation of IBM Translation Model 1
	Path match degree (similarity) as translation probability
		MatchScore (P_Q, P_S) → Prob (P_S\| P_Q)
		Relations as words

Why IBM Model 1?
	No “word order” – bag of undirected relations
	No need to estimate “target sentence length”
		Relation paths are determined by the parsing tree


	Measures bipartite co-occurrences in training path pairs
		Accounts for path length (penalize those long paths)
		Uses frequencies to approximate mutual information


	Employ the training method from IBM Model 1
		Relation mapping scores = word translation probability
		Utilize GIZA to accomplish training
		Iteratively boosting the precision of relation translation probability
	Initialization – assign 1 to identical relations and a small constant otherwise


	Training data
		3k corresponding path pairs from 10k QA pairs (TREC-8, 9)
	Test data
		324 factoid questions from TREC-12 QA task
	Passage retrieval on top 200 relevant documents by TREC


	MITRE –baseline
		Stemmed word overlapping
		Baseline in previous work on passage retrieval evaluation
	SiteQ – top performing density based method
		using 3 sentence window
	NUS
		Similar to SiteQ, but using sentences as passages
	Strict Matching of Relations
		Simulate strict matching in previous work for answer selection
		Counting the number of exactly matched paths

	Relation matching are applied on top of MITRE and NUS


	Mean reciprocal rank (MRR)
		Measure the mean rank position of the correct answer in the returned rank list
		On the top 20 returned passages
	Percentage of questions with incorrect answers
	Precision at the top one passage


	All improvements are statistically significant (p<0.001)
	MI and EM do not make much difference given our training data
		EM needs more training data
		MI is more susceptible to noise, so may not scale well


	Long questions, with more paired paths, tend to improve more
		Using the number of non-trivial question terms to approximate question length


	Mismatch of question terms
			e.g. In which city is the River Seine
			Introduce question analysis
	Paraphrasing between the question and the answer sentence
			e.g. write the book → be the author of the book
			Most of current techniques fail to handle it
			Finding paraphrasing via dependency parsing (Lin and Pantel)


	On top of query expansion, fuzzy relation matching brings a further 50% improvement
	However
		query expansion doesn’t help much on a fuzzy relation matching system
		Expansion terms do not help in paring relation paths


	Proposed a novel fuzzy relation matching method for factoid QA passage retrieval
		Brings dramatic 70%+ improvement over the state-of-the-art systems
		Brings further 50% improvement over query expansion
		Future QA systems should bring in relations between words for better performance

	Query expansion should be integrated to relation matching seamlessly