Slide 1

Patterns Are Everywhere

Two Methods of Pattern Matching
Hard Matching
Rule induction
Generalizing training instances into regular expression represented rules
Performing slot by slot matching
Soft Matching
Hidden Markov Models (HMM)
Soft pattern matching for definitional QA (Cui et al., 2004)

Hard Matching
Lack of flexibility in matching
Can’t deal with gaps between rules and test instances

Soft Matching (Cui et al., 2004)
…… The channel Iqra is owned by the …
   …… severance packages, known as golden parachutes, included ……
   A battery is a cell which can provide electricity.

Weakness of Current Soft Matching Models
Ad-hoc in model parameter estimation
Cui et al., 2004: Lack of formalization
Not generic
Task specific topology for HMM
Difficult to port to other applications

In This Paper …
Propose two generic soft pattern models
Bigram model
Formalization of our previous model
Profile Hidden Markov Model (PHMM)
More complex model that handles gaps better
Parameter estimation by EM algorithm
Evaluations on definitional question answering
Can be applied to other pattern matching applications

Outline
Overview of Definitional QA
Bigram Soft Pattern Model
PHMM Soft Pattern Model
Evaluations
Conclusions

Outline
Overview of Definitional QA
Bigram Soft Pattern Model
PHMM Soft Pattern Model
Evaluations
Conclusions

Definitional QA
To answer questions like “Who is Gunter Blobel” or “What is Wicca”.
Why evaluating on definition sentence retrieval?
Diverse patterns
Definitional QA is one of the least explored areas in QA

Pattern Matching for Definitional QA

Outline
Overview of Definitional QA
Bigram Soft Pattern Model
PHMM Soft Pattern Model
Evaluations
Conclusions

Bigram Soft Pattern Model
To estimate the interpolation mixture weight λ
Expectation Maximization (EM) algorithm
Count words and general tags separately
Avoid overwhelming frequency count of general tags

Bigram Model in Dealing with Gaps
Bigram model can deal with gaps
Unseen tokens have small smoothing probabilities in specific positions

Outline
Overview of Definitional QA
Bigram Soft Pattern Model
PHMM Soft Pattern Model
Evaluations
Conclusions

PHMM Soft Pattern Model
Better solution for dealing with gaps
Left to right Hidden Markov Model with insertion and deletion states

How PHMM Deals with Gaps
Calculating generative probability given a test instance
Find the most probable path by Viterbi algorithm
Efficient calculation by forward-backward algorithm

Estimation of PHMM
Estimated by Baum-Welch algorithm
Using the most probable path during training
Random or uniform initialization may lead to unsatisfactory model
Extreme diversity of definition patterns and not sufficient training data
Assume path should favor match states over others
P( token | Match ) > P ( token | Insertion )
Using smoothed ML estimates

Outline
Overview of Definitional QA
Bigram Soft Pattern Model
PHMM Soft Pattern Model
Evaluations
Overall performance evaluation
Sensitivity to model length
Sensitivity to size of training data
Conclusions

Evaluation Setup
Data set
Test data: TREC-13 question answering task data
AQUAINT corpus and 64 definition questions with answers
Training data
761 manually labeled definition sentences from TREC-12 question answering task data
Comparison systems
State-of-the-art manually constructed patterns
Most comprehensive manually constructed patterns to our knowledge
Previously proposed soft pattern in Cui et al., 2004

Evaluation Metrics
Manually checked F3 measure
Based on essential/acceptable answer nuggets
NR – proportion of returned essential answer nuggets
NP – penalty to longer answers
Weighting NR 3 times as NP
Subject to inconsistent scoring among assessors
Automatic ROUGE score
Gold standard: sentences containing answer nuggets
Counting the trigrams shared in the gold standard and system answers
ROUGE-3-ALL (R3A) and ROUGE-3-ESSENTIAL (R3E)

Performance Evaluation
Soft pattern matching outperforms hard matching
Bigram and PHMM models perform better than the previously proposed soft pattern method
Previous soft pattern method is not optimized
Manual F3 scores correlate well with automatic R3 scores

Sensitivity to Model Length
PHMM is less sensitive to model length
PHMM may handle longer sequences

Sensitivity to the Amount of Training Data
PHMM requires more training data to improve

Discussions on Both Models
Capture the same information
The importance of a token’s position in the context of the search term
The sequential order of tokens
Different in complexity
Bigram model
Simplified Markov model with each token as a state
Captures token sequential information by bigram probabilities
PHMM model
More complex – aggregated token sequential information by hidden state transition probabilities
Experimental results show
PHMM is less sensitive to model length
PHMM may benefit more by using more training data

Outline
Overview of Definitional QA
Bigram Soft Pattern Model
PHMM Soft Pattern Model
Evaluations
Conclusions

Conclusions
Proposed Bigram model and PHMM model
Generic in the forms
Systematic parameter estimation by EM algorithm
These two models can be applied to other applications using surface text patterns
Soft patterns have been applied to information extraction (Xiao et al., 2004)
PHMM is more flexible in dealing with gaps, but requires more training data to converge

Q & A