Slide 1
Patterns Are Everywhere
Two Methods of Pattern
Matching
|
|
|
|
Hard Matching |
|
Rule induction |
|
Generalizing training instances
into regular expression represented rules |
|
Performing slot by slot
matching |
|
Soft Matching |
|
Hidden Markov Models (HMM) |
|
Soft pattern matching for
definitional QA (Cui et al., 2004) |
Hard Matching
|
|
|
|
Lack of flexibility in
matching |
|
Can’t deal with gaps between
rules and test instances |
Soft Matching (Cui et
al., 2004)
|
|
|
…… The channel Iqra is owned by
the …
…… severance packages, known as golden
parachutes, included ……
A battery is a cell which can
provide electricity. |
Weakness of Current Soft
Matching Models
|
|
|
|
|
Ad-hoc in model parameter
estimation |
|
Cui et al., 2004: Lack of
formalization |
|
Not generic |
|
Task specific topology for HMM |
|
Difficult to port to other
applications |
|
|
In This Paper …
|
|
|
|
|
Propose two generic soft
pattern models |
|
Bigram model |
|
Formalization of our previous
model |
|
Profile Hidden Markov Model
(PHMM) |
|
More complex model that handles
gaps better |
|
Parameter estimation by EM
algorithm |
|
Evaluations on definitional
question answering |
|
Can be applied to other pattern
matching applications |
|
|
|
|
Outline
|
|
|
Overview of Definitional QA |
|
Bigram Soft Pattern Model |
|
PHMM Soft Pattern Model |
|
Evaluations |
|
Conclusions |
|
|
Outline
|
|
|
Overview of Definitional QA |
|
Bigram Soft Pattern Model |
|
PHMM Soft Pattern Model |
|
Evaluations |
|
Conclusions |
|
|
Definitional QA
|
|
|
|
To answer questions like “Who
is Gunter Blobel” or “What is Wicca”. |
|
Why evaluating on definition
sentence retrieval? |
|
Diverse patterns |
|
Definitional QA is one of the
least explored areas in QA |
|
|
Pattern Matching for
Definitional QA
Outline
|
|
|
Overview of Definitional QA |
|
Bigram Soft Pattern Model |
|
PHMM Soft Pattern Model |
|
Evaluations |
|
Conclusions |
|
|
Bigram Soft Pattern Model
|
|
|
|
To estimate the interpolation
mixture weight λ |
|
Expectation Maximization (EM)
algorithm |
|
Count words and general tags
separately |
|
Avoid overwhelming frequency
count of general tags |
Bigram Model in Dealing
with Gaps
|
|
|
|
Bigram model can deal with gaps |
|
Unseen tokens have small
smoothing probabilities in specific positions |
|
|
Outline
|
|
|
Overview of Definitional QA |
|
Bigram Soft Pattern Model |
|
PHMM Soft Pattern Model |
|
Evaluations |
|
Conclusions |
|
|
PHMM Soft Pattern Model
|
|
|
Better solution for dealing
with gaps |
|
Left to right Hidden Markov
Model with insertion and deletion states |
How PHMM Deals with Gaps
|
|
|
|
Calculating generative
probability given a test instance |
|
Find the most probable path by
Viterbi algorithm |
|
Efficient calculation by
forward-backward algorithm |
Estimation of PHMM
|
|
|
|
|
Estimated by Baum-Welch
algorithm |
|
Using the most probable path
during training |
|
Random or uniform
initialization may lead to unsatisfactory model |
|
Extreme diversity of definition
patterns and not sufficient training data |
|
Assume path should favor match
states over others |
|
P( token | Match ) > P ( token
| Insertion ) |
|
Using smoothed ML estimates |
Outline
|
|
|
|
Overview of Definitional QA |
|
Bigram Soft Pattern Model |
|
PHMM Soft Pattern Model |
|
Evaluations |
|
Overall performance evaluation |
|
Sensitivity to model length |
|
Sensitivity to size of training
data |
|
Conclusions |
|
|
Evaluation Setup
|
|
|
|
|
Data set |
|
Test data: TREC-13 question
answering task data |
|
AQUAINT corpus and 64
definition questions with answers |
|
Training data |
|
761 manually labeled definition
sentences from TREC-12 question answering task data |
|
Comparison systems |
|
State-of-the-art manually
constructed patterns |
|
Most comprehensive manually
constructed patterns to our knowledge |
|
Previously proposed soft
pattern in Cui et al., 2004 |
|
|
|
|
Evaluation Metrics
|
|
|
|
|
Manually checked F3 measure |
|
Based on essential/acceptable
answer nuggets |
|
NR – proportion of returned
essential answer nuggets |
|
NP – penalty to longer answers |
|
Weighting NR 3 times as NP |
|
Subject to inconsistent scoring
among assessors |
|
|
|
Automatic ROUGE score |
|
Gold standard: sentences
containing answer nuggets |
|
Counting the trigrams shared in
the gold standard and system answers |
|
ROUGE-3-ALL (R3A) and
ROUGE-3-ESSENTIAL (R3E) |
Performance Evaluation
|
|
|
|
Soft pattern matching
outperforms hard matching |
|
Bigram and PHMM models perform
better than the previously proposed soft pattern method |
|
Previous soft pattern method is
not optimized |
|
Manual F3 scores correlate well
with automatic R3 scores |
|
|
Sensitivity to Model
Length
|
|
|
PHMM is less sensitive to model
length |
|
PHMM may handle longer
sequences |
Sensitivity to the Amount
of Training Data
|
|
|
PHMM requires more training
data to improve |
|
|
Discussions on Both
Models
|
|
|
|
|
Capture the same information |
|
The importance of a token’s
position in the context of the search term |
|
The sequential order of tokens |
|
Different in complexity |
|
Bigram model |
|
Simplified Markov model with
each token as a state |
|
Captures token sequential
information by bigram probabilities |
|
PHMM model |
|
More complex – aggregated token
sequential information by hidden state transition probabilities |
|
Experimental results show |
|
PHMM is less sensitive to model
length |
|
PHMM may benefit more by using
more training data |
Outline
|
|
|
Overview of Definitional QA |
|
Bigram Soft Pattern Model |
|
PHMM Soft Pattern Model |
|
Evaluations |
|
Conclusions |
|
|
Conclusions
|
|
|
|
Proposed Bigram model and PHMM
model |
|
Generic in the forms |
|
Systematic parameter estimation
by EM algorithm |
|
These two models can be applied
to other applications using surface text patterns |
|
Soft patterns have been applied
to information extraction (Xiao et al., 2004) |
|
PHMM is more flexible in
dealing with gaps, but requires more training data to converge |
Q & A