Understanding Documents via Concept Links

Outline
DUC 2005 System Task
Targeted Sentences
Our Approach
System Overview
Concept Link
Sentence Similarity
Sentence Ranker: A modified MMR
Evaluation
Conclusions

DUC 2005 System Task
Task Definition in [Amigo et al, 04]
… topic-oriented, informative multi-document summarization, … compressed version of a set of documents 
Topic Creation Instructions
to formulate a topic out of interesting aspects
“At least 25 documents must each contribute some material to the answer” of a quest of the topic
Our view of the task
A general, and topic-oriented summary.

Targeted Sentences
Good DUC 2005 summary: an extract consists of sentences that
highly representative
highly relevant to the topic
General
Specific: named entities are favored
with minimal redundancy

System Overview

Concept Detection

Concept Link
There exists a Concept Link between each pair of similar concepts
Concept Similarity: maximal sense overlapping (Banerjee et al, 2003)
Consider all senses of each concept
Extended sense Sx:
     Synset + Gloss + hypernymy + meronymy set(1 level)

Concept Link Detection
1) A year ago Mr Douglas Hurd foreign secretary became the first UK cabinet minister to visit Argentina since the 1982 Falkland islands conflict.
2) Today Argentina gets out the red carpet for the UK Duke of York the first official royal visitor since the end of the Anglo Argentine Falklands war in 1982.

Concept Links between sentences

Sentence Similarity
Sum of “strength” of concept links

Sentence ranker
Original Weight: Representative Power

Sentence ranker
MMR modified

Evaluation: ROUGE

Evaluation: Pyramid

Slide 15

Experiments:

Conclusions:
    A simple system features
Concept Link: new way to calculate sentence similarity;
no chunker/parser involved
concept differs from NPs in Lexical Chain
Considering sentence similarity/relatedness via Concept Link:
Alleviate the influence of expression variations; (but might involve inaccurate sense guess)
Outperforms Word co-occurrence approach
Minimizing Redundancy via Modified MMR;
No extra heuristics involved.

Future work
Error analysis;
How to automatically set parameters;
Comparison with alternative Similarity Measures;
How about more knowledge (syntactic, semantic parsers …)?