Understanding Documents via Concept Links

Outline

DUC 2005 System Task

Targeted Sentences

Our Approach

System Overview

Concept Link

Sentence Similarity

Sentence Ranker: A modified MMR

Evaluation

Conclusions

DUC 2005 System Task

Task Definition in [Amigo et al, 04]

… topic-oriented, informative multi-document summarization, … compressed version of a set of documents …

Topic Creation Instructions

to formulate a topic out of interesting aspects

“At least 25 documents must each contribute some material to the answer” of a quest of the topic

Our view of the task

A general, and topic-oriented summary.

Targeted Sentences

Good DUC 2005 summary: an extract consists of sentences that

highly representative

highly relevant to the topic

General

Specific: named entities are favored

with minimal redundancy

System Overview

Concept Detection

Concept Link

There exists a Concept Link between each pair of similar concepts

Concept Similarity: maximal sense overlapping (Banerjee et al, 2003)

Consider all senses of each concept

Extended sense Sx:

Synset + Gloss + hypernymy + meronymy set(1 level)

Concept Link Detection

1) A year ago Mr Douglas Hurd foreign secretary became the first UK cabinet minister to visit Argentina since the 1982 Falkland islands conflict.

2) Today Argentina gets out the red carpet for the UK Duke of York the first official royal visitor since the end of the Anglo Argentine Falklands war in 1982.

Concept Links between sentences

Sentence Similarity

Sum of “strength” of concept links

Sentence ranker

Original Weight: Representative Power

Sentence ranker

MMR modified

Evaluation: ROUGE

Evaluation: Pyramid

Slide 15

Experiments:

Conclusions:
A simple system features

Concept Link: new way to calculate sentence similarity;

no chunker/parser involved

concept differs from NPs in Lexical Chain

Considering sentence similarity/relatedness via Concept Link:

Alleviate the influence of expression variations; (but might involve inaccurate sense guess)

Outperforms Word co-occurrence approach

Minimizing Redundancy via Modified MMR;

No extra heuristics involved.

Future work

Error analysis;

How to automatically set parameters;

Comparison with alternative Similarity Measures;

How about more knowledge (syntactic, semantic parsers …)?

…


	DUC 2005 System Task
	Targeted Sentences
	Our Approach
		System Overview
		Concept Link
		Sentence Similarity
		Sentence Ranker: A modified MMR
	Evaluation
	Conclusions


	Task Definition in [Amigo et al, 04]
		… topic-oriented, informative multi-document summarization, … compressed version of a set of documents …
	Topic Creation Instructions
		to formulate a topic out of interesting aspects
		“At least 25 documents must each contribute some material to the answer” of a quest of the topic
	Our view of the task
		A general, and topic-oriented summary.


Good DUC 2005 summary: an extract consists of sentences that
	highly representative
	highly relevant to the topic
		General
		Specific: named entities are favored
	with minimal redundancy


There exists a Concept Link between each pair of similar concepts
Concept Similarity: maximal sense overlapping (Banerjee et al, 2003)
	Consider all senses of each concept
		Extended sense Sx:
		Synset + Gloss + hypernymy + meronymy set(1 level)


	1) A year ago Mr Douglas Hurd foreign secretary became the first UK cabinet minister to visit Argentina since the 1982 Falkland islands conflict.
	2) Today Argentina gets out the red carpet for the UK Duke of York the first official royal visitor since the end of the Anglo Argentine Falklands war in 1982.


	Concept Link: new way to calculate sentence similarity;
		no chunker/parser involved
		concept differs from NPs in Lexical Chain
	Considering sentence similarity/relatedness via Concept Link:
		Alleviate the influence of expression variations; (but might involve inaccurate sense guess)
		Outperforms Word co-occurrence approach
	Minimizing Redundancy via Modified MMR;
	No extra heuristics involved.


	Error analysis;
	How to automatically set parameters;
	Comparison with alternative Similarity Measures;
	How about more knowledge (syntactic, semantic parsers …)?
	…