SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Downloaden Sie, um offline zu lesen
Automatic Text Summarization
           Katja Filippova
    filippova@eml-research.de

         EML Research gGmbH
            TU Darmstadt




                                Text Summarization – 25.02.2009 – p. 1
Text summarization
• A summary is a text that is produced from one or more
  texts, that contains a significant portion of the information in
  the original text(s), and that is no longer than half of the
  original text(s) (Hovy, 2003)


                              • information retrieval
                              • stock market prediction
                              • generation of abstracts
                              • online news summarization
                              • ...



                                                   Text Summarization – 25.02.2009 – p. 2
Overview
• Introduction
   • classification of summarization systems
   • abstraction vs. extraction
• Text cohesion and coherence for summarization
   • graph based methods
  • discourse structure based methods
• Document Understanding Conference
   • tasks
  • an example
• Research directions
   • sentence fusion and compression
   • integrating world knowledge
                                              Text Summarization – 25.02.2009 – p. 3
Text summarization: types

 • A summary is a text that is produced from one or more
   texts, that contains a significant portion of the information in
   the original text(s), and that is no longer than half of the
   original text(s) (Hovy, 2003)
 • Indicative
  « indicates types of information
  « “alerts”




                                                    Text Summarization – 25.02.2009 – p. 4
Text summarization: types

 • A summary is a text that is produced from one or more
   texts, that contains a significant portion of the information in
   the original text(s), and that is no longer than half of the
   original text(s) (Hovy, 2003)
 • Indicative
  « indicates types of information
  « “alerts”
 • Informative
  « includes quantitative/qualitative information
  « “informs”


                                                    Text Summarization – 25.02.2009 – p. 4
Text summarization: types

 • A summary is a text that is produced from one or more
   texts, that contains a significant portion of the information in
   the original text(s), and that is no longer than half of the
   original text(s) (Hovy, 2003)
 • Indicative
  « indicates types of information
  « “alerts”
 • Informative
  « includes quantitative/qualitative information
  « “informs”
 • Critic/evaluative
  « evaluates the content of the document           Text Summarization – 25.02.2009 – p. 4
Text summarization: types

INDICATIVE
 • The work of Consumer Advice Centres is examined. The
   information sources used to support this work are reviewed.
   The recent closure of many CACs has seriously affected the
   availability of consumer information and advice. The
   contribution that public libraries can make in enhancing the
   availability of consumer information and advice both to the
   public and other agencies involved in consumer information
   and advice, is discussed.




                                                 Text Summarization – 25.02.2009 – p. 5
Text summarization: types

INFORMATIVE
 • An examination of the work of Consumer Advice Centres
   and of the information sources and support activities that
   public libraries can offer. CACs have dealt with pre-shopping
   advice, education on consumers’ rights and complaints
   about goods and services, advising the client and often
   obtaining expert assessment. They have drawn on a wide
   range of information sources including case records, trade
   literature, contact files and external links. The recent closure
   of many CACs has seriously affected the availability of
   consumer information and advice. Libraries can cooperate
   closely with advice agencies through local coordinating
   committed, shared premises, join publicity referral and the
   sharing of professional expertise.
                                                   Text Summarization – 25.02.2009 – p. 5
Text summarization: types

 • Source: single-document vs. multi-document
  « research paper
  « proceedings of a conference




                                                Text Summarization – 25.02.2009 – p. 6
Text summarization: types

 • Source: single-document vs. multi-document
  « research paper
  « proceedings of a conference
 • Content: generic vs. query-based vs. user-focused
  « equal coverage of all major topics
  « based on a question “what are the causes of the war?”
  « users interested in chemistry




                                                Text Summarization – 25.02.2009 – p. 6
Text summarization: types

 • Source: single-document vs. multi-document
  « research paper
  « proceedings of a conference
 • Content: generic vs. query-based vs. user-focused
  « equal coverage of all major topics
  « based on a question “what are the causes of the war?”
  « users interested in chemistry
 • Form: extract vs. abstract
  « fragments from the document
  « newly re-written text


                                                Text Summarization – 25.02.2009 – p. 6
Extraction vs. abstraction

How should a text summarization system proceed?

 • read the documents




 • understand them – build
   a semantic representation


 • generate a summary from
   this representation


                                              Text Summarization – 25.02.2009 – p. 7
Extraction vs. abstraction
 • unfortunately, a rich semantic representation is not
   possible yet
 • to date, most summarization systems are extractive

 • usually, extraction units are sentences

 • low cost solution: could work without ontologies,
   complex representations, etc.
 • extractive summaries are usually incoherent

 • trade-off between non-redundancy and completeness




                                                  Text Summarization – 25.02.2009 – p. 8
Extraction vs. abstraction

Three sentences from related documents (Oct. 27 2009):
 • The Syrian foreign minister today condemned the killing of
   eight civilians in a US raid as an act of quot;criminal and terrorist
   aggressionquot;. (The Guardian)
 • Syria accused the United States on Monday of carrying out
   a quot;terrorist aggressionquot; after a deadly raid near its border
   with Iraq which it said killed eight civilians. (Reuters)
 • Lebanese President Michel Suleiman on Monday contacted
   his Syrian counterpart Bashar Assad to denounce
   quot;Sunday’s American aggressionquot; against the Syrian village
   of Abu Kamal near the border with Iraq, local Elnashra
   website reported. (Aljazeera)

                                                    Text Summarization – 25.02.2009 – p. 9
Extraction vs. abstraction

Three sentences from related documents (Oct. 27 2009):
 • The Syrian foreign minister today condemned the killing of
   eight civilians in a US raid as an act of quot;criminal and terrorist
   aggressionquot;. (The Guardian)
 • Syria accused the United States on Monday of carrying out
   a quot;terrorist aggressionquot; after a deadly raid near its border
   with Iraq which it said killed eight civilians. (Reuters)
 • Lebanese President Michel Suleiman on Monday contacted
   his Syrian counterpart Bashar Assad to denounce
   quot;Sunday’s American aggressionquot; against the Syrian village
   of Abu Kamal near the border with Iraq, local Elnashra
   website reported. (Aljazeera)

                                                    Text Summarization – 25.02.2009 – p. 9
Extraction vs. abstraction

Three sentences from related documents (Oct. 27 2009):
 • The Syrian foreign minister today condemned the killing of
   eight civilians in a US raid as an act of quot;criminal and terrorist
   aggressionquot;. (The Guardian)
 • Syria accused the United States on Monday of carrying out
   a quot;terrorist aggressionquot; after a deadly raid near its border
   with Iraq which it said killed eight civilians. (Reuters)
 • Lebanese President Michel Suleiman on Monday contacted
   his Syrian counterpart Bashar Assad to denounce
   quot;Sunday’s American aggressionquot; against the Syrian village
   of Abu Kamal near the border with Iraq, local Elnashra
   website reported. (Aljazeera)

                                                    Text Summarization – 25.02.2009 – p. 9
Extraction vs. abstraction

Three sentences from related documents (Oct. 27 2009):
 • The Syrian foreign minister today condemned the killing of
   eight civilians in a US raid as an act of quot;criminal and terrorist
   aggressionquot;. (The Guardian)
 • Syria accused the United States on Monday of carrying out
   a quot;terrorist aggressionquot; after a deadly raid near its border
   with Iraq which it said killed eight civilians. (Reuters)
 • Lebanese President Michel Suleiman on Monday contacted
   his Syrian counterpart Bashar Assad to denounce
   quot;Sunday’s American aggressionquot; against the Syrian village
   of Abu Kamal near the border with Iraq, local Elnashra
   website reported. (Aljazeera)

                                                    Text Summarization – 25.02.2009 – p. 9
Extraction vs. abstraction
 • extractive summaries are not coherent – sentences pulled
  out from different documents make sense each but sound
  awkward when put together




                                               Text Summarization – 25.02.2009 – p. 10
Extraction vs. abstraction
 • extractive summaries are not coherent – sentences pulled
  out from different documents make sense each but sound
  awkward when put together
 • unresolved pronouns may distort the meaning




                                               Text Summarization – 25.02.2009 – p. 10
Extraction vs. abstraction
 • extractive summaries are not coherent – sentences pulled
   out from different documents make sense each but sound
   awkward when put together
 • unresolved pronouns may distort the meaning

 • beginning with a sentence which starts with However, ... is
   not a good idea




                                                 Text Summarization – 25.02.2009 – p. 10
Extraction vs. abstraction
 • extractive summaries are not coherent – sentences pulled
   out from different documents make sense each but sound
   awkward when put together
 • unresolved pronouns may distort the meaning

 • beginning with a sentence which starts with However, ... is
   not a good idea
 • there is a striking difference with human generated texts –
   pronouns and connectives are in the right place, the flow of
   discourse makes sense



                                                 Text Summarization – 25.02.2009 – p. 10
Extraction vs. abstraction
 • extractive summaries are not coherent – sentences pulled
   out from different documents make sense each but sound
   awkward when put together
 • unresolved pronouns may distort the meaning

 • beginning with a sentence which starts with However, ... is
   not a good idea
 • there is a striking difference with human generated texts –
   pronouns and connectives are in the right place, the flow of
   discourse makes sense
 • How could one use this property of natural discourse for
   summarization?
                                                 Text Summarization – 25.02.2009 – p. 10
Text coherence vs. text cohesion
 • John enjoys playing the piano. John wants to become a
  famous piano player. John works hard and works hard every
  day. Working hard is necessary to become a famous piano
  player.




                                              Text Summarization – 25.02.2009 – p. 11
Text coherence vs. text cohesion
 • John enjoys playing the piano. John wants to become a
  famous piano player. John works hard and works hard every
  day. Working hard is necessary to become a famous piano
  player.




                                              Text Summarization – 25.02.2009 – p. 11
Text coherence vs. text cohesion
 • John enjoys playing the piano. John wants to become a
  famous piano player. John works hard and works hard every
  day. Working hard is necessary to become a famous piano
  player.
 • John enjoys playing the piano. However, he woke up early
  yesterday. But the day before yesterday the weather was
  wonderful, because rain and snow started immediately and
  continued the whole day through. By the way, his teacher
  did the same.




                                               Text Summarization – 25.02.2009 – p. 11
Text coherence vs. text cohesion
 • John enjoys playing the piano. John wants to become a
  famous piano player. John works hard and works hard every
  day. Working hard is necessary to become a famous piano
  player.
 • John enjoys playing the piano. However, he woke up early
  yesterday. But the day before yesterday the weather was
  wonderful, because rain and snow started immediately and
  continued the whole day through. By the way, his teacher
  did the same.




                                               Text Summarization – 25.02.2009 – p. 11
Text coherence vs. text cohesion
 • John enjoys playing the piano. John wants to become a
  famous piano player. John works hard and works hard every
  day. Working hard is necessary to become a famous piano
  player.
 • John enjoys playing the piano. However, he woke up early
  yesterday. But the day before yesterday the weather was
  wonderful, because rain and snow started immediately and
  continued the whole day through. By the way, his teacher
  did the same.
 • John enjoys playing the piano and wants to become famous.
  He works hard and does it every day because it is
  necessary for his goal.

                                               Text Summarization – 25.02.2009 – p. 11
Text coherence vs. text cohesion
 • Text coherence represents the overall structure of a
  multi-sentence text in terms of macro-level relations
  between clauses or sentences (Halliday & Hasan, 1996).
  « Rhetorical Structure Theory (Mann & Thompson, 1988)
  « Discourse Representation Theory (Kamp, 1981)
  « Discourse Lexicalized Tree Adjoining Grammar (Forbes,
     2001)
 • John enjoys playing the piano. [John wants to become a
   famous piano player.] (that’s why) [John works hard and
   works hard every day.] Working hard is necessary to
   become a famous piano player.


                                                Text Summarization – 25.02.2009 – p. 12
Text coherence vs. text cohesion
 • Text cohesion involves relations between words, word
  senses, or referring expressions, which determine how
  tightly connected the text is (Halliday & Hasan, 1996).
  « anaphora, ellipsis, connectives
  « synonymy and other lexical relations
 • John enjoys playing the piano. However, he woke up early
  yesterday. But the day before yesterday the weather was
  wonderful, because rain and snow started immediately and
  continued the whole day through. By the way, his teacher
  did the same.



                                                Text Summarization – 25.02.2009 – p. 12
Coherence based summarization
• earlier systems considered technical documents and aimed
  at identifying important information by assigning weights to
  sentences (Luhn, 1958; Edmundson, 1969)
• several weighted features were used:
 « word (stem) frequency
 « presence of cue words (e.g., as a result, significant)
   which signalize important content
 « sentence position
 « document structure
• feature weights were tuned manually



                                                 Text Summarization – 25.02.2009 – p. 13
Coherence based summarization
• Rhetorical Structure Theory (Mann & Thompson, 1987)
   • elaboration
  • example
  • contrast
  • background
  • motivation
  • etc.

                                                      Circumstance
             Attribution



       quot;I am optimisticquot;
                               said Mr. Smith
                                                  as the market plunged.

                           (from Sporleder & Lapata, 2005)
                                                                     Text Summarization – 25.02.2009 – p. 14
Coherence based summarization
• one could use discourse structure for summarization
  (Marcu, 2000)
• however, this is not done often:
   • there are few discourse parsers and they are not very
     precise
   • there are arguments whether tree representation is
     sufficient for discourse (Wolf & Gibson, 2005)
   • it is not obvious to classify rhetorical relations
   • some relations are argued to be anaphoric and not
     discourse (Webber et al., 2003)



                                               Text Summarization – 25.02.2009 – p. 15
Cohesion based summarization
 • it is common to represent a text as a graph, where nodes
   are sentences and edges are some relations between them
   (e.g., discourse relations or just similarity)
 • a common graph connectivity assumption is that the nodes
   which are connected to many other nodes are likely to carry
   salient information
 • it is also assumed that nodes whose removal affects the
   structure of the document are important (Skorochodko, 1972
   from Mani, 2001)




                                                Text Summarization – 25.02.2009 – p. 16
Cohesion based summarization
 • it is common to represent a text as a graph, where nodes
   are sentences and edges are some relations between them
   (e.g., discourse relations or just similarity)
 • a common graph connectivity assumption is that the nodes
   which are connected to many other nodes are likely to carry
   salient information
 • it is also assumed that nodes whose removal affects the
   structure of the document are important (Skorochodko, 1972
   from Mani, 2001)




                                                Text Summarization – 25.02.2009 – p. 16
Cohesion based summarization
 • modern approaches extend this idea and use PageRank
  (Page & Brin, 1998) to find salient nodes (Erkan & Radev,
  2004; Mihalcea & Tarau, 2004) in such a graph




                         • similar sentences are connected
                           (bag-of-words similarity)




                                               Text Summarization – 25.02.2009 – p. 17
Cohesion based summarization
 • modern approaches extend this idea and use PageRank
  (Page & Brin, 1998) to find salient nodes (Erkan & Radev,
  2004; Mihalcea & Tarau, 2004) in such a graph



                         • similar sentences are connected
                           (bag-of-words similarity)
                         • a similarity threshold is used




                                               Text Summarization – 25.02.2009 – p. 17
Cohesion based summarization
 • modern approaches extend this idea and use PageRank
  (Page & Brin, 1998) to find salient nodes (Erkan & Radev,
  2004; Mihalcea & Tarau, 2004) in such a graph


                         • similar sentences are connected
                           (bag-of-words similarity)
                         • a similarity threshold is used
                         • the top N of page-ranked
                           sentences are extracted




                                               Text Summarization – 25.02.2009 – p. 17
Coherence vs. cohesion based TS
  • Coherence:
      + transparent; coherence of the output can be improved
      – annotation of relations is still a challenge; preprocessing
        difficulties
  • Cohesion:
      + intuitively appealing; low-cost; even unsupervized
      – requires WSD*, anaphora resolution; hard to pin down;
        tuned thresholds

* word sense disambiguation




                                                     Text Summarization – 25.02.2009 – p. 18
DUC competitions

 • Document Understanding Conferences (2000-2007)
 • from 2008 Text Analysis Conference (TAC)

 • provide participants with
    - a task
    - data
    - manual and automatic evaluation
 • increasing challenge in tasks: from generic single-document
   summarization to multi-document update summary (2008)




                                               Text Summarization – 25.02.2009 – p. 19
DUC competitions

Sample topic:   D0740I


round-the-world balloon flight


Report on the planning, attempts and first
successful balloon circumnavigation of the earth
by Bertrand Piccard and his crew.




                                     Text Summarization – 25.02.2009 – p. 20
DUC competitions
 <DOC>
<DOCNO> APW19981112.0453 </DOCNO>
<DOCTYPE> NEWS STORY </DOCTYPE>
<DATE_TIME> 11/12/1998 08:21:00 </DATE_TIME>
<HEADER> w1942 &Cx1f; wstm- r i &Cx13; &Cx11; BC-Switzerland-BalloonQu
11-12 0355 </HEADER>
<BODY>
<SLUG> BC-Switzerland-Balloon Quest </SLUG> <HEADLINE> Swiss challenger
prepares third attempt at global record </HEADLINE> &UR; AP Photos GEV
101-102 &QL; <TEXT> GENEVA (AP) _ Swiss balloon pilot Bertrand Piccard
and his new teammate, British flight engineer Tony Brown, said Thursday
they will be ready later this month for a new attempt to fly nonstop
round the world.       Their new Breitling Orbiter 3 balloon will take off
from Chateau d’Oex, in the Swiss Alps, as soon after Nov.                     25 as weather
conditions are favorable, they said.              It will be Piccard’s third attempt
to become the first to pilot a balloon around the world.                    In February
the Swiss pilot, along with British flight engineer AndyText Summarization – 25.02.2009 – p. 20
                                                          Elson and
The EML NLP group at DUC 2007




                           Text Summarization – 25.02.2009 – p. 21
Preprocessing: Annotation

 • Sentence splitting
 • Tokenization
 • PoS tagging
 • Chunking
 • Named Entities recognition




                                Text Summarization – 25.02.2009 – p. 22
Preprocessing: Problems

 • Sentence splitting
   <sentence>At Pine Ridge, a scrolling marquee
   at Big Bat’s Texaco expressed both joy over
   Clinton’s visit and wariness of all the
   official attention: “Welcome President
   Clinton.</sentence> <sentence>Remember our
   treaties,” the sign read.




                                     Text Summarization – 25.02.2009 – p. 23
Preprocessing: Problems

 • Sentence splitting
   <sentence>At Pine Ridge, a scrolling marquee
   at Big Bat’s Texaco expressed both joy over
   Clinton’s visit and wariness of all the
   official attention: “Welcome President
   Clinton.</sentence> <sentence>Remember our
   treaties,” the sign read.
 • and cleaning
    <sentence>PINE RIDGE, S.D.</sentence>
   <sentence>(AP) - President Clinton turned the
   attention of his national poverty tour today
   to arguably the poorest, most forgotten U.S.
   citizens of them all: American
   Indians.</sentence>               Text Summarization – 25.02.2009 – p. 23
Preprocessing: Document filtering

 • Match topic with document extracts
 • Pick the top 5 matching documents




                                        Text Summarization – 25.02.2009 – p. 24
Semantic analysis


 • Filter topic
 • Connect topic words with words in
   document sentences
 • Compute sentence scores
      matching words
      matching word sequences


« ranked list of sentences


                                       Text Summarization – 25.02.2009 – p. 25
Extractive summary generation




 • Rerank sentences
 • Select the top non-redundant sentences (250 word limit)
 • Re-arrange sentences                        Text Summarization – 25.02.2009 – p. 26
A good summary

Round-the-world balloon flight: Report on the planning, attempts
and first successful balloon circumnavigation of the earth by
Bertrand Piccard and his crew.

Swiss balloon pilot Bertrand Piccard announced Wednesday
that he has chosen Brian Jones as his teammate for his next
attempt at circling the world in a balloon. Jones, 52, replaces
fellow British flight engineer Tony Brown. Achieving what
promoters called the last great milestone of aviation, Bertrand
Piccard and Brian Jones joined legends like the Wright Brothers
and Charles Lindbergh with Saturday’s completion of the first
manned round-the-world balloon flight. At 4:54 a.m. EST
Saturday, the two balloonists crossed the line of longitude from
which they had departed on March 1 at Chateau D’Oex,
Switzerland, ...                                 Text Summarization – 25.02.2009 – p. 27
A bad summary

Angelina Jolie: What have been the most recent significant
events in the life and career of actress Angelina Jolie?

Angelina Jolie’s win for best supporting actress for her role in
“Girl, Interrupted” came 21 years after father Jon Voight was
awarded best actor for “Coming Home.“ ANGELINA JOLIE’S
LIFE ON THE EDGE After all, her career is in overdrive. But
Jolie cautions that she’s still a serious actress. It’s not like I’m
suddenly a better actress because I have awards or this box
office clout,” she says. “I am secure in the fact that I do have
something to offer as an actress,”Jolie says. ‘...



                                                       Text Summarization – 25.02.2009 – p. 28
Evaluation
• automatic evaluation with ROUGE (Lin, 2004)

• manual evaluation with respect to
 « responsiveness
 « linguistic quality
   1. grammaticality
   2. non-redundancy
   3. referential clarity
   4. focus
   5. structure and coherence
• our system scored above the average, top 5 for
  non-redundancy and coherence (recall the document
  filtering stage)
                                                Text Summarization – 25.02.2009 – p. 29
Research directions
 • like in information retrieval, query expansion is expected to
  improve recall
  « WordNet (Fellbaum, 1998) for similarity
  « Wikipedia for relatedness (Strube & Ponzetto, 2006)
  « paraphrases




                                                 Text Summarization – 25.02.2009 – p. 30
Research directions
 • like in information retrieval, query expansion is expected to
  improve recall
  « WordNet (Fellbaum, 1998) for similarity
  « Wikipedia for relatedness (Strube & Ponzetto, 2006)
  « paraphrases
 • coreference resolution is needed for preprocessing,
   otherwise, e.g., pronouns are filtered as stopwords




                                                 Text Summarization – 25.02.2009 – p. 30
Research directions
 • like in information retrieval, query expansion is expected to
  improve recall
  « WordNet (Fellbaum, 1998) for similarity
  « Wikipedia for relatedness (Strube & Ponzetto, 2006)
  « paraphrases
 • coreference resolution is needed for preprocessing,
   otherwise, e.g., pronouns are filtered as stopwords
 • relevance vs. redundancy issue: in MDS, how can we
   ensure non-redundancy of the summary? (Carbonell &
   Goldstein, 1998)


                                                 Text Summarization – 25.02.2009 – p. 30
Research directions
 • like in information retrieval, query expansion is expected to
  improve recall
  « WordNet (Fellbaum, 1998) for similarity
  « Wikipedia for relatedness (Strube & Ponzetto, 2006)
  « paraphrases
 • coreference resolution is needed for preprocessing,
   otherwise, e.g., pronouns are filtered as stopwords
 • relevance vs. redundancy issue: in MDS, how can we
   ensure non-redundancy of the summary? (Carbonell &
   Goldstein, 1998)
 • sentence ordering for extractive MDS (Barzilay & Lapata,
   2005)
                                                 Text Summarization – 25.02.2009 – p. 30
Directions of research

 • abstractive summarization is a distant goal but there are
  ways to go beyond sentence extraction
  « sentence compression
  « sentence fusion




                                                  Text Summarization – 25.02.2009 – p. 31
Sentence compression

This is true, regardless of the opinion that some people have of Syria, and of
their unhappiness at Syria’s presence in Lebanon.




                                                             Text Summarization – 25.02.2009 – p. 32
Sentence compression

This is true, regardless of the opinion that some people have of Syria, and of
their unhappiness at Syria’s presence in Lebanon.




                                                             Text Summarization – 25.02.2009 – p. 32
Sentence compression

This is true, regardless of the opinion that some people have of Syria, and of
their unhappiness at Syria’s presence in Lebanon.

  • summarization on the sentence level

  • in principle, a compression can be different from the input
    (different wording and structure)
  • to date, most systems use word deletion only

  • meanwhile there is a compression corpus available online
    http://homepages.inf.ed.ac.uk/s0460084/data
  • the performance can be evaluated automatically



                                                             Text Summarization – 25.02.2009 – p. 32
Sentence fusion
 1 John Smith, born November 15 1900, studied chemistry and physics at
   the University of London.
 2 From 1917 Mr. Smith studied at the University of London and in 1921 he
   graduated with distinction.




                                                        Text Summarization – 25.02.2009 – p. 33
Sentence fusion
 1 John Smith, born November 15 1900, studied chemistry and physics at
   the University of London.
 2 From 1917 Mr. Smith studied at the University of London and in 1921 he
   graduated with distinction.
« Mr. Smith studied chemistry and physics at the University of London
  from 1917.

 • pieces of related sentences are used to generate a novel
   sentence
 • can be seen as a middle ground between extractive and
   abstractive summarization
 • addresses the incompleteness-redundancy problem

                                                        Text Summarization – 25.02.2009 – p. 33
Thank you!




             (FOR YOUR ATTENTION)




                                    Text Summarization – 25.02.2009 – p. 34
References
• R. Barzilay & M. Lapata, 2005: Modeling local coherence:
  An entity-based approach
• S. Brin & L. Page, 1998: The anatomy of a large-scale
  hypertextual web search engine
• J. G. Carbonell & J. Goldstein, 1998: The use of MMR,
  diversity-based reranking for reordering documents and
  producing summaries
• H. P. Edmundson, 1969: New methods in automatic
  extracting
• G. Erkan & D. Radev, 2004: LexRank: Graph-based lexical
  centrality as salience in text summarization
• C. Fellbaum, 1998: WordNet: An electronic lexical database

                                                 Text Summarization – 25.02.2009 – p. 35
References
• K. Forbes, E. Miltsakaki, R. Prasad, A. Sarkar, A. Joshi, B.
  L. Webber, 2001: DLTAG system – discourse parsing with a
  Lexicalized Tree Adjoining Grammar
• M. Halliday & R. Hasan, 1996: Cohesion in text
• E. H. Hovy, 2003: Text summarization
• H. Kamp, 1981: A theory of truth and semantic
  representation
• C.-Y. Lin, 2004: Automatic evaluation of summaries using
  N-gram co-occurrence statistics
• H. P. Luhn, 1958: The automatic creation of literature
  abstracts
• I. Mani, 2001: Automatic summarization
                                                 Text Summarization – 25.02.2009 – p. 36
References
• W. C. Mann & S. A. Thompson, 1988: Rhetorical structure
  theory. Towards a functional theory of text organization
• D. Marcu, 2000: The theory and practice of discourse
  parsing and summarization
• R. Mihalcea & P. Tarau, 2004: TextRank: Bringing order
  into text
• E. Skorochodko, 1972: Adaptive method of automatic
  abstracting and indexing
• C. Sporleder & M. Lapata, 2005: Discourse chunking and its
  application to sentence compression
• M. Strube & S. P. Ponzetto, 2006: WikiRelate! Computing
  semantic relatedness using Wikipedia
                                                 Text Summarization – 25.02.2009 – p. 37
References
• B. L. Webber, M. Stone, A. Joshi, A. Knott, 2003: Anaphora
  and discourse structure
• F. Wolf & E. Gibson, 2005: Representing discourse
  coherence: A corpus-based study




                                               Text Summarization – 25.02.2009 – p. 38

Weitere ähnliche Inhalte

Andere mochten auch

Introduction to Automatic Summarization
Introduction to Automatic SummarizationIntroduction to Automatic Summarization
Introduction to Automatic SummarizationHitoshi Nishikawa
 
Automatic Summarization (2014)
Automatic Summarization (2014)Automatic Summarization (2014)
Automatic Summarization (2014)Hitoshi Nishikawa
 
Продвижение лендинга с помощью контента
Продвижение лендинга с помощью контентаПродвижение лендинга с помощью контента
Продвижение лендинга с помощью контентаNadya Pominova
 
Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarizationConstantin Orasan
 
深層学習による機械とのコミュニケーション
深層学習による機械とのコミュニケーション深層学習による機械とのコミュニケーション
深層学習による機械とのコミュニケーションYuya Unno
 
態度
態度態度
態度nonnon
 
Spring 3 - An Introduction
Spring 3 - An IntroductionSpring 3 - An Introduction
Spring 3 - An IntroductionThorsten Kamann
 
Open Source Bridge Opening Day
Open Source Bridge Opening DayOpen Source Bridge Opening Day
Open Source Bridge Opening DaySelena Deckelmann
 
朱家故事chu's family
朱家故事chu's family朱家故事chu's family
朱家故事chu's familynonnon
 
Empowerment Movie Ppt Version Sample
Empowerment Movie Ppt Version SampleEmpowerment Movie Ppt Version Sample
Empowerment Movie Ppt Version SampleAndrew Schwartz
 
O que aconteceu com os mundos virtuais no ensino?
O que aconteceu com os mundos virtuais no ensino?O que aconteceu com os mundos virtuais no ensino?
O que aconteceu com os mundos virtuais no ensino?Neli Maria Mengalli
 
Madrid Alfresco Day 2015 - John Pomeroy - Why Alfresco in today’s Digital Ent...
Madrid Alfresco Day 2015 - John Pomeroy - Why Alfresco in today’s Digital Ent...Madrid Alfresco Day 2015 - John Pomeroy - Why Alfresco in today’s Digital Ent...
Madrid Alfresco Day 2015 - John Pomeroy - Why Alfresco in today’s Digital Ent...John Newton
 
Martin karlssons vykortssamling malmen
Martin karlssons vykortssamling   malmenMartin karlssons vykortssamling   malmen
Martin karlssons vykortssamling malmenhembygdsigtuna
 
Cars
CarsCars
Carsshore
 
Dreams Movie Ppt Version Sample
Dreams Movie Ppt Version SampleDreams Movie Ppt Version Sample
Dreams Movie Ppt Version SampleAndrew Schwartz
 
AINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, SelegeyAINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, SelegeyLidia Pivovarova
 
Spring 3 - Der dritte Frühling
Spring 3 - Der dritte FrühlingSpring 3 - Der dritte Frühling
Spring 3 - Der dritte FrühlingThorsten Kamann
 

Andere mochten auch (19)

Introduction to Automatic Summarization
Introduction to Automatic SummarizationIntroduction to Automatic Summarization
Introduction to Automatic Summarization
 
Automatic Summarization (2014)
Automatic Summarization (2014)Automatic Summarization (2014)
Automatic Summarization (2014)
 
Продвижение лендинга с помощью контента
Продвижение лендинга с помощью контентаПродвижение лендинга с помощью контента
Продвижение лендинга с помощью контента
 
Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarization
 
深層学習による機械とのコミュニケーション
深層学習による機械とのコミュニケーション深層学習による機械とのコミュニケーション
深層学習による機械とのコミュニケーション
 
態度
態度態度
態度
 
Spring 3 - An Introduction
Spring 3 - An IntroductionSpring 3 - An Introduction
Spring 3 - An Introduction
 
Open Source Bridge Opening Day
Open Source Bridge Opening DayOpen Source Bridge Opening Day
Open Source Bridge Opening Day
 
朱家故事chu's family
朱家故事chu's family朱家故事chu's family
朱家故事chu's family
 
Empowerment Movie Ppt Version Sample
Empowerment Movie Ppt Version SampleEmpowerment Movie Ppt Version Sample
Empowerment Movie Ppt Version Sample
 
Filesystem
FilesystemFilesystem
Filesystem
 
O que aconteceu com os mundos virtuais no ensino?
O que aconteceu com os mundos virtuais no ensino?O que aconteceu com os mundos virtuais no ensino?
O que aconteceu com os mundos virtuais no ensino?
 
Madrid Alfresco Day 2015 - John Pomeroy - Why Alfresco in today’s Digital Ent...
Madrid Alfresco Day 2015 - John Pomeroy - Why Alfresco in today’s Digital Ent...Madrid Alfresco Day 2015 - John Pomeroy - Why Alfresco in today’s Digital Ent...
Madrid Alfresco Day 2015 - John Pomeroy - Why Alfresco in today’s Digital Ent...
 
Martin karlssons vykortssamling malmen
Martin karlssons vykortssamling   malmenMartin karlssons vykortssamling   malmen
Martin karlssons vykortssamling malmen
 
Mathematics Of Life
Mathematics Of LifeMathematics Of Life
Mathematics Of Life
 
Cars
CarsCars
Cars
 
Dreams Movie Ppt Version Sample
Dreams Movie Ppt Version SampleDreams Movie Ppt Version Sample
Dreams Movie Ppt Version Sample
 
AINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, SelegeyAINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, Selegey
 
Spring 3 - Der dritte Frühling
Spring 3 - Der dritte FrühlingSpring 3 - Der dritte Frühling
Spring 3 - Der dritte Frühling
 

Mehr von Lidia Pivovarova

Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...Lidia Pivovarova
 
Convolutional neural networks for text classification
Convolutional neural networks for text classificationConvolutional neural networks for text classification
Convolutional neural networks for text classificationLidia Pivovarova
 
Grouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entitiesGrouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entitiesLidia Pivovarova
 
Интеллектуальный анализ текста
Интеллектуальный анализ текстаИнтеллектуальный анализ текста
Интеллектуальный анализ текстаLidia Pivovarova
 
AINL 2016: Bodrunova, Blekanov, Maksimov
AINL 2016: Bodrunova, Blekanov, MaksimovAINL 2016: Bodrunova, Blekanov, Maksimov
AINL 2016: Bodrunova, Blekanov, MaksimovLidia Pivovarova
 
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...Lidia Pivovarova
 

Mehr von Lidia Pivovarova (20)

Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...
 
Convolutional neural networks for text classification
Convolutional neural networks for text classificationConvolutional neural networks for text classification
Convolutional neural networks for text classification
 
Grouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entitiesGrouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entities
 
Интеллектуальный анализ текста
Интеллектуальный анализ текстаИнтеллектуальный анализ текста
Интеллектуальный анализ текста
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
 
AINL 2016: Kuznetsova
AINL 2016: KuznetsovaAINL 2016: Kuznetsova
AINL 2016: Kuznetsova
 
AINL 2016: Bodrunova, Blekanov, Maksimov
AINL 2016: Bodrunova, Blekanov, MaksimovAINL 2016: Bodrunova, Blekanov, Maksimov
AINL 2016: Bodrunova, Blekanov, Maksimov
 
AINL 2016: Boldyreva
AINL 2016: BoldyrevaAINL 2016: Boldyreva
AINL 2016: Boldyreva
 
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
 
AINL 2016: Kozerenko
AINL 2016: Kozerenko AINL 2016: Kozerenko
AINL 2016: Kozerenko
 
AINL 2016: Khudobakhshov
AINL 2016: KhudobakhshovAINL 2016: Khudobakhshov
AINL 2016: Khudobakhshov
 
AINL 2016: Proncheva
AINL 2016: PronchevaAINL 2016: Proncheva
AINL 2016: Proncheva
 
AINL 2016:
AINL 2016: AINL 2016:
AINL 2016:
 
AINL 2016: Bugaychenko
AINL 2016: BugaychenkoAINL 2016: Bugaychenko
AINL 2016: Bugaychenko
 
AINL 2016: Grigorieva
AINL 2016: GrigorievaAINL 2016: Grigorieva
AINL 2016: Grigorieva
 
AINL 2016: Muravyov
AINL 2016: MuravyovAINL 2016: Muravyov
AINL 2016: Muravyov
 
AINL 2016: Just AI
AINL 2016: Just AIAINL 2016: Just AI
AINL 2016: Just AI
 
AINL 2016: Moskvichev
AINL 2016: MoskvichevAINL 2016: Moskvichev
AINL 2016: Moskvichev
 
AINL 2016: Goncharov
AINL 2016: GoncharovAINL 2016: Goncharov
AINL 2016: Goncharov
 
AINL 2016: Malykh
AINL 2016: MalykhAINL 2016: Malykh
AINL 2016: Malykh
 

Kürzlich hochgeladen

Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Juan Carlos Gonzalez
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementNuwan Dias
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 

Kürzlich hochgeladen (20)

201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API Management
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 

Katja Filippova

  • 1. Automatic Text Summarization Katja Filippova filippova@eml-research.de EML Research gGmbH TU Darmstadt Text Summarization – 25.02.2009 – p. 1
  • 2. Text summarization • A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) (Hovy, 2003) • information retrieval • stock market prediction • generation of abstracts • online news summarization • ... Text Summarization – 25.02.2009 – p. 2
  • 3. Overview • Introduction • classification of summarization systems • abstraction vs. extraction • Text cohesion and coherence for summarization • graph based methods • discourse structure based methods • Document Understanding Conference • tasks • an example • Research directions • sentence fusion and compression • integrating world knowledge Text Summarization – 25.02.2009 – p. 3
  • 4. Text summarization: types • A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) (Hovy, 2003) • Indicative « indicates types of information « “alerts” Text Summarization – 25.02.2009 – p. 4
  • 5. Text summarization: types • A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) (Hovy, 2003) • Indicative « indicates types of information « “alerts” • Informative « includes quantitative/qualitative information « “informs” Text Summarization – 25.02.2009 – p. 4
  • 6. Text summarization: types • A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) (Hovy, 2003) • Indicative « indicates types of information « “alerts” • Informative « includes quantitative/qualitative information « “informs” • Critic/evaluative « evaluates the content of the document Text Summarization – 25.02.2009 – p. 4
  • 7. Text summarization: types INDICATIVE • The work of Consumer Advice Centres is examined. The information sources used to support this work are reviewed. The recent closure of many CACs has seriously affected the availability of consumer information and advice. The contribution that public libraries can make in enhancing the availability of consumer information and advice both to the public and other agencies involved in consumer information and advice, is discussed. Text Summarization – 25.02.2009 – p. 5
  • 8. Text summarization: types INFORMATIVE • An examination of the work of Consumer Advice Centres and of the information sources and support activities that public libraries can offer. CACs have dealt with pre-shopping advice, education on consumers’ rights and complaints about goods and services, advising the client and often obtaining expert assessment. They have drawn on a wide range of information sources including case records, trade literature, contact files and external links. The recent closure of many CACs has seriously affected the availability of consumer information and advice. Libraries can cooperate closely with advice agencies through local coordinating committed, shared premises, join publicity referral and the sharing of professional expertise. Text Summarization – 25.02.2009 – p. 5
  • 9. Text summarization: types • Source: single-document vs. multi-document « research paper « proceedings of a conference Text Summarization – 25.02.2009 – p. 6
  • 10. Text summarization: types • Source: single-document vs. multi-document « research paper « proceedings of a conference • Content: generic vs. query-based vs. user-focused « equal coverage of all major topics « based on a question “what are the causes of the war?” « users interested in chemistry Text Summarization – 25.02.2009 – p. 6
  • 11. Text summarization: types • Source: single-document vs. multi-document « research paper « proceedings of a conference • Content: generic vs. query-based vs. user-focused « equal coverage of all major topics « based on a question “what are the causes of the war?” « users interested in chemistry • Form: extract vs. abstract « fragments from the document « newly re-written text Text Summarization – 25.02.2009 – p. 6
  • 12. Extraction vs. abstraction How should a text summarization system proceed? • read the documents • understand them – build a semantic representation • generate a summary from this representation Text Summarization – 25.02.2009 – p. 7
  • 13. Extraction vs. abstraction • unfortunately, a rich semantic representation is not possible yet • to date, most summarization systems are extractive • usually, extraction units are sentences • low cost solution: could work without ontologies, complex representations, etc. • extractive summaries are usually incoherent • trade-off between non-redundancy and completeness Text Summarization – 25.02.2009 – p. 8
  • 14. Extraction vs. abstraction Three sentences from related documents (Oct. 27 2009): • The Syrian foreign minister today condemned the killing of eight civilians in a US raid as an act of quot;criminal and terrorist aggressionquot;. (The Guardian) • Syria accused the United States on Monday of carrying out a quot;terrorist aggressionquot; after a deadly raid near its border with Iraq which it said killed eight civilians. (Reuters) • Lebanese President Michel Suleiman on Monday contacted his Syrian counterpart Bashar Assad to denounce quot;Sunday’s American aggressionquot; against the Syrian village of Abu Kamal near the border with Iraq, local Elnashra website reported. (Aljazeera) Text Summarization – 25.02.2009 – p. 9
  • 15. Extraction vs. abstraction Three sentences from related documents (Oct. 27 2009): • The Syrian foreign minister today condemned the killing of eight civilians in a US raid as an act of quot;criminal and terrorist aggressionquot;. (The Guardian) • Syria accused the United States on Monday of carrying out a quot;terrorist aggressionquot; after a deadly raid near its border with Iraq which it said killed eight civilians. (Reuters) • Lebanese President Michel Suleiman on Monday contacted his Syrian counterpart Bashar Assad to denounce quot;Sunday’s American aggressionquot; against the Syrian village of Abu Kamal near the border with Iraq, local Elnashra website reported. (Aljazeera) Text Summarization – 25.02.2009 – p. 9
  • 16. Extraction vs. abstraction Three sentences from related documents (Oct. 27 2009): • The Syrian foreign minister today condemned the killing of eight civilians in a US raid as an act of quot;criminal and terrorist aggressionquot;. (The Guardian) • Syria accused the United States on Monday of carrying out a quot;terrorist aggressionquot; after a deadly raid near its border with Iraq which it said killed eight civilians. (Reuters) • Lebanese President Michel Suleiman on Monday contacted his Syrian counterpart Bashar Assad to denounce quot;Sunday’s American aggressionquot; against the Syrian village of Abu Kamal near the border with Iraq, local Elnashra website reported. (Aljazeera) Text Summarization – 25.02.2009 – p. 9
  • 17. Extraction vs. abstraction Three sentences from related documents (Oct. 27 2009): • The Syrian foreign minister today condemned the killing of eight civilians in a US raid as an act of quot;criminal and terrorist aggressionquot;. (The Guardian) • Syria accused the United States on Monday of carrying out a quot;terrorist aggressionquot; after a deadly raid near its border with Iraq which it said killed eight civilians. (Reuters) • Lebanese President Michel Suleiman on Monday contacted his Syrian counterpart Bashar Assad to denounce quot;Sunday’s American aggressionquot; against the Syrian village of Abu Kamal near the border with Iraq, local Elnashra website reported. (Aljazeera) Text Summarization – 25.02.2009 – p. 9
  • 18. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together Text Summarization – 25.02.2009 – p. 10
  • 19. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together • unresolved pronouns may distort the meaning Text Summarization – 25.02.2009 – p. 10
  • 20. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together • unresolved pronouns may distort the meaning • beginning with a sentence which starts with However, ... is not a good idea Text Summarization – 25.02.2009 – p. 10
  • 21. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together • unresolved pronouns may distort the meaning • beginning with a sentence which starts with However, ... is not a good idea • there is a striking difference with human generated texts – pronouns and connectives are in the right place, the flow of discourse makes sense Text Summarization – 25.02.2009 – p. 10
  • 22. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together • unresolved pronouns may distort the meaning • beginning with a sentence which starts with However, ... is not a good idea • there is a striking difference with human generated texts – pronouns and connectives are in the right place, the flow of discourse makes sense • How could one use this property of natural discourse for summarization? Text Summarization – 25.02.2009 – p. 10
  • 23. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. Text Summarization – 25.02.2009 – p. 11
  • 24. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. Text Summarization – 25.02.2009 – p. 11
  • 25. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. • John enjoys playing the piano. However, he woke up early yesterday. But the day before yesterday the weather was wonderful, because rain and snow started immediately and continued the whole day through. By the way, his teacher did the same. Text Summarization – 25.02.2009 – p. 11
  • 26. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. • John enjoys playing the piano. However, he woke up early yesterday. But the day before yesterday the weather was wonderful, because rain and snow started immediately and continued the whole day through. By the way, his teacher did the same. Text Summarization – 25.02.2009 – p. 11
  • 27. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. • John enjoys playing the piano. However, he woke up early yesterday. But the day before yesterday the weather was wonderful, because rain and snow started immediately and continued the whole day through. By the way, his teacher did the same. • John enjoys playing the piano and wants to become famous. He works hard and does it every day because it is necessary for his goal. Text Summarization – 25.02.2009 – p. 11
  • 28. Text coherence vs. text cohesion • Text coherence represents the overall structure of a multi-sentence text in terms of macro-level relations between clauses or sentences (Halliday & Hasan, 1996). « Rhetorical Structure Theory (Mann & Thompson, 1988) « Discourse Representation Theory (Kamp, 1981) « Discourse Lexicalized Tree Adjoining Grammar (Forbes, 2001) • John enjoys playing the piano. [John wants to become a famous piano player.] (that’s why) [John works hard and works hard every day.] Working hard is necessary to become a famous piano player. Text Summarization – 25.02.2009 – p. 12
  • 29. Text coherence vs. text cohesion • Text cohesion involves relations between words, word senses, or referring expressions, which determine how tightly connected the text is (Halliday & Hasan, 1996). « anaphora, ellipsis, connectives « synonymy and other lexical relations • John enjoys playing the piano. However, he woke up early yesterday. But the day before yesterday the weather was wonderful, because rain and snow started immediately and continued the whole day through. By the way, his teacher did the same. Text Summarization – 25.02.2009 – p. 12
  • 30. Coherence based summarization • earlier systems considered technical documents and aimed at identifying important information by assigning weights to sentences (Luhn, 1958; Edmundson, 1969) • several weighted features were used: « word (stem) frequency « presence of cue words (e.g., as a result, significant) which signalize important content « sentence position « document structure • feature weights were tuned manually Text Summarization – 25.02.2009 – p. 13
  • 31. Coherence based summarization • Rhetorical Structure Theory (Mann & Thompson, 1987) • elaboration • example • contrast • background • motivation • etc. Circumstance Attribution quot;I am optimisticquot; said Mr. Smith as the market plunged. (from Sporleder & Lapata, 2005) Text Summarization – 25.02.2009 – p. 14
  • 32. Coherence based summarization • one could use discourse structure for summarization (Marcu, 2000) • however, this is not done often: • there are few discourse parsers and they are not very precise • there are arguments whether tree representation is sufficient for discourse (Wolf & Gibson, 2005) • it is not obvious to classify rhetorical relations • some relations are argued to be anaphoric and not discourse (Webber et al., 2003) Text Summarization – 25.02.2009 – p. 15
  • 33. Cohesion based summarization • it is common to represent a text as a graph, where nodes are sentences and edges are some relations between them (e.g., discourse relations or just similarity) • a common graph connectivity assumption is that the nodes which are connected to many other nodes are likely to carry salient information • it is also assumed that nodes whose removal affects the structure of the document are important (Skorochodko, 1972 from Mani, 2001) Text Summarization – 25.02.2009 – p. 16
  • 34. Cohesion based summarization • it is common to represent a text as a graph, where nodes are sentences and edges are some relations between them (e.g., discourse relations or just similarity) • a common graph connectivity assumption is that the nodes which are connected to many other nodes are likely to carry salient information • it is also assumed that nodes whose removal affects the structure of the document are important (Skorochodko, 1972 from Mani, 2001) Text Summarization – 25.02.2009 – p. 16
  • 35. Cohesion based summarization • modern approaches extend this idea and use PageRank (Page & Brin, 1998) to find salient nodes (Erkan & Radev, 2004; Mihalcea & Tarau, 2004) in such a graph • similar sentences are connected (bag-of-words similarity) Text Summarization – 25.02.2009 – p. 17
  • 36. Cohesion based summarization • modern approaches extend this idea and use PageRank (Page & Brin, 1998) to find salient nodes (Erkan & Radev, 2004; Mihalcea & Tarau, 2004) in such a graph • similar sentences are connected (bag-of-words similarity) • a similarity threshold is used Text Summarization – 25.02.2009 – p. 17
  • 37. Cohesion based summarization • modern approaches extend this idea and use PageRank (Page & Brin, 1998) to find salient nodes (Erkan & Radev, 2004; Mihalcea & Tarau, 2004) in such a graph • similar sentences are connected (bag-of-words similarity) • a similarity threshold is used • the top N of page-ranked sentences are extracted Text Summarization – 25.02.2009 – p. 17
  • 38. Coherence vs. cohesion based TS • Coherence: + transparent; coherence of the output can be improved – annotation of relations is still a challenge; preprocessing difficulties • Cohesion: + intuitively appealing; low-cost; even unsupervized – requires WSD*, anaphora resolution; hard to pin down; tuned thresholds * word sense disambiguation Text Summarization – 25.02.2009 – p. 18
  • 39. DUC competitions • Document Understanding Conferences (2000-2007) • from 2008 Text Analysis Conference (TAC) • provide participants with - a task - data - manual and automatic evaluation • increasing challenge in tasks: from generic single-document summarization to multi-document update summary (2008) Text Summarization – 25.02.2009 – p. 19
  • 40. DUC competitions Sample topic: D0740I round-the-world balloon flight Report on the planning, attempts and first successful balloon circumnavigation of the earth by Bertrand Piccard and his crew. Text Summarization – 25.02.2009 – p. 20
  • 41. DUC competitions <DOC> <DOCNO> APW19981112.0453 </DOCNO> <DOCTYPE> NEWS STORY </DOCTYPE> <DATE_TIME> 11/12/1998 08:21:00 </DATE_TIME> <HEADER> w1942 &Cx1f; wstm- r i &Cx13; &Cx11; BC-Switzerland-BalloonQu 11-12 0355 </HEADER> <BODY> <SLUG> BC-Switzerland-Balloon Quest </SLUG> <HEADLINE> Swiss challenger prepares third attempt at global record </HEADLINE> &UR; AP Photos GEV 101-102 &QL; <TEXT> GENEVA (AP) _ Swiss balloon pilot Bertrand Piccard and his new teammate, British flight engineer Tony Brown, said Thursday they will be ready later this month for a new attempt to fly nonstop round the world. Their new Breitling Orbiter 3 balloon will take off from Chateau d’Oex, in the Swiss Alps, as soon after Nov. 25 as weather conditions are favorable, they said. It will be Piccard’s third attempt to become the first to pilot a balloon around the world. In February the Swiss pilot, along with British flight engineer AndyText Summarization – 25.02.2009 – p. 20 Elson and
  • 42. The EML NLP group at DUC 2007 Text Summarization – 25.02.2009 – p. 21
  • 43. Preprocessing: Annotation • Sentence splitting • Tokenization • PoS tagging • Chunking • Named Entities recognition Text Summarization – 25.02.2009 – p. 22
  • 44. Preprocessing: Problems • Sentence splitting <sentence>At Pine Ridge, a scrolling marquee at Big Bat’s Texaco expressed both joy over Clinton’s visit and wariness of all the official attention: “Welcome President Clinton.</sentence> <sentence>Remember our treaties,” the sign read. Text Summarization – 25.02.2009 – p. 23
  • 45. Preprocessing: Problems • Sentence splitting <sentence>At Pine Ridge, a scrolling marquee at Big Bat’s Texaco expressed both joy over Clinton’s visit and wariness of all the official attention: “Welcome President Clinton.</sentence> <sentence>Remember our treaties,” the sign read. • and cleaning <sentence>PINE RIDGE, S.D.</sentence> <sentence>(AP) - President Clinton turned the attention of his national poverty tour today to arguably the poorest, most forgotten U.S. citizens of them all: American Indians.</sentence> Text Summarization – 25.02.2009 – p. 23
  • 46. Preprocessing: Document filtering • Match topic with document extracts • Pick the top 5 matching documents Text Summarization – 25.02.2009 – p. 24
  • 47. Semantic analysis • Filter topic • Connect topic words with words in document sentences • Compute sentence scores matching words matching word sequences « ranked list of sentences Text Summarization – 25.02.2009 – p. 25
  • 48. Extractive summary generation • Rerank sentences • Select the top non-redundant sentences (250 word limit) • Re-arrange sentences Text Summarization – 25.02.2009 – p. 26
  • 49. A good summary Round-the-world balloon flight: Report on the planning, attempts and first successful balloon circumnavigation of the earth by Bertrand Piccard and his crew. Swiss balloon pilot Bertrand Piccard announced Wednesday that he has chosen Brian Jones as his teammate for his next attempt at circling the world in a balloon. Jones, 52, replaces fellow British flight engineer Tony Brown. Achieving what promoters called the last great milestone of aviation, Bertrand Piccard and Brian Jones joined legends like the Wright Brothers and Charles Lindbergh with Saturday’s completion of the first manned round-the-world balloon flight. At 4:54 a.m. EST Saturday, the two balloonists crossed the line of longitude from which they had departed on March 1 at Chateau D’Oex, Switzerland, ... Text Summarization – 25.02.2009 – p. 27
  • 50. A bad summary Angelina Jolie: What have been the most recent significant events in the life and career of actress Angelina Jolie? Angelina Jolie’s win for best supporting actress for her role in “Girl, Interrupted” came 21 years after father Jon Voight was awarded best actor for “Coming Home.“ ANGELINA JOLIE’S LIFE ON THE EDGE After all, her career is in overdrive. But Jolie cautions that she’s still a serious actress. It’s not like I’m suddenly a better actress because I have awards or this box office clout,” she says. “I am secure in the fact that I do have something to offer as an actress,”Jolie says. ‘... Text Summarization – 25.02.2009 – p. 28
  • 51. Evaluation • automatic evaluation with ROUGE (Lin, 2004) • manual evaluation with respect to « responsiveness « linguistic quality 1. grammaticality 2. non-redundancy 3. referential clarity 4. focus 5. structure and coherence • our system scored above the average, top 5 for non-redundancy and coherence (recall the document filtering stage) Text Summarization – 25.02.2009 – p. 29
  • 52. Research directions • like in information retrieval, query expansion is expected to improve recall « WordNet (Fellbaum, 1998) for similarity « Wikipedia for relatedness (Strube & Ponzetto, 2006) « paraphrases Text Summarization – 25.02.2009 – p. 30
  • 53. Research directions • like in information retrieval, query expansion is expected to improve recall « WordNet (Fellbaum, 1998) for similarity « Wikipedia for relatedness (Strube & Ponzetto, 2006) « paraphrases • coreference resolution is needed for preprocessing, otherwise, e.g., pronouns are filtered as stopwords Text Summarization – 25.02.2009 – p. 30
  • 54. Research directions • like in information retrieval, query expansion is expected to improve recall « WordNet (Fellbaum, 1998) for similarity « Wikipedia for relatedness (Strube & Ponzetto, 2006) « paraphrases • coreference resolution is needed for preprocessing, otherwise, e.g., pronouns are filtered as stopwords • relevance vs. redundancy issue: in MDS, how can we ensure non-redundancy of the summary? (Carbonell & Goldstein, 1998) Text Summarization – 25.02.2009 – p. 30
  • 55. Research directions • like in information retrieval, query expansion is expected to improve recall « WordNet (Fellbaum, 1998) for similarity « Wikipedia for relatedness (Strube & Ponzetto, 2006) « paraphrases • coreference resolution is needed for preprocessing, otherwise, e.g., pronouns are filtered as stopwords • relevance vs. redundancy issue: in MDS, how can we ensure non-redundancy of the summary? (Carbonell & Goldstein, 1998) • sentence ordering for extractive MDS (Barzilay & Lapata, 2005) Text Summarization – 25.02.2009 – p. 30
  • 56. Directions of research • abstractive summarization is a distant goal but there are ways to go beyond sentence extraction « sentence compression « sentence fusion Text Summarization – 25.02.2009 – p. 31
  • 57. Sentence compression This is true, regardless of the opinion that some people have of Syria, and of their unhappiness at Syria’s presence in Lebanon. Text Summarization – 25.02.2009 – p. 32
  • 58. Sentence compression This is true, regardless of the opinion that some people have of Syria, and of their unhappiness at Syria’s presence in Lebanon. Text Summarization – 25.02.2009 – p. 32
  • 59. Sentence compression This is true, regardless of the opinion that some people have of Syria, and of their unhappiness at Syria’s presence in Lebanon. • summarization on the sentence level • in principle, a compression can be different from the input (different wording and structure) • to date, most systems use word deletion only • meanwhile there is a compression corpus available online http://homepages.inf.ed.ac.uk/s0460084/data • the performance can be evaluated automatically Text Summarization – 25.02.2009 – p. 32
  • 60. Sentence fusion 1 John Smith, born November 15 1900, studied chemistry and physics at the University of London. 2 From 1917 Mr. Smith studied at the University of London and in 1921 he graduated with distinction. Text Summarization – 25.02.2009 – p. 33
  • 61. Sentence fusion 1 John Smith, born November 15 1900, studied chemistry and physics at the University of London. 2 From 1917 Mr. Smith studied at the University of London and in 1921 he graduated with distinction. « Mr. Smith studied chemistry and physics at the University of London from 1917. • pieces of related sentences are used to generate a novel sentence • can be seen as a middle ground between extractive and abstractive summarization • addresses the incompleteness-redundancy problem Text Summarization – 25.02.2009 – p. 33
  • 62. Thank you! (FOR YOUR ATTENTION) Text Summarization – 25.02.2009 – p. 34
  • 63. References • R. Barzilay & M. Lapata, 2005: Modeling local coherence: An entity-based approach • S. Brin & L. Page, 1998: The anatomy of a large-scale hypertextual web search engine • J. G. Carbonell & J. Goldstein, 1998: The use of MMR, diversity-based reranking for reordering documents and producing summaries • H. P. Edmundson, 1969: New methods in automatic extracting • G. Erkan & D. Radev, 2004: LexRank: Graph-based lexical centrality as salience in text summarization • C. Fellbaum, 1998: WordNet: An electronic lexical database Text Summarization – 25.02.2009 – p. 35
  • 64. References • K. Forbes, E. Miltsakaki, R. Prasad, A. Sarkar, A. Joshi, B. L. Webber, 2001: DLTAG system – discourse parsing with a Lexicalized Tree Adjoining Grammar • M. Halliday & R. Hasan, 1996: Cohesion in text • E. H. Hovy, 2003: Text summarization • H. Kamp, 1981: A theory of truth and semantic representation • C.-Y. Lin, 2004: Automatic evaluation of summaries using N-gram co-occurrence statistics • H. P. Luhn, 1958: The automatic creation of literature abstracts • I. Mani, 2001: Automatic summarization Text Summarization – 25.02.2009 – p. 36
  • 65. References • W. C. Mann & S. A. Thompson, 1988: Rhetorical structure theory. Towards a functional theory of text organization • D. Marcu, 2000: The theory and practice of discourse parsing and summarization • R. Mihalcea & P. Tarau, 2004: TextRank: Bringing order into text • E. Skorochodko, 1972: Adaptive method of automatic abstracting and indexing • C. Sporleder & M. Lapata, 2005: Discourse chunking and its application to sentence compression • M. Strube & S. P. Ponzetto, 2006: WikiRelate! Computing semantic relatedness using Wikipedia Text Summarization – 25.02.2009 – p. 37
  • 66. References • B. L. Webber, M. Stone, A. Joshi, A. Knott, 2003: Anaphora and discourse structure • F. Wolf & E. Gibson, 2005: Representing discourse coherence: A corpus-based study Text Summarization – 25.02.2009 – p. 38