SlideShare ist ein Scribd-Unternehmen logo
1 von 52
The Rhetorical Parsing
of Unrestricted Texts:
A Surface-based
Approach
Daniel Marcu (USC)
ACL 2000
Abstract
This paper explores the extent to which well-formed
rhetorical structures can be automatically derived by
means of surface-form-based algorithms.
These algorithms:
   identify discourse usages of cue phrases
   break sentences into clauses
   hypothesize rhetorical relations that hold among textual
   units
   produce valid rhetorical structure trees for unrestricted
   natural language texts.

The algorithms are empirically grounded in a corpus
analysis of cue phrases and rely on a first-order
formalization of rhetorical structure trees.
Abstract
The algorithms are evaluated both intrinsically
and extrinsically.

The intrinsic evaluation assesses the
resemblance between automatically and
manually constructed rhetorical structure trees.

The extrinsic evaluation shows that automatically
derived rhetorical structures can be successfully
exploited in the context of text summarization.
Motivation
Motivation
Marcu explores the ground found at the
intersection of:
   theories from a traditional, truth-based semantic
   perspective (weak)
   theories aimed at characterizing the constraints
   that pertain to the structure of unrestricted texts
   and the computational mechanisms that would
   enable the derivation of these structures (which
   are weak).

Goal: Automatically build rhetorical constructions
by relying only on cohesion and connectives (i.e.,
phrases such as for example, and, although, and
however.)
Outline
2. Foundation

3. Corpus Analysis of Cue Phrases

4. Rhetorical Parsing Algorithm

5. Evaluation

6. Related Work

7. Conclusion (Recap)
Foundation
The hypothesis that underlies this work is that
connectives, cohesion, shallow processing, and
a well-constrained mathematical model of valid
rhetorical structure trees (RS-trees) can be used
to implement algorithms that determine
   the elementary units of a text, i.e., the units that
   constitute the leaves of the RS-tree of that text;
   the rhetorical relations that hold between
   elementary units and between spans of text;
   the relative importance (nucleus or satellite) and
   the size of the spans subsumed by these
   rhetorical relations.
Foundation
2.1 Determining the Elementary Units Using
Connectives and Shallow Processing
   Pros: punctuation, connectives
   Cons: Difficult (consider and)

Using these, elementary unit boundaries can be
determined with approximately 80% accuracy.
Foundation
2.2 Determining Rhetorical Relations Using
Connectives
   Pros: psychologically useful, common/regular
   Cons:
   1.   Sentence vs. discourse function unclear;
   2.   They do not indicate scope of relation;
   3.   Can signal different rhetorical relations.
Foundation
2.2.2 Discourse markers ambiguous as to
scope:
     Rhetorical relations that hold between large
     textual spans can be explained in terms of similar
     relations that hold between their most important
     elementary units. (Marcu, elsewhere)
     Compositionality criterion for valid rhetorical
     structures: posits that a rhetorical structure tree is
     valid only if each rhetorical relation that holds
     between two spans is either an extended
     rhetorical relation or can be explained in terms of
     a simple rhetorical relation.
Foundation
Discussion:
  The more complex the text, the harder to
  automatically identify.
  “I have never come across a case in which a
  simple connective signaled more than one
  rhetorical relation.”
  “I have never come across an example that would
  require one to deal with exclusively disjunctive
  hypotheses [that do not overlap in their scope].”
Foundation
2.3 Determining Rhetorical Relations Using
Cohesion
   Pros: co-occurrence can determine thematic
   continuity, correlation between cohesion-defined
   textual segments and hierarchical, intentionally
   defined segments, cohesion works for smaller
   relations too.
   Cons: Marcu uses a coarse model of the relation
   between cohesion and rhetorical relations (to
   make things simpler).
Foundation
2.4 Determining Rhetorical Structure Using a
Well-Constrained Mathematical Model
   Uses first order logic, with these features and
   constraints:
      A valid rhetorical structure is a binary tree whose
      leaves denote elementary textual units.
      Rhetorical relations hold between textual units and
      spans of various sizes. These relations are paratactic
      or hypotactic.
      Each node of a rhetorical structure tree has associated
      a status (NUCLEUS or SATELLITE),a type (the
      rhetorical relation that holds between the text spans
      that the node spans over), and a set of promotion
      units.
Foundation
2.4 Determining Rhetorical Structure Using a
Well-Constrained Mathematical Model
   The status and type associated with each node
   are unique.
   The rhetorical relations of a valid rhetorical
   structure hold only between adjacent spans.
   There exists a span, which corresponds to the
   root node of the structure, that spans over the
   entire text.
   The status, type, and promotion set associated
   with each node reflect the compositionality
   criterion.
Foundation
Foundation
Two problems to solve:
1. The problem of rhetorical grounding: Needs to
   show how starting from free, unrestricted text,
   connectives and cohesion can be used to
   automatically determine the elementary units of
   text and hypothesize simple, extended, and
   exclusively disjunctive rhetorical relations that
   hold between these units and spans of units.
2. The problem of rhetorical structure
   derivation: The rhetorical structures must be
   consistent with the constraints given. I refer to this
   as.”
Foundation
   Not modeled as an incremental process in which
   elementary units are determined and attached to an
   increasingly complex RS-tree.
Process:
   Determine elementary discourse units (edus) first
   That knowledge of connectives and cohesion is then used
   to (over-)hypothesize simple, extended, and exclusively
   disjunctive rhetorical relations;
   These hypotheses and the well-constrained model of valid
   RS- trees are used to determine the set of valid rhetorical
   interpretations that are consistent with both the
   mathematical model and the hypotheses.
Corpus Analysis
Prior to Marcu, no empirical data existed which could
answer the question of the extent to which connectives
could be used to identify elementary units and hypothesize
rhetorical relations.

So… corpus study.

The corpus study was designed to:
   investigate how cue phrases can be used to identify the
   elementary units of texts;
   to determine what rhetorical relations hold between units
   and spans of text, the nuclearity of the units, and the
   sizes of the related spans.

Developed his own annotation scheme.
Corpus Analysis
450 potential markers from previous lists of
potential discourse markers, cue phrases

Brown corpus examples for each: 300 words or
so per example.

Average of 17 text fragments per cue phrase.

Overall, more than 7,600 randomly selected
texts.
Corpus Analysis
Hardcoding?

“By encoding algorithmic specific information in
the corpus, I only bootstrap the step that can
take one from annotated data to algorithmic
information.”

Doesn’t preclude other methods.
Corpus Analysis
Manually analyzed 2,100 of the text fragments in
the corpus.

Annotated only 2,100 fragments (time). Of these:
   1,197 had a discourse function;
   773 were sentential;
   244 were pragmatic.
Corpus Analysis
“I did not use an objective definition of
elementary unit. Rather, I relied on a more
intuitive one.”

This corpus only used for development; testing
was done on independent corpora with non-
biased judges.
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
         Algorithm
Orthographic markers, such as commas, periods,
dashes, paragraph breaks, etc., play an
important role in our surface-based approach to
discourse processing, so are included.

“By considering only cue phrases having a
discourse function in most of the cases, I
deliberately chose to focus more on precision
than on recall with respect to the task of
identifying the elementary units of text.”
Rhetorical Parsing
         Algorithm
Used Lex (Unix) to identify markers in text.

Shallow analyzer then used to perform 11
different actions, associated with the discourse
markers, as a foundation for identifying edus.
   NOTHING, NORMAL, COMMA, NORMAL_THEN_COMMA, END,
   MATCH_PARENS, COMMA_PAREN, MATCH_DASH,
   SET_AND/SET_OR, DUAL
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
            Algorithm
Clause-like Unit Identification

   On the basis of the information derived from the
   corpus, Marcu designed an algorithm that
   identifies elementary textual unit boundaries in
   sentences and cue phrases that have a
   discourse function.
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
         Algorithm
Used: an expository text of 5,036 words from
Scientific American; a magazine article of 1,588
words from Time; a narration of 583 words from
the Brown corpus (segment P25:1250-1710).

Manually broken into elementary units by three
slaves (coli masters students)
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
         Algorithm
4.4.1 From Discourse Markers to Rhetorical
Relations

For each regex for discourse markers, six
features for each possible discourse role were
annotated:
   Status (SAT_NUCLEUS, SATELLITE_NUC, NULL)
   Where to link (BEFORE, AFTER)
   Types (CLAUSE, SENTENCE, PARAGRAPH)
   Rhetorical relation (CONCESSION, ELABORATION
   &c.)
   Clause distance / Sentence distance
   Distance to salient unit
Rhetorical Parsing
        Algorithm
4.4.2 A Discourse-marker-based Algorithm for
Hypothesizing Rhetorical Relations.
     Iterate over all textual units (sentence, clause,
     paragraph)
     For each discourse marker, the algorithm constructs
     an exclusively disjunctive hypothesis concerning the
     rhetorical relations that the marker under scrutiny
     may signal.
     The algorithm assumes that the rhetorical structure
     at each level can be derived by hypothesizing
     rhetorical relations that hold between the units at
     that level.
     It generates simple and extended relations.
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
            Algorithm
4.4.3 A Word-co-occurrence-based Algorithm for
Hypothesizing Rhetorical Relations

   For paragraph-level analysis, discourse markers
   may not be enough.

   Here, cohesion, using word collocations, can be
   used to identify ELABORATION, BACKGROUND or JOINT
   relations.
Rhetorical Parsing
            Algorithm
4.5 A Proof-Theoretic Account of the Problem of
Rhetorical Structure Derivation

   “Once the elementary units of a text have been
   determined and the rhetorical relations between
   them have been hypothesized at sentence,
   paragraph, and section levels, we need to
   determine the rhetorical structures that are
   consistent with these hypotheses and with the
   constraints specific to valid RS-trees. That is, we
   need to solve the problem of rhetorical structure
   derivation.”
Rhetorical Parsing
            Algorithm
4.5 A Proof-Theoretic Account of the Problem of
Rhetorical Structure Derivation

   “Once the elementary units of a text have been
   determined and the rhetorical relations between
   them have been hypothesized at sentence,
   paragraph, and section levels, we need to
   determine the rhetorical structures that are
   consistent with these hypotheses and with the
   constraints specific to valid RS-trees. That is, we
   need to solve the problem of rhetorical structure
   derivation.”
Rhetorical Parsing
            Algorithm
4.5 A Proof-Theoretic Account of the Problem of
Rhetorical Structure Derivation
   Marcu devised a proof theory to determine all
   valid rhetorical structures of a text.
   The theory consists of a set of axioms and
   rewriting rules that encode all possible ways in
   which one can derive the valid RS- trees of a
   text. More in Marcu (2000).
   Lots of ways to implement.
Rhetorical Parsing
    Algorithm
Rhetorical Parsing
        Algorithm
Ambiguity in Discourse:
  Many different possibilities (as in parsing). Best
  are skewed to the right (style of writing,
  placement of important information)
  Weights are “computed recursively by summing
  up the weights of the left and right branches of a
  rhetorical structure and the difference between
  the depth of the right and left branches of the
  structure. Hence, the more skewed to the right a
  tree is, the greater its weight w is.”
  To disambiguate: keeps only partial structures
  that lead to maximal weighting
Evaluation

Two ways to evaluate:
   Compare the automatically derived trees with
   trees that have been built manually.
   Evaluate the impact that they have on the
   accuracy of other natural language processing
   tasks, such as anaphora resolution, intention
   recognition, or text summarization.
Evaluation
Evaluation
Evaluation

Qualitative:
   Good Discourse Structures at the Paragraph Level
   Good Discourse Structures at the Text Level, for Short
   Texts
   Good Discourse Structures for Sentences, Paragraphs,
   and Texts that Use Unambiguous Discourse Markers
   Good Discourse Structures for Sentences that Use
   Markers Other than And.
   Bad Discourse Structures for Sentences that Use the
   Discourse Marker
   Incorrectly Labeled Intentional Relations.
   Bad Discourse Structures for Very Large Texts
Evaluation

Summarization Algorithm:
   The algorithm uses the rhetorical parser
   described in this paper to determine the discourse
   structure of a text given as input, it uses the
   discourse structure to induce a partial ordering on
   the elementary units in the text, and then,
   depending on the desired compression rate, it
   selects the p most important units in the text.
   Corpora: 5 SA texts mentioned, and 40 articles
   from TREC collection.
Evaluation
Related Work

Corston-Oliver (1998): Syntactic information to
hypothesize relations
Sumita et al. (1992): Deep syntactic processing,
constrained trees, no weighting or disambiguation
Kurohashi and Nagao (1994) : discourse structure
generator that builds discourse trees in an
incremental fashion.
Strube and Hahn (1999): building hierarchies of
referential discourse elements.


None of these were compared due to inapplicability.
Conclusion

“Quantitative and qualitative analyses of the
results show that many relations can be identified
correctly within this framework.”

“The brightest side of the story is that the results
in this paper show that the rhetorical structures
derived by my parser can be used successfully in
the context of text summarization.”

Weitere ähnliche Inhalte

Was ist angesagt?

FCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of OntologiesFCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of Ontologiesalemarrena
 
Report
ReportReport
Reportbutest
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
 
Utilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataUtilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataMerlien Institute
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)IJERA Editor
 
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceMarcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceDaniel Lewis
 
Legal Document
Legal DocumentLegal Document
Legal Documentlegal4
 
Lecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsLecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsMarina Santini
 
Semantic Ordering Relation- Applied to Agatha Christie Crime Thrillers
Semantic Ordering Relation- Applied to Agatha Christie Crime ThrillersSemantic Ordering Relation- Applied to Agatha Christie Crime Thrillers
Semantic Ordering Relation- Applied to Agatha Christie Crime Thrillersijcoa
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
 
General theory: the Problems of Construction
General theory: the Problems of ConstructionGeneral theory: the Problems of Construction
General theory: the Problems of ConstructionSerge Patlavskiy
 

Was ist angesagt? (14)

FCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of OntologiesFCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of Ontologies
 
Report
ReportReport
Report
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
Utilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataUtilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative data
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
 
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceMarcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
 
Study_Report
Study_ReportStudy_Report
Study_Report
 
Legal Document
Legal DocumentLegal Document
Legal Document
 
J79 1063
J79 1063J79 1063
J79 1063
 
Lecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsLecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented Applications
 
Semantic Ordering Relation- Applied to Agatha Christie Crime Thrillers
Semantic Ordering Relation- Applied to Agatha Christie Crime ThrillersSemantic Ordering Relation- Applied to Agatha Christie Crime Thrillers
Semantic Ordering Relation- Applied to Agatha Christie Crime Thrillers
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
General theory: the Problems of Construction
General theory: the Problems of ConstructionGeneral theory: the Problems of Construction
General theory: the Problems of Construction
 

Andere mochten auch

Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationRichard Littauer
 
Deep and surface_structures
Deep and surface_structuresDeep and surface_structures
Deep and surface_structuresAkzharka
 
Deep & surface learning
Deep & surface learningDeep & surface learning
Deep & surface learningRosa Martinez
 
Transformational grammar
Transformational grammarTransformational grammar
Transformational grammarJack Feng
 
Transformations in Transformational Generative Grammar
Transformations in Transformational Generative GrammarTransformations in Transformational Generative Grammar
Transformations in Transformational Generative GrammarBayu Jaka Magistra
 
Summary - Transformational-Generative Theory
Summary - Transformational-Generative TheorySummary - Transformational-Generative Theory
Summary - Transformational-Generative TheoryMarielis VI
 
Transformational generative grammar
Transformational generative grammarTransformational generative grammar
Transformational generative grammarKat OngCan
 
Transformational-Generative Grammar
Transformational-Generative GrammarTransformational-Generative Grammar
Transformational-Generative GrammarRuth Ann Llego
 
Deep structure and surface structure
Deep structure and surface structureDeep structure and surface structure
Deep structure and surface structureAsif Ali Raza
 
grammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguitygrammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguityDedew Deviarini
 
Transformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyTransformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyShiela May Claro
 

Andere mochten auch (12)

Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
Deep and surface_structures
Deep and surface_structuresDeep and surface_structures
Deep and surface_structures
 
Deep & surface learning
Deep & surface learningDeep & surface learning
Deep & surface learning
 
MORPHOSYNTAX: GENERATIVE MORPHOLOGY
MORPHOSYNTAX: GENERATIVE MORPHOLOGYMORPHOSYNTAX: GENERATIVE MORPHOLOGY
MORPHOSYNTAX: GENERATIVE MORPHOLOGY
 
Transformational grammar
Transformational grammarTransformational grammar
Transformational grammar
 
Transformations in Transformational Generative Grammar
Transformations in Transformational Generative GrammarTransformations in Transformational Generative Grammar
Transformations in Transformational Generative Grammar
 
Summary - Transformational-Generative Theory
Summary - Transformational-Generative TheorySummary - Transformational-Generative Theory
Summary - Transformational-Generative Theory
 
Transformational generative grammar
Transformational generative grammarTransformational generative grammar
Transformational generative grammar
 
Transformational-Generative Grammar
Transformational-Generative GrammarTransformational-Generative Grammar
Transformational-Generative Grammar
 
Deep structure and surface structure
Deep structure and surface structureDeep structure and surface structure
Deep structure and surface structure
 
grammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguitygrammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguity
 
Transformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyTransformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam Chomsky
 

Ähnlich wie Marcu 2000 presentation

Application of rhetorical
Application of rhetoricalApplication of rhetorical
Application of rhetoricalcsandit
 
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITYASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITYdannyijwest
 
Identification of Rhetorical Structure Relation from Discourse Marker in Beng...
Identification of Rhetorical Structure Relation from Discourse Marker in Beng...Identification of Rhetorical Structure Relation from Discourse Marker in Beng...
Identification of Rhetorical Structure Relation from Discourse Marker in Beng...idescitation
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discoursePascual Pérez-Paredes
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...dannyijwest
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyIJwest
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introductionThennarasuSakkan
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
An efficient concept based mining model for enhancing text clustering(synopsis)
An efficient concept based mining model for enhancing text clustering(synopsis)An efficient concept based mining model for enhancing text clustering(synopsis)
An efficient concept based mining model for enhancing text clustering(synopsis)Mumbai Academisc
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourseijitcs
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
 
Textual relations in the holy qur’an
Textual relations in the holy qur’anTextual relations in the holy qur’an
Textual relations in the holy qur’anMohammad HeadFather
 

Ähnlich wie Marcu 2000 presentation (20)

Application of rhetorical
Application of rhetoricalApplication of rhetorical
Application of rhetorical
 
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITYASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
 
Identification of Rhetorical Structure Relation from Discourse Marker in Beng...
Identification of Rhetorical Structure Relation from Discourse Marker in Beng...Identification of Rhetorical Structure Relation from Discourse Marker in Beng...
Identification of Rhetorical Structure Relation from Discourse Marker in Beng...
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
P13 corley
P13 corleyP13 corley
P13 corley
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introduction
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
An efficient concept based mining model for enhancing text clustering(synopsis)
An efficient concept based mining model for enhancing text clustering(synopsis)An efficient concept based mining model for enhancing text clustering(synopsis)
An efficient concept based mining model for enhancing text clustering(synopsis)
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 
Textual relations in the holy qur’an
Textual relations in the holy qur’anTextual relations in the holy qur’an
Textual relations in the holy qur’an
 

Mehr von Richard Littauer

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Richard Littauer
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationRichard Littauer
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social MediaRichard Littauer
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsRichard Littauer
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossRichard Littauer
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological AgreementRichard Littauer
 
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Richard Littauer
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaRichard Littauer
 
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Richard Littauer
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationRichard Littauer
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsRichard Littauer
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageRichard Littauer
 

Mehr von Richard Littauer (13)

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
Saarland and UdS
Saarland and UdSSaarland and UdS
Saarland and UdS
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological Agreement
 
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
 
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in Linguistics
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for Language
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Marcu 2000 presentation

  • 1. The Rhetorical Parsing of Unrestricted Texts: A Surface-based Approach Daniel Marcu (USC) ACL 2000
  • 2. Abstract This paper explores the extent to which well-formed rhetorical structures can be automatically derived by means of surface-form-based algorithms. These algorithms: identify discourse usages of cue phrases break sentences into clauses hypothesize rhetorical relations that hold among textual units produce valid rhetorical structure trees for unrestricted natural language texts. The algorithms are empirically grounded in a corpus analysis of cue phrases and rely on a first-order formalization of rhetorical structure trees.
  • 3. Abstract The algorithms are evaluated both intrinsically and extrinsically. The intrinsic evaluation assesses the resemblance between automatically and manually constructed rhetorical structure trees. The extrinsic evaluation shows that automatically derived rhetorical structures can be successfully exploited in the context of text summarization.
  • 5. Motivation Marcu explores the ground found at the intersection of: theories from a traditional, truth-based semantic perspective (weak) theories aimed at characterizing the constraints that pertain to the structure of unrestricted texts and the computational mechanisms that would enable the derivation of these structures (which are weak). Goal: Automatically build rhetorical constructions by relying only on cohesion and connectives (i.e., phrases such as for example, and, although, and however.)
  • 6. Outline 2. Foundation 3. Corpus Analysis of Cue Phrases 4. Rhetorical Parsing Algorithm 5. Evaluation 6. Related Work 7. Conclusion (Recap)
  • 7. Foundation The hypothesis that underlies this work is that connectives, cohesion, shallow processing, and a well-constrained mathematical model of valid rhetorical structure trees (RS-trees) can be used to implement algorithms that determine the elementary units of a text, i.e., the units that constitute the leaves of the RS-tree of that text; the rhetorical relations that hold between elementary units and between spans of text; the relative importance (nucleus or satellite) and the size of the spans subsumed by these rhetorical relations.
  • 8. Foundation 2.1 Determining the Elementary Units Using Connectives and Shallow Processing Pros: punctuation, connectives Cons: Difficult (consider and) Using these, elementary unit boundaries can be determined with approximately 80% accuracy.
  • 9. Foundation 2.2 Determining Rhetorical Relations Using Connectives Pros: psychologically useful, common/regular Cons: 1. Sentence vs. discourse function unclear; 2. They do not indicate scope of relation; 3. Can signal different rhetorical relations.
  • 10. Foundation 2.2.2 Discourse markers ambiguous as to scope: Rhetorical relations that hold between large textual spans can be explained in terms of similar relations that hold between their most important elementary units. (Marcu, elsewhere) Compositionality criterion for valid rhetorical structures: posits that a rhetorical structure tree is valid only if each rhetorical relation that holds between two spans is either an extended rhetorical relation or can be explained in terms of a simple rhetorical relation.
  • 11. Foundation Discussion: The more complex the text, the harder to automatically identify. “I have never come across a case in which a simple connective signaled more than one rhetorical relation.” “I have never come across an example that would require one to deal with exclusively disjunctive hypotheses [that do not overlap in their scope].”
  • 12. Foundation 2.3 Determining Rhetorical Relations Using Cohesion Pros: co-occurrence can determine thematic continuity, correlation between cohesion-defined textual segments and hierarchical, intentionally defined segments, cohesion works for smaller relations too. Cons: Marcu uses a coarse model of the relation between cohesion and rhetorical relations (to make things simpler).
  • 13. Foundation 2.4 Determining Rhetorical Structure Using a Well-Constrained Mathematical Model Uses first order logic, with these features and constraints: A valid rhetorical structure is a binary tree whose leaves denote elementary textual units. Rhetorical relations hold between textual units and spans of various sizes. These relations are paratactic or hypotactic. Each node of a rhetorical structure tree has associated a status (NUCLEUS or SATELLITE),a type (the rhetorical relation that holds between the text spans that the node spans over), and a set of promotion units.
  • 14. Foundation 2.4 Determining Rhetorical Structure Using a Well-Constrained Mathematical Model The status and type associated with each node are unique. The rhetorical relations of a valid rhetorical structure hold only between adjacent spans. There exists a span, which corresponds to the root node of the structure, that spans over the entire text. The status, type, and promotion set associated with each node reflect the compositionality criterion.
  • 16. Foundation Two problems to solve: 1. The problem of rhetorical grounding: Needs to show how starting from free, unrestricted text, connectives and cohesion can be used to automatically determine the elementary units of text and hypothesize simple, extended, and exclusively disjunctive rhetorical relations that hold between these units and spans of units. 2. The problem of rhetorical structure derivation: The rhetorical structures must be consistent with the constraints given. I refer to this as.”
  • 17. Foundation Not modeled as an incremental process in which elementary units are determined and attached to an increasingly complex RS-tree. Process: Determine elementary discourse units (edus) first That knowledge of connectives and cohesion is then used to (over-)hypothesize simple, extended, and exclusively disjunctive rhetorical relations; These hypotheses and the well-constrained model of valid RS- trees are used to determine the set of valid rhetorical interpretations that are consistent with both the mathematical model and the hypotheses.
  • 18. Corpus Analysis Prior to Marcu, no empirical data existed which could answer the question of the extent to which connectives could be used to identify elementary units and hypothesize rhetorical relations. So… corpus study. The corpus study was designed to: investigate how cue phrases can be used to identify the elementary units of texts; to determine what rhetorical relations hold between units and spans of text, the nuclearity of the units, and the sizes of the related spans. Developed his own annotation scheme.
  • 19. Corpus Analysis 450 potential markers from previous lists of potential discourse markers, cue phrases Brown corpus examples for each: 300 words or so per example. Average of 17 text fragments per cue phrase. Overall, more than 7,600 randomly selected texts.
  • 20.
  • 21. Corpus Analysis Hardcoding? “By encoding algorithmic specific information in the corpus, I only bootstrap the step that can take one from annotated data to algorithmic information.” Doesn’t preclude other methods.
  • 22. Corpus Analysis Manually analyzed 2,100 of the text fragments in the corpus. Annotated only 2,100 fragments (time). Of these: 1,197 had a discourse function; 773 were sentential; 244 were pragmatic.
  • 23. Corpus Analysis “I did not use an objective definition of elementary unit. Rather, I relied on a more intuitive one.” This corpus only used for development; testing was done on independent corpora with non- biased judges.
  • 24. Rhetorical Parsing Algorithm
  • 25. Rhetorical Parsing Algorithm
  • 26. Rhetorical Parsing Algorithm Orthographic markers, such as commas, periods, dashes, paragraph breaks, etc., play an important role in our surface-based approach to discourse processing, so are included. “By considering only cue phrases having a discourse function in most of the cases, I deliberately chose to focus more on precision than on recall with respect to the task of identifying the elementary units of text.”
  • 27. Rhetorical Parsing Algorithm Used Lex (Unix) to identify markers in text. Shallow analyzer then used to perform 11 different actions, associated with the discourse markers, as a foundation for identifying edus. NOTHING, NORMAL, COMMA, NORMAL_THEN_COMMA, END, MATCH_PARENS, COMMA_PAREN, MATCH_DASH, SET_AND/SET_OR, DUAL
  • 28. Rhetorical Parsing Algorithm
  • 29. Rhetorical Parsing Algorithm Clause-like Unit Identification On the basis of the information derived from the corpus, Marcu designed an algorithm that identifies elementary textual unit boundaries in sentences and cue phrases that have a discourse function.
  • 30. Rhetorical Parsing Algorithm
  • 31. Rhetorical Parsing Algorithm
  • 32. Rhetorical Parsing Algorithm Used: an expository text of 5,036 words from Scientific American; a magazine article of 1,588 words from Time; a narration of 583 words from the Brown corpus (segment P25:1250-1710). Manually broken into elementary units by three slaves (coli masters students)
  • 33. Rhetorical Parsing Algorithm
  • 34. Rhetorical Parsing Algorithm
  • 35. Rhetorical Parsing Algorithm 4.4.1 From Discourse Markers to Rhetorical Relations For each regex for discourse markers, six features for each possible discourse role were annotated: Status (SAT_NUCLEUS, SATELLITE_NUC, NULL) Where to link (BEFORE, AFTER) Types (CLAUSE, SENTENCE, PARAGRAPH) Rhetorical relation (CONCESSION, ELABORATION &c.) Clause distance / Sentence distance Distance to salient unit
  • 36. Rhetorical Parsing Algorithm 4.4.2 A Discourse-marker-based Algorithm for Hypothesizing Rhetorical Relations. Iterate over all textual units (sentence, clause, paragraph) For each discourse marker, the algorithm constructs an exclusively disjunctive hypothesis concerning the rhetorical relations that the marker under scrutiny may signal. The algorithm assumes that the rhetorical structure at each level can be derived by hypothesizing rhetorical relations that hold between the units at that level. It generates simple and extended relations.
  • 37. Rhetorical Parsing Algorithm
  • 38. Rhetorical Parsing Algorithm 4.4.3 A Word-co-occurrence-based Algorithm for Hypothesizing Rhetorical Relations For paragraph-level analysis, discourse markers may not be enough. Here, cohesion, using word collocations, can be used to identify ELABORATION, BACKGROUND or JOINT relations.
  • 39. Rhetorical Parsing Algorithm 4.5 A Proof-Theoretic Account of the Problem of Rhetorical Structure Derivation “Once the elementary units of a text have been determined and the rhetorical relations between them have been hypothesized at sentence, paragraph, and section levels, we need to determine the rhetorical structures that are consistent with these hypotheses and with the constraints specific to valid RS-trees. That is, we need to solve the problem of rhetorical structure derivation.”
  • 40. Rhetorical Parsing Algorithm 4.5 A Proof-Theoretic Account of the Problem of Rhetorical Structure Derivation “Once the elementary units of a text have been determined and the rhetorical relations between them have been hypothesized at sentence, paragraph, and section levels, we need to determine the rhetorical structures that are consistent with these hypotheses and with the constraints specific to valid RS-trees. That is, we need to solve the problem of rhetorical structure derivation.”
  • 41. Rhetorical Parsing Algorithm 4.5 A Proof-Theoretic Account of the Problem of Rhetorical Structure Derivation Marcu devised a proof theory to determine all valid rhetorical structures of a text. The theory consists of a set of axioms and rewriting rules that encode all possible ways in which one can derive the valid RS- trees of a text. More in Marcu (2000). Lots of ways to implement.
  • 42. Rhetorical Parsing Algorithm
  • 43. Rhetorical Parsing Algorithm Ambiguity in Discourse: Many different possibilities (as in parsing). Best are skewed to the right (style of writing, placement of important information) Weights are “computed recursively by summing up the weights of the left and right branches of a rhetorical structure and the difference between the depth of the right and left branches of the structure. Hence, the more skewed to the right a tree is, the greater its weight w is.” To disambiguate: keeps only partial structures that lead to maximal weighting
  • 44.
  • 45. Evaluation Two ways to evaluate: Compare the automatically derived trees with trees that have been built manually. Evaluate the impact that they have on the accuracy of other natural language processing tasks, such as anaphora resolution, intention recognition, or text summarization.
  • 48. Evaluation Qualitative: Good Discourse Structures at the Paragraph Level Good Discourse Structures at the Text Level, for Short Texts Good Discourse Structures for Sentences, Paragraphs, and Texts that Use Unambiguous Discourse Markers Good Discourse Structures for Sentences that Use Markers Other than And. Bad Discourse Structures for Sentences that Use the Discourse Marker Incorrectly Labeled Intentional Relations. Bad Discourse Structures for Very Large Texts
  • 49. Evaluation Summarization Algorithm: The algorithm uses the rhetorical parser described in this paper to determine the discourse structure of a text given as input, it uses the discourse structure to induce a partial ordering on the elementary units in the text, and then, depending on the desired compression rate, it selects the p most important units in the text. Corpora: 5 SA texts mentioned, and 40 articles from TREC collection.
  • 51. Related Work Corston-Oliver (1998): Syntactic information to hypothesize relations Sumita et al. (1992): Deep syntactic processing, constrained trees, no weighting or disambiguation Kurohashi and Nagao (1994) : discourse structure generator that builds discourse trees in an incremental fashion. Strube and Hahn (1999): building hierarchies of referential discourse elements. None of these were compared due to inapplicability.
  • 52. Conclusion “Quantitative and qualitative analyses of the results show that many relations can be identified correctly within this framework.” “The brightest side of the story is that the results in this paper show that the rhetorical structures derived by my parser can be used successfully in the context of text summarization.”

Hinweis der Redaktion

  1. Here we see an example of rhetorical structures in a text. The internal nodes are labeled with the names of the rhetorical relations that hold between the textual spans that are subsumed by their child nodes. Each relation between two nodes is represented graphically by means of a combination of straight lines and arcs. The material subsumed by the text span that corresponds to the starting point of an arc is subsidiary to the material subsumed by the text span that corresponds to the end point of an arc. A relation represented only by straight lines corresponds to cases in which the subsumed text spans are equally important. Text spans that subsume subsidiary information, i.e., text spans that correspond to starting points of arcs, are called satellites. All other text spans are called nuclei. Text fragments surrounded by curly brackets denote parenthetical units: their deletion does not affect the understanding of the textual unit to which they belong.
  2. 1 and 3 will be covered later.
  3. Regarding correlation: think of the earth and the moon.
  4. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  5. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  6. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  7. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  8. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  9. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  10. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  11. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  12. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  13. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  14. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  15. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  16. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  17. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  18. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  19. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  20. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  21. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  22. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  23. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  24. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  25. Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  26. e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  27. e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  28. e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  29. e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  30. e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  31. e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  32. e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.