SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Robust Constituent-to-Dependency
       Conversion for English

The 9th International Workshop on Treebanks and
               Linguistic Theories

        Jinho D. Choi & Martha Palmer
       University of Colorado at Boulder
             December 3rd, 2010
Dependency Structure
• What is dependency?
  - Syntactic or semantic relation between a pair of words.
                              LOC
                              TMP              PMOD
                                                 NMOD

                 places
                 events             in        this         city
                                                           year
• Phrase structure vs. dependency structure
  - Constituents vs. dependencies
           S                                                root

    NP               VP                                    bought
                                                     SBJ            OBJ
    He      bought            NP              He                           car
                                                                        NMOD
                          a         car                             a
                                          2
Dependency Graph
• For a sentence s = w           1 .. wn , a   dependency graph Gs = (Vs, Es)

  -V s   = {w0 = root, w1, ... , wn}

  -E s   = {(wi, r, wj) : wi ≠ wj, wi ∈ Vs, wj ∈ Vs - {w0}, r ∈ Rs}

  -R s   = a set of all dependency relations in s

• A well-formed dependency graph
  - Unique root, single head, connected, acyclic ➜ dependency tree
  - Projective vs. non-projective.
                root   He       bought    a     car    that    is    red



         root    He    bought      a     car     yesterday    that   is    red
                                           3
Why Constituent-to-Dependency?
•   Dependency Treebanks in English
    -   Prague English Dependency Treebank (≈ 0.28M words)

    -   ISLE English Dependency Treebank (?)

•   Constituent Treebanks in English
    -   The Penn Treebank (> 1M words)

    -   The OntoNotes Treebank (> 2M words)

    -   The Genia Treebank (≈ 0.48M words)

    -   The Craft Treebank (≈ 0.79M words)

•   By performing the conversion, we get larger corpora
    with more diversities.

                                 4
Previous Conversion Tools
•   Constituent-to-dependency conversion tools
    -   Penn2Malt.

    -   LTH conversion tool (Johansson and Nugues, 2007).

    -   Stanford dependencies (Marneffe et al., 2006).

•   LTH conversion tool
    -   Used for CoNLL 2007 - 2009.

    -   Generates semantic dependencies from function tags and
        non-projective dependencies using empty categories.

    -   Customized for the original Penn Treebank.
        •   Penn Treebank style phrase structure has made some changes.


                                     5
Changes in Penn Treebank Style Phrase Structure
 •   Tokenized hyphenated words, inserted NML phrases.
                 ADJP                                  ADJP

           NNP          JJ                   NML       HYPH          VBN

                                           NNP NNP

           New    York-based               New York        -        based


 •   Introduced some new phrase/token-level tags.
                  S
          NP             VP
          He     met          EDITED        ,          NP
                                 NP               NP                PP
                         these    people   this    group       of        NP
                                                                     people

                                       6
Updated Conversion Tool
•   Motivations
    -   The conversion tool needs to be updated as the phrase
        structure format changes.

    -   The conversion tool needs to perform robustly across
        different corpora.

    -   The conversion tool may generate dependency trees with
        empty categories.

•   Contributions
    -   Less unclassified dependencies.

    -   Less unnecessary non-projective dependencies.

    -   More robust parsing performance across different corpora.

                                 7
Constituent-to-Dependency Conversion
• Conversion steps
  1. Use head-percolation rules to find the head of each
     constituent, and make it the parent of all other nodes in the
     constituent.
  2. For certain empty categories (e.g., *T*, *ICH*), make their
     antecedents children of the empty categories’ parents.
  3. Label all dependencies by comparing relations between all
     head-dependent pairs.
• Head-percolation rules
  • A set of rules that defines the head of each constituent.
  • e.g., the head of a noun phrase ::= the rightmost noun.

                               8
Head-percolation Rules




          9
Constituent-to-Dependency Conversion
                     8:TOP
                    7:SBARQ

            1:WHNP-1              6:SQ
                                          NP-2      5:VP
                                                              4:S
                                                              NP 3:VP
                                                                                  2:VP
                                                                                              NP
     root     How           far    do     you       expect    *-2       to        run        *T*-1
                        1                 6                         4         3          2
                            7                 6               5
                    8
                                                           Non-projective dependency


             root           How     far       do    you      expect          to     run

                                                   10
Small Clauses
   •   Object predicates in adjectival small clauses
       -   LTH tool: direct children of the main verbs.

       -   Ours: direct children of the subject-nouns.
                                     S
                        NP                  VP
      ARG0                                                                 ARG1
impeller to action      He          made            S-1             impelled predication
                      cause to be           NP-SBJ ADJP-PRD

                                              us          happy

               ROOT          OPRD                         ROOT
                  SBJ    OBJ                                SBJ     OBJ        PRD

           root He    made   us     happy          root He       made     us    happy


                                             11
Coordination
•   A phrase contains coordination if
    -   It contains a conjunction (CC) or a conj-phrase (CONJP)

    -   It is tagged as an unlike coordinated phrase (UCP).

    -   It contains a child annotated with a function tag, ETC.

•   Find correct left and right conjunct pairs




                                  12
Coordination
                S
     NP                     VP


          VP                CC ADVP       VP
               NP                                 NP

root We sold old    books   and   then   bought   new   books




root We sold old    books   and   then   bought   new   books




root We sold old    books   and   then   bought   new   books



                            13
Gapping Relations
•   Parsing gapping relations is hard.
    -    It is hard to distinguish them from coordination.

    -    There are not many instances to train with.
                                    S
               S                     ,                          S
        NP-1          VP                                 NP=1          VP
        Some   said         SBAR                         some       said    NP=2
                      *0*           S                                       May
                            NP                VP
                            Putin   visited         PP
                                               in    NP-2
                                                     April
                                     14
Gapping Relations
                                GAP-PMOD
                                DEP
                          GAP-SBJ
                          P
     ROOT         OBJ
        SBJ         SBJ   TMP      PMOD

root Some said1 Putin visited in April ,   some        said2 May



                                   GAP
     ROOT         OBJ                        P
        SBJ         SBJ   TMP      PMOD          SBJ      TMP

root Some said1 Putin visited in April ,   some        said2 May




                              15
Empty Category Mappings
•   *RNR* (right node raising)

    -   At least two *RNR* nodes get to be referenced to the same
        antecedent.

    -   Map the antecedent to its closest *RNR* node.
          S
        NP VP
                NP
                                   NML

                      NML                 CC NML               NP-1
                      NML     PP              NML PP
                                     NP                  NP
    root I know his admiration for *RNR*-1 and trust in *RNR*-1 you

                                                         Our
                                               LTH



                                    16
Experiments
•   Corpora
    -   OntoNotes v4.0.

           EBC       EBN     SIN       XIN     WEB       WSJ      ALL
    Train 14,873    11,968   7,259     3,156   13,419   12,311   62,986
    Eval. 1,291      1,339   1,066     1,296    1,172    1,381    7,545
    Avg. 15.21       19.49   23.36     29.77    22.01    24.03    21.02

•   Dependency parsers
    -   MaltParser: swap-lazy algorithm, LibLinear

    -   MSTParser: Chu-Liu-Edmonds algorithm, MIRA

    -   ClearParser: shift-eager algorithm, LibLinear


                                  17
Constituent-to-Dependency Conversion
•   Distributions of unclassified dependencies (in %)
           EBC    EBN    SIN         XIN      WEB      WSJ      ALL
     LTH   4.77   1.51   1.16        1.63     1.93     1.93     2.20
     Our   0.86   0.57   0.33        0.44     1.03     0.25     0.60

•   Distributions of non-projective dependencies (in %)

           EBC EBN         SIN         XIN     WEB       WSJ      ALL
 LTH - Dep 1.44 0.81       0.70        0.29    0.95      0.51     0.82
 Our - Dep 1.29 0.73       0.69        0.21    0.83      0.46     0.73
 LTH - Sen 11.14 8.66      8.47        5.30    11.29     7.27     9.27
 Our - Sen 9.19 7.39       8.22        3.75     9.02     6.24     7.78



                                18
Constituent-to-Dependency Conversion
•   Parsing accuracy when trained and tested on the
    same corpora (in %)
            EBC EBN SIN XIN WEB WSJ ALL
 Malt - LTH 82.91 86.38 86.20 84.61 85.10 86.93 85.44
 Malt - Our 83.20 86.40 86.03 84.85 85.45 87.40 85.65
 MST - LTH 81.64 85.47 85.02 84.10 84.05 85.93 84.49
 MST - Our 82.54 85.68 85.11 83.85 84.03 86.43 84.69
 Clear - LTH 83.36 86.32 86.80 85.50 85.53 87.15 85.88
 Clear - Our 84.06 86.77 86.55 85.41 85.70 87.58 86.09




                            19
Constituent-to-Dependency Conversion
•   Parsing accuracy when trained and tested on
    different corpora (in %)
            EBC EBN SIN XIN WEB WSJ ALL
 Malt - LTH 74.80 82.40 81.74 79.39 80.42 80.59 80.01
 Malt - Our 75.60 83.05 81.81 81.46 80.81 81.17 80.85
 MST - LTH 76.65 82.45 82.29 80.46 80.64 80.02 80.49
 MST - Our 77.20 83.06 82.52 80.88 80.82 81.04 81.01
 Clear - LTH 76.37 83.16 83.53 81.29 81.83 81.29 81.36
 Clear - Our 77.14 84.16 83.66 82.45 82.26 82.32 82.16

      All parsers gave significantly more accurate results
        when trained and tested on different corpora.


                               20
Conclusion
•   Aims
    -   Updated the conversion tool with respect to the changes in
        Penn Treebank style phrase structure.

    -   Robust conversion across different corpora.

•   Contributions
    -   Less unclassified dependencies

    -   Less unnecessary non-projective dependencies.

    -   More robust parsing performance across different corpora.

•   ClearParser open-source project
    -   http://code.google.com/p/clearparser/

                                 21
Acknowledgements
•   Special thanks to Joakim Nivre for helpful insights.
•   We gratefully acknowledge the support of the National
    Science Foundation Grants CISE- CRI-0551615, Towards
    a Comprehensive Linguistic Annotation and CISE-CRI
    0709167, Collaborative: A Multi-Representational and
    Multi-Layered Treebank for Hindi/Urdu, and a grant from
    the Defense Advanced Research Projects Agency
    (DARPA/IPTO) under the GALE program, DARPA/CMO
    Contract No. HR0011-06-C-0022, subcontract from
    BBN, Inc. Any opinions, findings, and conclusions or
    recommendations expressed in this mate- rial are those
    of the authors and do not necessarily reflect the views
    of the National Science Foundation.


                              22

Weitere ähnliche Inhalte

Mehr von Jinho Choi

Mehr von Jinho Choi (20)

Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 
Text-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesText-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven Templates
 
Resume Classification with Term Attention Embeddings
Resume Classification with Term Attention EmbeddingsResume Classification with Term Attention Embeddings
Resume Classification with Term Attention Embeddings
 
[DSA-Java] Heap Sort
[DSA-Java] Heap Sort[DSA-Java] Heap Sort
[DSA-Java] Heap Sort
 
Multilingual Propbank Annotation Tools: Cornerstone and Jubilee
Multilingual Propbank Annotation Tools: Cornerstone and JubileeMultilingual Propbank Annotation Tools: Cornerstone and Jubilee
Multilingual Propbank Annotation Tools: Cornerstone and Jubilee
 
Hierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions
Hierarchical Entity Extraction and Ranking with Unsupervised Graph ConvolutionsHierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions
Hierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
 
Analysis of a State Machine-based Interactive Dialogue Management System
Analysis of a State Machine-based Interactive Dialogue Management SystemAnalysis of a State Machine-based Interactive Dialogue Management System
Analysis of a State Machine-based Interactive Dialogue Management System
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Robust Constituent-to-Dependency Conversion for English

  • 1. Robust Constituent-to-Dependency Conversion for English The 9th International Workshop on Treebanks and Linguistic Theories Jinho D. Choi & Martha Palmer University of Colorado at Boulder December 3rd, 2010
  • 2. Dependency Structure • What is dependency? - Syntactic or semantic relation between a pair of words. LOC TMP PMOD NMOD places events in this city year • Phrase structure vs. dependency structure - Constituents vs. dependencies S root NP VP bought SBJ OBJ He bought NP He car NMOD a car a 2
  • 3. Dependency Graph • For a sentence s = w 1 .. wn , a dependency graph Gs = (Vs, Es) -V s = {w0 = root, w1, ... , wn} -E s = {(wi, r, wj) : wi ≠ wj, wi ∈ Vs, wj ∈ Vs - {w0}, r ∈ Rs} -R s = a set of all dependency relations in s • A well-formed dependency graph - Unique root, single head, connected, acyclic ➜ dependency tree - Projective vs. non-projective. root He bought a car that is red root He bought a car yesterday that is red 3
  • 4. Why Constituent-to-Dependency? • Dependency Treebanks in English - Prague English Dependency Treebank (≈ 0.28M words) - ISLE English Dependency Treebank (?) • Constituent Treebanks in English - The Penn Treebank (> 1M words) - The OntoNotes Treebank (> 2M words) - The Genia Treebank (≈ 0.48M words) - The Craft Treebank (≈ 0.79M words) • By performing the conversion, we get larger corpora with more diversities. 4
  • 5. Previous Conversion Tools • Constituent-to-dependency conversion tools - Penn2Malt. - LTH conversion tool (Johansson and Nugues, 2007). - Stanford dependencies (Marneffe et al., 2006). • LTH conversion tool - Used for CoNLL 2007 - 2009. - Generates semantic dependencies from function tags and non-projective dependencies using empty categories. - Customized for the original Penn Treebank. • Penn Treebank style phrase structure has made some changes. 5
  • 6. Changes in Penn Treebank Style Phrase Structure • Tokenized hyphenated words, inserted NML phrases. ADJP ADJP NNP JJ NML HYPH VBN NNP NNP New York-based New York - based • Introduced some new phrase/token-level tags. S NP VP He met EDITED , NP NP NP PP these people this group of NP people 6
  • 7. Updated Conversion Tool • Motivations - The conversion tool needs to be updated as the phrase structure format changes. - The conversion tool needs to perform robustly across different corpora. - The conversion tool may generate dependency trees with empty categories. • Contributions - Less unclassified dependencies. - Less unnecessary non-projective dependencies. - More robust parsing performance across different corpora. 7
  • 8. Constituent-to-Dependency Conversion • Conversion steps 1. Use head-percolation rules to find the head of each constituent, and make it the parent of all other nodes in the constituent. 2. For certain empty categories (e.g., *T*, *ICH*), make their antecedents children of the empty categories’ parents. 3. Label all dependencies by comparing relations between all head-dependent pairs. • Head-percolation rules • A set of rules that defines the head of each constituent. • e.g., the head of a noun phrase ::= the rightmost noun. 8
  • 10. Constituent-to-Dependency Conversion 8:TOP 7:SBARQ 1:WHNP-1 6:SQ NP-2 5:VP 4:S NP 3:VP 2:VP NP root How far do you expect *-2 to run *T*-1 1 6 4 3 2 7 6 5 8 Non-projective dependency root How far do you expect to run 10
  • 11. Small Clauses • Object predicates in adjectival small clauses - LTH tool: direct children of the main verbs. - Ours: direct children of the subject-nouns. S NP VP ARG0 ARG1 impeller to action He made S-1 impelled predication cause to be NP-SBJ ADJP-PRD us happy ROOT OPRD ROOT SBJ OBJ SBJ OBJ PRD root He made us happy root He made us happy 11
  • 12. Coordination • A phrase contains coordination if - It contains a conjunction (CC) or a conj-phrase (CONJP) - It is tagged as an unlike coordinated phrase (UCP). - It contains a child annotated with a function tag, ETC. • Find correct left and right conjunct pairs 12
  • 13. Coordination S NP VP VP CC ADVP VP NP NP root We sold old books and then bought new books root We sold old books and then bought new books root We sold old books and then bought new books 13
  • 14. Gapping Relations • Parsing gapping relations is hard. - It is hard to distinguish them from coordination. - There are not many instances to train with. S S , S NP-1 VP NP=1 VP Some said SBAR some said NP=2 *0* S May NP VP Putin visited PP in NP-2 April 14
  • 15. Gapping Relations GAP-PMOD DEP GAP-SBJ P ROOT OBJ SBJ SBJ TMP PMOD root Some said1 Putin visited in April , some said2 May GAP ROOT OBJ P SBJ SBJ TMP PMOD SBJ TMP root Some said1 Putin visited in April , some said2 May 15
  • 16. Empty Category Mappings • *RNR* (right node raising) - At least two *RNR* nodes get to be referenced to the same antecedent. - Map the antecedent to its closest *RNR* node. S NP VP NP NML NML CC NML NP-1 NML PP NML PP NP NP root I know his admiration for *RNR*-1 and trust in *RNR*-1 you Our LTH 16
  • 17. Experiments • Corpora - OntoNotes v4.0. EBC EBN SIN XIN WEB WSJ ALL Train 14,873 11,968 7,259 3,156 13,419 12,311 62,986 Eval. 1,291 1,339 1,066 1,296 1,172 1,381 7,545 Avg. 15.21 19.49 23.36 29.77 22.01 24.03 21.02 • Dependency parsers - MaltParser: swap-lazy algorithm, LibLinear - MSTParser: Chu-Liu-Edmonds algorithm, MIRA - ClearParser: shift-eager algorithm, LibLinear 17
  • 18. Constituent-to-Dependency Conversion • Distributions of unclassified dependencies (in %) EBC EBN SIN XIN WEB WSJ ALL LTH 4.77 1.51 1.16 1.63 1.93 1.93 2.20 Our 0.86 0.57 0.33 0.44 1.03 0.25 0.60 • Distributions of non-projective dependencies (in %) EBC EBN SIN XIN WEB WSJ ALL LTH - Dep 1.44 0.81 0.70 0.29 0.95 0.51 0.82 Our - Dep 1.29 0.73 0.69 0.21 0.83 0.46 0.73 LTH - Sen 11.14 8.66 8.47 5.30 11.29 7.27 9.27 Our - Sen 9.19 7.39 8.22 3.75 9.02 6.24 7.78 18
  • 19. Constituent-to-Dependency Conversion • Parsing accuracy when trained and tested on the same corpora (in %) EBC EBN SIN XIN WEB WSJ ALL Malt - LTH 82.91 86.38 86.20 84.61 85.10 86.93 85.44 Malt - Our 83.20 86.40 86.03 84.85 85.45 87.40 85.65 MST - LTH 81.64 85.47 85.02 84.10 84.05 85.93 84.49 MST - Our 82.54 85.68 85.11 83.85 84.03 86.43 84.69 Clear - LTH 83.36 86.32 86.80 85.50 85.53 87.15 85.88 Clear - Our 84.06 86.77 86.55 85.41 85.70 87.58 86.09 19
  • 20. Constituent-to-Dependency Conversion • Parsing accuracy when trained and tested on different corpora (in %) EBC EBN SIN XIN WEB WSJ ALL Malt - LTH 74.80 82.40 81.74 79.39 80.42 80.59 80.01 Malt - Our 75.60 83.05 81.81 81.46 80.81 81.17 80.85 MST - LTH 76.65 82.45 82.29 80.46 80.64 80.02 80.49 MST - Our 77.20 83.06 82.52 80.88 80.82 81.04 81.01 Clear - LTH 76.37 83.16 83.53 81.29 81.83 81.29 81.36 Clear - Our 77.14 84.16 83.66 82.45 82.26 82.32 82.16 All parsers gave significantly more accurate results when trained and tested on different corpora. 20
  • 21. Conclusion • Aims - Updated the conversion tool with respect to the changes in Penn Treebank style phrase structure. - Robust conversion across different corpora. • Contributions - Less unclassified dependencies - Less unnecessary non-projective dependencies. - More robust parsing performance across different corpora. • ClearParser open-source project - http://code.google.com/p/clearparser/ 21
  • 22. Acknowledgements • Special thanks to Joakim Nivre for helpful insights. • We gratefully acknowledge the support of the National Science Foundation Grants CISE- CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE-CRI 0709167, Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this mate- rial are those of the authors and do not necessarily reflect the views of the National Science Foundation. 22