SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Dependency Analysis of
Abstract Universal Structures
in Korean and English
Jayeol Chun
Contents
1. Thesis Road Map
2. Background Part 1: [Constituency & Dependency Grammar]
3. Constituent-to-Dependency Conversion
4. Universal Dependency Treebanks in Korean
5. Background Part 2: [Predicate-Argument Structure & AMR]
6. PropBank-Augmented OntoNotes Corpus
7. Contributions
ParsingSyntactic Parsing Semantic Parsing
Constituency Dependency Semantic Role Labeling AMR
Korean AMR ..?
done
in progress..
PropBank
Constituency (Phrase Structure)
´ Constituent: a word or a phrase
that acts like a single
grammatical unit
´ Root
´ Terminals
´ Non-Terminals
Dependency
´ Dependency: A directed arc that establishes a head-child relation
between two nodes
´ Dependency label describes the child’s role in relation to the head
´ Can represent languages with flexible word order
Well-Formed Dependency Graphs
head child
dep
1. Unique Root
2. Single Head
3. Connected
4. Acyclic
5. Projective
Jurafsky D.; Martin, J. H., Speech and Language Processing:
Dependency Parsing, Ch. 14 pg. 5
Korean
´ One of Morphologically Rich Languages
´ Morphology: study of how words are formed
´ Morphological Analysis: kamsahamnida (thank) -> kamsa (thank) + ham
(verbalize) + nida (ending marker)
´ Several large constituency treebanks
´ Q: What about dependency?
´ Relatively free word order
´ Morphemes provide syntactic function as well as meaning of words
´ Lack of large publicly available dependency corpora
Manual Annotation?
Approach
´ Leverage the large annotated constituency treebanks
´ Convert the constituent trees into dependency trees!
Constituent-to-Dependency
Conversion [1]
0. Redirect Dependencies for Empty Categories (if they exist)
1. Establish Head-Child Dependency relations using Head-Percolation Rules
2. Infer Dependency Labels using Linguistic Heuristics
Empty Categories
´ Characteristic of treebanks annotated in Penn Treebank [3] style
´ OntoNotes [4], Penn Korean Treebank [5]
´ Nominal units that indicate the location of their antecedent syntactic
elements
´ Enables to represent long-term dependencies
´ often breaks the projectivity property
Types of Empty Categories
in the Penn Korean Treebank
1. Trace (*T*): Argument that precedes its subject leaves in its place a trace, a
pointer to the index of the antecedent in the tree
´ Trace Mapping
2. Ellipsis (*?*): Dropped predicate in a matrix clause or a clausal coordination
´ Heuristics to identify the location of the shared predicate
3. Empty Assignment (*pro*): Dropped arguments
4. Empty Operator (*op*): Relative Clauses
After Wh-Movement
*?*
(S (ADCP (ADC 반면[Meanwhile]))
(S (NP-SBJ (NPR+NNX+PAU 삼성+측+은[Samsung]))
(VP (NP-OBJ (NNC+PCA 논평+을[to comment]))
(VV (NNC+XSV+EPF+EFN 거부+하+었+다[refused]))))
(SFN .))
Head-Percolation Rules
´ For every node in the tree, locates the head by iterating through its
immediate children and matching the POS in the order delimited by ;
´ r: Iterate from right to left (Korean is a head-final language)
´ Terminal node’s head is itself
Dependency Label Inference
´ Linguistic heuristics:
´ Morphological analysis of the head and the dependent
´ POS
´ Word
´ Function tags
´ Function Tags
´ Annotated in the Penn Treebank style treebanks
´ Provides additional syntactic / semantic information
´ Ex) NP-SBJ -> The NP (Noun Phrase) is the subject of a clause or a sentence
Universal Dependencies [6]
´ Effort to create a consistent annotation scheme for multiple languages
´ Encourage multi-lingual parsing experiments and comparative analysis
´ Defines a POS and dependency label tagset
´ Suggests a universal way of annotating certain sentence constructions, but
allows room for language-specific extensions
´ Ex) Coordination
The Google UD Korean Treebank
´ McDonald et al. [10] released a UD Korean Treebank of 6K sentences
´ Issues:
´ Coarse-tokenization regarding suffixes, particles, and punctuation marks
´ Outdated annotation scheme
´ Our approach:
´ perform a systematic conversion, including re-tokenization, to match the latest
guidelines
´ shown image by image
1. Morphological Analysis2. Re-Tokenization3. Head ID Remapping4. Dependency Re-Labeling
Corpus Analytics
Discussion
´ Google Korean Treebank
´ Further possibilities for errors exist
´ Ex) abundance of flat dependency relation
´ Kaist Treebank
´ Small set of phrasal POS and lack of function tags rendered dependency
inference difficult
´ Source code to be available at https://github.com/emorynlp/ud-korean.
Predicate Argument Structure
´ Predicate: describes the subject
´ Usually a verb
´ Argument: helps the predicate complete its meaning
´ ARG0: agent, ARG1: patient, ARG2: instrument, attribute, benefactive (for …)
´ Ex 1) Michael played the guitar
´ play (ARG0: Michael, ARG1: the guitar)
´ Ex 2) Sam was awake by 9 a.m.
´ be (ARG1: Sam, ARG2: awake, ARGM-TMP: by 9 a.m.)
´ awake(ARG0: Sam, ARGM-TMP: by 9 a.m.)
´ The task of assigning semantic roles to words or phrases is known as
Semantic Role Labeling.
PropBank [7]
´ Given a predicate of a sentence in the OntoNotes corpus,
´ Provides the sense ID to specify a particular meaning of the predicate
´ Lists the predicate’s arguments, along with their semantic roles
´ Ex) follow.01 : be subsequent
´ ARG0: causal agent
´ ARG1: thing following
´ ARG2: thing followed
But…
´ Hard to guarantee that a typical dependency parser will represent all
predicate argument relations annotated in PropBank in its parse tree.
´ Cannot break the properties that define a dependency tree
Deep Dependency Graph (DDG) [11]
´ Retains two of the four properties:
1. Unique Root
2. Connected
´ Seeks to abstract away from syntactic idiosyncrasies and produce a same
dependency graph (not a tree) for phrases/sentences with similar
meaning.
´ DDG can represent complete predicate argument structures
Abstract Meaning Representation
(AMR) [8]
´ Represents meaning in a rooted, directed
and labeled graph
´ Variables easily handle intra-sentence co-
reference
´ Inherits the PropBank semantic roles (arg0,
arg1, etc)
´ Ex) “The professor likes to drink coffee.”
´ Note, “The” and “to” is omitted in the AMR
for their lack of semantic contribution.
AMR Parsing
´ Transition-based Dependency Tree to AMR Mapping [9]
´ Exploits the head-child dependency in both representations
´ Two step algorithm:
1. Dependency parser is run to obtain dependency tree of the source text
2. Transition-based framework transforms the input dependency tree into an AMR
´ Adding linguistic features such as named entities as an input to the
mapping framework obtains better results
Hypothesis
´ Premise
´ AMR inherits the core semantic roles from PropBank
´ DDG can produce dependency graphs with complete predicate-argument structure
´ Preliminary Step
´ Insert PropBank labels in place of dependency relations between a predicate and its
arguments into OntoNotes
´ Hypothesis
´ Training a dependency parser on thus modified treebank will partially teach it how to
do semantic role labeling
´ The trained model can then be trained on AMR parsing task
Insertion of PropBank Labels into
OntoNotes
´ Straight forward in a general case
´ For each predicate in the OntoNotes sentence,
1. invoke the corresponding PropBank entry
2. identify the DDG dependency between the predicate and each of its
arguments
3. replace the dependency relation with PropBank labels
Example
(TOP (S (CC And)
(NP-SBJ (NN ad)
(NNS agencies))
(VP (VBP insist)
(SBAR (IN that)
(S (NP-SBJ (PRP they))
(VP (VBP do)
(VP (-NONE- *?*))))))
(. .)))
nw/wsj/17/wsj_1705.parse 25 3 gold insist insist.01 ----- 1:1-ARG0 3:0-rel 4:1-ARG1
arg0
arg1
node index height
nw/wsj/17/wsj_1705.parse 25 6 gold do do.01 ----- 6:0-rel
Label Distribution
Labels Top 1 Top 2 Top 3 no-match %
ARG0
nsubj no-match r-nsubj
12.5 %
159,474 25,721 8,472
ARG1
obj nsubj no-match
16.2 %
120,130 66,403 51,553
ARG2
no-match ppmod obj
48.0 %
47,000 23,428 7,507
ARG3
ppmod no-match obj
13.2 %
3,897 914 563
ARG4
ppmod adv no-match
3.4 %
4,037 747 182
ARG5
Total %
19.8 %
Contributions
1. Systematic updates to the Google UD Korean Treebank to match the latest
UD annotation guidelines
2. Constituent-to-dependency conversion of the phrase structure trees in the
Penn Korean Treebank and the Kaist Treebank
3. Analysis of the three converted Korean dependency treebanks
4. Construction of new corpus by replacing dependencies that represent
predicate argument structure in OntoNotes with PropBank labels
5. Analysis of mismatch cases between PropBank and DDG
References
´ [1] Choi, J. D.; and Palmer, M., Guidelines for the Clear Style Constituent to
Dependency Conversion,Technical Report 01-12, University of Colorado Boulder,
2012.
´ [2] Jurafsky, D.; Martin, J. H., Speech and Language Processing: Dependency
Parsing, Ch. 14 pg. 5
´ [3] Marcus, M. et al, The Penn Treebank: Annotating Predicate Argument
Structure, In Proceedings of the Workshop on Human Language Technology,
HLT ‘94, Association for Computational Linguistics, pp.114-119
´ [4] Weischedel, R. et al, Ontonotes: A Large Training Corpus for Enhanced
Processing
´ [5] Han, C. et al, Development and Evaluation of a Korean Treebank and Its
Application to NLP, In Proceedings to the Third International Conference on
Langauge Resources and Evaluation, LREC 2002, May 29-31, 2002
´ [6] Nivre, Joakim; Bosco, Cristina; Choi, Jinho; et al., 2015, Universal
Dependencies 1.0
References
´ [7] Palmer, M. et al, The Proposition Bank: An annotated corpus of semantic
roles, Computational Linguistics 31, 1 (2005), 71-106.
´ [8] Banarescu, L. et al, Abstract Meaning Representation for Sembanking,
2013.
´ [9] Wang, C. et al, A Transition-Based Algorithm for AMR Parsing, 2015
´ [10] Mcdonald, R. et al, Universal dependency annotation for multilingual
parsing, 2013
´ [11] Choi, . D., Deep Dependency Graph Conversion in
English, In Proceedings of the 15th International Workshop on Treebanks
and Linguistic Theories, of TLT'17, pages 35--62, Bloomington, IN, 2017.

Weitere ähnliche Inhalte

Was ist angesagt?

Constructive Hybrid Logics
Constructive Hybrid LogicsConstructive Hybrid Logics
Constructive Hybrid Logics
Valeria de Paiva
 
Constructive Description Logics 2006
Constructive Description Logics 2006Constructive Description Logics 2006
Constructive Description Logics 2006
Valeria de Paiva
 

Was ist angesagt? (19)

Constructive Hybrid Logics
Constructive Hybrid LogicsConstructive Hybrid Logics
Constructive Hybrid Logics
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Lfg and gpsg
Lfg and gpsgLfg and gpsg
Lfg and gpsg
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
 
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionLDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
 
Substitutability
SubstitutabilitySubstitutability
Substitutability
 
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Formal languages
Formal languagesFormal languages
Formal languages
 
Constructive Description Logics 2006
Constructive Description Logics 2006Constructive Description Logics 2006
Constructive Description Logics 2006
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
 

Ähnlich wie Dependency Analysis of Abstract Universal Structures in Korean and English

Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Edmond Lepedus
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
Shashank Shisodia
 
referát.doc
referát.docreferát.doc
referát.doc
butest
 
Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329
Editor IJARCET
 

Ähnlich wie Dependency Analysis of Abstract Universal Structures in Korean and English (20)

Sanskrit parser Project Report
Sanskrit parser Project ReportSanskrit parser Project Report
Sanskrit parser Project Report
 
Performance Grammar
Performance GrammarPerformance Grammar
Performance Grammar
 
LFG and GPSG.pptx
LFG and GPSG.pptxLFG and GPSG.pptx
LFG and GPSG.pptx
 
first_seminar
first_seminarfirst_seminar
first_seminar
 
Collin F. Baker - 2017 - Graph Methods for Multilingual FrameNets
Collin F. Baker - 2017 - Graph Methods for Multilingual FrameNetsCollin F. Baker - 2017 - Graph Methods for Multilingual FrameNets
Collin F. Baker - 2017 - Graph Methods for Multilingual FrameNets
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
p138-jiang
p138-jiangp138-jiang
p138-jiang
 
unit -3 part 1.ppt
unit -3 part 1.pptunit -3 part 1.ppt
unit -3 part 1.ppt
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
referát.doc
referát.docreferát.doc
referát.doc
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
bồn tắm jacuzzi.docx
bồn tắm jacuzzi.docxbồn tắm jacuzzi.docx
bồn tắm jacuzzi.docx
 
A Proposition Bank of Urdu
A Proposition Bank of UrduA Proposition Bank of Urdu
A Proposition Bank of Urdu
 
ijcai05_srl
ijcai05_srlijcai05_srl
ijcai05_srl
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329
 
FinalDraftRevisisions
FinalDraftRevisisionsFinalDraftRevisisions
FinalDraftRevisisions
 

Mehr von Jinho Choi

Mehr von Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Dependency Analysis of Abstract Universal Structures in Korean and English

  • 1. Dependency Analysis of Abstract Universal Structures in Korean and English Jayeol Chun
  • 2. Contents 1. Thesis Road Map 2. Background Part 1: [Constituency & Dependency Grammar] 3. Constituent-to-Dependency Conversion 4. Universal Dependency Treebanks in Korean 5. Background Part 2: [Predicate-Argument Structure & AMR] 6. PropBank-Augmented OntoNotes Corpus 7. Contributions
  • 3. ParsingSyntactic Parsing Semantic Parsing Constituency Dependency Semantic Role Labeling AMR Korean AMR ..? done in progress.. PropBank
  • 4. Constituency (Phrase Structure) ´ Constituent: a word or a phrase that acts like a single grammatical unit ´ Root ´ Terminals ´ Non-Terminals
  • 5. Dependency ´ Dependency: A directed arc that establishes a head-child relation between two nodes ´ Dependency label describes the child’s role in relation to the head ´ Can represent languages with flexible word order
  • 6. Well-Formed Dependency Graphs head child dep 1. Unique Root 2. Single Head 3. Connected 4. Acyclic 5. Projective Jurafsky D.; Martin, J. H., Speech and Language Processing: Dependency Parsing, Ch. 14 pg. 5
  • 7. Korean ´ One of Morphologically Rich Languages ´ Morphology: study of how words are formed ´ Morphological Analysis: kamsahamnida (thank) -> kamsa (thank) + ham (verbalize) + nida (ending marker) ´ Several large constituency treebanks ´ Q: What about dependency? ´ Relatively free word order ´ Morphemes provide syntactic function as well as meaning of words ´ Lack of large publicly available dependency corpora
  • 9. Approach ´ Leverage the large annotated constituency treebanks ´ Convert the constituent trees into dependency trees!
  • 10. Constituent-to-Dependency Conversion [1] 0. Redirect Dependencies for Empty Categories (if they exist) 1. Establish Head-Child Dependency relations using Head-Percolation Rules 2. Infer Dependency Labels using Linguistic Heuristics
  • 11. Empty Categories ´ Characteristic of treebanks annotated in Penn Treebank [3] style ´ OntoNotes [4], Penn Korean Treebank [5] ´ Nominal units that indicate the location of their antecedent syntactic elements ´ Enables to represent long-term dependencies ´ often breaks the projectivity property
  • 12. Types of Empty Categories in the Penn Korean Treebank 1. Trace (*T*): Argument that precedes its subject leaves in its place a trace, a pointer to the index of the antecedent in the tree ´ Trace Mapping 2. Ellipsis (*?*): Dropped predicate in a matrix clause or a clausal coordination ´ Heuristics to identify the location of the shared predicate 3. Empty Assignment (*pro*): Dropped arguments 4. Empty Operator (*op*): Relative Clauses After Wh-Movement *?*
  • 13. (S (ADCP (ADC 반면[Meanwhile])) (S (NP-SBJ (NPR+NNX+PAU 삼성+측+은[Samsung])) (VP (NP-OBJ (NNC+PCA 논평+을[to comment])) (VV (NNC+XSV+EPF+EFN 거부+하+었+다[refused])))) (SFN .)) Head-Percolation Rules ´ For every node in the tree, locates the head by iterating through its immediate children and matching the POS in the order delimited by ; ´ r: Iterate from right to left (Korean is a head-final language) ´ Terminal node’s head is itself
  • 14. Dependency Label Inference ´ Linguistic heuristics: ´ Morphological analysis of the head and the dependent ´ POS ´ Word ´ Function tags ´ Function Tags ´ Annotated in the Penn Treebank style treebanks ´ Provides additional syntactic / semantic information ´ Ex) NP-SBJ -> The NP (Noun Phrase) is the subject of a clause or a sentence
  • 15. Universal Dependencies [6] ´ Effort to create a consistent annotation scheme for multiple languages ´ Encourage multi-lingual parsing experiments and comparative analysis ´ Defines a POS and dependency label tagset ´ Suggests a universal way of annotating certain sentence constructions, but allows room for language-specific extensions ´ Ex) Coordination
  • 16. The Google UD Korean Treebank ´ McDonald et al. [10] released a UD Korean Treebank of 6K sentences ´ Issues: ´ Coarse-tokenization regarding suffixes, particles, and punctuation marks ´ Outdated annotation scheme ´ Our approach: ´ perform a systematic conversion, including re-tokenization, to match the latest guidelines ´ shown image by image
  • 17. 1. Morphological Analysis2. Re-Tokenization3. Head ID Remapping4. Dependency Re-Labeling
  • 19. Discussion ´ Google Korean Treebank ´ Further possibilities for errors exist ´ Ex) abundance of flat dependency relation ´ Kaist Treebank ´ Small set of phrasal POS and lack of function tags rendered dependency inference difficult ´ Source code to be available at https://github.com/emorynlp/ud-korean.
  • 20. Predicate Argument Structure ´ Predicate: describes the subject ´ Usually a verb ´ Argument: helps the predicate complete its meaning ´ ARG0: agent, ARG1: patient, ARG2: instrument, attribute, benefactive (for …) ´ Ex 1) Michael played the guitar ´ play (ARG0: Michael, ARG1: the guitar) ´ Ex 2) Sam was awake by 9 a.m. ´ be (ARG1: Sam, ARG2: awake, ARGM-TMP: by 9 a.m.) ´ awake(ARG0: Sam, ARGM-TMP: by 9 a.m.) ´ The task of assigning semantic roles to words or phrases is known as Semantic Role Labeling.
  • 21. PropBank [7] ´ Given a predicate of a sentence in the OntoNotes corpus, ´ Provides the sense ID to specify a particular meaning of the predicate ´ Lists the predicate’s arguments, along with their semantic roles ´ Ex) follow.01 : be subsequent ´ ARG0: causal agent ´ ARG1: thing following ´ ARG2: thing followed
  • 22. But… ´ Hard to guarantee that a typical dependency parser will represent all predicate argument relations annotated in PropBank in its parse tree. ´ Cannot break the properties that define a dependency tree
  • 23. Deep Dependency Graph (DDG) [11] ´ Retains two of the four properties: 1. Unique Root 2. Connected ´ Seeks to abstract away from syntactic idiosyncrasies and produce a same dependency graph (not a tree) for phrases/sentences with similar meaning. ´ DDG can represent complete predicate argument structures
  • 24. Abstract Meaning Representation (AMR) [8] ´ Represents meaning in a rooted, directed and labeled graph ´ Variables easily handle intra-sentence co- reference ´ Inherits the PropBank semantic roles (arg0, arg1, etc) ´ Ex) “The professor likes to drink coffee.” ´ Note, “The” and “to” is omitted in the AMR for their lack of semantic contribution.
  • 25. AMR Parsing ´ Transition-based Dependency Tree to AMR Mapping [9] ´ Exploits the head-child dependency in both representations ´ Two step algorithm: 1. Dependency parser is run to obtain dependency tree of the source text 2. Transition-based framework transforms the input dependency tree into an AMR ´ Adding linguistic features such as named entities as an input to the mapping framework obtains better results
  • 26. Hypothesis ´ Premise ´ AMR inherits the core semantic roles from PropBank ´ DDG can produce dependency graphs with complete predicate-argument structure ´ Preliminary Step ´ Insert PropBank labels in place of dependency relations between a predicate and its arguments into OntoNotes ´ Hypothesis ´ Training a dependency parser on thus modified treebank will partially teach it how to do semantic role labeling ´ The trained model can then be trained on AMR parsing task
  • 27. Insertion of PropBank Labels into OntoNotes ´ Straight forward in a general case ´ For each predicate in the OntoNotes sentence, 1. invoke the corresponding PropBank entry 2. identify the DDG dependency between the predicate and each of its arguments 3. replace the dependency relation with PropBank labels
  • 28. Example (TOP (S (CC And) (NP-SBJ (NN ad) (NNS agencies)) (VP (VBP insist) (SBAR (IN that) (S (NP-SBJ (PRP they)) (VP (VBP do) (VP (-NONE- *?*)))))) (. .))) nw/wsj/17/wsj_1705.parse 25 3 gold insist insist.01 ----- 1:1-ARG0 3:0-rel 4:1-ARG1 arg0 arg1 node index height nw/wsj/17/wsj_1705.parse 25 6 gold do do.01 ----- 6:0-rel
  • 29. Label Distribution Labels Top 1 Top 2 Top 3 no-match % ARG0 nsubj no-match r-nsubj 12.5 % 159,474 25,721 8,472 ARG1 obj nsubj no-match 16.2 % 120,130 66,403 51,553 ARG2 no-match ppmod obj 48.0 % 47,000 23,428 7,507 ARG3 ppmod no-match obj 13.2 % 3,897 914 563 ARG4 ppmod adv no-match 3.4 % 4,037 747 182 ARG5 Total % 19.8 %
  • 30. Contributions 1. Systematic updates to the Google UD Korean Treebank to match the latest UD annotation guidelines 2. Constituent-to-dependency conversion of the phrase structure trees in the Penn Korean Treebank and the Kaist Treebank 3. Analysis of the three converted Korean dependency treebanks 4. Construction of new corpus by replacing dependencies that represent predicate argument structure in OntoNotes with PropBank labels 5. Analysis of mismatch cases between PropBank and DDG
  • 31. References ´ [1] Choi, J. D.; and Palmer, M., Guidelines for the Clear Style Constituent to Dependency Conversion,Technical Report 01-12, University of Colorado Boulder, 2012. ´ [2] Jurafsky, D.; Martin, J. H., Speech and Language Processing: Dependency Parsing, Ch. 14 pg. 5 ´ [3] Marcus, M. et al, The Penn Treebank: Annotating Predicate Argument Structure, In Proceedings of the Workshop on Human Language Technology, HLT ‘94, Association for Computational Linguistics, pp.114-119 ´ [4] Weischedel, R. et al, Ontonotes: A Large Training Corpus for Enhanced Processing ´ [5] Han, C. et al, Development and Evaluation of a Korean Treebank and Its Application to NLP, In Proceedings to the Third International Conference on Langauge Resources and Evaluation, LREC 2002, May 29-31, 2002 ´ [6] Nivre, Joakim; Bosco, Cristina; Choi, Jinho; et al., 2015, Universal Dependencies 1.0
  • 32. References ´ [7] Palmer, M. et al, The Proposition Bank: An annotated corpus of semantic roles, Computational Linguistics 31, 1 (2005), 71-106. ´ [8] Banarescu, L. et al, Abstract Meaning Representation for Sembanking, 2013. ´ [9] Wang, C. et al, A Transition-Based Algorithm for AMR Parsing, 2015 ´ [10] Mcdonald, R. et al, Universal dependency annotation for multilingual parsing, 2013 ´ [11] Choi, . D., Deep Dependency Graph Conversion in English, In Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories, of TLT'17, pages 35--62, Bloomington, IN, 2017.