Dependency Analysis of Abstract Universal Structures in Korean and English

Dependency Analysis of
Abstract Universal Structures
in Korean and English
Jayeol Chun

Contents
1. Thesis Road Map
2. Background Part 1: [Constituency & Dependency Grammar]
3. Constituent-to-Dependency Conversion
4. Universal Dependency Treebanks in Korean
5. Background Part 2: [Predicate-Argument Structure & AMR]
6. PropBank-Augmented OntoNotes Corpus
7. Contributions

ParsingSyntactic Parsing Semantic Parsing
Constituency Dependency Semantic Role Labeling AMR
Korean AMR ..?
done
in progress..
PropBank

Constituency (Phrase Structure)
´ Constituent: a word or a phrase
that acts like a single
grammatical unit
´ Root
´ Terminals
´ Non-Terminals

Dependency
´ Dependency: A directed arc that establishes a head-child relation
between two nodes
´ Dependency label describes the child’s role in relation to the head
´ Can represent languages with flexible word order

Well-Formed Dependency Graphs
head child
dep
1. Unique Root
2. Single Head
3. Connected
4. Acyclic
5. Projective
Jurafsky D.; Martin, J. H., Speech and Language Processing:
Dependency Parsing, Ch. 14 pg. 5

Korean
´ One of Morphologically Rich Languages
´ Morphology: study of how words are formed
´ Morphological Analysis: kamsahamnida (thank) -> kamsa (thank) + ham
(verbalize) + nida (ending marker)
´ Several large constituency treebanks
´ Q: What about dependency?
´ Relatively free word order
´ Morphemes provide syntactic function as well as meaning of words
´ Lack of large publicly available dependency corpora

Approach
´ Leverage the large annotated constituency treebanks
´ Convert the constituent trees into dependency trees!

Constituent-to-Dependency
Conversion [1]
0. Redirect Dependencies for Empty Categories (if they exist)
1. Establish Head-Child Dependency relations using Head-Percolation Rules
2. Infer Dependency Labels using Linguistic Heuristics

Empty Categories
´ Characteristic of treebanks annotated in Penn Treebank [3] style
´ OntoNotes [4], Penn Korean Treebank [5]
´ Nominal units that indicate the location of their antecedent syntactic
elements
´ Enables to represent long-term dependencies
´ often breaks the projectivity property

Types of Empty Categories
in the Penn Korean Treebank
1. Trace (*T*): Argument that precedes its subject leaves in its place a trace, a
pointer to the index of the antecedent in the tree
´ Trace Mapping
2. Ellipsis (*?*): Dropped predicate in a matrix clause or a clausal coordination
´ Heuristics to identify the location of the shared predicate
3. Empty Assignment (*pro*): Dropped arguments
4. Empty Operator (*op*): Relative Clauses
After Wh-Movement
*?*

(S (ADCP (ADC 반면[Meanwhile]))
(S (NP-SBJ (NPR+NNX+PAU 삼성+측+은[Samsung]))
(VP (NP-OBJ (NNC+PCA 논평+을[to comment]))
(VV (NNC+XSV+EPF+EFN 거부+하+었+다[refused]))))
(SFN .))
Head-Percolation Rules
´ For every node in the tree, locates the head by iterating through its
immediate children and matching the POS in the order delimited by ;
´ r: Iterate from right to left (Korean is a head-final language)
´ Terminal node’s head is itself

Dependency Label Inference
´ Linguistic heuristics:
´ Morphological analysis of the head and the dependent
´ POS
´ Word
´ Function tags
´ Function Tags
´ Annotated in the Penn Treebank style treebanks
´ Provides additional syntactic / semantic information
´ Ex) NP-SBJ -> The NP (Noun Phrase) is the subject of a clause or a sentence

Universal Dependencies [6]
´ Effort to create a consistent annotation scheme for multiple languages
´ Encourage multi-lingual parsing experiments and comparative analysis
´ Defines a POS and dependency label tagset
´ Suggests a universal way of annotating certain sentence constructions, but
allows room for language-specific extensions
´ Ex) Coordination

The Google UD Korean Treebank
´ McDonald et al. [10] released a UD Korean Treebank of 6K sentences
´ Issues:
´ Coarse-tokenization regarding suffixes, particles, and punctuation marks
´ Outdated annotation scheme
´ Our approach:
´ perform a systematic conversion, including re-tokenization, to match the latest
guidelines
´ shown image by image

1. Morphological Analysis2. Re-Tokenization3. Head ID Remapping4. Dependency Re-Labeling

Discussion
´ Google Korean Treebank
´ Further possibilities for errors exist
´ Ex) abundance of flat dependency relation
´ Kaist Treebank
´ Small set of phrasal POS and lack of function tags rendered dependency
inference difficult
´ Source code to be available at https://github.com/emorynlp/ud-korean.

Predicate Argument Structure
´ Predicate: describes the subject
´ Usually a verb
´ Argument: helps the predicate complete its meaning
´ ARG0: agent, ARG1: patient, ARG2: instrument, attribute, benefactive (for …)
´ Ex 1) Michael played the guitar
´ play (ARG0: Michael, ARG1: the guitar)
´ Ex 2) Sam was awake by 9 a.m.
´ be (ARG1: Sam, ARG2: awake, ARGM-TMP: by 9 a.m.)
´ awake(ARG0: Sam, ARGM-TMP: by 9 a.m.)
´ The task of assigning semantic roles to words or phrases is known as
Semantic Role Labeling.

PropBank [7]
´ Given a predicate of a sentence in the OntoNotes corpus,
´ Provides the sense ID to specify a particular meaning of the predicate
´ Lists the predicate’s arguments, along with their semantic roles
´ Ex) follow.01 : be subsequent
´ ARG0: causal agent
´ ARG1: thing following
´ ARG2: thing followed

But…
´ Hard to guarantee that a typical dependency parser will represent all
predicate argument relations annotated in PropBank in its parse tree.
´ Cannot break the properties that define a dependency tree

Deep Dependency Graph (DDG) [11]
´ Retains two of the four properties:
1. Unique Root
2. Connected
´ Seeks to abstract away from syntactic idiosyncrasies and produce a same
dependency graph (not a tree) for phrases/sentences with similar
meaning.
´ DDG can represent complete predicate argument structures

Abstract Meaning Representation
(AMR) [8]
´ Represents meaning in a rooted, directed
and labeled graph
´ Variables easily handle intra-sentence co-
reference
´ Inherits the PropBank semantic roles (arg0,
arg1, etc)
´ Ex) “The professor likes to drink coffee.”
´ Note, “The” and “to” is omitted in the AMR
for their lack of semantic contribution.

AMR Parsing
´ Transition-based Dependency Tree to AMR Mapping [9]
´ Exploits the head-child dependency in both representations
´ Two step algorithm:
1. Dependency parser is run to obtain dependency tree of the source text
2. Transition-based framework transforms the input dependency tree into an AMR
´ Adding linguistic features such as named entities as an input to the
mapping framework obtains better results

Hypothesis
´ Premise
´ AMR inherits the core semantic roles from PropBank
´ DDG can produce dependency graphs with complete predicate-argument structure
´ Preliminary Step
´ Insert PropBank labels in place of dependency relations between a predicate and its
arguments into OntoNotes
´ Hypothesis
´ Training a dependency parser on thus modified treebank will partially teach it how to
do semantic role labeling
´ The trained model can then be trained on AMR parsing task

Insertion of PropBank Labels into
OntoNotes
´ Straight forward in a general case
´ For each predicate in the OntoNotes sentence,
1. invoke the corresponding PropBank entry
2. identify the DDG dependency between the predicate and each of its
arguments
3. replace the dependency relation with PropBank labels

Example
(TOP (S (CC And)
(NP-SBJ (NN ad)
(NNS agencies))
(VP (VBP insist)
(SBAR (IN that)
(S (NP-SBJ (PRP they))
(VP (VBP do)
(VP (-NONE- *?*))))))
(. .)))
nw/wsj/17/wsj_1705.parse 25 3 gold insist insist.01 ----- 1:1-ARG0 3:0-rel 4:1-ARG1
arg0
arg1
node index height
nw/wsj/17/wsj_1705.parse 25 6 gold do do.01 ----- 6:0-rel

Label Distribution
Labels Top 1 Top 2 Top 3 no-match %
ARG0
nsubj no-match r-nsubj
12.5 %
159,474 25,721 8,472
ARG1
obj nsubj no-match
16.2 %
120,130 66,403 51,553
ARG2
no-match ppmod obj
48.0 %
47,000 23,428 7,507
ARG3
ppmod no-match obj
13.2 %
3,897 914 563
ARG4
ppmod adv no-match
3.4 %
4,037 747 182
ARG5
Total %
19.8 %

Contributions
1. Systematic updates to the Google UD Korean Treebank to match the latest
UD annotation guidelines
2. Constituent-to-dependency conversion of the phrase structure trees in the
Penn Korean Treebank and the Kaist Treebank
3. Analysis of the three converted Korean dependency treebanks
4. Construction of new corpus by replacing dependencies that represent
predicate argument structure in OntoNotes with PropBank labels
5. Analysis of mismatch cases between PropBank and DDG

References
´ [1] Choi, J. D.; and Palmer, M., Guidelines for the Clear Style Constituent to
Dependency Conversion,Technical Report 01-12, University of Colorado Boulder,
2012.
´ [2] Jurafsky, D.; Martin, J. H., Speech and Language Processing: Dependency
Parsing, Ch. 14 pg. 5
´ [3] Marcus, M. et al, The Penn Treebank: Annotating Predicate Argument
Structure, In Proceedings of the Workshop on Human Language Technology,
HLT ‘94, Association for Computational Linguistics, pp.114-119
´ [4] Weischedel, R. et al, Ontonotes: A Large Training Corpus for Enhanced
Processing
´ [5] Han, C. et al, Development and Evaluation of a Korean Treebank and Its
Application to NLP, In Proceedings to the Third International Conference on
Langauge Resources and Evaluation, LREC 2002, May 29-31, 2002
´ [6] Nivre, Joakim; Bosco, Cristina; Choi, Jinho; et al., 2015, Universal
Dependencies 1.0

References
´ [7] Palmer, M. et al, The Proposition Bank: An annotated corpus of semantic
roles, Computational Linguistics 31, 1 (2005), 71-106.
´ [8] Banarescu, L. et al, Abstract Meaning Representation for Sembanking,
2013.
´ [9] Wang, C. et al, A Transition-Based Algorithm for AMR Parsing, 2015
´ [10] Mcdonald, R. et al, Universal dependency annotation for multilingual
parsing, 2013
´ [11] Choi, . D., Deep Dependency Graph Conversion in
English, In Proceedings of the 15th International Workshop on Treebanks
and Linguistic Theories, of TLT'17, pages 35--62, Bloomington, IN, 2017.

Dependency Analysis of Abstract Universal Structures in Korean and English

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie Dependency Analysis of Abstract Universal Structures in Korean and English

Ähnlich wie Dependency Analysis of Abstract Universal Structures in Korean and English (20)

Mehr von Jinho Choi

Mehr von Jinho Choi (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Dependency Analysis of Abstract Universal Structures in Korean and English