Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Transition-based Semantic Role Labeling
Using Predicate Argument Clustering

Workshop on Relational Models of Semantics

Jinho D. Choi & Martha Palmer
University of Colorado at Boulder
June 23rd, 2011

Thursday, June 23, 2011

Dependency-based SRL
• Semantic role labeling
- Task of identifying arguments of each predicate and labeling
them with semantic roles in relation to the predicate.

• Dependency-based semantic role labeling
- Advantages over constituent-based semantic role labeling.
• Dependency parsing is faster (2.29 milliseconds / sentence).

• Dependency structure is more similar to predicate argument
structure.

- Labels headwords instead of phrases.
• Still can recover the original semantic chunks for the most of time
(Choi and Palmer, LAW 2010).

2

• Constituent-based vs. dependency-based SRL
Agent S Theme Instrument Temporal
NP VP
NP PP PP
NP NP
He opened the door with his foot at ten

He door with at
the foot ten
his

3

• Constituent-based vs. dependency-based SRL
opened

SBJ OBJ ADV TMP

He door with at
the foot ten
his
opened

ARG0 ARG1 ARG2 TMP

He the door with his foot at ten

4

Motivations
• Do argument identiﬁcation and classiﬁcation need to be
in separate steps?
- They may require two different feature sets.

- Training them in a pipeline takes less time than as a joint-
inference task.

- We have seen advantages of dealing with them as a joint-
inference task in dependency parsing, why not in SRL?

5

Transition-based SRL
• Dependency parsing vs. dependency-based SRL
- Both try to ﬁnd relations between word pairs.

- Dep-based SRL is a special kind of dep. parsing.
• It restricts the search only to top-down relations between
predicate (head) and argument (dependent) pairs.

• It allows multiple predicates for each argument.

• Transition-based SRL algorithm
- Top-down, bidirectional search. → More suitable for SRL

- Easier to develop a joint-inference system between
dependency parsing and semantic role labeling.

6

• Parsing states

- (λ1, λ2, p, λ3, λ4, A)

- p - index of the current predicate candidate.

- λ1 - indices of lefthand-side argument candidates.

- λ4 - indices of righthand-side argument candidates.

- λ2,3 - indices of processed tokens.

- A - labeled arcs with semantic roles

• Initialization: ([ ], [ ], 1, [ ], [2, ..., n], ∅)

• Termination: (λ1, λ2, ∄, [ ], [ ], A)

7

• Transitions

- No-Pred - ﬁnds the next predicate candidate.

- No-Arc← - rejects the lefthand-side argument candidate.

- No-Arc→ - rejects the righthand-side argument candidate.

- Left-Arc← - accepts the lefthand-side argument candidate.

- Right-Arc→ - accepts the righthand-side argument candidate.

8

A0 A1

John1 wants2 to3 buy4 a5 car6
A0 A1

John
wants wants
to car to buy → car
to John buy a buy John ← buy
wants wants car
buy a wants → to
John John
to to
a car John ← wants

λ1 λ2 λ3 λ4 A
• No-Pred • No-Pred
• Left-Arc : John ← wants • No-Arc x 2
• Right-Arc : wants → to • Left-Arc : John ← buy
• No-Arc x 3 • No-Arc
• Shift • Right-Arc : buy → car
9

Features
• Baseline features
- N-gram and binary features
(similar to ones in Johansson and Nugues, EMNLP 2008).

- Structural features.
wants Subcategorization of “wants”
SBJ OPRD

PRP:John TO:to SBJ ← V → OPRD
IM
VB:buy Path from “John” to “buy”
PRP ↑ LCA ↓ TO ↓ VB
Depth from “John” to “buy”
SBJ ↑ LCA ↓ OPRD ↓ IM
1 ↑ LCA ↓ 2

10

Features
• Dynamic features
- Derived from previously identiﬁed arguments.

- Previously identiﬁed argument label of warg.

A0 A1

John1 wants2 to3 buy4 a5 car6
A0 A1

- Label of the very last predicted numbered argument of wpred.

- These features can narrow down the scope of expected
arguments of wpred.

11

Experiments
• Corpora
- CoNLL’09 English data.

- In-domain task: the Wall Street Journal.

- Out-of-domain task: the Brown corpus.

• Input to our semantic role labeler
- Automatically generated dependency trees.

- Used our open-source dependency parser, ClearParser.

• Machine learning algorithm
- Liblinear L2-L1 SVM.

12

Experiments
• Results
- AI - Argument Identiﬁcation.

- AC - Argument Classiﬁcation.

In-domain Out-of-domain
Task
P R F1 P R F1
AI 92.57 88.44 90.46 90.96 81.57 86.01
Baseline
AI+AC 87.20 83.31 85.21 77.11 69.14 72.91
+ AI 92.38 88.76 90.54 90.90 82.25 86.36
Dynamic AI+AC 87.33 83.91 85.59 77.41 70.05 73.55
JN’08 AI+AC 88.46 83.55 85.93 77.67 69.63 73.43

13

Summary
• Introduced a transition-based SRL algorithm, showing
near state-of-the-art results.
- No need to design separate systems for argument
identiﬁcation and classiﬁcation.

- Make it easier to develop a joint-inference system between
dependency parsing and semantic role labeling.

• Future work
- Several techniques, designed to improve transition-based
parsing, can be applied (e.g., dynamic programming, k-best
ranking)

- We can apply more features, such as clustering information,
to improve labeling accuracy.

14

Predicate Argument Clustering
• Verb clusters can give more generalization to the
statistical models.
- Clustering verbs using bag-of-words, syntactic structure.

- Clustering verbs using predicate argument structure.

• Self-learning clustering
- Cluster verbs in the test data using automatically generated
predicate argument structures.

- Cluster verbs in the training data using the verb clusters
found in the test data as seeds.

- Re-run our semantic role labeler on the test data using the
clustering information.

15

Figure 2: Projecting the predicate argument structure of
each verb into vector space.
Figure 2: Projecting the predicate argument structure of
rma- higherverb into vector space.more important than the
each confidence, or are
e la-
orma-
t al.,
• others; e.g., ARG0 andorARG1moregenerally predicted
Vector confidence, are are important than the
higher
representation
with higher confidence than modifiers, nouns give
ole la-
ually
et al.,
- others; e.g., ARG0 and semantic role labels predictedlemmas.
Semantic role labels, ARG1 are generally + word
more important information than some other gram-
with higher confidence than modifiers, nouns give
g, for maticalVerb A0 A1 ... john:A0 assign each ex-
categories, etc. Instead, we to:A1 car:A1 ...
sually more important information than some other gram-
seful isting feature with a value computed by 1 follow-
want 1 the
1 etc. Instead, we assign each0ex- 0s
0s 1
ng, for matical categories,
g-of- ing equations: 1 1 0s
useful isting buy 1 0
feature with a value computed by the follow- 0s 1
how-
ag-of- ing equations: 1
redi-
how-
cally
- s(lj |vi ) =
1 + exp(−score(lj |vi ))
predi- 1
s(lj |vi ) =
s by
tically 11 scorejof lj noun) jlabel of vi
(w = being a
+ exp(−score(l |vi ))
mized s(mj , lj ) = count(mj ,lj ) )
rbs by exp( (w count(mk ,lk )
1 ∀k = noun)
mized - s(mj , lj ) =
exp(
j
count(mj ,lj ) )
clus- ∀k count(mk ,lk )
vi is the current verb, ljlikelihood of m co-occurring with l
is the j’th label of vi , and
eling max. j j
clus- mj is lj ’s corresponding lemma. score(lj |vi ) is a
m se- vi is the current verb, lj is the j’th label of vi , and
beling score of lj being a correct argument label of vi ; this
algo- mj is lj ’s corresponding lemma. score(lj |vi ) is a
16
rm se- is always 1 for training data and is provided by our
e test score of l being a correct argument label of v ; this

• Clustering verbs in the test data
- K-best hierarchical agglomerative clustering.
• Merges k-best pairs at each iteration.

• Uses a threshold to dynamically determine the top k clusters.

- We set another threshold for early break-out.

• Clustering verbs in the training data
- K-means clustering.
• Starts with centroids estimated from the clusters found in the test
data.

• Uses a threshold to ﬁlter out verbs not close enough to any
cluster.

17

Experiments
• Results

In-domain Out-of-domain
Task
P R F1 P R F1
AI 92.57 88.44 90.46 90.96 81.57 86.01
Baseline
AI+AC 87.20 83.31 85.21 77.11 69.14 72.91
+ AI 92.38 88.76 90.54 90.90 82.25 86.36
Dynamic AI+AC 87.33 83.91 85.59 77.41 70.05 73.55
+ AI 92.62 88.90 90.72 90.87 82.43 86.44
Cluster AI+AC 87.43 83.92 85.64 77.47 70.28 73.70
JN’08 AI+AC 88.46 83.55 85.93 77.67 69.63 73.43

18

Conclusion
• Introduced self-learning clustering technique, potential
for improving labeling accuracy in the new domain.
- Need to try with large scale data to see a clear impact of the
clustering.

- Can also be improved by using different features or
clustering algorithms.

• ClearParser open-source project
- http://code.google.com/p/clearparser/

19

Acknowledgements
• We gratefully acknowledge the support of the National
Science Foundation Grants CISE-IIS- RI-0910992, Richer
Representations for Machine Translation, a subcontract
from the Mayo Clinic and Harvard Children’s Hospital
based on a grant from the ONC, 90TR0002/01, Strategic
Health Advanced Research Project Area 4: Natural
Language Processing, and a grant from the Defense
Advanced Research Projects Agency (DARPA/IPTO)
under the GALE program, DARPA/CMO Contract No.
HR0011-06-C-0022, subcontract from BBN, Inc. Any
opinions, ﬁndings, and conclusions or recommendations
expressed in this material are those of the authors and
do not necessarily reﬂect the views of the National
Science Foundation.

20

Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

More from Jinho Choi

More from Jinho Choi (20)

Recently uploaded

Recently uploaded (20)

Transition-based Semantic Role Labeling Using Predicate Argument Clustering