This paper suggests two ways of improving semantic role labeling (SRL). First, we introduce a novel transition-based SRL algorithm that gives a quite different approach to SRL. Our algorithm is inspired by shift-reduce parsing and brings the advantages of the transition-based approach to SRL. Second, we present a self-learning clustering technique that effectively improves labeling accuracy in the test domain. For better generalization of the statistical models, we cluster verb predicates by comparing their predicate argument structures and apply the clustering information to the final labeling decisions. All approaches are evaluated on the CoNLL’09 English data. The new algorithm shows comparable results to another state-of-the-art system. The clustering technique improves labeling accuracy for both in-domain and out-of-domain tasks.
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transition-based Semantic Role Labeling Using Predicate Argument Clustering
1. Transition-based Semantic Role Labeling
Using Predicate Argument Clustering
Workshop on Relational Models of Semantics
Jinho D. Choi & Martha Palmer
University of Colorado at Boulder
June 23rd, 2011
Thursday, June 23, 2011
2. Dependency-based SRL
• Semantic role labeling
- Task of identifying arguments of each predicate and labeling
them with semantic roles in relation to the predicate.
• Dependency-based semantic role labeling
- Advantages over constituent-based semantic role labeling.
• Dependency parsing is faster (2.29 milliseconds / sentence).
• Dependency structure is more similar to predicate argument
structure.
- Labels headwords instead of phrases.
• Still can recover the original semantic chunks for the most of time
(Choi and Palmer, LAW 2010).
2
Thursday, June 23, 2011
3. Dependency-based SRL
• Constituent-based vs. dependency-based SRL
Agent S Theme Instrument Temporal
NP VP
NP PP PP
NP NP
He opened the door with his foot at ten
He door with at
the foot ten
his
3
Thursday, June 23, 2011
4. Dependency-based SRL
• Constituent-based vs. dependency-based SRL
opened
SBJ OBJ ADV TMP
He door with at
the foot ten
his
opened
ARG0 ARG1 ARG2 TMP
He the door with his foot at ten
4
Thursday, June 23, 2011
5. Motivations
• Do argument identification and classification need to be
in separate steps?
- They may require two different feature sets.
- Training them in a pipeline takes less time than as a joint-
inference task.
- We have seen advantages of dealing with them as a joint-
inference task in dependency parsing, why not in SRL?
5
Thursday, June 23, 2011
6. Transition-based SRL
• Dependency parsing vs. dependency-based SRL
- Both try to find relations between word pairs.
- Dep-based SRL is a special kind of dep. parsing.
• It restricts the search only to top-down relations between
predicate (head) and argument (dependent) pairs.
• It allows multiple predicates for each argument.
• Transition-based SRL algorithm
- Top-down, bidirectional search. → More suitable for SRL
- Easier to develop a joint-inference system between
dependency parsing and semantic role labeling.
6
Thursday, June 23, 2011
7. Transition-based SRL
• Parsing states
- (λ1, λ2, p, λ3, λ4, A)
- p - index of the current predicate candidate.
- λ1 - indices of lefthand-side argument candidates.
- λ4 - indices of righthand-side argument candidates.
- λ2,3 - indices of processed tokens.
- A - labeled arcs with semantic roles
• Initialization: ([ ], [ ], 1, [ ], [2, ..., n], ∅)
• Termination: (λ1, λ2, ∄, [ ], [ ], A)
7
Thursday, June 23, 2011
8. Transition-based SRL
• Transitions
- No-Pred - finds the next predicate candidate.
- No-Arc← - rejects the lefthand-side argument candidate.
- No-Arc→ - rejects the righthand-side argument candidate.
- Left-Arc← - accepts the lefthand-side argument candidate.
- Right-Arc→ - accepts the righthand-side argument candidate.
8
Thursday, June 23, 2011
9. A0 A1
John1 wants2 to3 buy4 a5 car6
A0 A1
John
wants wants
to car to buy → car
to John buy a buy John ← buy
wants wants car
buy a wants → to
John John
to to
a car John ← wants
λ1 λ2 λ3 λ4 A
• No-Pred • No-Pred
• Left-Arc : John ← wants • No-Arc x 2
• Right-Arc : wants → to • Left-Arc : John ← buy
• No-Arc x 3 • No-Arc
• Shift • Right-Arc : buy → car
9
Thursday, June 23, 2011
10. Features
• Baseline features
- N-gram and binary features
(similar to ones in Johansson and Nugues, EMNLP 2008).
- Structural features.
wants Subcategorization of “wants”
SBJ OPRD
PRP:John TO:to SBJ ← V → OPRD
IM
VB:buy Path from “John” to “buy”
PRP ↑ LCA ↓ TO ↓ VB
Depth from “John” to “buy”
SBJ ↑ LCA ↓ OPRD ↓ IM
1 ↑ LCA ↓ 2
10
Thursday, June 23, 2011
11. Features
• Dynamic features
- Derived from previously identified arguments.
- Previously identified argument label of warg.
A0 A1
John1 wants2 to3 buy4 a5 car6
A0 A1
- Label of the very last predicted numbered argument of wpred.
- These features can narrow down the scope of expected
arguments of wpred.
11
Thursday, June 23, 2011
12. Experiments
• Corpora
- CoNLL’09 English data.
- In-domain task: the Wall Street Journal.
- Out-of-domain task: the Brown corpus.
• Input to our semantic role labeler
- Automatically generated dependency trees.
- Used our open-source dependency parser, ClearParser.
• Machine learning algorithm
- Liblinear L2-L1 SVM.
12
Thursday, June 23, 2011
13. Experiments
• Results
- AI - Argument Identification.
- AC - Argument Classification.
In-domain Out-of-domain
Task
P R F1 P R F1
AI 92.57 88.44 90.46 90.96 81.57 86.01
Baseline
AI+AC 87.20 83.31 85.21 77.11 69.14 72.91
+ AI 92.38 88.76 90.54 90.90 82.25 86.36
Dynamic AI+AC 87.33 83.91 85.59 77.41 70.05 73.55
JN’08 AI+AC 88.46 83.55 85.93 77.67 69.63 73.43
13
Thursday, June 23, 2011
14. Summary
• Introduced a transition-based SRL algorithm, showing
near state-of-the-art results.
- No need to design separate systems for argument
identification and classification.
- Make it easier to develop a joint-inference system between
dependency parsing and semantic role labeling.
• Future work
- Several techniques, designed to improve transition-based
parsing, can be applied (e.g., dynamic programming, k-best
ranking)
- We can apply more features, such as clustering information,
to improve labeling accuracy.
14
Thursday, June 23, 2011
15. Predicate Argument Clustering
• Verb clusters can give more generalization to the
statistical models.
- Clustering verbs using bag-of-words, syntactic structure.
- Clustering verbs using predicate argument structure.
• Self-learning clustering
- Cluster verbs in the test data using automatically generated
predicate argument structures.
- Cluster verbs in the training data using the verb clusters
found in the test data as seeds.
- Re-run our semantic role labeler on the test data using the
clustering information.
15
Thursday, June 23, 2011
16. Figure 2: Projecting the predicate argument structure of
each verb into vector space.
Predicate Argument Clustering
Figure 2: Projecting the predicate argument structure of
rma- higherverb into vector space.more important than the
each confidence, or are
e la-
orma-
t al.,
• others; e.g., ARG0 andorARG1moregenerally predicted
Vector confidence, are are important than the
higher
representation
with higher confidence than modifiers, nouns give
ole la-
ually
et al.,
- others; e.g., ARG0 and semantic role labels predictedlemmas.
Semantic role labels, ARG1 are generally + word
more important information than some other gram-
with higher confidence than modifiers, nouns give
g, for maticalVerb A0 A1 ... john:A0 assign each ex-
categories, etc. Instead, we to:A1 car:A1 ...
sually more important information than some other gram-
seful isting feature with a value computed by 1 follow-
want 1 the
1 etc. Instead, we assign each0ex- 0s
0s 1
ng, for matical categories,
g-of- ing equations: 1 1 0s
useful isting buy 1 0
feature with a value computed by the follow- 0s 1
how-
ag-of- ing equations: 1
redi-
how-
cally
- s(lj |vi ) =
1 + exp(−score(lj |vi ))
predi- 1
s(lj |vi ) =
s by
tically 11 scorejof lj noun) jlabel of vi
(w = being a
+ exp(−score(l |vi ))
mized s(mj , lj ) = count(mj ,lj ) )
rbs by exp( (w count(mk ,lk )
1 ∀k = noun)
mized - s(mj , lj ) =
exp(
j
count(mj ,lj ) )
clus- ∀k count(mk ,lk )
vi is the current verb, ljlikelihood of m co-occurring with l
is the j’th label of vi , and
eling max. j j
clus- mj is lj ’s corresponding lemma. score(lj |vi ) is a
m se- vi is the current verb, lj is the j’th label of vi , and
beling score of lj being a correct argument label of vi ; this
algo- mj is lj ’s corresponding lemma. score(lj |vi ) is a
16
rm se- is always 1 for training data and is provided by our
e test score of l being a correct argument label of v ; this
Thursday, June 23, 2011
17. Predicate Argument Clustering
• Clustering verbs in the test data
- K-best hierarchical agglomerative clustering.
• Merges k-best pairs at each iteration.
• Uses a threshold to dynamically determine the top k clusters.
- We set another threshold for early break-out.
• Clustering verbs in the training data
- K-means clustering.
• Starts with centroids estimated from the clusters found in the test
data.
• Uses a threshold to filter out verbs not close enough to any
cluster.
17
Thursday, June 23, 2011
18. Experiments
• Results
In-domain Out-of-domain
Task
P R F1 P R F1
AI 92.57 88.44 90.46 90.96 81.57 86.01
Baseline
AI+AC 87.20 83.31 85.21 77.11 69.14 72.91
+ AI 92.38 88.76 90.54 90.90 82.25 86.36
Dynamic AI+AC 87.33 83.91 85.59 77.41 70.05 73.55
+ AI 92.62 88.90 90.72 90.87 82.43 86.44
Cluster AI+AC 87.43 83.92 85.64 77.47 70.28 73.70
JN’08 AI+AC 88.46 83.55 85.93 77.67 69.63 73.43
18
Thursday, June 23, 2011
19. Conclusion
• Introduced self-learning clustering technique, potential
for improving labeling accuracy in the new domain.
- Need to try with large scale data to see a clear impact of the
clustering.
- Can also be improved by using different features or
clustering algorithms.
• ClearParser open-source project
- http://code.google.com/p/clearparser/
19
Thursday, June 23, 2011
20. Acknowledgements
• We gratefully acknowledge the support of the National
Science Foundation Grants CISE-IIS- RI-0910992, Richer
Representations for Machine Translation, a subcontract
from the Mayo Clinic and Harvard Children’s Hospital
based on a grant from the ONC, 90TR0002/01, Strategic
Health Advanced Research Project Area 4: Natural
Language Processing, and a grant from the Defense
Advanced Research Projects Agency (DARPA/IPTO)
under the GALE program, DARPA/CMO Contract No.
HR0011-06-C-0022, subcontract from BBN, Inc. Any
opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and
do not necessarily reflect the views of the National
Science Foundation.
20
Thursday, June 23, 2011