Slides for my EMNLP2017 presentation (edited for slideshare).
Paper: http://aclweb.org/anthology/D/D17/D17-1001.pdf
Supplementary material: http://aclweb.org/anthology/attachments/D/D17/D17-1001.Attachment.zip
Abstract: We propose an efficient method to conduct phrase alignment on parse forests for paraphrase detection. Unlike previous studies, our method identifies syntactic paraphrases under linguistically motivated grammar. In addition, it allows phrases to non-compositionally align to handle paraphrases with non-homographic phrase correspondences.
A dataset that provides gold parse trees and their phrase alignments is created. The experimental results confirm that the proposed method conducts highly accurate phrase alignment compared to human performance.
How to Troubleshoot Apps for the Modern Connected Worker
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
1. Monolingual Phrase Alignment
on Parse Forests
Yuki Arase*† and Junichi Tsujii†◊
*Osaka University, Japan
†Artificial Intelligence Research Center (AIRC), AIST, Japan
◊NaCTeM, School of Computer Science, University of Manchester, UK
2. Develop an efficient method to conduct
phrase alignment on parse forests
for paraphrase detection
Relying on team spirit, expedition members defeated difficulties.
Members of the scientific team overcame challenges
through teamwork.
3. Develop an efficient method to conduct
phrase alignment on parse forests
for paraphrase detection
Members of
the scientific team
overcame
challenges
through
teamwork
Relying on
team spirit
expedition
members
defeated
difficulties
VP
NP VP
NP
VP PP
S
S
VP
S
4. Scope: Paraphrase Types
•Paraphrases by linguistic operations
•Paraphrases with simple summarization
Relying on team spirit, expedition members defeated difficulties.
Members of the scientific team overcame challenges living on Mars
through teamwork.
•Paraphrases involve inferences/entailments
Scientists overcame challenges living on Mars.
Scientists overcame water and oxygen scarcity on the red planet.
5. Non-homographic Nature of Paraphrases
Phrase correspondences in paraphrases are often
non-homographic
• Synchronous parsing of paraphrases (Weese+ 2014)
•Only 9.1% of paraphrases were reachable, even
though using SCFG extracted from paraphrase
corpora
6. Related Work
•Phrase alignment in paraphrases (MacCartney+
2008, Thadani+ 2012, Yao+ 2013)
• Phrases are simply 𝑛𝑛-grams, NOT syntactic phrases
•PPDB (Ganitkevitch+ 2013) provides syntactic
paraphrases of SCFG
• Captures only a fraction of paraphrasing
phenomenon
7. Related Work
•Parallel parsing with increased flexibility
(Burkett+ 2010)
• Allow disagreements when possible
• Alignments are restricted to conform to ITG
•Parallel parsing of paraphrases (Choe & McClosky
2015)
• Alignment quality was beyond their scope
8. Our Contributions
• Formalize the problem of identifying a legitimate set
of syntactic paraphrases under linguistically
motivated grammar
• Design computationally feasible method using
dynamic programing a la CKY
• Improve alignment quality using 𝑛𝑛-best parse forests
• Allow non-homographic correspondences of
phrases
9. Alignment Model: Overview
•Input: Sentential paraphrase pair
•Bottom-up process like CKY assuming
compositionality in alignments
• From word to phrases
•Allow phrases to be aligned to null nodes when
they do NOT have correspondences
10. Alignment Model: Basic
Lowest common
ancestor
Null-alignments
Construct phrase alignments
supported by alignments of
their descendants nodes
11. Alignment Model: Non-homographic
• Compositionality is often violated in paraphrases
• Allow non-monotonic alignments if two alignments are
compatible
20. Parameterization
• Apply the feature-enhanced EM (Berg-Kirkpatrick+
2010)
• Ability to use dependent features without an irrational
independence assumption
• 𝑃𝑃𝑟𝑟(𝜏𝜏 𝑠𝑠
, 𝜏𝜏𝑡𝑡
) is parameterized using features as in logistic
regression model:
𝑃𝑃𝑟𝑟 𝜏𝜏 𝑠𝑠
, 𝜏𝜏𝑡𝑡
=
exp(𝒘𝒘 � 𝕗𝕗(𝜏𝜏 𝑠𝑠
, 𝜏𝜏𝑡𝑡
))
∑𝜏𝜏𝑖𝑖
𝑡𝑡 exp(𝒘𝒘 � 𝕗𝕗(𝜏𝜏𝑠𝑠, 𝜏𝜏𝑖𝑖
𝑡𝑡
))
21. Features
• Semantic heads
• Surface similarity
• WordNet similarity
• Word embedding similarity
• Combination of prepositions
(Srikumar & Roth, 2013)
• Combination of syntactic categories
Syntactic category
Members
NP
of the scientific team
NP
NP
PP
Semantic head
22. Combination with Parse Probability
• Interpolate alignment and parse probabilities inspired
by parallel parsing
• Use HPSG parser Enju, whose parameters have been
tuned
1 − 𝜇𝜇 𝛼𝛼𝑖𝑖 + 𝜇𝜇𝜇𝜇 𝜏𝜏𝑖𝑖
𝑠𝑠
𝜌𝜌 𝜏𝜏𝑖𝑖
𝑡𝑡
𝜇𝜇 ∈ [0,1]: hyper-parameter for balancing
23. Evaluation Data
• Training: 41K sentential paraphrases
• From MT evaluation sets (NIST OpenMT corpora)
• Paired references of 10-30 words
• Development: 50 sentence pairs with human
annotation for hyper-parameter tuning
• Test: 151 sentence pairs with annotation
24. Gold-Standard
•Annotation on 201 sentential paraphrases:
• Gold parse trees by a linguistic expert
• Phrase alignments by 3 English translators:
agreed 77% of phrases are paraphrases
•14K phrase alignments are obtained as
gold-standard
25. Evaluation Metric
• See how gold alignments can be replicated by
automatic alignment
Recall =
𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∩ 𝔾𝔾′
𝔾𝔾 ∩ 𝔾𝔾′
• See how automatic alignments overlap with alignments
that at least an annotator aligned
Precision =
𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∪ 𝔾𝔾′
ℍ𝑎𝑎
30. Example
Whenever I go to the ground floor for a smoke, I always come face to face with them.
Whenever I go down to smoke a cigarette, I come face to face with one of them.
⋯ go to the ground floor for a smoke
NP
PP
NP
VP
PP
VP
go smoke a cigarette
NP
VP
VP
to
CP
VP
down⋯
⋯ ⋯
NP
31. Example
Whenever I go to the ground floor for a smoke, I always come face to face with them.
Whenever I go down to smoke a cigarette, I come face to face with one of them.
⋯ go to the ground floor for a smoke
NP
PP
NP
VP
PP
VP
go smoke a cigarette
NP
VP
VP
to
CP
VP
down⋯
⋯ ⋯
NP
NP
1-best tree
32. Future (On Going) Work
• Tackle paraphrases with inferences/entailments
• Common among paraphrases in practice
• Contribution to knowledge extraction
• Application to:
• Bilingual alignment on Jp-En
• Phrase embedding
• Working on release of the annotation data
33. Glance at on-going study:
Application to Jp-En alignment
…the proposed PSRF has simpler structure than that of modulated PSRF …提案PSRF は 変調型PSRF より 構造 が簡単 で ある
NP
VP
NP
VP
IP-EMB
NP
34. •Paraphrases in practice are more challenging
The no-shows were John William of Massachusetts and David
Miller of Florida.
John William and David Miller declined invitations to speak.
•Alignment on MSR paraphrase corpus
• Our method succeeds in partial alignment
• Extend it to conduct top-down alignment to handle
phrases with inferences/entailments
Glance at on-going study:
Alignment on Paraphrases with
Inferences/Entailments
35. Glance at on-going study:
Alignment on Paraphrases with
Inferences/Entailments
The no-shows
NP
S
COOD
NP
VP
were John William
of Massachusetts
David Miller
of Florida
and John William David Millerand declined invitations
to speak
NP
NP NP NP
VP
COOD
NP
S
36. Glance at on-going study:
Alignment on Paraphrases with
Inferences/Entailments
The no-shows
NP
S
COOD
NP
VP
were John William
of Massachusetts
David Miller
of Florida
and John William David Millerand declined invitations
to speak
NP
NP NP NP
VP
COOD
NP
S