SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Monolingual Phrase Alignment
on Parse Forests
Yuki Arase*† and Junichi Tsujii†◊
*Osaka University, Japan
†Artificial Intelligence Research Center (AIRC), AIST, Japan
◊NaCTeM, School of Computer Science, University of Manchester, UK
Develop an efficient method to conduct
phrase alignment on parse forests
for paraphrase detection
Relying on team spirit, expedition members defeated difficulties.
Members of the scientific team overcame challenges
through teamwork.
Develop an efficient method to conduct
phrase alignment on parse forests
for paraphrase detection
Members of
the scientific team
overcame
challenges
through
teamwork
Relying on
team spirit
expedition
members
defeated
difficulties
VP
NP VP
NP
VP PP
S
S
VP
S
Scope: Paraphrase Types
•Paraphrases by linguistic operations
•Paraphrases with simple summarization
Relying on team spirit, expedition members defeated difficulties.
Members of the scientific team overcame challenges living on Mars
through teamwork.
•Paraphrases involve inferences/entailments
Scientists overcame challenges living on Mars.
Scientists overcame water and oxygen scarcity on the red planet.
Non-homographic Nature of Paraphrases
Phrase correspondences in paraphrases are often
non-homographic
• Synchronous parsing of paraphrases (Weese+ 2014)
•Only 9.1% of paraphrases were reachable, even
though using SCFG extracted from paraphrase
corpora
Related Work
•Phrase alignment in paraphrases (MacCartney+
2008, Thadani+ 2012, Yao+ 2013)
• Phrases are simply 𝑛𝑛-grams, NOT syntactic phrases
•PPDB (Ganitkevitch+ 2013) provides syntactic
paraphrases of SCFG
• Captures only a fraction of paraphrasing
phenomenon
Related Work
•Parallel parsing with increased flexibility
(Burkett+ 2010)
• Allow disagreements when possible
• Alignments are restricted to conform to ITG
•Parallel parsing of paraphrases (Choe & McClosky
2015)
• Alignment quality was beyond their scope
Our Contributions
• Formalize the problem of identifying a legitimate set
of syntactic paraphrases under linguistically
motivated grammar
• Design computationally feasible method using
dynamic programing a la CKY
• Improve alignment quality using 𝑛𝑛-best parse forests
• Allow non-homographic correspondences of
phrases
Alignment Model: Overview
•Input: Sentential paraphrase pair
•Bottom-up process like CKY assuming
compositionality in alignments
• From word to phrases
•Allow phrases to be aligned to null nodes when
they do NOT have correspondences
Alignment Model: Basic
Lowest common
ancestor
Null-alignments
Construct phrase alignments
supported by alignments of
their descendants nodes
Alignment Model: Non-homographic
• Compositionality is often violated in paraphrases
• Allow non-monotonic alignments if two alignments are
compatible
Alignment Model: Non-homographic
Trace back alignment histories to check if two alignments are
compatible
Alignment Model: Non-homographic
Trace back alignment histories to check if two alignments are
compatible
Alignment Model: Non-homographic
Trace back alignment histories to check if two alignments are
compatible
Alignment Model: Non-homographic
Trace back alignment histories to check if two alignments are
compatible
Alignment Probability
𝛼𝛼𝑖𝑖 = 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛 𝑃𝑃𝑟𝑟(𝜏𝜏𝑖𝑖
𝑠𝑠
, 𝜏𝜏𝑖𝑖
𝑡𝑡
) � 𝑃𝑃𝑟𝑟(𝜏𝜏, 𝜏𝜏∅)
𝜏𝜏 𝑚𝑚
𝑠𝑠
𝜏𝜏𝑖𝑖
𝑡𝑡𝜏𝜏𝑖𝑖
𝑠𝑠
𝜏𝜏𝑛𝑛
𝑠𝑠
𝜏𝜏 𝑚𝑚
𝑡𝑡
𝜏𝜏𝑛𝑛
𝑡𝑡
𝛼𝛼𝑖𝑖
𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛
Alignment Probability
𝛼𝛼𝑖𝑖 = 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛 𝑃𝑃𝑟𝑟(𝜏𝜏𝑖𝑖
𝑠𝑠
, 𝜏𝜏𝑖𝑖
𝑡𝑡
) � 𝑃𝑃𝑟𝑟(𝜏𝜏, 𝜏𝜏∅)
𝜏𝜏 𝑚𝑚
𝑠𝑠
𝜏𝜏𝑖𝑖
𝑡𝑡𝜏𝜏𝑖𝑖
𝑠𝑠
𝜏𝜏𝑛𝑛
𝑠𝑠
𝜏𝜏 𝑚𝑚
𝑡𝑡
𝜏𝜏𝑛𝑛
𝑡𝑡
𝛼𝛼𝑖𝑖
𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛
Alignment Probability
𝛼𝛼𝑖𝑖 = 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛 𝑃𝑃𝑟𝑟(𝜏𝜏𝑖𝑖
𝑠𝑠
, 𝜏𝜏𝑖𝑖
𝑡𝑡
) � 𝑃𝑃𝑟𝑟(𝜏𝜏, 𝜏𝜏∅)
𝜏𝜏 𝑚𝑚
𝑠𝑠
𝜏𝜏𝑖𝑖
𝑡𝑡𝜏𝜏𝑖𝑖
𝑠𝑠
𝜏𝜏𝑛𝑛
𝑠𝑠
𝜏𝜏 𝑚𝑚
𝑡𝑡
𝜏𝜏𝑛𝑛
𝑡𝑡
𝛼𝛼𝑖𝑖
𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛
Alignment Probability
𝛼𝛼𝑖𝑖 = 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛 𝑃𝑃𝑟𝑟(𝜏𝜏𝑖𝑖
𝑠𝑠
, 𝜏𝜏𝑖𝑖
𝑡𝑡
) � 𝑃𝑃𝑟𝑟(𝜏𝜏, 𝜏𝜏∅)
𝜏𝜏 𝑚𝑚
𝑠𝑠
𝜏𝜏𝑖𝑖
𝑡𝑡𝜏𝜏𝑖𝑖
𝑠𝑠
𝜏𝜏𝑛𝑛
𝑠𝑠
𝜏𝜏 𝑚𝑚
𝑡𝑡
𝜏𝜏𝑛𝑛
𝑡𝑡
𝛼𝛼𝑖𝑖
𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛
Parameterization
• Apply the feature-enhanced EM (Berg-Kirkpatrick+
2010)
• Ability to use dependent features without an irrational
independence assumption
• 𝑃𝑃𝑟𝑟(𝜏𝜏 𝑠𝑠
, 𝜏𝜏𝑡𝑡
) is parameterized using features as in logistic
regression model:
𝑃𝑃𝑟𝑟 𝜏𝜏 𝑠𝑠
, 𝜏𝜏𝑡𝑡
=
exp(𝒘𝒘 � 𝕗𝕗(𝜏𝜏 𝑠𝑠
, 𝜏𝜏𝑡𝑡
))
∑𝜏𝜏𝑖𝑖
𝑡𝑡 exp(𝒘𝒘 � 𝕗𝕗(𝜏𝜏𝑠𝑠, 𝜏𝜏𝑖𝑖
𝑡𝑡
))
Features
• Semantic heads
• Surface similarity
• WordNet similarity
• Word embedding similarity
• Combination of prepositions
(Srikumar & Roth, 2013)
• Combination of syntactic categories
Syntactic category
Members
NP
of the scientific team
NP
NP
PP
Semantic head
Combination with Parse Probability
• Interpolate alignment and parse probabilities inspired
by parallel parsing
• Use HPSG parser Enju, whose parameters have been
tuned
1 − 𝜇𝜇 𝛼𝛼𝑖𝑖 + 𝜇𝜇𝜇𝜇 𝜏𝜏𝑖𝑖
𝑠𝑠
𝜌𝜌 𝜏𝜏𝑖𝑖
𝑡𝑡
𝜇𝜇 ∈ [0,1]: hyper-parameter for balancing
Evaluation Data
• Training: 41K sentential paraphrases
• From MT evaluation sets (NIST OpenMT corpora)
• Paired references of 10-30 words
• Development: 50 sentence pairs with human
annotation for hyper-parameter tuning
• Test: 151 sentence pairs with annotation
Gold-Standard
•Annotation on 201 sentential paraphrases:
• Gold parse trees by a linguistic expert
• Phrase alignments by 3 English translators:
agreed 77% of phrases are paraphrases
•14K phrase alignments are obtained as
gold-standard
Evaluation Metric
• See how gold alignments can be replicated by
automatic alignment
Recall =
𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∩ 𝔾𝔾′
𝔾𝔾 ∩ 𝔾𝔾′
• See how automatic alignments overlap with alignments
that at least an annotator aligned
Precision =
𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∪ 𝔾𝔾′
ℍ𝑎𝑎
Result: Alignment Quality
70
75
80
85
90
Human Proposed Monotonic 1-best tree
Recall Precision
Regard one annotator as the test and
the other two as the gold-standard
Result: Alignment Quality
70
75
80
85
90
Human Proposed Monotonic 1-best tree
Recall Precision
92% & 89% of humans’!
Result: Alignment Quality
70
75
80
85
90
Human Proposed Monotonic 1-best tree
Recall Precision
Alignment using forests
greatly improve recall/precision
Result: Alignment Quality
70
75
80
85
90
Human Proposed Monotonic 1-best tree
Recall Precision
Effective
Non-monotonic alignment
Example
Whenever I go to the ground floor for a smoke, I always come face to face with them.
Whenever I go down to smoke a cigarette, I come face to face with one of them.
⋯ go to the ground floor for a smoke
NP
PP
NP
VP
PP
VP
go smoke a cigarette
NP
VP
VP
to
CP
VP
down⋯
⋯ ⋯
NP
Example
Whenever I go to the ground floor for a smoke, I always come face to face with them.
Whenever I go down to smoke a cigarette, I come face to face with one of them.
⋯ go to the ground floor for a smoke
NP
PP
NP
VP
PP
VP
go smoke a cigarette
NP
VP
VP
to
CP
VP
down⋯
⋯ ⋯
NP
NP
1-best tree
Future (On Going) Work
• Tackle paraphrases with inferences/entailments
• Common among paraphrases in practice
• Contribution to knowledge extraction
• Application to:
• Bilingual alignment on Jp-En
• Phrase embedding
• Working on release of the annotation data
Glance at on-going study:
Application to Jp-En alignment
…the proposed PSRF has simpler structure than that of modulated PSRF …提案PSRF は 変調型PSRF より 構造 が簡単 で ある
NP
VP
NP
VP
IP-EMB
NP
•Paraphrases in practice are more challenging
The no-shows were John William of Massachusetts and David
Miller of Florida.
John William and David Miller declined invitations to speak.
•Alignment on MSR paraphrase corpus
• Our method succeeds in partial alignment
• Extend it to conduct top-down alignment to handle
phrases with inferences/entailments
Glance at on-going study:
Alignment on Paraphrases with
Inferences/Entailments
Glance at on-going study:
Alignment on Paraphrases with
Inferences/Entailments
The no-shows
NP
S
COOD
NP
VP
were John William
of Massachusetts
David Miller
of Florida
and John William David Millerand declined invitations
to speak
NP
NP NP NP
VP
COOD
NP
S
Glance at on-going study:
Alignment on Paraphrases with
Inferences/Entailments
The no-shows
NP
S
COOD
NP
VP
were John William
of Massachusetts
David Miller
of Florida
and John William David Millerand declined invitations
to speak
NP
NP NP NP
VP
COOD
NP
S

Weitere ähnliche Inhalte

Ähnlich wie Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)

Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAbhinav Gupta
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Zachary S. Brown
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
SPADE: Evaluation Dataset for Monolingual Phrase Alignment
SPADE: Evaluation Dataset for Monolingual Phrase AlignmentSPADE: Evaluation Dataset for Monolingual Phrase Alignment
SPADE: Evaluation Dataset for Monolingual Phrase AlignmentYuki Arase
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
 
DETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENTDETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENTWarNik Chow
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2Pier Luca Lanzi
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsTomoyuki Kajiwara
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 

Ähnlich wie Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation) (20)

Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
Dependency-Based Word Embeddings
Dependency-Based Word EmbeddingsDependency-Based Word Embeddings
Dependency-Based Word Embeddings
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
SPADE: Evaluation Dataset for Monolingual Phrase Alignment
SPADE: Evaluation Dataset for Monolingual Phrase AlignmentSPADE: Evaluation Dataset for Monolingual Phrase Alignment
SPADE: Evaluation Dataset for Monolingual Phrase Alignment
 
Anaphora resolution
Anaphora resolutionAnaphora resolution
Anaphora resolution
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
 
Chapter14part2
Chapter14part2Chapter14part2
Chapter14part2
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Syntax.ppt
Syntax.pptSyntax.ppt
Syntax.ppt
 
Presentation
PresentationPresentation
Presentation
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
DETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENTDETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENT
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
Language models
Language modelsLanguage models
Language models
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)

  • 1. Monolingual Phrase Alignment on Parse Forests Yuki Arase*† and Junichi Tsujii†◊ *Osaka University, Japan †Artificial Intelligence Research Center (AIRC), AIST, Japan ◊NaCTeM, School of Computer Science, University of Manchester, UK
  • 2. Develop an efficient method to conduct phrase alignment on parse forests for paraphrase detection Relying on team spirit, expedition members defeated difficulties. Members of the scientific team overcame challenges through teamwork.
  • 3. Develop an efficient method to conduct phrase alignment on parse forests for paraphrase detection Members of the scientific team overcame challenges through teamwork Relying on team spirit expedition members defeated difficulties VP NP VP NP VP PP S S VP S
  • 4. Scope: Paraphrase Types •Paraphrases by linguistic operations •Paraphrases with simple summarization Relying on team spirit, expedition members defeated difficulties. Members of the scientific team overcame challenges living on Mars through teamwork. •Paraphrases involve inferences/entailments Scientists overcame challenges living on Mars. Scientists overcame water and oxygen scarcity on the red planet.
  • 5. Non-homographic Nature of Paraphrases Phrase correspondences in paraphrases are often non-homographic • Synchronous parsing of paraphrases (Weese+ 2014) •Only 9.1% of paraphrases were reachable, even though using SCFG extracted from paraphrase corpora
  • 6. Related Work •Phrase alignment in paraphrases (MacCartney+ 2008, Thadani+ 2012, Yao+ 2013) • Phrases are simply 𝑛𝑛-grams, NOT syntactic phrases •PPDB (Ganitkevitch+ 2013) provides syntactic paraphrases of SCFG • Captures only a fraction of paraphrasing phenomenon
  • 7. Related Work •Parallel parsing with increased flexibility (Burkett+ 2010) • Allow disagreements when possible • Alignments are restricted to conform to ITG •Parallel parsing of paraphrases (Choe & McClosky 2015) • Alignment quality was beyond their scope
  • 8. Our Contributions • Formalize the problem of identifying a legitimate set of syntactic paraphrases under linguistically motivated grammar • Design computationally feasible method using dynamic programing a la CKY • Improve alignment quality using 𝑛𝑛-best parse forests • Allow non-homographic correspondences of phrases
  • 9. Alignment Model: Overview •Input: Sentential paraphrase pair •Bottom-up process like CKY assuming compositionality in alignments • From word to phrases •Allow phrases to be aligned to null nodes when they do NOT have correspondences
  • 10. Alignment Model: Basic Lowest common ancestor Null-alignments Construct phrase alignments supported by alignments of their descendants nodes
  • 11. Alignment Model: Non-homographic • Compositionality is often violated in paraphrases • Allow non-monotonic alignments if two alignments are compatible
  • 12. Alignment Model: Non-homographic Trace back alignment histories to check if two alignments are compatible
  • 13. Alignment Model: Non-homographic Trace back alignment histories to check if two alignments are compatible
  • 14. Alignment Model: Non-homographic Trace back alignment histories to check if two alignments are compatible
  • 15. Alignment Model: Non-homographic Trace back alignment histories to check if two alignments are compatible
  • 16. Alignment Probability 𝛼𝛼𝑖𝑖 = 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛 𝑃𝑃𝑟𝑟(𝜏𝜏𝑖𝑖 𝑠𝑠 , 𝜏𝜏𝑖𝑖 𝑡𝑡 ) � 𝑃𝑃𝑟𝑟(𝜏𝜏, 𝜏𝜏∅) 𝜏𝜏 𝑚𝑚 𝑠𝑠 𝜏𝜏𝑖𝑖 𝑡𝑡𝜏𝜏𝑖𝑖 𝑠𝑠 𝜏𝜏𝑛𝑛 𝑠𝑠 𝜏𝜏 𝑚𝑚 𝑡𝑡 𝜏𝜏𝑛𝑛 𝑡𝑡 𝛼𝛼𝑖𝑖 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛
  • 17. Alignment Probability 𝛼𝛼𝑖𝑖 = 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛 𝑃𝑃𝑟𝑟(𝜏𝜏𝑖𝑖 𝑠𝑠 , 𝜏𝜏𝑖𝑖 𝑡𝑡 ) � 𝑃𝑃𝑟𝑟(𝜏𝜏, 𝜏𝜏∅) 𝜏𝜏 𝑚𝑚 𝑠𝑠 𝜏𝜏𝑖𝑖 𝑡𝑡𝜏𝜏𝑖𝑖 𝑠𝑠 𝜏𝜏𝑛𝑛 𝑠𝑠 𝜏𝜏 𝑚𝑚 𝑡𝑡 𝜏𝜏𝑛𝑛 𝑡𝑡 𝛼𝛼𝑖𝑖 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛
  • 18. Alignment Probability 𝛼𝛼𝑖𝑖 = 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛 𝑃𝑃𝑟𝑟(𝜏𝜏𝑖𝑖 𝑠𝑠 , 𝜏𝜏𝑖𝑖 𝑡𝑡 ) � 𝑃𝑃𝑟𝑟(𝜏𝜏, 𝜏𝜏∅) 𝜏𝜏 𝑚𝑚 𝑠𝑠 𝜏𝜏𝑖𝑖 𝑡𝑡𝜏𝜏𝑖𝑖 𝑠𝑠 𝜏𝜏𝑛𝑛 𝑠𝑠 𝜏𝜏 𝑚𝑚 𝑡𝑡 𝜏𝜏𝑛𝑛 𝑡𝑡 𝛼𝛼𝑖𝑖 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛
  • 19. Alignment Probability 𝛼𝛼𝑖𝑖 = 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛 𝑃𝑃𝑟𝑟(𝜏𝜏𝑖𝑖 𝑠𝑠 , 𝜏𝜏𝑖𝑖 𝑡𝑡 ) � 𝑃𝑃𝑟𝑟(𝜏𝜏, 𝜏𝜏∅) 𝜏𝜏 𝑚𝑚 𝑠𝑠 𝜏𝜏𝑖𝑖 𝑡𝑡𝜏𝜏𝑖𝑖 𝑠𝑠 𝜏𝜏𝑛𝑛 𝑠𝑠 𝜏𝜏 𝑚𝑚 𝑡𝑡 𝜏𝜏𝑛𝑛 𝑡𝑡 𝛼𝛼𝑖𝑖 𝛼𝛼 𝑚𝑚 𝛼𝛼𝑛𝑛
  • 20. Parameterization • Apply the feature-enhanced EM (Berg-Kirkpatrick+ 2010) • Ability to use dependent features without an irrational independence assumption • 𝑃𝑃𝑟𝑟(𝜏𝜏 𝑠𝑠 , 𝜏𝜏𝑡𝑡 ) is parameterized using features as in logistic regression model: 𝑃𝑃𝑟𝑟 𝜏𝜏 𝑠𝑠 , 𝜏𝜏𝑡𝑡 = exp(𝒘𝒘 � 𝕗𝕗(𝜏𝜏 𝑠𝑠 , 𝜏𝜏𝑡𝑡 )) ∑𝜏𝜏𝑖𝑖 𝑡𝑡 exp(𝒘𝒘 � 𝕗𝕗(𝜏𝜏𝑠𝑠, 𝜏𝜏𝑖𝑖 𝑡𝑡 ))
  • 21. Features • Semantic heads • Surface similarity • WordNet similarity • Word embedding similarity • Combination of prepositions (Srikumar & Roth, 2013) • Combination of syntactic categories Syntactic category Members NP of the scientific team NP NP PP Semantic head
  • 22. Combination with Parse Probability • Interpolate alignment and parse probabilities inspired by parallel parsing • Use HPSG parser Enju, whose parameters have been tuned 1 − 𝜇𝜇 𝛼𝛼𝑖𝑖 + 𝜇𝜇𝜇𝜇 𝜏𝜏𝑖𝑖 𝑠𝑠 𝜌𝜌 𝜏𝜏𝑖𝑖 𝑡𝑡 𝜇𝜇 ∈ [0,1]: hyper-parameter for balancing
  • 23. Evaluation Data • Training: 41K sentential paraphrases • From MT evaluation sets (NIST OpenMT corpora) • Paired references of 10-30 words • Development: 50 sentence pairs with human annotation for hyper-parameter tuning • Test: 151 sentence pairs with annotation
  • 24. Gold-Standard •Annotation on 201 sentential paraphrases: • Gold parse trees by a linguistic expert • Phrase alignments by 3 English translators: agreed 77% of phrases are paraphrases •14K phrase alignments are obtained as gold-standard
  • 25. Evaluation Metric • See how gold alignments can be replicated by automatic alignment Recall = 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∩ 𝔾𝔾′ 𝔾𝔾 ∩ 𝔾𝔾′ • See how automatic alignments overlap with alignments that at least an annotator aligned Precision = 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∪ 𝔾𝔾′ ℍ𝑎𝑎
  • 26. Result: Alignment Quality 70 75 80 85 90 Human Proposed Monotonic 1-best tree Recall Precision Regard one annotator as the test and the other two as the gold-standard
  • 27. Result: Alignment Quality 70 75 80 85 90 Human Proposed Monotonic 1-best tree Recall Precision 92% & 89% of humans’!
  • 28. Result: Alignment Quality 70 75 80 85 90 Human Proposed Monotonic 1-best tree Recall Precision Alignment using forests greatly improve recall/precision
  • 29. Result: Alignment Quality 70 75 80 85 90 Human Proposed Monotonic 1-best tree Recall Precision Effective Non-monotonic alignment
  • 30. Example Whenever I go to the ground floor for a smoke, I always come face to face with them. Whenever I go down to smoke a cigarette, I come face to face with one of them. ⋯ go to the ground floor for a smoke NP PP NP VP PP VP go smoke a cigarette NP VP VP to CP VP down⋯ ⋯ ⋯ NP
  • 31. Example Whenever I go to the ground floor for a smoke, I always come face to face with them. Whenever I go down to smoke a cigarette, I come face to face with one of them. ⋯ go to the ground floor for a smoke NP PP NP VP PP VP go smoke a cigarette NP VP VP to CP VP down⋯ ⋯ ⋯ NP NP 1-best tree
  • 32. Future (On Going) Work • Tackle paraphrases with inferences/entailments • Common among paraphrases in practice • Contribution to knowledge extraction • Application to: • Bilingual alignment on Jp-En • Phrase embedding • Working on release of the annotation data
  • 33. Glance at on-going study: Application to Jp-En alignment …the proposed PSRF has simpler structure than that of modulated PSRF …提案PSRF は 変調型PSRF より 構造 が簡単 で ある NP VP NP VP IP-EMB NP
  • 34. •Paraphrases in practice are more challenging The no-shows were John William of Massachusetts and David Miller of Florida. John William and David Miller declined invitations to speak. •Alignment on MSR paraphrase corpus • Our method succeeds in partial alignment • Extend it to conduct top-down alignment to handle phrases with inferences/entailments Glance at on-going study: Alignment on Paraphrases with Inferences/Entailments
  • 35. Glance at on-going study: Alignment on Paraphrases with Inferences/Entailments The no-shows NP S COOD NP VP were John William of Massachusetts David Miller of Florida and John William David Millerand declined invitations to speak NP NP NP NP VP COOD NP S
  • 36. Glance at on-going study: Alignment on Paraphrases with Inferences/Entailments The no-shows NP S COOD NP VP were John William of Massachusetts David Miller of Florida and John William David Millerand declined invitations to speak NP NP NP NP VP COOD NP S