SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Institut für Anthropomatik1 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Grammatical Agreement in SMT
Seminar Sprach-zu-Sprach-Übersetzung
SS 2013
Institut für Anthropomatik2 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Inflection
– Modification of a word
– signals grammatical variants (tense, gender, case, …)
– e.g. walk vs. Walked
Agreement
– Inflection for related words in a sentence has to agree
– e.g. das Haus vs. die Haus
Some languages are weakly inflected (e.g. English)
Some are highly inflected (e.g. German, Arabic, …)
Inflection and Agreement
Institut für Anthropomatik3 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Local Agreement Errors
Ref:
the-carF
goF
with-speed
Hypo:
the-carF
goM
with-speed
Long-distance Agreement Errors
Ref: celle qui parle , c’est ma femme
oneF
who speak , is my wifeF
Hypo: celui qui parle est ma femme
oneM
who speak is my spouseF
Agreement Errors
Institut für Anthropomatik4 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Approaches for SMT
Morphological Generation
– Create raw stems and modify with predicted inflection
Agreement Constraints
– Use SCFG of target and add constraints to it
Class-based Agreement Model
– Use morphological word classes “Noun+Def+Sg+Fem”
Institut für Anthropomatik5 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Idea
“Generating Complex Morphology for Machine Translation” (Minkov
and Toutanova, 2007)
Convert MT output to stem sequence
Predict an inflection for every stem
Reflect meaning and comply with agreement rules
Institut für Anthropomatik6 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Lexicons
Morphology analysis and generation
Operations:
– Stemming
– Inflection
– Morphological analysis
Create manually
Create automatically from data
Here: assumed as given
Institut für Anthropomatik7 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Inflection Prediction
Maximum Entropy Markov model (2nd
order)
Features:
– Monolingual
– Bilingual
– Lexical
– Morphological
– Syntactic
p(̄y∣̄x)=∏t=1
n
p(yt∣ yt−1 , yt−2 , xt ) , yt ∈It
Institut für Anthropomatik8 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Evaluation
English-Russian and English-Arabic
Technical (software manual) domain
Input: Aligned sentence pairs of reference translations (no output of MT
System) → reduce noise
Accuracy (%) results
Institut für Anthropomatik9 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Conclusion
Needed resources:
– Large corpus of aligned sentence pairs
– Lexicons (source and target) with the three operations
+ Better accuracy than simple LM (even with small training data)
+ Easy to add to existing MT system
- Expensive creation of lexicons
Institut für Anthropomatik10 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Idea
“Agreement Constraints for Statistical Machine Translation into
German” (Williams and Koehn, 2011)
String-to-tree model
Synchronous grammar for target language
Adding learned constraints and probabilities
Evaluation of constraints during decoding
Institut für Anthropomatik11 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Feature Structure
Feature structure
Unification
Institut für Anthropomatik12 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Grammar
Synchronous grammar learned from parallel corpus
Extended by constraints at target-side
Sample rule/constraint:
NP-SB → the X1
cat | die AP1
Katze
Institut für Anthropomatik13 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Training
Propagation rules to
capture NP/PP agreements:
Applied bottom-up
Institut für Anthropomatik14 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Decoding
Model:
Every element of rule/constraint has a feature structure
Constraint evaluation: Each hypothesis stores set of feature structures
corresponding to its root rule element
Recombination of hypotheses is possible
̂t=arg max
t
p(t∣s)
p(t∣s)=
1
Z
∑
i=1
n
λi hi (s ,t)
Institut für Anthropomatik15 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Evaluation
English-German
Europarl and News Commentary
Parsing: BitPar; Alignment: GIZA++; SCFG rules: Moses toolkit
Treebank for target
Grammar: ~140 m rules
BLEU scores and p-values for three test sets
Institut für Anthropomatik16 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Conclusion
Needed resources:
– Parallel corpus
– Heuristics for constraint extraction
+ Improvement in translation accuracy
- Improvement is quite small
Institut für Anthropomatik17 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Idea
1. Segmentation
2. Tagging
3. Scoring
“A Class-Based Agreement Model for Generating Accurately Inflected
Translations” (Green and DeNero, 2012)
During Decoding
Target-Side
Three Steps:
Institut für Anthropomatik18 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Segmentation
Train conditional random field
Features:
Centered 5-character window
During decoding
Not as preprocessing step
Labels:
I: Continuation (Inside)
O: Outside (whitespace)
B: Beginning
F: Non-native chars
Institut für Anthropomatik19 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Tagging
Train CRF on full sentences with gold classes
Features:
– Current and previous words, affixes, etc.
Labels:
– Morphological classes
→ Gender, number, person, definiteness
– e.g. 89 classes for Arabic
Example:
'the car'
Tagged: “Noun+Def+Sg+Fem”
Institut für Anthropomatik20 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Scoring
Scoring of word sequences not comparable across hypotheses
→ Scoring class sequences with generative model
Simple bigram LM over gold class sequences (add-1 smoothed)
τ' =arg max
τ
p(τ∣̂s)
q(e)= p(τ')=∏i=1
I
p(τ'i∣τ'i−1)
Institut für Anthropomatik21 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Evaluation
English-Arabic
Training data: variety of sources (e.g. web)
Development and Test: NIST sets (Newswire and mixed genre
[broadcast news, newsgroups, weblog])
Phrase-based decoder
BLEU score for newswire sets
BLEU score for mixed genre sets
Institut für Anthropomatik22 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Conclusion
Needed resources:
– Treebank for target (existing for many languages)
– Large target corpus
+ Improves translation quality
+ Easy to integrate in existing MT system
- Increases decoding time
- Not very good for mixed genres
Institut für Anthropomatik23 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Green, S. and DeNero, J. (2012). “A Class-Based Agreement Model for
Generating Accurately Inflected Translations”. In: ACL.
Williams, P. and Koehn, P. (2011). “Agreement Constraints for Statistical
Machine Translation into German”. In: Sixth Workshop on Statistical
Machine Translation
Minkov, E. and Toutanova, K. (2007) “Generating Complex Morphology
for Machine Translation”. In: ACL.
References

Weitere ähnliche Inhalte

Andere mochten auch

Understanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’sUnderstanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’sanbray723
 
Translation Problems with 4 Different Languages
Translation Problems with 4 Different LanguagesTranslation Problems with 4 Different Languages
Translation Problems with 4 Different LanguagesTennycut
 
Google translator
Google translatorGoogle translator
Google translatorLaura P
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translationArabic_NLP_ImamU2013
 
Translation problems
Translation problemsTranslation problems
Translation problemsCharley_Long
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to TranslationMohammed Raiyah
 
Grammatical problems in translation
Grammatical problems in translationGrammatical problems in translation
Grammatical problems in translationAcademic Supervisor
 
Challenges of Translation
Challenges of TranslationChallenges of Translation
Challenges of Translationm nagaRAJU
 
Translation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarTranslation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarDr. Shadia Banjar
 
Translation techniques presentation
Translation  techniques  presentationTranslation  techniques  presentation
Translation techniques presentationAngelo pizzuto
 
Translation Types
Translation TypesTranslation Types
Translation TypesElena Shapa
 
Intercultural Communications Chapter 5: Language
Intercultural Communications Chapter 5: LanguageIntercultural Communications Chapter 5: Language
Intercultural Communications Chapter 5: LanguageSawyer Education & Training
 
Translation: purpose in practice
Translation: purpose in practiceTranslation: purpose in practice
Translation: purpose in practiceNicola Thayil
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

Andere mochten auch (16)

Understanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’sUnderstanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’s
 
Translation Problems with 4 Different Languages
Translation Problems with 4 Different LanguagesTranslation Problems with 4 Different Languages
Translation Problems with 4 Different Languages
 
Google translator
Google translatorGoogle translator
Google translator
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
 
Translation problems
Translation problemsTranslation problems
Translation problems
 
Translation strategy
Translation strategyTranslation strategy
Translation strategy
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to Translation
 
Grammatical problems in translation
Grammatical problems in translationGrammatical problems in translation
Grammatical problems in translation
 
Challenges of Translation
Challenges of TranslationChallenges of Translation
Challenges of Translation
 
Translation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarTranslation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. Banjar
 
Methods Of Translation
Methods Of TranslationMethods Of Translation
Methods Of Translation
 
Translation techniques presentation
Translation  techniques  presentationTranslation  techniques  presentation
Translation techniques presentation
 
Translation Types
Translation TypesTranslation Types
Translation Types
 
Intercultural Communications Chapter 5: Language
Intercultural Communications Chapter 5: LanguageIntercultural Communications Chapter 5: Language
Intercultural Communications Chapter 5: Language
 
Translation: purpose in practice
Translation: purpose in practiceTranslation: purpose in practice
Translation: purpose in practice
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Grammatical Agreement in SMT

  • 1. Institut für Anthropomatik1 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Grammatical Agreement in SMT Seminar Sprach-zu-Sprach-Übersetzung SS 2013
  • 2. Institut für Anthropomatik2 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Inflection – Modification of a word – signals grammatical variants (tense, gender, case, …) – e.g. walk vs. Walked Agreement – Inflection for related words in a sentence has to agree – e.g. das Haus vs. die Haus Some languages are weakly inflected (e.g. English) Some are highly inflected (e.g. German, Arabic, …) Inflection and Agreement
  • 3. Institut für Anthropomatik3 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Local Agreement Errors Ref: the-carF goF with-speed Hypo: the-carF goM with-speed Long-distance Agreement Errors Ref: celle qui parle , c’est ma femme oneF who speak , is my wifeF Hypo: celui qui parle est ma femme oneM who speak is my spouseF Agreement Errors
  • 4. Institut für Anthropomatik4 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Approaches for SMT Morphological Generation – Create raw stems and modify with predicted inflection Agreement Constraints – Use SCFG of target and add constraints to it Class-based Agreement Model – Use morphological word classes “Noun+Def+Sg+Fem”
  • 5. Institut für Anthropomatik5 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Idea “Generating Complex Morphology for Machine Translation” (Minkov and Toutanova, 2007) Convert MT output to stem sequence Predict an inflection for every stem Reflect meaning and comply with agreement rules
  • 6. Institut für Anthropomatik6 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Lexicons Morphology analysis and generation Operations: – Stemming – Inflection – Morphological analysis Create manually Create automatically from data Here: assumed as given
  • 7. Institut für Anthropomatik7 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Inflection Prediction Maximum Entropy Markov model (2nd order) Features: – Monolingual – Bilingual – Lexical – Morphological – Syntactic p(̄y∣̄x)=∏t=1 n p(yt∣ yt−1 , yt−2 , xt ) , yt ∈It
  • 8. Institut für Anthropomatik8 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Evaluation English-Russian and English-Arabic Technical (software manual) domain Input: Aligned sentence pairs of reference translations (no output of MT System) → reduce noise Accuracy (%) results
  • 9. Institut für Anthropomatik9 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Conclusion Needed resources: – Large corpus of aligned sentence pairs – Lexicons (source and target) with the three operations + Better accuracy than simple LM (even with small training data) + Easy to add to existing MT system - Expensive creation of lexicons
  • 10. Institut für Anthropomatik10 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Idea “Agreement Constraints for Statistical Machine Translation into German” (Williams and Koehn, 2011) String-to-tree model Synchronous grammar for target language Adding learned constraints and probabilities Evaluation of constraints during decoding
  • 11. Institut für Anthropomatik11 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Feature Structure Feature structure Unification
  • 12. Institut für Anthropomatik12 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Grammar Synchronous grammar learned from parallel corpus Extended by constraints at target-side Sample rule/constraint: NP-SB → the X1 cat | die AP1 Katze
  • 13. Institut für Anthropomatik13 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Training Propagation rules to capture NP/PP agreements: Applied bottom-up
  • 14. Institut für Anthropomatik14 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Decoding Model: Every element of rule/constraint has a feature structure Constraint evaluation: Each hypothesis stores set of feature structures corresponding to its root rule element Recombination of hypotheses is possible ̂t=arg max t p(t∣s) p(t∣s)= 1 Z ∑ i=1 n λi hi (s ,t)
  • 15. Institut für Anthropomatik15 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Evaluation English-German Europarl and News Commentary Parsing: BitPar; Alignment: GIZA++; SCFG rules: Moses toolkit Treebank for target Grammar: ~140 m rules BLEU scores and p-values for three test sets
  • 16. Institut für Anthropomatik16 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Conclusion Needed resources: – Parallel corpus – Heuristics for constraint extraction + Improvement in translation accuracy - Improvement is quite small
  • 17. Institut für Anthropomatik17 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Idea 1. Segmentation 2. Tagging 3. Scoring “A Class-Based Agreement Model for Generating Accurately Inflected Translations” (Green and DeNero, 2012) During Decoding Target-Side Three Steps:
  • 18. Institut für Anthropomatik18 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Segmentation Train conditional random field Features: Centered 5-character window During decoding Not as preprocessing step Labels: I: Continuation (Inside) O: Outside (whitespace) B: Beginning F: Non-native chars
  • 19. Institut für Anthropomatik19 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Tagging Train CRF on full sentences with gold classes Features: – Current and previous words, affixes, etc. Labels: – Morphological classes → Gender, number, person, definiteness – e.g. 89 classes for Arabic Example: 'the car' Tagged: “Noun+Def+Sg+Fem”
  • 20. Institut für Anthropomatik20 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Scoring Scoring of word sequences not comparable across hypotheses → Scoring class sequences with generative model Simple bigram LM over gold class sequences (add-1 smoothed) τ' =arg max τ p(τ∣̂s) q(e)= p(τ')=∏i=1 I p(τ'i∣τ'i−1)
  • 21. Institut für Anthropomatik21 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Evaluation English-Arabic Training data: variety of sources (e.g. web) Development and Test: NIST sets (Newswire and mixed genre [broadcast news, newsgroups, weblog]) Phrase-based decoder BLEU score for newswire sets BLEU score for mixed genre sets
  • 22. Institut für Anthropomatik22 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Conclusion Needed resources: – Treebank for target (existing for many languages) – Large target corpus + Improves translation quality + Easy to integrate in existing MT system - Increases decoding time - Not very good for mixed genres
  • 23. Institut für Anthropomatik23 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Green, S. and DeNero, J. (2012). “A Class-Based Agreement Model for Generating Accurately Inflected Translations”. In: ACL. Williams, P. and Koehn, P. (2011). “Agreement Constraints for Statistical Machine Translation into German”. In: Sixth Workshop on Statistical Machine Translation Minkov, E. and Toutanova, K. (2007) “Generating Complex Morphology for Machine Translation”. In: ACL. References