SlideShare ist ein Scribd-Unternehmen logo
1 von 18
BLEU: a Method for Automatic
Evaluation of Machine Translation
(BiLingual Evaluation Understudy)
  Kishore Papineni, Salim Roukos, Todd
        Ward, and Wei-Jing Zhu
  Proceedings of the 40th Annual Meeting of the
  Association for Computational Linguistics (ACL),
       Philadelphia, July 2002, pp. 311- 318
Viewpoint
• The idea: the closer a machine translation is to a
  professional human translation, the better it is.
• To judge the quality
   – Numerical metric
• So, MT evaluation system requires:
   1. A numerical “translation closeness” metric
   2. A corpus of good quality human reference translations
• Word error rate metric
   – Idea: use of weighted average of variable length phrase
     matches against the reference translations
   – 参照変換に対して可変長フレーズ一致の加重平均を
     使用 (Google Translate)
Baseline BLEU Metric
• The primary programming task for a BLEU
  implementor is to compare n-grams of the
  candidate with the n-grams of the reference
  translation and count the number of matches

• So, we look at computing unigram matches
n-gram precision
• Precision measure
   – Counts up the number of candidate translation words
     ( unigrams ) which occur in any reference translation and
     then divides by the total number of words in the candidate
     translation
• However, MT generates improbable, high-precision
  translations like the example result below
   – A ref word considered exhausted after a matching
     candidate word is identified
Modified n-gram precision
• Modified unigram precision
    – Counts the maximum number of times a word occurs in any single reference
      translation
    – Clips the total count of each candidate word by its maximum reference count
    – Adds these clipped counts up
    – Divides by the total (unclipped) number of candidate words
• Modified n-gram precision
    – All candidate n-gram counts & corresponding maximum reference counts are
      collected
    – The candidate counts are clipped by their corresponding reference maximum
      value, summed and divided by the total number of candidate n-grams
Modified n-gram precision on text
                  blocks
•   Basic unit of evaluation is the sentence
•   Compute the n-gram matches sentence by sentence
•   Add clipped n-gram counts for all the candidate sentences
•   Divide by the number of candidate n-grams in the test corpus to compute
    a modified precision score
Ranking systems
• Human translation & machine translation
• 4 reference translations for each of 127 source sentences
• Result:




•   From this result:
     –   Single n-gram precision score can distinguish good/bad translations
•   To be useful, the metric must distinguish between two human translations that do not differ so
    greatly in quality
Ranking systems
• Translations done by:
    – Lacking native proficiency in both SL/TL
    – Native English speaker
    – Three commercial systems




• Result:
    – The systems in result order is the same rank order by
      human judges
Combining the modified n-gram
             precisions
• The result, in prev. slide, shows:
  – It decays roughly exponentially with n
  – mod. unigram precision > bigram > trigram
• BLEU uses the average logarithm with uniform
  weights (BLEUは一様重み付き平均の対数を
  使用しています)
Recall
• BLEU considers multiple reference translations,
  each of which may use a different word choice
  to translate the same source word.
• A good candidate translation will only use
  (recall) one of these possible choices, but not
  all. Indeed, recalling all choices leads to a bad
  translation
Sentence brevity penalty
• Candidate translations longer than references are penalized by the
  modified n-gram precision measure
• Brevity penalty factor:
    – A high-scoring candidate translation must match the reference translations in
      length, in word choice and in word order
        • Brevity penalty 1.0: candidate’s length is the same as any reference translations length.
• c: the length of the candidate translation
• r: the effective reference corpus length
• exp(1 - r/c): brevity penalty
BLEU details
• Take the geometric mean of the test corpus’ modified precision scores and
  then multiply the result by an exponential brevity penalty factor.
• We first compute the geometric average of the modified n-gram precisions,
  pn, using n-grams up to length N and positive weights wn summing to one.



•   To make the behavior apparent
The BLEU Evaluation
• The BLEU metric ranges from 0 to 1
• 1 is very rare: only for perfect match
• The more, the better
• Human translation score 0.3468 against four references and scored 0.2571
  against two references
• Table 1: 5 systems against two reference
•   Is the difference in BLEU metric reliable?
•   What is the variance of the BLEU score?
•   If we were to pick another random set of 500 sentences, would we still judge S3 to
    be better than S2?




• 20 blocks of 25 sentences each on BLEU metric
• Computed the means, variances, paired t-statistics
• What the Table2 indicates is:
     – 500 sentences in Table 1 and 25 sentences in Table 2
     – t-statistics of 1.7 or above is considered 95% significant
Evaluation
• Two groups of people, each group has 10 ppl
  – Monolingual group
  – Bilingual group
• Evaluated previous 5 systems
• Evaluation Rate: 1 (very bad) to 5 (very good)
• There were some liberal evaluations than
  others
Pairwise Judgments
BLEU predictions
BLEU vs Bi, Mono-lingual Judgements

Weitere ähnliche Inhalte

Was ist angesagt?

Audio and video streaming
Audio and video streamingAudio and video streaming
Audio and video streamingRohan Bhatkar
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelANIRUDHMALODE2
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Text summarization
Text summarizationText summarization
Text summarizationkareemhashem
 
software project management
software project managementsoftware project management
software project managementdeep sharma
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Steganography presentation
Steganography presentationSteganography presentation
Steganography presentationBSheghembe
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 

Was ist angesagt? (20)

Audio and video streaming
Audio and video streamingAudio and video streaming
Audio and video streaming
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language Model
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Video Compression
Video CompressionVideo Compression
Video Compression
 
Lzw compression
Lzw compressionLzw compression
Lzw compression
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Text summarization
Text summarizationText summarization
Text summarization
 
software project management
software project managementsoftware project management
software project management
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Lzw algorithm
Lzw algorithmLzw algorithm
Lzw algorithm
 
Steganography presentation
Steganography presentationSteganography presentation
Steganography presentation
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
MPEG 4
MPEG 4MPEG 4
MPEG 4
 

Andere mochten auch

LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 
はじめてのルベーグ積分
はじめてのルベーグ積分はじめてのルベーグ積分
はじめてのルベーグ積分Wakamatz
 
Deview2013 naver labs_nsmt_외부공개버전_김준석
Deview2013 naver labs_nsmt_외부공개버전_김준석Deview2013 naver labs_nsmt_외부공개버전_김준석
Deview2013 naver labs_nsmt_외부공개버전_김준석NAVER D2
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測Shuyo Nakatani
 
[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVERNAVER D2
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
画像キャプションの自動生成
画像キャプションの自動生成画像キャプションの自動生成
画像キャプションの自動生成Yoshitaka Ushiku
 

Andere mochten auch (8)

LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
はじめてのルベーグ積分
はじめてのルベーグ積分はじめてのルベーグ積分
はじめてのルベーグ積分
 
Deview2013 naver labs_nsmt_외부공개버전_김준석
Deview2013 naver labs_nsmt_외부공개버전_김준석Deview2013 naver labs_nsmt_외부공개버전_김준석
Deview2013 naver labs_nsmt_외부공개버전_김준석
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
Harmons App
Harmons AppHarmons App
Harmons App
 
[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
画像キャプションの自動生成
画像キャプションの自動生成画像キャプションの自動生成
画像キャプションの自動生成
 

Ähnlich wie 5. bleu

Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_saRobert Martin
 
Conversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionConversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionTakato Hayashi
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Lifeng (Aaron) Han
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...Lifeng (Aaron) Han
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
EAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTEAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTkantanmt
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translationStephen Peacock
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemAssociation for Computational Linguistics
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsSajeed Mahaboob
 
Interface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryInterface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryPriyatham Bollimpalli
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsJinho Choi
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 

Ähnlich wie 5. bleu (20)

Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
TOIN - TAUS Tokyo Forum 2015
TOIN - TAUS Tokyo Forum 2015TOIN - TAUS Tokyo Forum 2015
TOIN - TAUS Tokyo Forum 2015
 
TransQuest
TransQuestTransQuest
TransQuest
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
Conversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionConversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognition
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
EAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTEAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMT
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutions
 
Interface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryInterface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation Memory
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word Embeddings
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 

Mehr von Hiroshi Matsumoto

Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Hiroshi Matsumoto
 
Paraphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationParaphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationHiroshi Matsumoto
 
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Hiroshi Matsumoto
 
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Hiroshi Matsumoto
 
Improving translation via targeted paraphrasing
Improving translation via targeted paraphrasingImproving translation via targeted paraphrasing
Improving translation via targeted paraphrasingHiroshi Matsumoto
 
Summary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationSummary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationHiroshi Matsumoto
 
Summary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine TranslationSummary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine TranslationHiroshi Matsumoto
 
Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Hiroshi Matsumoto
 
10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmtHiroshi Matsumoto
 
9. cgc parser with_norml_std
9. cgc parser with_norml_std9. cgc parser with_norml_std
9. cgc parser with_norml_stdHiroshi Matsumoto
 
Summary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTSummary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTHiroshi Matsumoto
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translationHiroshi Matsumoto
 
Approach to japanese english automatic translation by Susumu Kuno
Approach to japanese english automatic translation by Susumu KunoApproach to japanese english automatic translation by Susumu Kuno
Approach to japanese english automatic translation by Susumu KunoHiroshi Matsumoto
 

Mehr von Hiroshi Matsumoto (19)

Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...
 
Paraphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationParaphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine Translation
 
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
 
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
 
Improving translation via targeted paraphrasing
Improving translation via targeted paraphrasingImproving translation via targeted paraphrasing
Improving translation via targeted paraphrasing
 
Summary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationSummary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine Translation
 
Summary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine TranslationSummary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine Translation
 
Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...
 
Modeling Irony in Twitter
Modeling Irony in TwitterModeling Irony in Twitter
Modeling Irony in Twitter
 
Factored translationmodel
Factored translationmodelFactored translationmodel
Factored translationmodel
 
10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt
 
9. cgc parser with_norml_std
9. cgc parser with_norml_std9. cgc parser with_norml_std
9. cgc parser with_norml_std
 
8. relearnt rbmt
8. relearnt rbmt8. relearnt rbmt
8. relearnt rbmt
 
7. ebmt based on st sm
7. ebmt based on st sm7. ebmt based on st sm
7. ebmt based on st sm
 
Summary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTSummary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MT
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translation
 
Mt framework nagao_makoto
Mt framework nagao_makotoMt framework nagao_makoto
Mt framework nagao_makoto
 
Approach to japanese english automatic translation by Susumu Kuno
Approach to japanese english automatic translation by Susumu KunoApproach to japanese english automatic translation by Susumu Kuno
Approach to japanese english automatic translation by Susumu Kuno
 
Machine translation
Machine translationMachine translation
Machine translation
 

Kürzlich hochgeladen

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 

Kürzlich hochgeladen (20)

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 

5. bleu

  • 1. BLEU: a Method for Automatic Evaluation of Machine Translation (BiLingual Evaluation Understudy) Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311- 318
  • 2. Viewpoint • The idea: the closer a machine translation is to a professional human translation, the better it is. • To judge the quality – Numerical metric • So, MT evaluation system requires: 1. A numerical “translation closeness” metric 2. A corpus of good quality human reference translations • Word error rate metric – Idea: use of weighted average of variable length phrase matches against the reference translations – 参照変換に対して可変長フレーズ一致の加重平均を 使用 (Google Translate)
  • 3. Baseline BLEU Metric • The primary programming task for a BLEU implementor is to compare n-grams of the candidate with the n-grams of the reference translation and count the number of matches • So, we look at computing unigram matches
  • 4. n-gram precision • Precision measure – Counts up the number of candidate translation words ( unigrams ) which occur in any reference translation and then divides by the total number of words in the candidate translation • However, MT generates improbable, high-precision translations like the example result below – A ref word considered exhausted after a matching candidate word is identified
  • 5. Modified n-gram precision • Modified unigram precision – Counts the maximum number of times a word occurs in any single reference translation – Clips the total count of each candidate word by its maximum reference count – Adds these clipped counts up – Divides by the total (unclipped) number of candidate words • Modified n-gram precision – All candidate n-gram counts & corresponding maximum reference counts are collected – The candidate counts are clipped by their corresponding reference maximum value, summed and divided by the total number of candidate n-grams
  • 6. Modified n-gram precision on text blocks • Basic unit of evaluation is the sentence • Compute the n-gram matches sentence by sentence • Add clipped n-gram counts for all the candidate sentences • Divide by the number of candidate n-grams in the test corpus to compute a modified precision score
  • 7. Ranking systems • Human translation & machine translation • 4 reference translations for each of 127 source sentences • Result: • From this result: – Single n-gram precision score can distinguish good/bad translations • To be useful, the metric must distinguish between two human translations that do not differ so greatly in quality
  • 8. Ranking systems • Translations done by: – Lacking native proficiency in both SL/TL – Native English speaker – Three commercial systems • Result: – The systems in result order is the same rank order by human judges
  • 9. Combining the modified n-gram precisions • The result, in prev. slide, shows: – It decays roughly exponentially with n – mod. unigram precision > bigram > trigram • BLEU uses the average logarithm with uniform weights (BLEUは一様重み付き平均の対数を 使用しています)
  • 10. Recall • BLEU considers multiple reference translations, each of which may use a different word choice to translate the same source word. • A good candidate translation will only use (recall) one of these possible choices, but not all. Indeed, recalling all choices leads to a bad translation
  • 11. Sentence brevity penalty • Candidate translations longer than references are penalized by the modified n-gram precision measure • Brevity penalty factor: – A high-scoring candidate translation must match the reference translations in length, in word choice and in word order • Brevity penalty 1.0: candidate’s length is the same as any reference translations length. • c: the length of the candidate translation • r: the effective reference corpus length • exp(1 - r/c): brevity penalty
  • 12. BLEU details • Take the geometric mean of the test corpus’ modified precision scores and then multiply the result by an exponential brevity penalty factor. • We first compute the geometric average of the modified n-gram precisions, pn, using n-grams up to length N and positive weights wn summing to one. • To make the behavior apparent
  • 13. The BLEU Evaluation • The BLEU metric ranges from 0 to 1 • 1 is very rare: only for perfect match • The more, the better • Human translation score 0.3468 against four references and scored 0.2571 against two references • Table 1: 5 systems against two reference
  • 14. Is the difference in BLEU metric reliable? • What is the variance of the BLEU score? • If we were to pick another random set of 500 sentences, would we still judge S3 to be better than S2? • 20 blocks of 25 sentences each on BLEU metric • Computed the means, variances, paired t-statistics • What the Table2 indicates is: – 500 sentences in Table 1 and 25 sentences in Table 2 – t-statistics of 1.7 or above is considered 95% significant
  • 15. Evaluation • Two groups of people, each group has 10 ppl – Monolingual group – Bilingual group • Evaluated previous 5 systems • Evaluation Rate: 1 (very bad) to 5 (very good) • There were some liberal evaluations than others
  • 18. BLEU vs Bi, Mono-lingual Judgements