SlideShare ist ein Scribd-Unternehmen logo
1 von 34
No Hardware. No Software. No Hassle MT.
KantanMT Analytics - The Missing Link
What we aim to cover today?
 The MT & Quality Relationship
 What is quality?
 Possible ways of measuring it
 Automated/Manual methods
 Who needs to measure quality
 Localisation stakeholders
 The Missing Link - KantanMT Analytics



Segment level quality analysis
Helping to build predictable business models

45 Mins Presentation
15 Mins Q&A

 Q&A
KantanMT Analytics - The Missing Link
What is KantanMT.com?
 Statistical MT System
 Cloud-based




Highly scalable
Inexpensive to operate
Quick to deploy

 Our Vision
 To put Machine Translation




Customization
Improvement
Deployment

 into your hands

Fully Operational 7 months
Active KantanMT Engines

6,632
Training Words Uploaded

23,653,605,925
Member Words Translated

362,291,925
KantanMT Analytics - The Missing Link
The Quality & MT Relationship
 Let’s agree a model for defining quality!

Quality Target (defined by client)

No Quality (baseline)



Taking into consideration quality of MT outputs and level of quality defined by your clients.

KantanMT Analytics - The Missing Link
Attributes of Quality
Attributes of Quality – Model
Language Attributes
 Adequacy




Fluency
Adequacy

Meaning of generated texts

expressed in source/target

 Fluency



Comprehensibility & readability
Factors include




Task-oriented Attributes
 Productivity


Post-editing speed

 Acceptability



Fit-for-purpose measurement
Usable translations within the
context of the end user/client

Acceptability

Grammar errors
word selection
syntax

Language

Productivity

Task

KantanMT Analytics - The Missing Link
Attributes of Quality
Attributes of Quality – Model
Language Attributes
 Adequacy




Fluency
Adequacy

Meaning of generated texts

expressed in source/target

 Fluency



Comprehensibility & readability
Factors include




Task-oriented Attributes
 Productivity


Post-editing speed

 Acceptability



Fit-for-purpose measurement
Usable translations within the
context of the end user/client

Acceptability

Grammar errors
word selection
syntax

Language
Translation Style

Productivity

Task
Business Model
KantanMT Analytics - The Missing Link
Attributes of Quality
Attributes of Quality – Model
Language Attributes

Task-oriented Attributes

What we want?

Fluency
Adequacy

Productivity
Acceptability

FuzzyMatch

Language
Translation Style

Task
Business Model
KantanMT Analytics - The Missing Link
Measuring MT Quality
 Automated
 Fast
 Repeatable
 Objective
 Scalable
 Cheap
 Based on samples
 Can’t be used by PMs


Scope/Cost predictions

 Manual
 Slow
 Cumbersome
 Subjective
 Not scalable
 Expensive
 Based on samples
 Can’t be used by PMs


Scope/Cost predictions

KantanMT Analytics - The Missing Link
Measuring MT by hand!
 Sample Translations based on template
Style

Wrong terminology
Wrong Spelling
Source not Capitalization
Translated/Omissions
Syntax & Grammar
Compliance with client specs
Wrong Word Form
Literal translation Part of Speech
Wrong
Text/Information added
Punctuation
Technical
Tags and Markup
Sentence Structure
Locale Adaptation

Overall

Spacing Adequacy Score
Fluency Score
Overall Quality Score

KantanMT Analytics - The Missing Link
Manual Framework
 Adequacy Score (Range 1 – 5)

5

 Full Meaning


All meaning expressed in the source segment appears in the translated
segment

 Most Meaning


Most of the source segment meaning is expressed in the translated segment

 Much Meaning


Much of the source segment meaning is expressed in the translated segment

 Little Meaning


Little of the source segment is expressed in the translated segment

 No Meaning


None of the meaning expressed in the source segment is expressed in the
translated segment

1

KantanMT Analytics - The Missing Link
Manual Framework
 Fluency Score (Range 1 – 5)

5

 Native language fluency


No grammar errors, excellent word selection and good syntax. No post-editing
required.

 Near native fluency


Few terminology/grammar errors. No impact on overall understanding of the
meaning. Little post-editing required.

 Not very fluent


About half of translation contains errors and requires post-editing.

 Little fluency


Wrong word choice, poor grammar and syntax. A lot of post-editing required.

 No fluency


Absolutely ungrammatical and doesn’t make any sense. Re-translate from
scratch .

1

KantanMT Analytics - The Missing Link
Source
MT Target
Spacing

Syntax and Grammar

Locale Adaptation

Tags and Markup

Sentence Structure

Punctuation

Wrong Part of Speech

Style

Wrong Word Form

Capitalization

Text/Information added

Literal translation

Compliance with client specs

Source not
Translated/Omissions

Wrong Spelling

Wrong terminology

Overall quality (1-4)

Fluency (Score 1-5)

Adequacy (Score 1-5)

Manual Framework
Tech

KantanMT Analytics - The Missing Link
Manual Framework
Attributes of Quality – Model
Language Attributes

Fluency

Task-oriented Attributes

Productivity

Manual
Methods

Adequacy

Acceptability

Language
Translation Style

Task
Business Model
KantanMT Analytics - The Missing Link
Automated Methods
 Many different methods available
 BLEU, F-Measure, GTM, TER, NIST, Meteor, etc.
 Common characteristics
 Compute similarity of generated texts to reference texts
 The smaller the difference => the better the quality!
 Broad adoption
 Industry & Academia

KantanMT Analytics - The Missing Link
Automated Methods
 F-Measure
 Recall & Precision Metric
Reference Translation
MT Output
Recall

Precision

F-Measure

correct
Ref-Len

correct
MT-Len

Precision * Recall
(Precision + Recall) /2

80%

66%

73%

 Flaw: no penalty for reordering
KantanMT Analytics - The Missing Link
Automated Methods
 WER (Word Error Rate)
 Min number of edits to transform output to reference
Reference Translation
MT Output
WER
Substitutions + insertions + deletions
Reference-length




Levenshtein distance measure
General indicator of Post-Editing Effort
KantanMT Analytics - The Missing Link
Automated Methods
 BLEU Score
 Put simply – measures how many words overlap, giving
higher scores to sequential words
 High correlation between BLEU and human judgement of
translation quality
Reference Translation

MT Output

KantanMT Analytics - The Missing Link
Automated Methods
 KantanWatch™ can be used to track and monitor

automated scores

* KantanWatch Reports

KantanMT Analytics - The Missing Link
Automated Methods
 Improvements can be monitored during the build-

measure-learn cycle of a KantanMT deployment

* KantanWatch Reports

KantanMT Analytics - The Missing Link
Automated Methods
 Time-graphs offer good overview of the maturing of a

KantanMT engine

* KantanWatch Reports

KantanMT Analytics - The Missing Link
Automated Methods
 Can also present a holistic view of the potential quality

of KantanMT outputs

* KantanWatch Reports

KantanMT Analytics - The Missing Link
Automated Methods
Attributes of Quality – Model
Language Attributes

Task-oriented Attributes

NIST

Fluency

Productivity

GTM
F-Measure

Adequacy

TER

Acceptability

BLEU
METEOR

Language

Task

Translation Style
Business Model
Major Flaw: All measurements based on reference translations
KantanMT Analytics - The Missing Link
Who uses these measurements?
 The Localisation Stakeholder Dilemma
 Developers of MT Engines




Automated BLEU, METEOR, F-MEASURE, TER ideal and practical
No individual measurement has absolute meaning


but points quality curve in the right direction within a domain

KantanMT Analytics - The Missing Link
Who needs to measure Quality?
 The Localisation Stakeholder Dilemma
 Production Teams (PMs, LEs and QEs)


Need segment measurements on quality and PE efforts



Determine tiered segment post-edit rate
Distribution of post-editing tasks based on segment quality

 Localisation Managers


Need productivity measurements to predict budget and schedule



Aka Project Segment Reports
MT Measurements need to ‘fit’ business planning and charge models

 Translators


Unfortunately, don’t get a fair deal


No segment information, just top level project ‘inferences’ based on samples
KantanMT Analytics - The Missing Link
Manual
Methods

TER

BLEU

GTM

METEOR

F-Measure

NIST

MT Developers

Production

The Quality & MT Relationship

KantanMT Analytics - The Missing Link
Conclusions
 There are many automated MT quality measurements




Mostly suitable for MT developers
Not optimal for production teams
Of no use to translators

 All rely on reference texts to compute measurements

 What’s needed?
 Segment level measurements



Drive project schedule and charge model
High correlation to human effort

 Do not rely on reference texts to compute measurements

KantanMT Analytics - The Missing Link
Attributes of Quality
Attributes of Quality – Model
Language Attributes

Task-oriented Attributes

What you want…

Fluency
Adequacy

Productivity
Acceptability

KantanMT Analytics

Language
Translation Style

Task
Business Model
KantanMT Analytics - The Missing Link
Introducing KantanMT Analytics™
 Segment level scoring for MT output
 Designed to make it possible to create predictable
 Business Models
 Project Schedule
 Cost Models
 Co-developed
 KantanMT.com
 CNGL – Centre of Next Generation Localisation

KantanMT Analytics - The Missing Link
KantanMT Analytics™
 Select Analyse feature

KantanMT Analytics - The Missing Link
KantanMT Analytics™
 Select Analyse feature

KantanMT Analytics - The Missing Link
KantanMT Analytics™
 KantanMT Analytics Report

created

 XML based for consumption by

TMS/GMS platforms
KantanMT Analytics - The Missing Link
KantanMT Analytics™
 XLIFF document created

 Contains scores for each segment

KantanMT Analytics - The Missing Link
The Missing Link
Attributes of Quality – Model
Language Attributes

Task-oriented Attributes

Fluency

Productivity

KantanMT Analytics™

Adequacy

Language
Translation Style

Acceptability

Task
Business Model
KantanMT Analytics - The Missing Link

Weitere ähnliche Inhalte

Mehr von kantanmt

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskaskantanmt
 
Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2kantanmt
 
Kantanfest: Laura Casanellas
Kantanfest: Laura CasanellasKantanfest: Laura Casanellas
Kantanfest: Laura Casanellaskantanmt
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1kantanmt
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowdkantanmt
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeuralkantanmt
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answerkantanmt
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systemskantanmt
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translationkantanmt
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...kantanmt
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016kantanmt
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translationkantanmt
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translationkantanmt
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...kantanmt
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivitykantanmt
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up businesskantanmt
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?kantanmt
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translationkantanmt
 

Mehr von kantanmt (20)

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskas
 
Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2
 
Kantanfest: Laura Casanellas
Kantanfest: Laura CasanellasKantanfest: Laura Casanellas
Kantanfest: Laura Casanellas
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowd
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeural
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answer
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translation
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translation
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translation
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up business
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translation
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

KantanMT Analytics: The Missing Link in Machine Translation

  • 1. No Hardware. No Software. No Hassle MT.
  • 2. KantanMT Analytics - The Missing Link
  • 3. What we aim to cover today?  The MT & Quality Relationship  What is quality?  Possible ways of measuring it  Automated/Manual methods  Who needs to measure quality  Localisation stakeholders  The Missing Link - KantanMT Analytics   Segment level quality analysis Helping to build predictable business models 45 Mins Presentation 15 Mins Q&A  Q&A KantanMT Analytics - The Missing Link
  • 4. What is KantanMT.com?  Statistical MT System  Cloud-based    Highly scalable Inexpensive to operate Quick to deploy  Our Vision  To put Machine Translation    Customization Improvement Deployment  into your hands Fully Operational 7 months Active KantanMT Engines 6,632 Training Words Uploaded 23,653,605,925 Member Words Translated 362,291,925 KantanMT Analytics - The Missing Link
  • 5. The Quality & MT Relationship  Let’s agree a model for defining quality! Quality Target (defined by client) No Quality (baseline)  Taking into consideration quality of MT outputs and level of quality defined by your clients. KantanMT Analytics - The Missing Link
  • 6. Attributes of Quality Attributes of Quality – Model Language Attributes  Adequacy   Fluency Adequacy Meaning of generated texts expressed in source/target  Fluency   Comprehensibility & readability Factors include    Task-oriented Attributes  Productivity  Post-editing speed  Acceptability   Fit-for-purpose measurement Usable translations within the context of the end user/client Acceptability Grammar errors word selection syntax Language Productivity Task KantanMT Analytics - The Missing Link
  • 7. Attributes of Quality Attributes of Quality – Model Language Attributes  Adequacy   Fluency Adequacy Meaning of generated texts expressed in source/target  Fluency   Comprehensibility & readability Factors include    Task-oriented Attributes  Productivity  Post-editing speed  Acceptability   Fit-for-purpose measurement Usable translations within the context of the end user/client Acceptability Grammar errors word selection syntax Language Translation Style Productivity Task Business Model KantanMT Analytics - The Missing Link
  • 8. Attributes of Quality Attributes of Quality – Model Language Attributes Task-oriented Attributes What we want? Fluency Adequacy Productivity Acceptability FuzzyMatch Language Translation Style Task Business Model KantanMT Analytics - The Missing Link
  • 9. Measuring MT Quality  Automated  Fast  Repeatable  Objective  Scalable  Cheap  Based on samples  Can’t be used by PMs  Scope/Cost predictions  Manual  Slow  Cumbersome  Subjective  Not scalable  Expensive  Based on samples  Can’t be used by PMs  Scope/Cost predictions KantanMT Analytics - The Missing Link
  • 10. Measuring MT by hand!  Sample Translations based on template Style Wrong terminology Wrong Spelling Source not Capitalization Translated/Omissions Syntax & Grammar Compliance with client specs Wrong Word Form Literal translation Part of Speech Wrong Text/Information added Punctuation Technical Tags and Markup Sentence Structure Locale Adaptation Overall Spacing Adequacy Score Fluency Score Overall Quality Score KantanMT Analytics - The Missing Link
  • 11. Manual Framework  Adequacy Score (Range 1 – 5) 5  Full Meaning  All meaning expressed in the source segment appears in the translated segment  Most Meaning  Most of the source segment meaning is expressed in the translated segment  Much Meaning  Much of the source segment meaning is expressed in the translated segment  Little Meaning  Little of the source segment is expressed in the translated segment  No Meaning  None of the meaning expressed in the source segment is expressed in the translated segment 1 KantanMT Analytics - The Missing Link
  • 12. Manual Framework  Fluency Score (Range 1 – 5) 5  Native language fluency  No grammar errors, excellent word selection and good syntax. No post-editing required.  Near native fluency  Few terminology/grammar errors. No impact on overall understanding of the meaning. Little post-editing required.  Not very fluent  About half of translation contains errors and requires post-editing.  Little fluency  Wrong word choice, poor grammar and syntax. A lot of post-editing required.  No fluency  Absolutely ungrammatical and doesn’t make any sense. Re-translate from scratch . 1 KantanMT Analytics - The Missing Link
  • 13. Source MT Target Spacing Syntax and Grammar Locale Adaptation Tags and Markup Sentence Structure Punctuation Wrong Part of Speech Style Wrong Word Form Capitalization Text/Information added Literal translation Compliance with client specs Source not Translated/Omissions Wrong Spelling Wrong terminology Overall quality (1-4) Fluency (Score 1-5) Adequacy (Score 1-5) Manual Framework Tech KantanMT Analytics - The Missing Link
  • 14. Manual Framework Attributes of Quality – Model Language Attributes Fluency Task-oriented Attributes Productivity Manual Methods Adequacy Acceptability Language Translation Style Task Business Model KantanMT Analytics - The Missing Link
  • 15. Automated Methods  Many different methods available  BLEU, F-Measure, GTM, TER, NIST, Meteor, etc.  Common characteristics  Compute similarity of generated texts to reference texts  The smaller the difference => the better the quality!  Broad adoption  Industry & Academia KantanMT Analytics - The Missing Link
  • 16. Automated Methods  F-Measure  Recall & Precision Metric Reference Translation MT Output Recall Precision F-Measure correct Ref-Len correct MT-Len Precision * Recall (Precision + Recall) /2 80% 66% 73%  Flaw: no penalty for reordering KantanMT Analytics - The Missing Link
  • 17. Automated Methods  WER (Word Error Rate)  Min number of edits to transform output to reference Reference Translation MT Output WER Substitutions + insertions + deletions Reference-length   Levenshtein distance measure General indicator of Post-Editing Effort KantanMT Analytics - The Missing Link
  • 18. Automated Methods  BLEU Score  Put simply – measures how many words overlap, giving higher scores to sequential words  High correlation between BLEU and human judgement of translation quality Reference Translation MT Output KantanMT Analytics - The Missing Link
  • 19. Automated Methods  KantanWatch™ can be used to track and monitor automated scores * KantanWatch Reports KantanMT Analytics - The Missing Link
  • 20. Automated Methods  Improvements can be monitored during the build- measure-learn cycle of a KantanMT deployment * KantanWatch Reports KantanMT Analytics - The Missing Link
  • 21. Automated Methods  Time-graphs offer good overview of the maturing of a KantanMT engine * KantanWatch Reports KantanMT Analytics - The Missing Link
  • 22. Automated Methods  Can also present a holistic view of the potential quality of KantanMT outputs * KantanWatch Reports KantanMT Analytics - The Missing Link
  • 23. Automated Methods Attributes of Quality – Model Language Attributes Task-oriented Attributes NIST Fluency Productivity GTM F-Measure Adequacy TER Acceptability BLEU METEOR Language Task Translation Style Business Model Major Flaw: All measurements based on reference translations KantanMT Analytics - The Missing Link
  • 24. Who uses these measurements?  The Localisation Stakeholder Dilemma  Developers of MT Engines   Automated BLEU, METEOR, F-MEASURE, TER ideal and practical No individual measurement has absolute meaning  but points quality curve in the right direction within a domain KantanMT Analytics - The Missing Link
  • 25. Who needs to measure Quality?  The Localisation Stakeholder Dilemma  Production Teams (PMs, LEs and QEs)  Need segment measurements on quality and PE efforts   Determine tiered segment post-edit rate Distribution of post-editing tasks based on segment quality  Localisation Managers  Need productivity measurements to predict budget and schedule   Aka Project Segment Reports MT Measurements need to ‘fit’ business planning and charge models  Translators  Unfortunately, don’t get a fair deal  No segment information, just top level project ‘inferences’ based on samples KantanMT Analytics - The Missing Link
  • 26. Manual Methods TER BLEU GTM METEOR F-Measure NIST MT Developers Production The Quality & MT Relationship KantanMT Analytics - The Missing Link
  • 27. Conclusions  There are many automated MT quality measurements    Mostly suitable for MT developers Not optimal for production teams Of no use to translators  All rely on reference texts to compute measurements  What’s needed?  Segment level measurements   Drive project schedule and charge model High correlation to human effort  Do not rely on reference texts to compute measurements KantanMT Analytics - The Missing Link
  • 28. Attributes of Quality Attributes of Quality – Model Language Attributes Task-oriented Attributes What you want… Fluency Adequacy Productivity Acceptability KantanMT Analytics Language Translation Style Task Business Model KantanMT Analytics - The Missing Link
  • 29. Introducing KantanMT Analytics™  Segment level scoring for MT output  Designed to make it possible to create predictable  Business Models  Project Schedule  Cost Models  Co-developed  KantanMT.com  CNGL – Centre of Next Generation Localisation KantanMT Analytics - The Missing Link
  • 30. KantanMT Analytics™  Select Analyse feature KantanMT Analytics - The Missing Link
  • 31. KantanMT Analytics™  Select Analyse feature KantanMT Analytics - The Missing Link
  • 32. KantanMT Analytics™  KantanMT Analytics Report created  XML based for consumption by TMS/GMS platforms KantanMT Analytics - The Missing Link
  • 33. KantanMT Analytics™  XLIFF document created  Contains scores for each segment KantanMT Analytics - The Missing Link
  • 34. The Missing Link Attributes of Quality – Model Language Attributes Task-oriented Attributes Fluency Productivity KantanMT Analytics™ Adequacy Language Translation Style Acceptability Task Business Model KantanMT Analytics - The Missing Link

Hinweis der Redaktion

  1. No more expensive deploymentsMonthly subscription plan Customised subscription planNo more complexityKantanMT does all the heavy liftingYou focus on what you do best – grow and develop your business
  2. Flaw – no penalty for reordering
  3. Flaw – no penalty for reordering
  4. Flaw – no penalty for reordering