www.kantanmt.com.
Tony O’Dowd discusses the major developments in Machine Translation over the last few years with a particular focus on measurement technologies.
In the past, users of Machine Translation have had considerable difficulty in pre-evaluating the quality of their Machine Translation output. This has led to industry confusion with regards to both post editing pricing and client Machine Translation project pricing. Speaking about the ‘confidence scoring’ technology which has been co-developed with CNGL, Tony illustrates how LSPs and other users of Machine Translation can now accurately predict the quality of their Machine Translation output on a segment by segment basis.
3. What we aim to cover today?
The MT & Quality Relationship
What is quality?
Possible ways of measuring it
Automated/Manual methods
Who needs to measure quality
Localisation stakeholders
The Missing Link - KantanMT Analytics
Segment level quality analysis
Helping to build predictable business models
45 Mins Presentation
15 Mins Q&A
Q&A
KantanMT Analytics - The Missing Link
4. What is KantanMT.com?
Statistical MT System
Cloud-based
Highly scalable
Inexpensive to operate
Quick to deploy
Our Vision
To put Machine Translation
Customization
Improvement
Deployment
into your hands
Fully Operational 7 months
Active KantanMT Engines
6,632
Training Words Uploaded
23,653,605,925
Member Words Translated
362,291,925
KantanMT Analytics - The Missing Link
5. The Quality & MT Relationship
Let’s agree a model for defining quality!
Quality Target (defined by client)
No Quality (baseline)
Taking into consideration quality of MT outputs and level of quality defined by your clients.
KantanMT Analytics - The Missing Link
6. Attributes of Quality
Attributes of Quality – Model
Language Attributes
Adequacy
Fluency
Adequacy
Meaning of generated texts
expressed in source/target
Fluency
Comprehensibility & readability
Factors include
Task-oriented Attributes
Productivity
Post-editing speed
Acceptability
Fit-for-purpose measurement
Usable translations within the
context of the end user/client
Acceptability
Grammar errors
word selection
syntax
Language
Productivity
Task
KantanMT Analytics - The Missing Link
7. Attributes of Quality
Attributes of Quality – Model
Language Attributes
Adequacy
Fluency
Adequacy
Meaning of generated texts
expressed in source/target
Fluency
Comprehensibility & readability
Factors include
Task-oriented Attributes
Productivity
Post-editing speed
Acceptability
Fit-for-purpose measurement
Usable translations within the
context of the end user/client
Acceptability
Grammar errors
word selection
syntax
Language
Translation Style
Productivity
Task
Business Model
KantanMT Analytics - The Missing Link
8. Attributes of Quality
Attributes of Quality – Model
Language Attributes
Task-oriented Attributes
What we want?
Fluency
Adequacy
Productivity
Acceptability
FuzzyMatch
Language
Translation Style
Task
Business Model
KantanMT Analytics - The Missing Link
9. Measuring MT Quality
Automated
Fast
Repeatable
Objective
Scalable
Cheap
Based on samples
Can’t be used by PMs
Scope/Cost predictions
Manual
Slow
Cumbersome
Subjective
Not scalable
Expensive
Based on samples
Can’t be used by PMs
Scope/Cost predictions
KantanMT Analytics - The Missing Link
10. Measuring MT by hand!
Sample Translations based on template
Style
Wrong terminology
Wrong Spelling
Source not Capitalization
Translated/Omissions
Syntax & Grammar
Compliance with client specs
Wrong Word Form
Literal translation Part of Speech
Wrong
Text/Information added
Punctuation
Technical
Tags and Markup
Sentence Structure
Locale Adaptation
Overall
Spacing Adequacy Score
Fluency Score
Overall Quality Score
KantanMT Analytics - The Missing Link
11. Manual Framework
Adequacy Score (Range 1 – 5)
5
Full Meaning
All meaning expressed in the source segment appears in the translated
segment
Most Meaning
Most of the source segment meaning is expressed in the translated segment
Much Meaning
Much of the source segment meaning is expressed in the translated segment
Little Meaning
Little of the source segment is expressed in the translated segment
No Meaning
None of the meaning expressed in the source segment is expressed in the
translated segment
1
KantanMT Analytics - The Missing Link
12. Manual Framework
Fluency Score (Range 1 – 5)
5
Native language fluency
No grammar errors, excellent word selection and good syntax. No post-editing
required.
Near native fluency
Few terminology/grammar errors. No impact on overall understanding of the
meaning. Little post-editing required.
Not very fluent
About half of translation contains errors and requires post-editing.
Little fluency
Wrong word choice, poor grammar and syntax. A lot of post-editing required.
No fluency
Absolutely ungrammatical and doesn’t make any sense. Re-translate from
scratch .
1
KantanMT Analytics - The Missing Link
13. Source
MT Target
Spacing
Syntax and Grammar
Locale Adaptation
Tags and Markup
Sentence Structure
Punctuation
Wrong Part of Speech
Style
Wrong Word Form
Capitalization
Text/Information added
Literal translation
Compliance with client specs
Source not
Translated/Omissions
Wrong Spelling
Wrong terminology
Overall quality (1-4)
Fluency (Score 1-5)
Adequacy (Score 1-5)
Manual Framework
Tech
KantanMT Analytics - The Missing Link
14. Manual Framework
Attributes of Quality – Model
Language Attributes
Fluency
Task-oriented Attributes
Productivity
Manual
Methods
Adequacy
Acceptability
Language
Translation Style
Task
Business Model
KantanMT Analytics - The Missing Link
15. Automated Methods
Many different methods available
BLEU, F-Measure, GTM, TER, NIST, Meteor, etc.
Common characteristics
Compute similarity of generated texts to reference texts
The smaller the difference => the better the quality!
Broad adoption
Industry & Academia
KantanMT Analytics - The Missing Link
17. Automated Methods
WER (Word Error Rate)
Min number of edits to transform output to reference
Reference Translation
MT Output
WER
Substitutions + insertions + deletions
Reference-length
Levenshtein distance measure
General indicator of Post-Editing Effort
KantanMT Analytics - The Missing Link
18. Automated Methods
BLEU Score
Put simply – measures how many words overlap, giving
higher scores to sequential words
High correlation between BLEU and human judgement of
translation quality
Reference Translation
MT Output
KantanMT Analytics - The Missing Link
19. Automated Methods
KantanWatch™ can be used to track and monitor
automated scores
* KantanWatch Reports
KantanMT Analytics - The Missing Link
20. Automated Methods
Improvements can be monitored during the build-
measure-learn cycle of a KantanMT deployment
* KantanWatch Reports
KantanMT Analytics - The Missing Link
21. Automated Methods
Time-graphs offer good overview of the maturing of a
KantanMT engine
* KantanWatch Reports
KantanMT Analytics - The Missing Link
22. Automated Methods
Can also present a holistic view of the potential quality
of KantanMT outputs
* KantanWatch Reports
KantanMT Analytics - The Missing Link
23. Automated Methods
Attributes of Quality – Model
Language Attributes
Task-oriented Attributes
NIST
Fluency
Productivity
GTM
F-Measure
Adequacy
TER
Acceptability
BLEU
METEOR
Language
Task
Translation Style
Business Model
Major Flaw: All measurements based on reference translations
KantanMT Analytics - The Missing Link
24. Who uses these measurements?
The Localisation Stakeholder Dilemma
Developers of MT Engines
Automated BLEU, METEOR, F-MEASURE, TER ideal and practical
No individual measurement has absolute meaning
but points quality curve in the right direction within a domain
KantanMT Analytics - The Missing Link
25. Who needs to measure Quality?
The Localisation Stakeholder Dilemma
Production Teams (PMs, LEs and QEs)
Need segment measurements on quality and PE efforts
Determine tiered segment post-edit rate
Distribution of post-editing tasks based on segment quality
Localisation Managers
Need productivity measurements to predict budget and schedule
Aka Project Segment Reports
MT Measurements need to ‘fit’ business planning and charge models
Translators
Unfortunately, don’t get a fair deal
No segment information, just top level project ‘inferences’ based on samples
KantanMT Analytics - The Missing Link
27. Conclusions
There are many automated MT quality measurements
Mostly suitable for MT developers
Not optimal for production teams
Of no use to translators
All rely on reference texts to compute measurements
What’s needed?
Segment level measurements
Drive project schedule and charge model
High correlation to human effort
Do not rely on reference texts to compute measurements
KantanMT Analytics - The Missing Link
28. Attributes of Quality
Attributes of Quality – Model
Language Attributes
Task-oriented Attributes
What you want…
Fluency
Adequacy
Productivity
Acceptability
KantanMT Analytics
Language
Translation Style
Task
Business Model
KantanMT Analytics - The Missing Link
29. Introducing KantanMT Analytics™
Segment level scoring for MT output
Designed to make it possible to create predictable
Business Models
Project Schedule
Cost Models
Co-developed
KantanMT.com
CNGL – Centre of Next Generation Localisation
KantanMT Analytics - The Missing Link
32. KantanMT Analytics™
KantanMT Analytics Report
created
XML based for consumption by
TMS/GMS platforms
KantanMT Analytics - The Missing Link
33. KantanMT Analytics™
XLIFF document created
Contains scores for each segment
KantanMT Analytics - The Missing Link
34. The Missing Link
Attributes of Quality – Model
Language Attributes
Task-oriented Attributes
Fluency
Productivity
KantanMT Analytics™
Adequacy
Language
Translation Style
Acceptability
Task
Business Model
KantanMT Analytics - The Missing Link
Hinweis der Redaktion
No more expensive deploymentsMonthly subscription plan Customised subscription planNo more complexityKantanMT does all the heavy liftingYou focus on what you do best – grow and develop your business