KantanMT Analytics: The Missing Link in Machine Translation

No Hardware. No Software. No Hassle MT.

KantanMT Analytics - The Missing Link

What we aim to cover today?
 The MT & Quality Relationship
 What is quality?
 Possible ways of measuring it
 Automated/Manual methods
 Who needs to measure quality
 Localisation stakeholders
 The Missing Link - KantanMT Analytics



Segment level quality analysis
Helping to build predictable business models

45 Mins Presentation
15 Mins Q&A

 Q&A

What is KantanMT.com?
 Statistical MT System
 Cloud-based




Highly scalable
Inexpensive to operate
Quick to deploy

 Our Vision
 To put Machine Translation




Customization
Improvement
Deployment

 into your hands

Fully Operational 7 months
Active KantanMT Engines

6,632
Training Words Uploaded

23,653,605,925
Member Words Translated

362,291,925

The Quality & MT Relationship
 Let’s agree a model for defining quality!

Quality Target (defined by client)

No Quality (baseline)



Taking into consideration quality of MT outputs and level of quality defined by your clients.


Attributes of Quality
Attributes of Quality – Model
Language Attributes
 Adequacy




Fluency
Adequacy

Meaning of generated texts

expressed in source/target

 Fluency



Comprehensibility & readability
Factors include




Task-oriented Attributes
 Productivity


Post-editing speed

 Acceptability



Fit-for-purpose measurement
Usable translations within the
context of the end user/client

Acceptability

Grammar errors
word selection
syntax

Language

Productivity

Task


Language Attributes
 Adequacy




Fluency
Adequacy

Meaning of generated texts

expressed in source/target

 Fluency



Comprehensibility & readability
Factors include




 Productivity


Post-editing speed

 Acceptability



Fit-for-purpose measurement
Usable translations within the
context of the end user/client

Acceptability

Grammar errors
word selection
syntax

Language
Translation Style

Productivity

Task
Business Model

Language Attributes


What we want?

Fluency
Adequacy

Productivity
Acceptability

FuzzyMatch

Language
Translation Style

Task
Business Model

Measuring MT Quality
 Automated
 Fast
 Repeatable
 Objective
 Scalable
 Cheap
 Based on samples
 Can’t be used by PMs


Scope/Cost predictions

 Manual
 Slow
 Cumbersome
 Subjective
 Not scalable
 Expensive
 Based on samples
 Can’t be used by PMs


Scope/Cost predictions


Measuring MT by hand!
 Sample Translations based on template
Style

Wrong terminology
Wrong Spelling
Source not Capitalization
Translated/Omissions
Syntax & Grammar
Compliance with client specs
Wrong Word Form
Literal translation Part of Speech
Wrong
Text/Information added
Punctuation
Technical
Tags and Markup
Sentence Structure
Locale Adaptation

Overall

Spacing Adequacy Score
Fluency Score
Overall Quality Score


Manual Framework
 Adequacy Score (Range 1 – 5)

5

 Full Meaning


All meaning expressed in the source segment appears in the translated
segment

 Most Meaning


Most of the source segment meaning is expressed in the translated segment

 Much Meaning


Much of the source segment meaning is expressed in the translated segment

 Little Meaning


Little of the source segment is expressed in the translated segment

 No Meaning


None of the meaning expressed in the source segment is expressed in the
translated segment

1


Manual Framework
 Fluency Score (Range 1 – 5)

5

 Native language fluency


No grammar errors, excellent word selection and good syntax. No post-editing
required.

 Near native fluency


Few terminology/grammar errors. No impact on overall understanding of the
meaning. Little post-editing required.

 Not very fluent


About half of translation contains errors and requires post-editing.

 Little fluency


Wrong word choice, poor grammar and syntax. A lot of post-editing required.

 No fluency


Absolutely ungrammatical and doesn’t make any sense. Re-translate from
scratch .

1


Source
MT Target
Spacing

Syntax and Grammar

Locale Adaptation

Tags and Markup

Sentence Structure

Punctuation

Wrong Part of Speech

Style

Wrong Word Form

Capitalization

Text/Information added

Literal translation

Compliance with client specs

Source not
Translated/Omissions

Wrong Spelling

Wrong terminology

Overall quality (1-4)

Fluency (Score 1-5)

Adequacy (Score 1-5)

Manual Framework
Tech


Manual Framework
Language Attributes

Fluency


Productivity

Manual
Methods

Adequacy

Acceptability

Language
Translation Style

Task
Business Model

Automated Methods
 Many different methods available
 BLEU, F-Measure, GTM, TER, NIST, Meteor, etc.
 Common characteristics
 Compute similarity of generated texts to reference texts
 The smaller the difference => the better the quality!
 Broad adoption
 Industry & Academia


Automated Methods
 F-Measure
 Recall & Precision Metric
Reference Translation
MT Output
Recall

Precision

F-Measure

correct
Ref-Len

correct
MT-Len

Precision * Recall
(Precision + Recall) /2

80%

66%

73%

 Flaw: no penalty for reordering

Automated Methods
 WER (Word Error Rate)
 Min number of edits to transform output to reference
MT Output
WER
Substitutions + insertions + deletions
Reference-length




Levenshtein distance measure
General indicator of Post-Editing Effort

Automated Methods
 BLEU Score
 Put simply – measures how many words overlap, giving
higher scores to sequential words
 High correlation between BLEU and human judgement of
translation quality

MT Output


Automated Methods
 KantanWatch™ can be used to track and monitor

automated scores

* KantanWatch Reports


Automated Methods
 Improvements can be monitored during the build-

measure-learn cycle of a KantanMT deployment



Automated Methods
 Time-graphs offer good overview of the maturing of a

KantanMT engine



Automated Methods
 Can also present a holistic view of the potential quality

of KantanMT outputs



Automated Methods
Language Attributes


NIST

Fluency

Productivity

GTM
F-Measure

Adequacy

TER

Acceptability

BLEU
METEOR

Language

Task

Translation Style
Business Model
Major Flaw: All measurements based on reference translations

Who uses these measurements?
 The Localisation Stakeholder Dilemma
 Developers of MT Engines




Automated BLEU, METEOR, F-MEASURE, TER ideal and practical
No individual measurement has absolute meaning


but points quality curve in the right direction within a domain


Who needs to measure Quality?
 The Localisation Stakeholder Dilemma
 Production Teams (PMs, LEs and QEs)


Need segment measurements on quality and PE efforts



Determine tiered segment post-edit rate
Distribution of post-editing tasks based on segment quality

 Localisation Managers


Need productivity measurements to predict budget and schedule



Aka Project Segment Reports
MT Measurements need to ‘fit’ business planning and charge models

 Translators


Unfortunately, don’t get a fair deal


No segment information, just top level project ‘inferences’ based on samples

Manual
Methods

TER

BLEU

GTM

METEOR

F-Measure

NIST

MT Developers

Production

The Quality & MT Relationship


Conclusions
 There are many automated MT quality measurements




Mostly suitable for MT developers
Not optimal for production teams
Of no use to translators

 All rely on reference texts to compute measurements

 What’s needed?
 Segment level measurements



Drive project schedule and charge model
High correlation to human effort

 Do not rely on reference texts to compute measurements


Language Attributes


What you want…

Fluency
Adequacy

Productivity
Acceptability

KantanMT Analytics

Language
Translation Style

Task
Business Model

Introducing KantanMT Analytics™
 Segment level scoring for MT output
 Designed to make it possible to create predictable
 Business Models
 Project Schedule
 Cost Models
 Co-developed
 KantanMT.com
 CNGL – Centre of Next Generation Localisation


KantanMT Analytics™
 Select Analyse feature


 KantanMT Analytics Report

created

 XML based for consumption by

TMS/GMS platforms

 XLIFF document created

 Contains scores for each segment


The Missing Link
Language Attributes


Fluency

Productivity


Adequacy

Language
Translation Style

Acceptability

Task
Business Model

KantanMT Analytics: The Missing Link in Machine Translation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von kantanmt

Mehr von kantanmt (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

KantanMT Analytics: The Missing Link in Machine Translation

Hinweis der Redaktion