Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation

SDL Proprietary and Confidential
How to Attain
Maximum Machine
Translation Quality
Rodrigo Fuentes Corradi, MT Consultant
SDL Language Customer Success Summit | June 7, 2016

2
Overview: The SDL MT Team
Who we are
First to commercialize Statistical
Machine Translation
o 50+ Professionals
o Over 10 Nationalities
o Across 5 Time Zones
o 8 Locations
o Computational
Linguists
o Project
Managers
Widespread team of language lovers:
o Data
Specialists
o Post-
Editors
…all gathered from the
four corners of SDL!
What we do
Drive MT Adoption:
Educate, promote and support MT
usage in existing SDL accounts
& new opportunities
o Design
o Create
o Test
o Implement
o Monitor
Custom Engine Builds:
…custom
Statistical Machine
Translation
engines
Linguistic Projects:
Semantic annotation projects
for US Government bodies
& academic institutions
How we do it
o Los Angeles, CA
o Cambridge, UK
Two Research Labs:
o 30+ Production offices
resourcing MTPE
o Custom Training for MTPE
resources
o Investment in Universities
and future supply chain
We’re Evangelists…about
Machine Translation, using
automation to accelerate
productivity
PE Production offices

3
Post-Edit
SDL’s Intelligent Machine Translation (iMT):
Key steps in MT life cycle
Evaluate Train MT Test
SDL Approach
Refine
Engineers Developers ScientistsPost-Editor
Process Workflow
Resource Pool
Computational
Linguists

4
Teamwork for MT success
○ The MT market is undergoing
radical transformation
○ Scepticism remains in terms
of what benefit MT can bring
to business
○ Increasing numbers of mature
MT players opt for a structured
MT approach to match current
communication demands
○ The secret of MTPE success
lies in a step-by-step,
resource-by-resource approach
to Enterprise scale Post-Editing
Account Managers
& Consultants
o Technical consulting
o Research & implement
specific solutions
o Sales support
PJMs
o Communications
o Project coordinate
o Reporting
o Support for
consulting
Linguists
o Prepare
customized
material
o Give trainings
online or on-site
Linguists
o Data cleaning
o Expert training
o Engine testing
o Maintenance
Engineers
o Data evaluation
o Alignment
o Conversion
Translation
Manager
o Consolidate
feedback on quality
o Run PE Certification
to improve quality
SDL MT Team Roles
Post-Edit Training
Engine Building
& Testing
Data Analysis
& Management
Quality Management
Project Management

Just Starting: Content,
Use Case & Solutions

6
○ Faster throughput without sacrificing quality
○ To meet aggressive turnarounds
○ Ability to handle increasing content volume / volume fluctuation
○ Lower production costs
○ For high volume, MT can be more consistent
The demand for MT solutions is growing quickly & post-editing
is rapidly becoming a basic skill for translators
Why companies use MT post-editing

7
Right translation method, right price, right time
Quality
Volume
Human Translation Machine Translation
Blogs
User Forums
Reviews
Chat
Email
Support
FAQ
Websites
Wikis
Knowledge
Base
Alerts/
Notifications
Help
User
Guides
Documentation
Post-Edit
Newsletters
Advertising Marketing
Legal
Light Post-Edit

8
SDL’s solutions for increasing MT quality
Customized
Engines
Domain
Verticals
Baselines
Language
Verticals

Engine Creation &
Data Best Practices

10
Good data for customized engines
How much?
What
content?
What style?
Engineers
Vertical engines or baselines may work better if you don’t have enough or the right type of content
Computational
Linguists
o More is better. The statistical algorithms work better with many
words to analyse. Upwards of one million words for best
success. For very consistent, clean data, half of that may work.
o Content should all be from one content type, using similar
terminology. A mix of content types (e.g., technical
manuals, advertising, etc.) may produce poor results.
o Style should be consistent. The algorithms learn patterns from
similarities, and perform better if data is in similar form. Very
long sentences, or creative and varied styles, can negatively
affect trainings.

11
Types of training data
Bilingual
Parallel
Terminology
Source Only
Target Only
o Core training data: translated content, usually in a translation memory.
This is the content that works best and can be processed the fastest.
o Translated content, but in separate files. This can be used if the content has
been translated exactly, and the format is the same. If for example the
document has extra tables in one language, or has been rewritten
substantially to fit a different market, it is hard to find matching sentences.
o Added to the training data to ensure corporate terms and brands are
translated consistently. This can be a termbase or a simple bilingual word list.
o Representative documents of the content that will be translated. They
are used in initial evaluations of suitability for MT and to test the quality of
the engine. Depending on their size, some 50-100 documents are ideal.
o Representative documents in the translated language. They are used
during the training and contribute to the fluency of the output. To have
an effect, large numbers are needed, several million words are ideal.

12
Goals:
o Enable volume translations
o Migrate content from HT to PE
o Provide accuracy and term
consistency
o Provide productivity increases
Feedback
New MT customization workflow
Utility and / or
Productivity Testing
SDL
Assessment
Client
Request
Engine
Trainings
Auto Eval
Metrics
Data Intake
&
Processing
Blind Human
Evaluation
Deploy
Engine
Method
o Iterative engine trainings, with several
engines created with the best being deployed
o Output matches your style and terminology
o Engines “learn” from your Translation
Memories and terminology
o Work in combination with Baseline language
engines
Post-Editor
Computational
Linguists

14
MT testing approaches
Automated Measures
o Useful to compare competing engines and identify the best engine with a high reliability
o No predictive value for Post-editing productivity but can validate post-editor’s feedback on MT output
o All automated measures have their flaws, but SDL has found a weighted combination of measures that gives
significant insights.
Human - Quality Scoring
o Resources are asked to score the MT output according to instructions, with a focus on understandability.
o Advantage of method: Human evaluation is considered more robust to alternative, but also valid translations.
Note: Human evaluations are prone to subjectivity so you need multiple test subjects. Performing this kind of test is more
expensive and time consuming than an automated approach, but can give an absolute value for one engine, not just a comparison.
Human – Productivity Testing
o Productivity gain for MT is calculated by comparing post-editing speed with conventional translation speed so
evaluators can assess how much value post-editing would add in a production environment.
o Advantage of method: For Post-Editing, results are a good indicator of the suitability of the MT output.
Note: Productivity increase is a difficult factor to predict for all cases and It’s also the most expensive and time consuming
test of the three.
Engineers Developers
MT evaluations should be relevant to your content, from the method of testing
(Automatic vs. Human Evaluation) to the testbed. It should represent true
life scenarios, taking the available Science and applying it commercially.
Computational
Linguists

15
SDL’s custom MT evaluation platform
○ Data is presented to evaluators in a blind test scenario
in order to safeguard validity of results
○ Evaluation speed is recorded per segment
○ Multiple evaluators assess the same set of sentences
○ Each individual performance is compared to ensure
consistency
Additional measures for productivity tests:
○ Productivity increase from HT to PE
○ Translator’s editing actions (insert, copy-paste, pause)
○ Percentage of MT segments that do not require editing
○ Levenshtein edit distance from MT to final translation
1,127
1,510
1,026
1,188
1,123
1,816
1,470 1,414
Client
resource
SDL
resource 1
SDL
resource 2
Total
Speed (WPH)
Human
Baseline
Can evaluate both Sentence level quality & post-edit
productivity gain via a custom testing platform and
ensure the validity of results
3.15 3.04 3.09
3.01 2.92 2.97
0.13 0.12 0.13
evaluator1 evaluator2 Average total
Customization-Baseline: Average scores
Customization
Baseline
Delta

17
Achieving effective post-editing process
Raw output:
Building blocks
are in place
Linguists focus on
refining the output
Terminology &
style are applied
At high volume,
MT can deliver
greater consistency
Trained linguists
certified in MT
post-editing
Post-Editor

18
Post-editing quality guidelines
When post-editing to publishable quality, the following basic principles still apply:
o The same
references must
be used as
for conventional
translation (project-
specific guidelines,
TMs, glossaries,
termbases, etc.)
o Grammar,
spelling and
punctuation
must be correct
o Appropriate
style & correct
terminology
must be used
consistently
o The translation
must read well
and be suitable
for its intended
purpose
Customer
User Guide

19
What is your quality requirement?
Error Category Specific Issue
Translation
($$$)
Publishable PE
($$)
Light PE
($)
Mistranslation Error   
Terminology Glossary adherence   
Consistency   x
Accuracy Omissions/Additions   
Language
Grammar   x
Spelling   x
Punctuation   x
Style General Style   x
Country
Country Standards   x
Register & Tone   x

How to Maintain & Improve
Future Performance

21
Technical
support
Product
development
Product
development
iMT
consultants
Scientific
development
Hotfix
Terms & brands
Python filters to
protect and
transform patterns
Fundamental
problem
Influence long term
scientific strategy
iMT
consultants
Scheduled fix for
future product
release
Analysis of setup,
technical advice
Major tool issue
Minor tool issue
Protected content
translated, wrong
terminology Translation errors
following patterns,
like dates
Expected MT
behaviour
Linguistic
Technical
The effects of post-editor feedback

22
Post-editors identify expected SMT misbehavior
Incorrect
formatting
Additional or
missing words
Words not
localised
Gender, number,
agreement or
verb inflection
issues
Compound
formation issues
Syntax and word
order issues
Wrong
punctuation
Inconsistent or
non-compliant
terminology
Mistranslations

23
Punctuation not following
the specific language rules
Syntax and word order issues
very frequently observed
Inconsistent or wrong terminology
very frequently observed
Examples of unexpected misbehavior
HTML entities instead of the correct
character (i.e. & instead of &)
Words in a language other than
the target
Engineers
Scientists
Post-Editor
Computational
Linguists

25
SDL iMT Group are constantly researching
ways to improve Vertical and Customized
MT Engines
SDL Research Scientists are continuously
improving the Statistical Machine Translation
algorithms (e.g. Language Models, Translation
Models, Reordering Models, Syntax,
Transliteration, Rule-Based Components, etc…)
SDL Data Engineers are
continuously mining large
amounts of good data used
by the statistical algorithms
Continuous improvement

26
Legacy MT
Legacy MT
(Monolithic
Phrase-based)
Foreign
Language
Your
Language

27
……
Neural
Networks
Compound
Splitting
Phrase-
Based
Finite
State
Automata
String
to Tree
Rule-
Based
Tree to
String
Pre-
Ordering
Trans-
literation
Hidden
Markov
Model
Hyper
Graphs
Modular &
Flexible
“State-of-the-Art”
Machine Learning
Better Translation
Quality
Rapid Research
Transition
SDL XMT: Next generation technology, higher quality
XMT
Foreign
Language
Your
Language
M O D U L A R C O M P O N E N T S

28
Legacy MT systems are static
MT Provider Post-Editor
MT
Engine
xx x xxx
xx xxxxx
xxxx xxx
x x xx x
xxx x xx
PE Edited
xx x xxx
xx xxxxx
xxxx xxx
x x xx x
xxx x xx
MT Output

29
SDL MT innovation – Adaptive MT
○ New technology developed by SDL Research
○ An Adaptive MT engine that learns interactively from
the post-editor’s edits
SDL Adaptive MT Post-Editor
MT
Engine
Adaptive MT
Processor
xx x xxx
xx xxxxx
xxxx xxx x
x xx x xxx
x xx
PE Edited
xx x xxx
xx xxxxx
xxxx xxx
x x xx x
xxx x xx
MT Output

30
Adaptive MT key Features & Benefits
○ Creates a personal
adaptive MT engine
for the user
○ Interactive
o Improves
post-editor’s
productivity
○ Reduces the
frustration of editing
the same incorrect MT
○ Cumulative learning
over time – saved
from job to job
○ No need to wait for a
retrain

31
French
Le service était exceptionnel
Lits très à l'aise
La vue était breathtaking
French Translation
Le service clientèle était exceptionnel
Lits très confortables à l'aise
La vue était à couper le souffle breathtaking
English Document
The customer service was outstanding
Very comfortable beds
The view was breathtaking
French Translation
Le service ____ était excellent
Les lits étaient très à l'aise
Quelle breathtaking vue!
User
Feedback
English Document
The customer service was excellent
The beds were very comfortable
What a breathtaking view!
Before Adaptive MT
Machine
Translation

32
French
Le service était exceptionnel
Lits très à l'aise
La vue était breathtaking
French Translation
Le service clientèle était exceptionnel
Lits très confortables à l'aise
La vue était à couper le souffle breathtaking
English Document
The customer service was outstanding
Very comfortable beds
The view was breathtaking
French Translation
Le service clientèle était excellent
Les lits étaient très confortables
Quelle vue à couper le souffle!
User
Feedback
English Document
The customer service was excellent
The beds were very comfortable
What a breathtaking view!
Machine
Translation
Adaptive MT
Engineers Post-Editor Developers Scientists
Computational
Linguists
With Adaptive MT

34
Focus on Canada’s market challenges
Flavor
requirements
Large retail
projects, no or
small starting
TMs
High
turnover
High quality
requirements
Traditional
offer (SDL
prior to 2014,
Google, Bing)
Mixed
French
flavor
Mixed
domains,
no retail
vertical
Lack of suitable
generic solutions
prevent MTPE
from the start
Lack of flavor &
domain-specific
terminology
increase PE
effort and review
costs
   

35
Engine performance summary
Flavor
TerminologyFluency
Flavor
TerminologyFluency
Flavor
TerminologyFluency
Flavor
TerminologyFluency
FR Baseline
FR-CA Language Vertical
FR Domain Verticals
Customizations

36
SDL’s solution maturity roadmap
Generic
FR-CA
solutions
o Win clients
o Meet deadlines
o Collect project-specific data
Customizations
o Improve productivity
& quality
o Collect more data and
share feedback
Retrainings
o Further
improvement to
productivity and
quality
M A T U R I T Y

37
SDL’s answer to Canada’s market challenges
Flavor
requirements
Large retail
projects, no or
small starting
TMs
High
turnover
High quality
requirements
SDL’s offer
after 2014
Training
material is
handpicked to
ensure correct
flavor
We have grown
retail solutions
to fit current
& new
opportunites
We have a
portfolio of
training material
& success
recipes for a
quick start
Combination of
adapted MT
solutions &
shrewd testing
and feedback
processes
   

39
How do I get started?
Let’s have a conversation:
What content do you
need translated?
What are your quality
requirements?
What can you use for
a training corpus?

40
Takeaway
o Measure
& improve
1 2 3 4 5
o MT can be
complex, so
choose your
MT provider
wisely
o Document
your quality
requirement
o Integrate MT
within your
larger
localization
infrastructure
o Use trained,
certified
post-editors

Copyright © 2008-2016 SDL plc. All rights reserved. All company names, brand names, trademarks,
service marks, images and logos are the property of their respective owners.
This presentation and its content are SDL confidential unless otherwise specified, and may not be
copied, used or distributed except as authorised by SDL.
Global Customer Experience Management

Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation

Ähnlich wie Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation (20)

Mehr von SDL

Mehr von SDL (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation

Hinweis der Redaktion