SDL provides machine translation solutions to customers. They have a team of over 50 professionals across various locations that work on driving MT adoption, building custom engines, and conducting linguistic projects. SDL's approach involves evaluating data, training machine translation engines, testing outputs, and refining engines through an iterative process with a focus on maximizing quality. They provide customized solutions through domain-specific engines and language verticals to meet the needs of different customers and content types.
Scaling API-first – The story of a global engineering organization
Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation
1. SDL Proprietary and Confidential
How to Attain
Maximum Machine
Translation Quality
Rodrigo Fuentes Corradi, MT Consultant
SDL Language Customer Success Summit | June 7, 2016
2. 2
Overview: The SDL MT Team
Who we are
First to commercialize Statistical
Machine Translation
o 50+ Professionals
o Over 10 Nationalities
o Across 5 Time Zones
o 8 Locations
o Computational
Linguists
o Project
Managers
Widespread team of language lovers:
o Data
Specialists
o Post-
Editors
…all gathered from the
four corners of SDL!
What we do
Drive MT Adoption:
Educate, promote and support MT
usage in existing SDL accounts
& new opportunities
o Design
o Create
o Test
o Implement
o Monitor
Custom Engine Builds:
…custom
Statistical Machine
Translation
engines
Linguistic Projects:
Semantic annotation projects
for US Government bodies
& academic institutions
How we do it
o Los Angeles, CA
o Cambridge, UK
Two Research Labs:
o 30+ Production offices
resourcing MTPE
o Custom Training for MTPE
resources
o Investment in Universities
and future supply chain
We’re Evangelists…about
Machine Translation, using
automation to accelerate
productivity
PE Production offices
3. 3
Post-Edit
SDL’s Intelligent Machine Translation (iMT):
Key steps in MT life cycle
Evaluate Train MT Test
SDL Approach
Refine
Engineers Developers ScientistsPost-Editor
Process Workflow
Resource Pool
Computational
Linguists
4. 4
Teamwork for MT success
○ The MT market is undergoing
radical transformation
○ Scepticism remains in terms
of what benefit MT can bring
to business
○ Increasing numbers of mature
MT players opt for a structured
MT approach to match current
communication demands
○ The secret of MTPE success
lies in a step-by-step,
resource-by-resource approach
to Enterprise scale Post-Editing
Account Managers
& Consultants
o Technical consulting
o Research & implement
specific solutions
o Sales support
PJMs
o Communications
o Project coordinate
o Reporting
o Support for
consulting
Linguists
o Prepare
customized
material
o Give trainings
online or on-site
Linguists
o Data cleaning
o Expert training
o Engine testing
o Maintenance
Engineers
o Data evaluation
o Alignment
o Conversion
Translation
Manager
o Consolidate
feedback on quality
o Run PE Certification
to improve quality
SDL MT Team Roles
Post-Edit Training
Engine Building
& Testing
Data Analysis
& Management
Quality Management
Project Management
6. 6
○ Faster throughput without sacrificing quality
○ To meet aggressive turnarounds
○ Ability to handle increasing content volume / volume fluctuation
○ Lower production costs
○ For high volume, MT can be more consistent
The demand for MT solutions is growing quickly & post-editing
is rapidly becoming a basic skill for translators
Why companies use MT post-editing
7. 7
Right translation method, right price, right time
Quality
Volume
Human Translation Machine Translation
Blogs
User Forums
Reviews
Chat
Email
Support
FAQ
Websites
Wikis
Knowledge
Base
Alerts/
Notifications
Help
User
Guides
Documentation
Post-Edit
Newsletters
Advertising Marketing
Legal
Light Post-Edit
8. 8
SDL’s solutions for increasing MT quality
Customized
Engines
Domain
Verticals
Baselines
Language
Verticals
10. 10
Good data for customized engines
How much?
What
content?
What style?
Engineers
Vertical engines or baselines may work better if you don’t have enough or the right type of content
Computational
Linguists
o More is better. The statistical algorithms work better with many
words to analyse. Upwards of one million words for best
success. For very consistent, clean data, half of that may work.
o Content should all be from one content type, using similar
terminology. A mix of content types (e.g., technical
manuals, advertising, etc.) may produce poor results.
o Style should be consistent. The algorithms learn patterns from
similarities, and perform better if data is in similar form. Very
long sentences, or creative and varied styles, can negatively
affect trainings.
11. 11
Types of training data
Bilingual
Parallel
Terminology
Source Only
Target Only
o Core training data: translated content, usually in a translation memory.
This is the content that works best and can be processed the fastest.
o Translated content, but in separate files. This can be used if the content has
been translated exactly, and the format is the same. If for example the
document has extra tables in one language, or has been rewritten
substantially to fit a different market, it is hard to find matching sentences.
o Added to the training data to ensure corporate terms and brands are
translated consistently. This can be a termbase or a simple bilingual word list.
o Representative documents of the content that will be translated. They
are used in initial evaluations of suitability for MT and to test the quality of
the engine. Depending on their size, some 50-100 documents are ideal.
o Representative documents in the translated language. They are used
during the training and contribute to the fluency of the output. To have
an effect, large numbers are needed, several million words are ideal.
12. 12
Goals:
o Enable volume translations
o Migrate content from HT to PE
o Provide accuracy and term
consistency
o Provide productivity increases
Feedback
New MT customization workflow
Utility and / or
Productivity Testing
SDL
Assessment
Client
Request
Engine
Trainings
Auto Eval
Metrics
Data Intake
&
Processing
Blind Human
Evaluation
Deploy
Engine
Method
o Iterative engine trainings, with several
engines created with the best being deployed
o Output matches your style and terminology
o Engines “learn” from your Translation
Memories and terminology
o Work in combination with Baseline language
engines
Post-Editor
Computational
Linguists
14. 14
MT testing approaches
Automated Measures
o Useful to compare competing engines and identify the best engine with a high reliability
o No predictive value for Post-editing productivity but can validate post-editor’s feedback on MT output
o All automated measures have their flaws, but SDL has found a weighted combination of measures that gives
significant insights.
Human - Quality Scoring
o Resources are asked to score the MT output according to instructions, with a focus on understandability.
o Advantage of method: Human evaluation is considered more robust to alternative, but also valid translations.
Note: Human evaluations are prone to subjectivity so you need multiple test subjects. Performing this kind of test is more
expensive and time consuming than an automated approach, but can give an absolute value for one engine, not just a comparison.
Human – Productivity Testing
o Productivity gain for MT is calculated by comparing post-editing speed with conventional translation speed so
evaluators can assess how much value post-editing would add in a production environment.
o Advantage of method: For Post-Editing, results are a good indicator of the suitability of the MT output.
Note: Productivity increase is a difficult factor to predict for all cases and It’s also the most expensive and time consuming
test of the three.
Engineers Developers
MT evaluations should be relevant to your content, from the method of testing
(Automatic vs. Human Evaluation) to the testbed. It should represent true
life scenarios, taking the available Science and applying it commercially.
Computational
Linguists
15. 15
SDL’s custom MT evaluation platform
○ Data is presented to evaluators in a blind test scenario
in order to safeguard validity of results
○ Evaluation speed is recorded per segment
○ Multiple evaluators assess the same set of sentences
○ Each individual performance is compared to ensure
consistency
Additional measures for productivity tests:
○ Productivity increase from HT to PE
○ Translator’s editing actions (insert, copy-paste, pause)
○ Percentage of MT segments that do not require editing
○ Levenshtein edit distance from MT to final translation
1,127
1,510
1,026
1,188
1,123
1,816
1,470 1,414
Client
resource
SDL
resource 1
SDL
resource 2
Total
Speed (WPH)
Human
Baseline
Can evaluate both Sentence level quality & post-edit
productivity gain via a custom testing platform and
ensure the validity of results
3.15 3.04 3.09
3.01 2.92 2.97
0.13 0.12 0.13
evaluator1 evaluator2 Average total
Customization-Baseline: Average scores
Customization
Baseline
Delta
17. 17
Achieving effective post-editing process
Raw output:
Building blocks
are in place
Linguists focus on
refining the output
Terminology &
style are applied
At high volume,
MT can deliver
greater consistency
Trained linguists
certified in MT
post-editing
Post-Editor
18. 18
Post-editing quality guidelines
When post-editing to publishable quality, the following basic principles still apply:
o The same
references must
be used as
for conventional
translation (project-
specific guidelines,
TMs, glossaries,
termbases, etc.)
o Grammar,
spelling and
punctuation
must be correct
o Appropriate
style & correct
terminology
must be used
consistently
o The translation
must read well
and be suitable
for its intended
purpose
Customer
User Guide
19. 19
What is your quality requirement?
Error Category Specific Issue
Translation
($$$)
Publishable PE
($$)
Light PE
($)
Mistranslation Error
Terminology Glossary adherence
Consistency x
Accuracy Omissions/Additions
Language
Grammar x
Spelling x
Punctuation x
Style General Style x
Country
Country Standards x
Register & Tone x
21. 21
Technical
support
Product
development
Product
development
iMT
consultants
Scientific
development
Hotfix
Terms & brands
Python filters to
protect and
transform patterns
Fundamental
problem
Influence long term
scientific strategy
iMT
consultants
Scheduled fix for
future product
release
Analysis of setup,
technical advice
Major tool issue
Minor tool issue
Protected content
translated, wrong
terminology Translation errors
following patterns,
like dates
Expected MT
behaviour
Linguistic
Technical
The effects of post-editor feedback
22. 22
Post-editors identify expected SMT misbehavior
Incorrect
formatting
Additional or
missing words
Words not
localised
Gender, number,
agreement or
verb inflection
issues
Compound
formation issues
Syntax and word
order issues
Wrong
punctuation
Inconsistent or
non-compliant
terminology
Mistranslations
23. 23
Punctuation not following
the specific language rules
Syntax and word order issues
very frequently observed
Inconsistent or wrong terminology
very frequently observed
Examples of unexpected misbehavior
HTML entities instead of the correct
character (i.e. & instead of &)
Words in a language other than
the target
Engineers
Scientists
Post-Editor
Computational
Linguists
25. 25
SDL iMT Group are constantly researching
ways to improve Vertical and Customized
MT Engines
SDL Research Scientists are continuously
improving the Statistical Machine Translation
algorithms (e.g. Language Models, Translation
Models, Reordering Models, Syntax,
Transliteration, Rule-Based Components, etc…)
SDL Data Engineers are
continuously mining large
amounts of good data used
by the statistical algorithms
Continuous improvement
28. 28
Legacy MT systems are static
MT Provider Post-Editor
MT
Engine
xx x xxx
xx xxxxx
xxxx xxx
x x xx x
xxx x xx
PE Edited
xx x xxx
xx xxxxx
xxxx xxx
x x xx x
xxx x xx
MT Output
29. 29
SDL MT innovation – Adaptive MT
○ New technology developed by SDL Research
○ An Adaptive MT engine that learns interactively from
the post-editor’s edits
SDL Adaptive MT Post-Editor
MT
Engine
Adaptive MT
Processor
xx x xxx
xx xxxxx
xxxx xxx x
x xx x xxx
x xx
PE Edited
xx x xxx
xx xxxxx
xxxx xxx
x x xx x
xxx x xx
MT Output
30. 30
Adaptive MT key Features & Benefits
○ Creates a personal
adaptive MT engine
for the user
○ Interactive
o Improves
post-editor’s
productivity
○ Reduces the
frustration of editing
the same incorrect MT
○ Cumulative learning
over time – saved
from job to job
○ No need to wait for a
retrain
31. 31
French
Le service était exceptionnel
Lits très à l'aise
La vue était breathtaking
French Translation
Le service clientèle était exceptionnel
Lits très confortables à l'aise
La vue était à couper le souffle breathtaking
English Document
The customer service was outstanding
Very comfortable beds
The view was breathtaking
French Translation
Le service ____ était excellent
Les lits étaient très à l'aise
Quelle breathtaking vue!
User
Feedback
English Document
The customer service was excellent
The beds were very comfortable
What a breathtaking view!
Before Adaptive MT
Machine
Translation
32. 32
French
Le service était exceptionnel
Lits très à l'aise
La vue était breathtaking
French Translation
Le service clientèle était exceptionnel
Lits très confortables à l'aise
La vue était à couper le souffle breathtaking
English Document
The customer service was outstanding
Very comfortable beds
The view was breathtaking
French Translation
Le service clientèle était excellent
Les lits étaient très confortables
Quelle vue à couper le souffle!
User
Feedback
English Document
The customer service was excellent
The beds were very comfortable
What a breathtaking view!
Machine
Translation
Adaptive MT
Engineers Post-Editor Developers Scientists
Computational
Linguists
With Adaptive MT
34. 34
Focus on Canada’s market challenges
Flavor
requirements
Large retail
projects, no or
small starting
TMs
High
turnover
High quality
requirements
Traditional
offer (SDL
prior to 2014,
Google, Bing)
Mixed
French
flavor
Mixed
domains,
no retail
vertical
Lack of suitable
generic solutions
prevent MTPE
from the start
Lack of flavor &
domain-specific
terminology
increase PE
effort and review
costs
36. 36
SDL’s solution maturity roadmap
Generic
FR-CA
solutions
o Win clients
o Meet deadlines
o Collect project-specific data
Customizations
o Improve productivity
& quality
o Collect more data and
share feedback
Retrainings
o Further
improvement to
productivity and
quality
M A T U R I T Y
37. 37
SDL’s answer to Canada’s market challenges
Flavor
requirements
Large retail
projects, no or
small starting
TMs
High
turnover
High quality
requirements
SDL’s offer
after 2014
Training
material is
handpicked to
ensure correct
flavor
We have grown
retail solutions
to fit current
& new
opportunites
We have a
portfolio of
training material
& success
recipes for a
quick start
Combination of
adapted MT
solutions &
shrewd testing
and feedback
processes
39. 39
How do I get started?
Let’s have a conversation:
What content do you
need translated?
What are your quality
requirements?
What can you use for
a training corpus?
40. 40
Takeaway
o Measure
& improve
1 2 3 4 5
o MT can be
complex, so
choose your
MT provider
wisely
o Document
your quality
requirement
o Integrate MT
within your
larger
localization
infrastructure
o Use trained,
certified
post-editors
Fix technical issues
Get useful hints for retraining the engines
Empower post-editors and make them part of the process
Gather experience and improve MT in the long run through continuous research
Major tool issue:No tags, every second word Spanish, systematic errors, hotfix product management
Minor tool issue: doesn’t hamper everyday work, is registered, goes to product development but can take a bit longer to fix
Building on over 16 years of Machine Translation leadership, SDL has developed the next generation machine translation platform – SDL XMT
While the legacy SDL MT platform was designed as monolithic phrase-based system, SDL XMT has a unique modular design, which allows the rapid development and integration of special-purpose modules that can address specific challenges.
SDL XMT incorporates all previous innovations and algorithms but also enables rapid transition of new technologies and innovations thanks to its unique modular and robust design.
Different translation technologies can be applied to different language pairs, depending on what produces the best translation quality. You will begin to see much higher language quality, particularly for languages such as English to Japanese and Chinese.
If you are not using MT in your Translation process you are missing out on a powerful tool.
User provides feedback by fixing machine translation errors
Machine cannot learn from the feedback
Without Language Learning, Machine repeats the same errors again and again
With Language Learning, the Machine learns in real time, seamlessly, and continuously from the user feedback
The machine improves over time and does not repeat the same mistakes
Your MT provider must have the right combo: Technology, people, process