This was a talk given at the annual GALA conference in Amsterdam on March 27th 2017. The topic is Neural Machine Translation. Where are we now?
Neural Machine Translation is at the peak of a hype cycle. There is no doubt it is an emerging technology with massive potential, but it is not yet a sweeping solution to all ills. Several factors prevent NMT from being commercially ready. Expectations, therefore, need to be managed. That is the goal of this presentation.
3. What we’re actually going to cover this morning!
How does it work?
What’s all the fuss about?
“Neural machine translation is ______.”
What is the status as of today?
Is it really that good?
What does all this mean for the future?
4.
5. What they actually said...
“In some cases human and GNMT translations are nearly
indistinguishable on the relatively simplistic and isolated
sentences sampled from Wikipedia and news articles for this
experiment.”
What was reported...
MT developers
around the world
8. Source: (modified from) http://nlp.stanford.edu/projects/nmt/Luong-Cho-Manning-NMT-ACL2016-v4.pdf
Rule Based
Statistical
Neural
A brief history of MT…
9. “State of the Union”
The initial splash
made by
statistical MT
The initial splash
made by neural
MT
wow that’s
pretty
good!
We’re about here
now
March 27th 2007
This is where the
excitement is
coming from
Statistical
Machine
Translation
MTQuality
Neural
Machine
Translation
20+ years worth of
research
?
15. Still early stage
Language independent
Fundamental practical
considerations not yet
addressed
Neural Machine Translation March 27th 2017
Generic applications only
No flexibility for customisation
Significant hurdles for cost-
effective scalable production
performance
Academia
Industry
Output can be insanely fluent!
18. • “Yeah it looks better”Anecdotal
• Generally, neural is better*
• More obviously so for complex languages
• It falls over badly on long sentences
Academic
• Stark improvements for Chinese and Arabic
• Comparable performance on other
languages
WIPO
What evaluations are out there?
19. WIPO large scale apples-to-apples comparison
English to Chinese
Arabic to Chinese
Spanish to Chinese
French to Chinese
20. • “Yeah it looks better”Anecdotal
• Generally, neural is better*
• More obviously so for complex languages
• It falls over badly on long sentences
Academic
• Stark improvements for Chinese and Arabic
• Comparable performance on other languagesWIPO
• Practical comparison with production MT
• Mixed results depending on content type
• Clear strengths and weaknesses emerging
Iconic
What evaluations are out there?
21. Real-world languages
and content
Chinese to English
patents, mature
production engine,
highly tuned.
“Real-world” comparative use case
Apples to apples
comparison
Access to same
training data, test
data, including all of
the ugly parts.
Effective qualitative
evaluation
No one-size-fits-all, so
what MT good and
what and where does
it fall down?
22. Short
Sentences
All
Sentences
u Iconic Production MT
u Iconic Neural MT
Neural MT works – and it’s good!
It is not a silver bullet
+ word order
+ agreement
- omitting phrases
+ terminology
+ error free output
- sentence structure
23. New Opportunities =
New Challenges
Black Box
Customisation
Production
“Why is this error
happening?”
“Can you fix this
error please?”
“How much is that
GPU??!”
Data
Evaluation
Pricing
Still needed, now
more than ever!
Do we know how to
quantify “quality”?
How much does it
cost now?
Old Challenges
24. Short term
• Research which takes time
• More effective use of general machine translation
2-5 years
• Emerging use cases, new types of hybrid, and clarity
Longer term
• “Zero-shot” translation?
What does this mean for the future?
Rule-based
Statistical
Neural
You are here
What this talk is not! => An intractable, impenetrable technical deep dive into how neural MT works!
Fuss – context
NMT is – significance / impact
Status – not media / meaning for YOU
Good? – examples and case studies
Future – short, mid, longer term
Before we get into it…
Fake news
Embellished, sensationalised, out of context reporting – not doing anyone any favours
Example
Google paper, 23 pages
Not the first time
History of false dawns
I’m going to be a more friendly expectation manager!
Not bursting bubble – just bringing down to earth
Bombastic!
What better way to start
Answer needs historical context
Paradigm shift
Rule based didn’t go away
Neural last 2 years – why the fuss?
SMT became incremental
NMT here out of the box
Excitement – haven’t even had the chance to try all these things!
Hype cycle
Brand new runway
Light and end of the tunnel - EXCITING
Hard to timeframe – will try later
Run in parallel
But definitely the way forward
HOWEVER
still just MT + ML, like SMT
Same UI, same integrations, same problems (later)
But better quality = ?
Not US vs THEM, MAN vs MACHINE
Reframe conversation – competition, frustrating
Complimentary technology
Own use cases
Promising, way forward, needs time…
“Do you do neural machine translation?”
With that being said, where are we today
Early – fringe in 2015
General – “German nouns”
Practical – not yet, glossaries, customisation BUT GOOD people
Generic – no use cases
Flexibility – no customisation
GPUs! revisit
MUCH OF IT IS A MATTER OF TIME BUT THAT’S WHERE WE STAND NOW
Obvious question – depends on many factors, let’s look at evals
Anecdotal – good, surface, needs deeper
Academic – more effective in some cases. Room for improvement = improvement
WIPO – broadly on par but some interesting. LET’S LOOK
Automatic scores
We don’t know why
most highly optimised type of MT
we want to know ourselves!
My first experience, very cool.
What are the practical implications of this? What do we do now? What’s holding us back?
Now we have some direction
Leveraging and utilising it in industry is challenging!
How do we make decisions – where to use it, when, and how! Needs more practical field testing…
“This first wave of NMT solutions are mostly generic systems, which are clearly improved in most language combinations over existing generic SMT solutions, especially to human evaluators. While we need to be wary of over exuberance about the progress, there is reason for optimism and we can expect further quality improvements as our understanding of the mystery of ‘hidden layers’ of deep learning improves”,
“MT must be adaptable/customizable for specific business purposes, i.e. they need to learn specific terminology and specific customer domain. Comprehensive customization will take significantly more computing time and all the requirements for good quality data will only intensify.”