4. How to NMT – The Recipe
Hardware + Software:
GPUs, torch, Theano
nematus, OpenNMT
Know-how, Support
Integration,
Deployment
Training data
31/07/2017 KantanFest, Dublin, Ireland 4
5. How to NMT – KantanNeural™
Hardware + Software:
GPUs, torch, theano
nematus, OpenNMT
Know-how, Support
Integration,
Deployment
Training data
KantanNeural™
31/07/2017 KantanFest, Dublin, Ireland 5
6. KantanNeural™: black board to production
Proof of Concept:
AWS, NVIDIA K520 GPUs
Nematus, ADAM, BPE, SCN
MT (engines) build: 4 weeks
Quality: impressive
01 Nov 2016
31/07/2017 KantanFest, Dublin, Ireland 6
• ADAM: Parameter update algorithm
• Byte-pair encoding (BPE)
• Single-character n-gram (SCN)
lower → low er
tallest → tall est
almost → al most
lowest
taller
allow
7. KantanNeural™ α:
OpenNMT, ADAM, BPE
ΜΤ build time: 4 days
Quality: on a par with nematus
KantanFleet™
01 Nov 2016 01 Feb 2017
KantanNeural™: black board to production
31/07/2017 KantanFest, Dublin, Ireland 7
8. KantanNeural™ β:
Build-your-own NMT
Available to all clients
(no extra charge)
Extended KantanFleet™
01 Nov 2016 01 Feb 2017 15 March 2017
KantanNeural™: black board to production
31/07/2017 KantanFest, Dublin, Ireland 8
9. 01 Nov 2016 01 Feb 2017 15 March 2017
Currently:
Build-your-own NMT
NVIDIA K80 GPUs
AdaptiveMT
Incremental Retraining
4 hours?
30 June 2017
31/07/2017 KantanFest, Dublin, Ireland 9
KantanNeural™: black board to production
12. KantanMT.com – A Complete Platform
Build Improve Deploy
Select a KantanFleet™ engine
KantanFleet™ Neural (18 language
pairs)
Multiple domains
Create new NMT engine
Import library data
Import your own data
Convert an SMT profile:
… just two clicks away from NMT
31/07/2017 KantanFest, Dublin, Ireland 12
13. KantanMT.com – A Complete Platform
Build Improve Deploy
Select a KantanFleet™ engine
31/07/2017 KantanFest, Dublin, Ireland 13
14. KantanMT.com – A Complete Platform
Build Improve Deploy
Create a blank KantanNeural™
engine
31/07/2017 KantanFest, Dublin, Ireland 14
15. KantanMT.com – A Complete Platform
Build Improve Deploy
Convert a PBSMT engine into KantanNeural™
engine
31/07/2017 KantanFest, Dublin, Ireland 15
16. KantanMT.com – A Complete Platform
Build Improve Deploy
31/07/2017 KantanFest, Dublin, Ireland 16
Artificial Neural Networks train iteratively:
While stopping condition not met:
While training data not exhausted:
Take a batch
Learn from it
Repeat
17. KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corpora
Preprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TER
KantanLQR
(Error typology, AB Testing)
New Preprocessing rules
New data
Augment data Augment data Aug
31/07/2017 KantanFest, Dublin, Ireland 17
18. KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corpora
Preprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TER
KantanLQR
(Error typology, AB Testing)
New Preprocessing rules
New data
Augment data Augment data Aug
31/07/2017 KantanFest, Dublin, Ireland 18
19. KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corpora
Preprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TER
KantanLQR
(Error typology, AB Testing)
New Preprocessing rules
New data
Augment data Augment data Aug
31/07/2017 KantanFest, Dublin, Ireland 19
4 hours?
20. KantanMT.com – A Complete Platform
Build Improve Deploy
API
Connectors
KantanWidgets™
As every other
KantanMT engine
31/07/2017 KantanFest, Dublin, Ireland 20
21. Conclusions…
KantanMT:
A complete MT platform for both NMT and PBSMT engines
Easy access to powerful MT technology
How to train, improve and deploy KantanNeural™ engines
Seamless switch from PBSMT to NMT
Incremental retraining to improve, adapt and specialize engines
22. Conclusions…
KantanMT:
A complete MT platform for both NMT and PBSMT engines
Easy access to powerful MT technology
How to train, improve and deploy KantanNeural™ engines
Seamless switch from PBSMT to NMT
Incremental retraining to improve, adapt and specialize engines
4 hours training?
23. … and future work
Better control:
Terminology
Tags
NTAs
Learn from postedits:
Exploit feedback from KantanLQR™
Exploit feedback from connectors
Models:
Add language knowledge
Hybrid MT
Convolutional Neural Networks (CNN)
…
A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.