SlideShare ist ein Scribd-Unternehmen logo
1 von 24
KantanNeural™ from A to Z
3/3: NMT in 4 weeks → 4 days → 4 hours
Dimitar Shterionov
What is NMT?
31/07/2017 KantanFest, Dublin, Ireland 2
What is NMT?
31/07/2017 KantanFest, Dublin, Ireland 3
x1 x2 x3 c y1 y2 y3
How to NMT – The Recipe
Hardware + Software:
GPUs, torch, Theano
nematus, OpenNMT
Know-how, Support
Integration,
Deployment
Training data
31/07/2017 KantanFest, Dublin, Ireland 4
How to NMT – KantanNeural™
Hardware + Software:
GPUs, torch, theano
nematus, OpenNMT
Know-how, Support
Integration,
Deployment
Training data
KantanNeural™
31/07/2017 KantanFest, Dublin, Ireland 5
KantanNeural™: black board to production
 Proof of Concept:
AWS, NVIDIA K520 GPUs
Nematus, ADAM, BPE, SCN
MT (engines) build: 4 weeks
Quality: impressive
01 Nov 2016
31/07/2017 KantanFest, Dublin, Ireland 6
• ADAM: Parameter update algorithm
• Byte-pair encoding (BPE)
• Single-character n-gram (SCN)
lower → low er
tallest → tall est
almost → al most
lowest
taller
allow
 KantanNeural™ α:
OpenNMT, ADAM, BPE
ΜΤ build time: 4 days
Quality: on a par with nematus
KantanFleet™
01 Nov 2016 01 Feb 2017
KantanNeural™: black board to production
31/07/2017 KantanFest, Dublin, Ireland 7
 KantanNeural™ β:
Build-your-own NMT
Available to all clients
(no extra charge)
Extended KantanFleet™
01 Nov 2016 01 Feb 2017 15 March 2017
KantanNeural™: black board to production
31/07/2017 KantanFest, Dublin, Ireland 8
01 Nov 2016 01 Feb 2017 15 March 2017
 Currently:
Build-your-own NMT
NVIDIA K80 GPUs
AdaptiveMT
Incremental Retraining
4 hours?
30 June 2017
31/07/2017 KantanFest, Dublin, Ireland 9
KantanNeural™: black board to production
KantanMT.com – A Complete Platform
Build
Improve
Deploy
Kantan
Templates
Kantan
NER
Kantan
Llibrary
Kantan
Fleet
Kantan
BuildAnalytics
Kantan
Analytics
Kantan
PEX
Kantan
LQR
Adaptive
MT
Kantan
GENTRY
Kantan
TotalRecall
KantanNeural™
Kantan
Translate
Kantan
Swift
Kantan
API
Kantan
AutoScale
Kantan
OfficeMT
Kantan
Connectors
Kantan
Snippets
KantanNeural™
31/07/2017 KantanFest, Dublin, Ireland 10
KantanMT.com – A Complete Platform
Build Improve Deploy
31/07/2017 KantanFest, Dublin, Ireland 11
KantanMT.com – A Complete Platform
Build Improve Deploy
 Select a KantanFleet™ engine
 KantanFleet™ Neural (18 language
pairs)
 Multiple domains
 Create new NMT engine
 Import library data
 Import your own data
 Convert an SMT profile:
… just two clicks away from NMT
31/07/2017 KantanFest, Dublin, Ireland 12
KantanMT.com – A Complete Platform
Build Improve Deploy
 Select a KantanFleet™ engine
31/07/2017 KantanFest, Dublin, Ireland 13
KantanMT.com – A Complete Platform
Build Improve Deploy
 Create a blank KantanNeural™
engine
31/07/2017 KantanFest, Dublin, Ireland 14
KantanMT.com – A Complete Platform
Build Improve Deploy
 Convert a PBSMT engine into KantanNeural™
engine
31/07/2017 KantanFest, Dublin, Ireland 15
KantanMT.com – A Complete Platform
Build Improve Deploy
31/07/2017 KantanFest, Dublin, Ireland 16
Artificial Neural Networks train iteratively:
While stopping condition not met:
While training data not exhausted:
Take a batch
Learn from it
Repeat
KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corpora
Preprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TER
KantanLQR
(Error typology, AB Testing)
New Preprocessing rules
New data
Augment data Augment data Aug
31/07/2017 KantanFest, Dublin, Ireland 17
KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corpora
Preprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TER
KantanLQR
(Error typology, AB Testing)
New Preprocessing rules
New data
Augment data Augment data Aug
31/07/2017 KantanFest, Dublin, Ireland 18
KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corpora
Preprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TER
KantanLQR
(Error typology, AB Testing)
New Preprocessing rules
New data
Augment data Augment data Aug
31/07/2017 KantanFest, Dublin, Ireland 19
4 hours?
KantanMT.com – A Complete Platform
Build Improve Deploy
 API
 Connectors
 KantanWidgets™
 As every other
KantanMT engine
31/07/2017 KantanFest, Dublin, Ireland 20
Conclusions…
 KantanMT:
 A complete MT platform for both NMT and PBSMT engines
 Easy access to powerful MT technology
 How to train, improve and deploy KantanNeural™ engines
 Seamless switch from PBSMT to NMT
 Incremental retraining to improve, adapt and specialize engines
Conclusions…
 KantanMT:
 A complete MT platform for both NMT and PBSMT engines
 Easy access to powerful MT technology
 How to train, improve and deploy KantanNeural™ engines
 Seamless switch from PBSMT to NMT
 Incremental retraining to improve, adapt and specialize engines
4 hours training?
… and future work
 Better control:
 Terminology
 Tags
 NTAs
 Learn from postedits:
 Exploit feedback from KantanLQR™
 Exploit feedback from connectors
 Models:
 Add language knowledge
 Hybrid MT
 Convolutional Neural Networks (CNN)
 …
Solving
Thank you…
Laura Casanellas: laurac@kantanmt.com
Dimitar Shterionov: dimitars@kantanmt.com
KantanLabs: labs@kantanmt.com
KantanMT: info@kantanmt.com

Weitere ähnliche Inhalte

Was ist angesagt?

OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017NVIDIA
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Kevin Xu
 
A True Story About Database Orchestration
A True Story About Database OrchestrationA True Story About Database Orchestration
A True Story About Database OrchestrationInfluxData
 
Kapacitor Stream Processing
Kapacitor Stream ProcessingKapacitor Stream Processing
Kapacitor Stream ProcessingInfluxData
 
Pitfalls of Migrating to SharePoint 2010
Pitfalls of Migrating to SharePoint 2010Pitfalls of Migrating to SharePoint 2010
Pitfalls of Migrating to SharePoint 2010Dan Usher
 
SC5 Hangout2 pilot 1 description
SC5 Hangout2  pilot 1 descriptionSC5 Hangout2  pilot 1 description
SC5 Hangout2 pilot 1 descriptionBigData_Europe
 
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...InfluxData
 
Big Data with Neo4j
Big Data with Neo4jBig Data with Neo4j
Big Data with Neo4jNeo4j
 
A TRUE STORY ABOUT DATABASE ORCHESTRATION
A TRUE STORY ABOUT DATABASE ORCHESTRATIONA TRUE STORY ABOUT DATABASE ORCHESTRATION
A TRUE STORY ABOUT DATABASE ORCHESTRATIONInfluxData
 

Was ist angesagt? (14)

Prez Test
Prez TestPrez Test
Prez Test
 
OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017
 
Keep Calm and Distributed Tracing
Keep Calm and Distributed TracingKeep Calm and Distributed Tracing
Keep Calm and Distributed Tracing
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
A True Story About Database Orchestration
A True Story About Database OrchestrationA True Story About Database Orchestration
A True Story About Database Orchestration
 
Компактное хранение компонентов в компании Liebherr
Компактное хранение компонентов в компании LiebherrКомпактное хранение компонентов в компании Liebherr
Компактное хранение компонентов в компании Liebherr
 
AI4quant 2018 PIXNET Hackathon
AI4quant 2018 PIXNET HackathonAI4quant 2018 PIXNET Hackathon
AI4quant 2018 PIXNET Hackathon
 
Kapacitor Stream Processing
Kapacitor Stream ProcessingKapacitor Stream Processing
Kapacitor Stream Processing
 
Pitfalls of Migrating to SharePoint 2010
Pitfalls of Migrating to SharePoint 2010Pitfalls of Migrating to SharePoint 2010
Pitfalls of Migrating to SharePoint 2010
 
SC5 Hangout2 pilot 1 description
SC5 Hangout2  pilot 1 descriptionSC5 Hangout2  pilot 1 description
SC5 Hangout2 pilot 1 description
 
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Big Data with Neo4j
Big Data with Neo4jBig Data with Neo4j
Big Data with Neo4j
 
A TRUE STORY ABOUT DATABASE ORCHESTRATION
A TRUE STORY ABOUT DATABASE ORCHESTRATIONA TRUE STORY ABOUT DATABASE ORCHESTRATION
A TRUE STORY ABOUT DATABASE ORCHESTRATION
 

Ähnlich wie Kantanfest: Dimitar Shterionov - Part 2

TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMT
TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMTTAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMT
TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMTTAUS - The Language Data Network
 
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data AnalyticsOpen Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analyticsinside-BigData.com
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu
 
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)Konstantin Savenkov
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningSergey Karayev
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015DataKitchen
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)Blake Irvine
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowdkantanmt
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivitykantanmt
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Igor José F. Freitas
 
Inflectra 2017 Boston, MA, USA User Summit
Inflectra 2017 Boston, MA, USA User SummitInflectra 2017 Boston, MA, USA User Summit
Inflectra 2017 Boston, MA, USA User SummitAdam Sandman
 
Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Neo4j
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016Matthew Broberg
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningSigOpt
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 

Ähnlich wie Kantanfest: Dimitar Shterionov - Part 2 (20)

TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMT
TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMTTAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMT
TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMT
 
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data AnalyticsOpen Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
 
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Sql 2016 2017 full
Sql 2016   2017 fullSql 2016   2017 full
Sql 2016 2017 full
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
Sql 2017 net raf
Sql 2017  net rafSql 2017  net raf
Sql 2017 net raf
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowd
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
 
Inflectra 2017 Boston, MA, USA User Summit
Inflectra 2017 Boston, MA, USA User SummitInflectra 2017 Boston, MA, USA User Summit
Inflectra 2017 Boston, MA, USA User Summit
 
Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 

Mehr von kantanmt

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskaskantanmt
 
Kantanfest: Laura Casanellas
Kantanfest: Laura CasanellasKantanfest: Laura Casanellas
Kantanfest: Laura Casanellaskantanmt
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1kantanmt
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeuralkantanmt
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answerkantanmt
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systemskantanmt
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translationkantanmt
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...kantanmt
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016kantanmt
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translationkantanmt
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translationkantanmt
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...kantanmt
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up businesskantanmt
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?kantanmt
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translationkantanmt
 
EAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTEAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTkantanmt
 
Breaking Language Barriers: Machine Translation for eCommerce
Breaking Language Barriers: Machine Translation for eCommerceBreaking Language Barriers: Machine Translation for eCommerce
Breaking Language Barriers: Machine Translation for eCommercekantanmt
 
Cloud Computing: IC4 Cloud On-Boarding Clinic, DCU
Cloud Computing: IC4 Cloud On-Boarding Clinic, DCUCloud Computing: IC4 Cloud On-Boarding Clinic, DCU
Cloud Computing: IC4 Cloud On-Boarding Clinic, DCUkantanmt
 

Mehr von kantanmt (20)

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskas
 
Kantanfest: Laura Casanellas
Kantanfest: Laura CasanellasKantanfest: Laura Casanellas
Kantanfest: Laura Casanellas
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeural
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answer
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translation
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translation
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translation
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up business
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translation
 
EAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTEAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMT
 
Breaking Language Barriers: Machine Translation for eCommerce
Breaking Language Barriers: Machine Translation for eCommerceBreaking Language Barriers: Machine Translation for eCommerce
Breaking Language Barriers: Machine Translation for eCommerce
 
Cloud Computing: IC4 Cloud On-Boarding Clinic, DCU
Cloud Computing: IC4 Cloud On-Boarding Clinic, DCUCloud Computing: IC4 Cloud On-Boarding Clinic, DCU
Cloud Computing: IC4 Cloud On-Boarding Clinic, DCU
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Kantanfest: Dimitar Shterionov - Part 2

  • 1. KantanNeural™ from A to Z 3/3: NMT in 4 weeks → 4 days → 4 hours Dimitar Shterionov
  • 2. What is NMT? 31/07/2017 KantanFest, Dublin, Ireland 2
  • 3. What is NMT? 31/07/2017 KantanFest, Dublin, Ireland 3 x1 x2 x3 c y1 y2 y3
  • 4. How to NMT – The Recipe Hardware + Software: GPUs, torch, Theano nematus, OpenNMT Know-how, Support Integration, Deployment Training data 31/07/2017 KantanFest, Dublin, Ireland 4
  • 5. How to NMT – KantanNeural™ Hardware + Software: GPUs, torch, theano nematus, OpenNMT Know-how, Support Integration, Deployment Training data KantanNeural™ 31/07/2017 KantanFest, Dublin, Ireland 5
  • 6. KantanNeural™: black board to production  Proof of Concept: AWS, NVIDIA K520 GPUs Nematus, ADAM, BPE, SCN MT (engines) build: 4 weeks Quality: impressive 01 Nov 2016 31/07/2017 KantanFest, Dublin, Ireland 6 • ADAM: Parameter update algorithm • Byte-pair encoding (BPE) • Single-character n-gram (SCN) lower → low er tallest → tall est almost → al most lowest taller allow
  • 7.  KantanNeural™ α: OpenNMT, ADAM, BPE ΜΤ build time: 4 days Quality: on a par with nematus KantanFleet™ 01 Nov 2016 01 Feb 2017 KantanNeural™: black board to production 31/07/2017 KantanFest, Dublin, Ireland 7
  • 8.  KantanNeural™ β: Build-your-own NMT Available to all clients (no extra charge) Extended KantanFleet™ 01 Nov 2016 01 Feb 2017 15 March 2017 KantanNeural™: black board to production 31/07/2017 KantanFest, Dublin, Ireland 8
  • 9. 01 Nov 2016 01 Feb 2017 15 March 2017  Currently: Build-your-own NMT NVIDIA K80 GPUs AdaptiveMT Incremental Retraining 4 hours? 30 June 2017 31/07/2017 KantanFest, Dublin, Ireland 9 KantanNeural™: black board to production
  • 10. KantanMT.com – A Complete Platform Build Improve Deploy Kantan Templates Kantan NER Kantan Llibrary Kantan Fleet Kantan BuildAnalytics Kantan Analytics Kantan PEX Kantan LQR Adaptive MT Kantan GENTRY Kantan TotalRecall KantanNeural™ Kantan Translate Kantan Swift Kantan API Kantan AutoScale Kantan OfficeMT Kantan Connectors Kantan Snippets KantanNeural™ 31/07/2017 KantanFest, Dublin, Ireland 10
  • 11. KantanMT.com – A Complete Platform Build Improve Deploy 31/07/2017 KantanFest, Dublin, Ireland 11
  • 12. KantanMT.com – A Complete Platform Build Improve Deploy  Select a KantanFleet™ engine  KantanFleet™ Neural (18 language pairs)  Multiple domains  Create new NMT engine  Import library data  Import your own data  Convert an SMT profile: … just two clicks away from NMT 31/07/2017 KantanFest, Dublin, Ireland 12
  • 13. KantanMT.com – A Complete Platform Build Improve Deploy  Select a KantanFleet™ engine 31/07/2017 KantanFest, Dublin, Ireland 13
  • 14. KantanMT.com – A Complete Platform Build Improve Deploy  Create a blank KantanNeural™ engine 31/07/2017 KantanFest, Dublin, Ireland 14
  • 15. KantanMT.com – A Complete Platform Build Improve Deploy  Convert a PBSMT engine into KantanNeural™ engine 31/07/2017 KantanFest, Dublin, Ireland 15
  • 16. KantanMT.com – A Complete Platform Build Improve Deploy 31/07/2017 KantanFest, Dublin, Ireland 16 Artificial Neural Networks train iteratively: While stopping condition not met: While training data not exhausted: Take a batch Learn from it Repeat
  • 17. KantanMT.com – A Complete Platform Build Improve Deploy Augment data Parallel corpora Preprocessing rules (PEX, tokeniser excep., etc.) F-Measure, BLEU, TER KantanLQR (Error typology, AB Testing) New Preprocessing rules New data Augment data Augment data Aug 31/07/2017 KantanFest, Dublin, Ireland 17
  • 18. KantanMT.com – A Complete Platform Build Improve Deploy Augment data Parallel corpora Preprocessing rules (PEX, tokeniser excep., etc.) F-Measure, BLEU, TER KantanLQR (Error typology, AB Testing) New Preprocessing rules New data Augment data Augment data Aug 31/07/2017 KantanFest, Dublin, Ireland 18
  • 19. KantanMT.com – A Complete Platform Build Improve Deploy Augment data Parallel corpora Preprocessing rules (PEX, tokeniser excep., etc.) F-Measure, BLEU, TER KantanLQR (Error typology, AB Testing) New Preprocessing rules New data Augment data Augment data Aug 31/07/2017 KantanFest, Dublin, Ireland 19 4 hours?
  • 20. KantanMT.com – A Complete Platform Build Improve Deploy  API  Connectors  KantanWidgets™  As every other KantanMT engine 31/07/2017 KantanFest, Dublin, Ireland 20
  • 21. Conclusions…  KantanMT:  A complete MT platform for both NMT and PBSMT engines  Easy access to powerful MT technology  How to train, improve and deploy KantanNeural™ engines  Seamless switch from PBSMT to NMT  Incremental retraining to improve, adapt and specialize engines
  • 22. Conclusions…  KantanMT:  A complete MT platform for both NMT and PBSMT engines  Easy access to powerful MT technology  How to train, improve and deploy KantanNeural™ engines  Seamless switch from PBSMT to NMT  Incremental retraining to improve, adapt and specialize engines 4 hours training?
  • 23. … and future work  Better control:  Terminology  Tags  NTAs  Learn from postedits:  Exploit feedback from KantanLQR™  Exploit feedback from connectors  Models:  Add language knowledge  Hybrid MT  Convolutional Neural Networks (CNN)  …
  • 24. Solving Thank you… Laura Casanellas: laurac@kantanmt.com Dimitar Shterionov: dimitars@kantanmt.com KantanLabs: labs@kantanmt.com KantanMT: info@kantanmt.com

Hinweis der Redaktion

  1. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  2. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  3. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.