SlideShare a Scribd company logo
1 of 22
Download to read offline
Learning to Translate with
Joey NMT
PyData Meetup Montreal
Julia Kreutzer
Feb 25, 2021
Today
1. Neural Machine Translation 101
a. Translation as a ML problem
b. Transformer model
c. The role of data
2. Joey NMT
a. Features and purpose
b. Demo
c. Use cases
3. Q & A
Assuming basic ML knowledge, familiarity with neural networks.
What's the technology behind modern
machine translation?
How can you get started?
Why open-sourcing?
Why another toolkit?
[Optional] Demo Preparation
If you want to train your own translation model during this presentation:
1. Open joey_demo.ipynb on Colab.
2. Create a copy.
3. Select GPU runtime: Runtime -> Change runtime type -> Hardware accelerator: GPU
4. Run all cells: Runtime -> Run all
5. Come back to the talk :)
We'll inspect later what's happening there.
Neural Machine Translation 101
Translation as a ML Problem
Challenges
➢ Unlimited length
➢ Structural dependencies
➢ Unseen words
➢ Figurative language
Seq2Seq
➢ Modeling sentences (mostly)
➢ Connections between all words
➢ Sub-word modeling
➢ A lot of training data
Input: What is a poutine ?
Output: Qu'est-ce qu'une poutine ?
The
Transformer
"Attention is all you need"
Vaswani et al. 2017
Decoder
Specialties
Source: Vaswani et al. 2017
Training vs. Inference
Conditional language modeling: Predict the next token yt
:
● given source X and all previous tokens of the reference during training.
● given source X and previously predicted tokens during inference.
Training with MLE, inference with greedy or beam search.
Beam Search
Source: G. Neubig's course on MT and Seq2Seq
Keep the k most likely
prediction sequences in
each step.
➢ more expensive
than greedy
➢ more exact
Implementation on
mini-batches is tricky!
k=2
Words?
Pre-processing plays a huge role in NMT.
qu'est-ce qu'une poutine ? 4 tokens, 4 types
vs
qu ' est - ce qu ' une poutine ? 10 tokens, 8 types
➢ Sub-words instead of words: frequency-based automatic segmentation.
➢ Algorithms: BPE, unigram LM.
➢ Implementations: subword-nmt, SentencePiece.
The Role of Data
A "base"-sized Transformer has ~65M weights. How much data does it need?
➢ It depends!
➢ “As much as you can find" heuristic
➢ Beyond parallel data
○ unsupervised NMT
○ data augmentation
○ dictionaries
○ pre-trained embeddings
○ multilingual modeling
How similar are source and target language?
What kind of quality are you expecting?
How complex is the text?
Evaluation
Input: What is a poutine ?
Reference: Qu'est-ce qu'une poutine ?
Outputs:
1. Est-ce qu'une poutine ?
2. Que-ce une poutine ?
3. Qu'une poutine ?
4. Qu'est-ce qu'un poutin ?
5. C'est qu'une poutine .
How should these outputs be ranked / scored?
Evaluation
Input: What is a poutine ?
Reference: Qu'est-ce qu'une poutine ?
Outputs:
1. Est-ce qu'une poutine ? 59.5 82.8
2. Que-ce une poutine ? 32.0 51.4
3. Qu'une poutine ? 39.4 58.3
4. Qu'est-ce qu'un poutin ? 19.0 74.4
5. C'est qu'une poutine . 32.0 60.8
BLEU: geometric average of
token n-gram precisions,
brevity penalty
ChrF: character
n-gram F-score
Joey NMT
Joint work with Jasmijn Bastings, Mayumi Ohta and Joey NMT contributors
Problem
+ A lot of code for NMT is online.
+ Free compute through Colab.
+ Data is freely available. Is it clean?
How long would I have to study it?
Are all features documented?
How can I run it on Colab?
How do I need to prepare data to use it?
Does that mean it's accessible?
Solution
Joey NMT: clean, minimalist, documented.
➢ Much smaller than other toolkits
➢ Covers core features
➢ User study on usability
➢ The core API changes very little.
➢ Examples, pre-trained models, tutorials, FAQ
➢ Based on PyTorch
Does not do
everything,
does not grow
much.
Features
You can:
● train a RNN/Transformer model
● on CPU, one or multiple GPUs
● monitor the training process
● configure hyperparameters
● store it, load it, test it
And more:
● follow training recipes
● modify the code easily
● get inspiration from other extensions
● share/load pre-trained models
It's cute, but can it compete?
Quality?
➢ Comparable to other toolkits.
Adoption?
➢ Not as popular.
Innovation?
➢ More and more research.
It's cute, but can it compete?
Quality?
➢ Comparable to other toolkits.
Adoption?
➢ Not as popular.
Innovation?
➢ More and more research.
It might not be the best choice for
➢ exact replication of another paper
-> use their code instead
➢ non-seq2seq applications
➢ performance-critical applications
(not optimized for it)
➢ loading BERT (not implemented)
Demo
Cool stuff feat. Joey NMT
Grassroots research communities
➢ Masakhane: NLP for African languages
➢ Turkic Interlingua: NLP for Turkic languages
Extensions
➢ Reinforcement learning
➢ Sign language translation
➢ Speech translation
➢ Image captioning
➢ Slack bot
More on this list.
Material
➢ Neural networks in NLP
○ Y. Goldberg: A Primer on Neural Network Models for Natural Language Processing
○ G. Neubig: CMU CS 11-747: Neural Networks for NLP
➢ Neural Machine translation
○ P. Koehn: Neural Machine Translation (Draft Chapter of the Statistical MT book)
○ G. Neubig: Tutorial on Neural Machine Translation
○ A. Rush: The Annotated Transformer
○ J. Bastings: The Annotated Encoder-Decoder
○ M. Müller: Seven Recommendations for MT Evaluation
➢ Joey NMT
○ Joey NMT paper
○ Joey NMT tutorial
○ Masakhane notebooks and YouTube tutorial
○ Turkic Interlingua YouTube tutorial
Thank you!
jkreutzer@google.com
Twitter: @KreutzerJulia
Q & A

More Related Content

What's hot

Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkGauravPandey319
 
Flower Classification Using Neural Network Based Image Processing
Flower Classification Using Neural Network Based Image ProcessingFlower Classification Using Neural Network Based Image Processing
Flower Classification Using Neural Network Based Image ProcessingIOSR Journals
 
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS ijgca
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Yann le cun
Yann le cunYann le cun
Yann le cunYandex
 
Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)spartacus131211
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural networkNagarajan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingsaurabhnarhe
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101AMIT KUMAR
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognitionYUNG-KUEI CHEN
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 

What's hot (20)

Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Flower Classification Using Neural Network Based Image Processing
Flower Classification Using Neural Network Based Image ProcessingFlower Classification Using Neural Network Based Image Processing
Flower Classification Using Neural Network Based Image Processing
 
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Final PPT.pptx
Final PPT.pptxFinal PPT.pptx
Final PPT.pptx
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 
Back propagation
Back propagationBack propagation
Back propagation
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 

Similar to Learning to Translate with Joey NMT

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLPVijay Ganti
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...Edge AI and Vision Alliance
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java codeAttila Balazs
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingsaber tabatabaee
 
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Igalia
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsChris Rackauckas
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Abhishek Thakur
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
The *on-going* future of Perl5
The *on-going* future of Perl5The *on-going* future of Perl5
The *on-going* future of Perl5Vytautas Dauksa
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
Programming languages and concepts by vivek parihar
Programming languages and concepts by vivek pariharProgramming languages and concepts by vivek parihar
Programming languages and concepts by vivek pariharVivek Parihar
 
7 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM20157 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM2015Francesco Degrassi
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman
 

Similar to Learning to Translate with Joey NMT (20)

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Go fundamentals
Go fundamentalsGo fundamentals
Go fundamentals
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancoding
 
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia Ecosystems
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
The *on-going* future of Perl5
The *on-going* future of Perl5The *on-going* future of Perl5
The *on-going* future of Perl5
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
Programming languages and concepts by vivek parihar
Programming languages and concepts by vivek pariharProgramming languages and concepts by vivek parihar
Programming languages and concepts by vivek parihar
 
7 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM20157 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM2015
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 

Recently uploaded

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Learning to Translate with Joey NMT

  • 1. Learning to Translate with Joey NMT PyData Meetup Montreal Julia Kreutzer Feb 25, 2021
  • 2. Today 1. Neural Machine Translation 101 a. Translation as a ML problem b. Transformer model c. The role of data 2. Joey NMT a. Features and purpose b. Demo c. Use cases 3. Q & A Assuming basic ML knowledge, familiarity with neural networks. What's the technology behind modern machine translation? How can you get started? Why open-sourcing? Why another toolkit?
  • 3. [Optional] Demo Preparation If you want to train your own translation model during this presentation: 1. Open joey_demo.ipynb on Colab. 2. Create a copy. 3. Select GPU runtime: Runtime -> Change runtime type -> Hardware accelerator: GPU 4. Run all cells: Runtime -> Run all 5. Come back to the talk :) We'll inspect later what's happening there.
  • 5. Translation as a ML Problem Challenges ➢ Unlimited length ➢ Structural dependencies ➢ Unseen words ➢ Figurative language Seq2Seq ➢ Modeling sentences (mostly) ➢ Connections between all words ➢ Sub-word modeling ➢ A lot of training data Input: What is a poutine ? Output: Qu'est-ce qu'une poutine ?
  • 6. The Transformer "Attention is all you need" Vaswani et al. 2017 Decoder Specialties Source: Vaswani et al. 2017
  • 7. Training vs. Inference Conditional language modeling: Predict the next token yt : ● given source X and all previous tokens of the reference during training. ● given source X and previously predicted tokens during inference. Training with MLE, inference with greedy or beam search.
  • 8. Beam Search Source: G. Neubig's course on MT and Seq2Seq Keep the k most likely prediction sequences in each step. ➢ more expensive than greedy ➢ more exact Implementation on mini-batches is tricky! k=2
  • 9. Words? Pre-processing plays a huge role in NMT. qu'est-ce qu'une poutine ? 4 tokens, 4 types vs qu ' est - ce qu ' une poutine ? 10 tokens, 8 types ➢ Sub-words instead of words: frequency-based automatic segmentation. ➢ Algorithms: BPE, unigram LM. ➢ Implementations: subword-nmt, SentencePiece.
  • 10. The Role of Data A "base"-sized Transformer has ~65M weights. How much data does it need? ➢ It depends! ➢ “As much as you can find" heuristic ➢ Beyond parallel data ○ unsupervised NMT ○ data augmentation ○ dictionaries ○ pre-trained embeddings ○ multilingual modeling How similar are source and target language? What kind of quality are you expecting? How complex is the text?
  • 11. Evaluation Input: What is a poutine ? Reference: Qu'est-ce qu'une poutine ? Outputs: 1. Est-ce qu'une poutine ? 2. Que-ce une poutine ? 3. Qu'une poutine ? 4. Qu'est-ce qu'un poutin ? 5. C'est qu'une poutine . How should these outputs be ranked / scored?
  • 12. Evaluation Input: What is a poutine ? Reference: Qu'est-ce qu'une poutine ? Outputs: 1. Est-ce qu'une poutine ? 59.5 82.8 2. Que-ce une poutine ? 32.0 51.4 3. Qu'une poutine ? 39.4 58.3 4. Qu'est-ce qu'un poutin ? 19.0 74.4 5. C'est qu'une poutine . 32.0 60.8 BLEU: geometric average of token n-gram precisions, brevity penalty ChrF: character n-gram F-score
  • 13. Joey NMT Joint work with Jasmijn Bastings, Mayumi Ohta and Joey NMT contributors
  • 14. Problem + A lot of code for NMT is online. + Free compute through Colab. + Data is freely available. Is it clean? How long would I have to study it? Are all features documented? How can I run it on Colab? How do I need to prepare data to use it? Does that mean it's accessible?
  • 15. Solution Joey NMT: clean, minimalist, documented. ➢ Much smaller than other toolkits ➢ Covers core features ➢ User study on usability ➢ The core API changes very little. ➢ Examples, pre-trained models, tutorials, FAQ ➢ Based on PyTorch Does not do everything, does not grow much.
  • 16. Features You can: ● train a RNN/Transformer model ● on CPU, one or multiple GPUs ● monitor the training process ● configure hyperparameters ● store it, load it, test it And more: ● follow training recipes ● modify the code easily ● get inspiration from other extensions ● share/load pre-trained models
  • 17. It's cute, but can it compete? Quality? ➢ Comparable to other toolkits. Adoption? ➢ Not as popular. Innovation? ➢ More and more research.
  • 18. It's cute, but can it compete? Quality? ➢ Comparable to other toolkits. Adoption? ➢ Not as popular. Innovation? ➢ More and more research. It might not be the best choice for ➢ exact replication of another paper -> use their code instead ➢ non-seq2seq applications ➢ performance-critical applications (not optimized for it) ➢ loading BERT (not implemented)
  • 19. Demo
  • 20. Cool stuff feat. Joey NMT Grassroots research communities ➢ Masakhane: NLP for African languages ➢ Turkic Interlingua: NLP for Turkic languages Extensions ➢ Reinforcement learning ➢ Sign language translation ➢ Speech translation ➢ Image captioning ➢ Slack bot More on this list.
  • 21. Material ➢ Neural networks in NLP ○ Y. Goldberg: A Primer on Neural Network Models for Natural Language Processing ○ G. Neubig: CMU CS 11-747: Neural Networks for NLP ➢ Neural Machine translation ○ P. Koehn: Neural Machine Translation (Draft Chapter of the Statistical MT book) ○ G. Neubig: Tutorial on Neural Machine Translation ○ A. Rush: The Annotated Transformer ○ J. Bastings: The Annotated Encoder-Decoder ○ M. Müller: Seven Recommendations for MT Evaluation ➢ Joey NMT ○ Joey NMT paper ○ Joey NMT tutorial ○ Masakhane notebooks and YouTube tutorial ○ Turkic Interlingua YouTube tutorial