SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
NLP using Transformer models
How translation works
Questions to ponder
What is Deep learning for NLP
How Machine translation (NMT) works
How Google translate drastically improved after 2017
Why Transformer (gave rise to BERT, GPT, XLNet)
Deep learning for NLP
Neural Machine Translation
Major improvement from the earlier SMT (Statistical MT)
Transformer model introduced in 2017 revolutionized and became SOTA
Models
1. Neural Networks
2. Recurrent Neural Networks
3. Encoder-Decoder
4. Attention Models
5. Transformer
Models
1. Neural Networks
2. Recurrent Neural Networks
3. Encoder-Decoder
4. Attention Models
5. Transformer
Models with Applications
1. Neural Networks - Word representation (word2vec)
2. Recurrent Neural Networks - Language Generation
3. Encoder-Decoder - Translation
4. Attention Models - Translation, Image recognition
5. Transformer - Translation
Model 1
Neural Networks
Neural Network
Neural Networks act as a ‘black box’ that takes inputs and predicts an output.
Neural Network
Neural Network Training
Neural Network
Neural Network
‘Black box’ that takes inputs and predicts an output.
Trained using known (input,output) , approximates the function and maps new inputs
Learns the function mapping inputs to output by adjusting the internal parameters (weights)
Word2vec using neural networks
Word2vec using neural networks
Word2vec using neural networks
From words to sentences
Word embedding captures meaning of words
How about sentences?
Can we encode meaning of sentences
You can't cram the meaning of the entire $%!!* sentence into a !!$!$ vector
Sentences are variable length
Need fixed length representation
Model 2
RNN
RNN
RNNs are used when the inputs have some state information
Examples include time series and word sequences
Can captures the essence of the sequence in its hidden state (context)
https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9
RNN - unrolled in time
RNN
RNN
The unreasonable effectiveness of neural networks by Andrej Karpathy
RNN can learn sentence representations
RNNs can predict
RNN can learn a probability function p( word | previous-words )
RNN Training
RNN - Summary
RNN can encode sentence meaning
Can predict next word given a sequence of words.
Can be trained to learn a language model
RNN - problems
● Hard to parallelize efficiently
● Back propagation through variable length sequence
● Transmitting information through one bottleneck (final hidden state)
● longer sentences lose context
Example I was born in France..I speak fluent French
(There are improvements to RNN structure that retain context)
Translation - First Attempt
Now we can have two RNNs jointly trained.
The first RNN creates context
The second RNN uses the final context of the first RNN.
Training using Parallel corpora
Model 3
Encoder Decoder
Encoder Decoder
http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
Encoder Decoder
Encoder Decoder
models the conditional probability p(y/x) of
translating a source X into target Y
Encoder-Decoder in Translation
Encoder - Decoder summary
2014 -Google replaced their Statistical model with NMT
Due to its flexibility it is the goto framework for NLG with different models taking
roles of encoder and decoder
The decoder can not only be conditioned on a sequence but on arbitrary
representation enabling lot of use cases (like generating caption from image)
Translation using Encoder-Decoder
Issues
● Meaning is crammed
● Long term dependencies
● Word alignment
● Word interdependencies
Translation issues (long term dependencies)
The model cannot remember enough
Only final Encoder state is used
Each encoder state has valuable information
difficult to retain >30 words.
"Born in France..went to EPFL Switzerland..I speak fluent ..."
Translation issues ( Word alignment )
Translation issues (word alignment)
The European Economic Area → Le zone économique européenne
The European Economic Area
Le 1
Zone 1
Economique 1
Europeene 1
Translation issues ( Interdepencies )
Translation issues
Interdependencies
Crammed meaning (due to context)
Word alignment
Words may or may not compose ( green apple, hot dog)
Word ordering is important ( I hit him on the eye yesterday)
How to retain more meaning (Context)
European 1 Economic 2 Area 3 → zone économique européenne
Decoding is done using only final context.
The final context doesn’t capture entire information
Can we use intermediate contexts ?
Model 4
Attention
Need for Attention
Attention removes bottleneck of Encoder-Decoder model
● Pay selective attention to relevant parts of input
● Utilize all intermediate states while decoding
(Don’t encode the whole input into one vector losing information)
● Do Alignment while translating
Pay selective attention
Pay selective attention
First introduced in Image Recognition
Without attention
Decode using only final state
Attention
Attention captures alignment
Align while translating
Attention summary
Pay selective attention to words we need to translate
Compute weighted sum of all hidden states (called attention weight)
Use attention weights while decoding
Attention also take care of alignment
Model 5
Transformers
Transformers
Can we get rid of RNN completely ?
RNN are too sequential. Parallelization is not possible since these intermediate
contexts are generated sequentially
Transformer
Transformer
Transformers
There are three kinds of attention
1. within encoder
2. encoder decoder
3. within decoder
Transformers (Self attention)
A new representation Z
Transformers (Self attention)
Calculate for each word , its
dependencies on others
‘it’ = 0.7 * street + .2 * animal + …..
Transformers (Self attention)
Remember that Encoder creates input representation
Transformers create a rich representation that captures interdependencies
(This rich representation captures relationship between words compared to
simple word embeddings)
This rich representation is called Self-attention
Transformers (Self attention)
Self-attention mechanism directly models relationships between all words in a
sentence, regardless of their respective position
Self attention allows connections between words within a sentence.
Eg. I arrived at the bank after crossing the river
Transformers
Use self attention instead of RNNs, CNN
Computation can be parallelized
Ability to learn long term dependencies
Transformers
Transformer model
Transformer
Two types of attention
1. Source Target Attention
2. Self Attention
Thank you for you attention
Attention is all you need
Summary
NN can encode words
RNN can encode sentences
Long sentences need changes to RNN architecture
Two RNNs can act as encoder and decoder (of any representation)
Encoding everything into single context loses information
Selectively pay attention to the inputs we need.
Get rid of RNN and use only attention mechanism. Make it parallelizable
Richer representation of inputs using Self-attention.
Use Encoder-Decoder attention as usual for translation

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Bert
BertBert
Bert
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 

Ähnlich wie NLP using transformers

Ähnlich wie NLP using transformers (20)

Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdf
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 

Mehr von Arvind Devaraj

Yourstory Android Workshop
Yourstory Android WorkshopYourstory Android Workshop
Yourstory Android Workshop
Arvind Devaraj
 
Android High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscriptAndroid High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscript
Arvind Devaraj
 
Sorting (introduction)
 Sorting (introduction) Sorting (introduction)
Sorting (introduction)
Arvind Devaraj
 
Data structures (introduction)
 Data structures (introduction) Data structures (introduction)
Data structures (introduction)
Arvind Devaraj
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
Arvind Devaraj
 

Mehr von Arvind Devaraj (20)

Nodejs presentation
Nodejs presentationNodejs presentation
Nodejs presentation
 
Career hunt pitch
Career hunt pitchCareer hunt pitch
Career hunt pitch
 
Career options for CS and IT students
Career options for CS and IT studentsCareer options for CS and IT students
Career options for CS and IT students
 
Careerhunt ebook
Careerhunt ebookCareerhunt ebook
Careerhunt ebook
 
Static Analysis of Computer programs
Static Analysis of Computer programs Static Analysis of Computer programs
Static Analysis of Computer programs
 
Hyperbook
HyperbookHyperbook
Hyperbook
 
Yourstory Android Workshop
Yourstory Android WorkshopYourstory Android Workshop
Yourstory Android Workshop
 
Android High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscriptAndroid High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscript
 
OpenGLES Android Graphics
OpenGLES Android GraphicsOpenGLES Android Graphics
OpenGLES Android Graphics
 
Broadcast Receiver
Broadcast ReceiverBroadcast Receiver
Broadcast Receiver
 
AIDL - Android Interface Definition Language
AIDL  - Android Interface Definition LanguageAIDL  - Android Interface Definition Language
AIDL - Android Interface Definition Language
 
NDK Programming in Android
NDK Programming in AndroidNDK Programming in Android
NDK Programming in Android
 
Google Cloud Messaging
Google Cloud MessagingGoogle Cloud Messaging
Google Cloud Messaging
 
OpenGLES - Graphics Programming in Android
OpenGLES - Graphics Programming in Android OpenGLES - Graphics Programming in Android
OpenGLES - Graphics Programming in Android
 
Operating system
Operating systemOperating system
Operating system
 
Sorting (introduction)
 Sorting (introduction) Sorting (introduction)
Sorting (introduction)
 
Data structures (introduction)
 Data structures (introduction) Data structures (introduction)
Data structures (introduction)
 
Graphics programming in open gl
Graphics programming in open glGraphics programming in open gl
Graphics programming in open gl
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
Thesis: Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006) Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis: Slicing of Java Programs using the Soot Framework (2006)
 

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Kürzlich hochgeladen (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 

NLP using transformers