SlideShare a Scribd company logo
1 of 32
27 Jan 2016
Sequence-to-sequence models
Seq2seq (Sutskever et al., 2014)
Source: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns
Encoder RNN
Decoder RNN
Seq2seq overview and applications
• Encoder-decoder
• Two RNNs (typically LSTMs or GRUs)
• Can be deterministic or variational
• Applications:
• Machine translation
• Question answering
• Dialogue models (conversational agents)
• Summarization
• Etc.
LSTM cell
Seq2Seq
• Source sequence x= (x1, x2,..., x|x|) represented as word embedding
vectors
• Target sequence y= (y1, y2,..., y|y|)
• At the end of the encoding process, we have the final hidden and cell
states
• Hidden state initialization:
• Set the initial states of the decoder to
Seq2seq (cont.)
• At each step of the decoder, compute
• yj-1 – ground truth previous word during training (“teacher forcing”),
and previously predicted word at inference time.
• θ – parameters (weights) of the network
Seq2seq (cont.)
• Predicted word at time step j is given by a softmax layer:
• Wout is a weight matrix
• Softmax function:
• yjk is the value of the kth dimension of the output vector at time step j
Softmax example
Source: (Bahuleyan, 2018)
Seq2seq model
Source: (Bahuleyan, 2018)
Selecting the word at each time step of the
decoder
• Greedy search: select word with the highest p(yi) given by the
softmax layer
• Beam search: choose k words with the highest p(yi) at each time step.
• k – beam width (typically 5-10)
Beam search
• Multiple possible replies
can be generated in
response to “Who does
John like?”
Image source: https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/
Beam search (cont.)
Image source: https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/
• Chose the proposed path
with the maximum
combined probability
Seq2seq resources
• https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-
sequence-learning-in-keras.html
Attention mechanism in RNN encoder-
decoder networks – Intuitions
• Dynamically align target sequence with source sequence in the
decoder
• Pay different level of attention to words in the input sequence at each
time step in the decoder
• At each time step, the decoder is provided access to all encoded
source tokens
• The decoder gives higher weights to certain and lower to others
Attention mechanism – Formal definition
• Compute a probabilistic distribution at each decoding time step j
where is the weight given to source output i and
is a pre-normalized score
• Two methods to compute :
• Multiplicative (Luong et al., 2015)
• Additive (Bahdanau et al., 2014)
Attention mechanism – Formal definition (cont.)
• Take the sum of the source outputs weighted by to get
the context vector
• Compute attention vector
• Finally, feed attention vector to the softmax layer
Attention mechanism – Formal definition (cont.)
Seq2seq model
with attention
Figure source: (Bahuleyan, 2018)
Visualizing Attention in Machine Translation (1)
Source: https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Visualizing Attention in Machine Translation (2)
Source: https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Variational Attention for
Sequence-to-Sequence Models
Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart
In Proc. COLING 2018
Figure source: (Bahuleyan, 2018)
• The decoder LSTM has direct
access to source via cj
• This may cause the decoder to
ignore z – bypassing
phenomenon (Bahuleyan et al.,
2018)
Deterministic Attention
in Variational Encoder-
Decoder (VED)
Variational Attention
• The context vector cj is modelled as a Gaussian random variable
• ELBO for the standard VAE:
• ELBO for VAE with variational attention:
Variational Attention (continued)
• Given x we can assume conditional independence between z and cj
• Hence, the posterior factorizes as
• Assume separate priors for z and cj
• Sampling is done separately and KL loss can be computed
independently
Seq2Seq VED with
Variational Attention
Figure source: (Bahuleyan, 2018)
Seq2Seq VED with Variational Attention
• Loss function:
• 𝛌KL – coefficient for both KL terms
• 𝛄𝑎 – coefficient for the context vector’s KL term (kept constant)
Seq2Seq VED with Variational Attention – Prior
• Sentence latent code z prior: p(z)=N(0,I) (same as in the VAE)
• Context vector cj prior:
• Option 1: p(cj) = N(0,I)
• Option 2:
where is the mean of the source hidden states
and
Seq2Seq VED with Variational Attention –
Posterior
• Both posterior distributions q(z|x) and q(cj|x) are
parameterized by the encoder LSTM
• For the sentence latent space (same as VAE):
• For the context vector cj at time step j:
Where , and is computed using feed-forward neural
network
Evaluation
• Tasks and datasets
• Question generation (SQuAD dataset) ~100K QA pairs
• Dialogue (Cornell Movie dialogs corpus) >200K conversational exchanges
• Evaluation measures:
• BLEU scores
• Entropy
• Distinct
Results on the question generation task
Source: Bahuleyan et al (2018) https://arxiv.org/abs/1712.08207
Results on the conversational (dialogue)
system experiment
Examples
from the
question
generation
task
Source: Bahuleyan et al (2018) https://arxiv.org/abs/1712.08207

More Related Content

Similar to Week9_Seq2seq.pptx

Intelligent soft computing based
Intelligent soft computing basedIntelligent soft computing based
Intelligent soft computing basedijasa
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part IIQuantUniversity
 
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptxNibrasulIslam
 
Multilayer Perceptron Guided Key Generation through Mutation with Recursive R...
Multilayer Perceptron Guided Key Generation through Mutation with Recursive R...Multilayer Perceptron Guided Key Generation through Mutation with Recursive R...
Multilayer Perceptron Guided Key Generation through Mutation with Recursive R...pijans
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Association for Computational Linguistics
 
DNA Splice site prediction
DNA Splice site predictionDNA Splice site prediction
DNA Splice site predictionsageteam
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalSuhas Pillai
 
1 Cryptography Introduction_shared.ppt
1 Cryptography Introduction_shared.ppt1 Cryptography Introduction_shared.ppt
1 Cryptography Introduction_shared.pptssuser0cd7c9
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptxSeungeon Baek
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with KerasQuantUniversity
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptxthanhdowork
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Biomedical Signals Classification With Transformer Based Model.pptx
Biomedical Signals Classification With Transformer Based Model.pptxBiomedical Signals Classification With Transformer Based Model.pptx
Biomedical Signals Classification With Transformer Based Model.pptxSandeep Kumar
 

Similar to Week9_Seq2seq.pptx (20)

Intelligent soft computing based
Intelligent soft computing basedIntelligent soft computing based
Intelligent soft computing based
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
 
Multilayer Perceptron Guided Key Generation through Mutation with Recursive R...
Multilayer Perceptron Guided Key Generation through Mutation with Recursive R...Multilayer Perceptron Guided Key Generation through Mutation with Recursive R...
Multilayer Perceptron Guided Key Generation through Mutation with Recursive R...
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
 
DNA Splice site prediction
DNA Splice site predictionDNA Splice site prediction
DNA Splice site prediction
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
1 Cryptography Introduction_shared.ppt
1 Cryptography Introduction_shared.ppt1 Cryptography Introduction_shared.ppt
1 Cryptography Introduction_shared.ppt
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Biomedical Signals Classification With Transformer Based Model.pptx
Biomedical Signals Classification With Transformer Based Model.pptxBiomedical Signals Classification With Transformer Based Model.pptx
Biomedical Signals Classification With Transformer Based Model.pptx
 
ma52009id420
ma52009id420ma52009id420
ma52009id420
 

Recently uploaded

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 

Recently uploaded (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Week9_Seq2seq.pptx

  • 2. Seq2seq (Sutskever et al., 2014) Source: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns Encoder RNN Decoder RNN
  • 3. Seq2seq overview and applications • Encoder-decoder • Two RNNs (typically LSTMs or GRUs) • Can be deterministic or variational • Applications: • Machine translation • Question answering • Dialogue models (conversational agents) • Summarization • Etc.
  • 5. Seq2Seq • Source sequence x= (x1, x2,..., x|x|) represented as word embedding vectors • Target sequence y= (y1, y2,..., y|y|) • At the end of the encoding process, we have the final hidden and cell states • Hidden state initialization: • Set the initial states of the decoder to
  • 6. Seq2seq (cont.) • At each step of the decoder, compute • yj-1 – ground truth previous word during training (“teacher forcing”), and previously predicted word at inference time. • θ – parameters (weights) of the network
  • 7. Seq2seq (cont.) • Predicted word at time step j is given by a softmax layer: • Wout is a weight matrix • Softmax function: • yjk is the value of the kth dimension of the output vector at time step j
  • 10. Selecting the word at each time step of the decoder • Greedy search: select word with the highest p(yi) given by the softmax layer • Beam search: choose k words with the highest p(yi) at each time step. • k – beam width (typically 5-10)
  • 11. Beam search • Multiple possible replies can be generated in response to “Who does John like?” Image source: https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/
  • 12. Beam search (cont.) Image source: https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/ • Chose the proposed path with the maximum combined probability
  • 14. Attention mechanism in RNN encoder- decoder networks – Intuitions • Dynamically align target sequence with source sequence in the decoder • Pay different level of attention to words in the input sequence at each time step in the decoder • At each time step, the decoder is provided access to all encoded source tokens • The decoder gives higher weights to certain and lower to others
  • 15. Attention mechanism – Formal definition • Compute a probabilistic distribution at each decoding time step j where is the weight given to source output i and is a pre-normalized score
  • 16. • Two methods to compute : • Multiplicative (Luong et al., 2015) • Additive (Bahdanau et al., 2014) Attention mechanism – Formal definition (cont.)
  • 17. • Take the sum of the source outputs weighted by to get the context vector • Compute attention vector • Finally, feed attention vector to the softmax layer Attention mechanism – Formal definition (cont.)
  • 18. Seq2seq model with attention Figure source: (Bahuleyan, 2018)
  • 19. Visualizing Attention in Machine Translation (1) Source: https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
  • 20. Visualizing Attention in Machine Translation (2) Source: https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
  • 21. Variational Attention for Sequence-to-Sequence Models Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart In Proc. COLING 2018
  • 22. Figure source: (Bahuleyan, 2018) • The decoder LSTM has direct access to source via cj • This may cause the decoder to ignore z – bypassing phenomenon (Bahuleyan et al., 2018) Deterministic Attention in Variational Encoder- Decoder (VED)
  • 23. Variational Attention • The context vector cj is modelled as a Gaussian random variable • ELBO for the standard VAE: • ELBO for VAE with variational attention:
  • 24. Variational Attention (continued) • Given x we can assume conditional independence between z and cj • Hence, the posterior factorizes as • Assume separate priors for z and cj • Sampling is done separately and KL loss can be computed independently
  • 25. Seq2Seq VED with Variational Attention Figure source: (Bahuleyan, 2018)
  • 26. Seq2Seq VED with Variational Attention • Loss function: • 𝛌KL – coefficient for both KL terms • 𝛄𝑎 – coefficient for the context vector’s KL term (kept constant)
  • 27. Seq2Seq VED with Variational Attention – Prior • Sentence latent code z prior: p(z)=N(0,I) (same as in the VAE) • Context vector cj prior: • Option 1: p(cj) = N(0,I) • Option 2: where is the mean of the source hidden states and
  • 28. Seq2Seq VED with Variational Attention – Posterior • Both posterior distributions q(z|x) and q(cj|x) are parameterized by the encoder LSTM • For the sentence latent space (same as VAE): • For the context vector cj at time step j: Where , and is computed using feed-forward neural network
  • 29. Evaluation • Tasks and datasets • Question generation (SQuAD dataset) ~100K QA pairs • Dialogue (Cornell Movie dialogs corpus) >200K conversational exchanges • Evaluation measures: • BLEU scores • Entropy • Distinct
  • 30. Results on the question generation task Source: Bahuleyan et al (2018) https://arxiv.org/abs/1712.08207
  • 31. Results on the conversational (dialogue) system experiment
  • 32. Examples from the question generation task Source: Bahuleyan et al (2018) https://arxiv.org/abs/1712.08207

Editor's Notes

  1. Problem with “teacher forcing”: exposure bias From “Adversarial generation of natural language” (Rajeswar et al, 2017) https://arxiv.org/pdf/1705.10929.pdf: “However this one-step ahead prediction during training makes the model prone to exposure bias (Ranzato et al.,2015;Bengio et al.,2015). Exposure bias occurs when a model is only trained conditioned on ground-truth contexts and is not exposed to its own errors (Wiseman and Rush,2016). An important consequence to exposure bias is that generated sequences can degenerate as small errors accumulate.” Alternative to “teacher forcing”: scheduled sampling
  2. The output vector at each time step has |V| dimensions
  3. -- Entropy. Generate k samples for a given input, and compute entropy of unigram probability distribution. P(w) = count(w)/total number of tokens A higher value of entropy corresponds to more randomness in the system. -- Distinct. The higher the distinct score, the more diverse the output sentences will be.
  4. MAP - max a posteriori inference