SlideShare a Scribd company logo
1 of 28
Lecture 11.
Convolution Neural Networks for NLP
CS224n Natural Language Processing with Deep Learning
UOS STAT NLP Study
Changdae Oh
2021. 01. 16
1
Contents
1. From RNN to CNN
2. Introduction to CNN
3. Simple CNN for Sentence Classification
4. Toolkit & ideas for NLP task
5. Deep CNN for Sentence Classification
6. Quasi-recurrent Neural Networks
2
Contents
1. From RNN to CNN
2. Introduction to CNN
3. Simple CNN for Sentence Classification
4. Toolkit & ideas for NLP task
5. Deep CNN for Sentence Classification
6. Quasi-recurrent Neural Networks
3
From RNN to CNN
4
Recurrent Neural Networks
• Cannot capture phrases without prefix context
• Often capture too much of last words in final vector
• What if we compute vectors for every possible
word subsequence of a certain length?
• And calculate a representation for it
• Another solution to bottleneck problem !
1) Can get N-gram features
2) Capture patterns locally
3) Fastet than RNN
RNN consider the final step’s hidden state
as a representation about input sequence !
Bottleneck,,,
Convolution Neural Networks
For NLP
Contents
1. From RNN to CNN
2. Introduction to CNN
3. Simple CNN for Sentence Classification
4. Toolkit & ideas for NLP task
5. Deep CNN for Sentence Classification
6. Quasi-recurrent Neural Networks
5
Introduction to CNN
6
• Sliding window function applied to a matrix
• The sliding window is called a filter (also kernel)
• Use n*n filter, multiply its values element-wise
with the original matrix, then sum them up
 Channel
 Filter
 Kernel
 Stride
What is a convolution ?
For image
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolutio
n
For text
< Various terms used on CNN >
Convolution is classically used
to extract features from images
 Padding
 Pooling
 Feature Map
 Activation Map
Introduction to CNN
7
• Adjust
the size of output sequence
• Preserve dimensions of data
flowing across the layer
1D convolution for text
(zero) padding Multiple filters pooling
• Increase output channels
• Capture various features
• The filters to be specialized
for different domain
• Summarize the output
of convolution
• Capture sparse signal
• Subsampling(down-sampling)
• Give some invariance
to fluctuation of inputs
Introduction to CNN
8
Other notions
increase stride
local max pooling
k-max pooling dilated convolution
Compress data
Compress representation
Focus on
more local characteristics
Compress representation
Keep more information
than 1-max pooling
Compress data
Use a relatively small number of
parameters to have a larger
receptive field
Introduction to CNN
9
Tools to see a wider range at once
1)Use bigger filters
2)Dilated convolution
3)Make CNNs deeper
•https://www.slideshare.net/modulabs/2-cnn-rnn
https://zzsza.github.io/data/2018/02/23/i
ntroduction-convolution/
https://zzsza.github.io/data/2018/
05/25/cs231n-cnn-architectures/
Contents
1. From RNN to CNN
2. Introduction to CNN
3. Simple CNN for Sentence Classification
4. Toolkit & ideas for NLP task
5. Deep CNN for Sentence Classification
6. Quasi-recurrent Neural Networks
10
Simple CNN for Sentence Classification
11
• Goal : Sentence Classification
• A simple use of one convolutional layer and pooling
https://arxiv.org/pdf/1408.5882.pdf
Yoon Kim (2014)
Notation
Note,
filter is a vector (all inputs and filters are flatten)
Feature extraction
Simple CNN for Sentence Classification
12
Max-over-time pooling
Take only one maximum value as the feature from
each feature maps
Use multiple filters(100) with various widths
• Different window sizes h(3, 4, 5) -> look at various n-grams
• Can represent lots of domain features
Multi-channel input idea
• Having two channels of word vectors
-- one that is kept static throughout training and one that is fine-tuned via backpropagation
• Both channel sets are added to feature map before max-pooling
Simple CNN for Sentence Classification
13
Regularization
Dropout
Max-norm regularization
https://d2l.ai/
• Constrain l2 norms of weights vectors
of each class to fixed threshold value s
• prevents overfitting
• Masking vector r of Bernoulli r.v.
with prob. p of being 1
• Delete feature during training
-> prevents co-adaptation / overfitting
• At testing, scale final vector by keep probability p
Simple CNN for Sentence Classification
14
Model Architecture
https://arxiv.org/pdf/1510.03820.pdf,
Simple CNN for Sentence Classification
15
Conclusion
the model is very simple,
but it’s quite powerful !
Contents
1. From RNN to CNN
2. Introduction to CNN
3. Simple CNN for Sentence Classification
4. Toolkit & ideas for NLP task
5. Deep CNN for Sentence Classification
6. Quasi-recurrent Neural Networks
16
Toolkit & ideas for NLP task
17
Bag of Vectors
The average of word embeddings in sentence can be the vector of that sentence
Text classification can be performed by just average of word vectors.
Good baseline method !
Model comparison
Window Model
CNNs
RNNs
Good for single word classification for problems that do not need wide context
(POS, NER)
Advanced Model
 Good for representing sentence meaning -> sentence classification
 Easy to parallelize on GPUs.
 Efficient and versatile
 Cognitively plausible, good for sequence tagging / classification
 Great for language models
 Can be amazing with attention mechanisms
 Much slower than CNNs
Toolkit & ideas for NLP task
18
Gated units used vertically
1 x 1 convolutions
• The gating in LSTMs / GRUs is a general idea
• Can also gate vertically (in CNNs, deeps FNN)
Gating = Skipping
• Add shortcut connection
• Can stack many layers without gradient vanishing
• Convolution with kernels _ size : 1
• Add additional neural network layers
with very few additional parameters
• Can be used to map
from many channels to fewer channels
(preserves spatial dimensions, reduces depth) http://d2l.ai/chapter_convolutional-neural-networks/channels.html
19
• It is widely used in CNNs and Deep FNNs
• In CNNs, transform the convolution output of a batch
by scaling the activations to have zero mean and unit variance (Z-transform)
Toolkit & ideas for NLP task
Batch Normalization
advantage disadvantage
 Learn faster
 Less sensitive to initialization
 Prevent overfitting
 Increase complexity
 Computing resource
20
NEXT CHAPTER
Toolkit & ideas for NLP task
Encoder(CNN)-Decoder(RNN) networks Character-level representations
• Shrink the input data continuously
by stacking up convolutional layers
• Take the final pulled vector as a sentence representation
Contents
1. From RNN to CNN
2. Introduction to CNN
3. Simple CNN for Sentence Classification
4. Toolkit & ideas for NLP task
5. Deep CNN for Sentence Classification
6. Quasi-recurrent Neural Networks
21
Deep CNN for Sentence Classification
22
• Starting point : sequence models have been very dominant in NLP
but all the models are basically not very deep
Conneau, Schwenk, Lecun, Barrault. EACL 2017
Note
 Build a vision-like system for NLP
 Works from the character-level
 Result is constant size, since text is truncated or padded
 Local max pooling
https://arxiv.org/abs/1606.01781
VeryDeep-CNN
Deep CNN for Sentence Classification
23
Classification tasks
 Two conv. Layers, each one followed by a temporal BN
and an ReLU activation
 Padding for preservation of temporal resolution
 Use small size filters (3) so that networks can be deep
and get more expressive in a few parameters
Convolutional block
 Temporal resolution of the output
is first down-sampled to a fixed dimension using k-max pooling
(extracts the k most important features indep. position)
 the 512 x k resulting features are transformed into a single vector
(Flatten) which is the input to a three FC layer with ReLU
 The number of output neurons in there depends on clf task
Deep CNN for Sentence Classification
24
Conclusion
 deeper networks are better
 However, if it is too deep, its performance will be degraded
 Shrinkage method – “MaxPooling“ is easier
Contents
1. From RNN to CNN
2. Introduction to CNN
3. Simple CNN for Sentence Classification
4. Toolkit & ideas for NLP task
5. Deep CNN for Sentence Classification
6. Quasi-recurrent Neural Networks
25
Quasi-recurrent Neural Networks
26
QRNNs : have the advantages of both CNN and RNN
- Parallel computation across both timestep & minibatch dimensions
- Output depend on the overall order of elements in the sequence
(*) Denotes a masked convolution
along the timestep dimension
https://arxiv.org/abs/1611.01576
LSTM notation
QRNN notation
Parallelism across timesteps
function controlled by gates
that can mix states across timesteps,
but which acts independently
on each channel of the state vector
Parallelism across channels
“ Dynamic average pooling ”
Reference
27
Zhang, Xiang, Junbo Zhao, and Yann LeCun. "Character-level convolutional networks for text
classification." Advances in neural information processing systems 28 (2015): 649-657.
Conneau, Alexis, et al. "Very deep convolutional networks for text classification." arXiv preprint arXiv:1606.01781 (2016).
Bradbury, J., et al. "Quasi-recurrent neural networks. arXiv 2016." arXiv preprint arXiv:1611.01576.
Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).
Zhang, Ye, and Byron Wallace. "A sensitivity analysis of (and practitioners' guide to) convolutional neural
networks for sentence classification." arXiv preprint arXiv:1510.03820 (2015).
https://zzsza.github.io/data/2018/05/25/cs231n-cnn-architectures/
https://www.slideshare.net/modulabs/2-cnn-rnn
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
https://happyzipsa.tistory.com/10?category=748723
http://taewan.kim/post/cnn/
https://wikidocs.net/103496
Changdae Oh
bnormal16@naver.com
https://velog.io/@changdaeoh
https://github.com/changdaeoh

More Related Content

What's hot

Packet hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksPacket hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksJPINFOTECH JAYAPRAKASH
 
Packet hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksPacket hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksJPINFOTECH JAYAPRAKASH
 
Link Capacity Estimation in Wireless Software Defined Networks
Link Capacity Estimation in Wireless Software Defined NetworksLink Capacity Estimation in Wireless Software Defined Networks
Link Capacity Estimation in Wireless Software Defined NetworksFarzaneh Pakzad
 
A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...
A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...
A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...IDES Editor
 
Towards Seamless TCP Congestion Avoidance in Multiprotocol Environments
Towards Seamless TCP Congestion Avoidance in Multiprotocol EnvironmentsTowards Seamless TCP Congestion Avoidance in Multiprotocol Environments
Towards Seamless TCP Congestion Avoidance in Multiprotocol EnvironmentsIDES Editor
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054Jinwon Lee
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Márton Miháltz
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM健程 杨
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
An Analytical Approach To Analyze The Impact Of Gray Hole Attacks In Manet
An Analytical Approach To Analyze The Impact Of Gray Hole Attacks In ManetAn Analytical Approach To Analyze The Impact Of Gray Hole Attacks In Manet
An Analytical Approach To Analyze The Impact Of Gray Hole Attacks In Manetidescitation
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Lviv Startup Club
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataYao-Chieh Hu
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learningBabu Priyavrat
 
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...eSAT Journals
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...eSAT Publishing House
 
Combining cryptographic primitives to prevent jamming attacks in wireless net...
Combining cryptographic primitives to prevent jamming attacks in wireless net...Combining cryptographic primitives to prevent jamming attacks in wireless net...
Combining cryptographic primitives to prevent jamming attacks in wireless net...JPINFOTECH JAYAPRAKASH
 

What's hot (20)

Packet hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksPacket hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacks
 
Packet hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksPacket hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacks
 
Link Capacity Estimation in Wireless Software Defined Networks
Link Capacity Estimation in Wireless Software Defined NetworksLink Capacity Estimation in Wireless Software Defined Networks
Link Capacity Estimation in Wireless Software Defined Networks
 
A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...
A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...
A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...
 
Towards Seamless TCP Congestion Avoidance in Multiprotocol Environments
Towards Seamless TCP Congestion Avoidance in Multiprotocol EnvironmentsTowards Seamless TCP Congestion Avoidance in Multiprotocol Environments
Towards Seamless TCP Congestion Avoidance in Multiprotocol Environments
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
An Analytical Approach To Analyze The Impact Of Gray Hole Attacks In Manet
An Analytical Approach To Analyze The Impact Of Gray Hole Attacks In ManetAn Analytical Approach To Analyze The Impact Of Gray Hole Attacks In Manet
An Analytical Approach To Analyze The Impact Of Gray Hole Attacks In Manet
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
 
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
lpc and horn noise detection
lpc and horn noise detectionlpc and horn noise detection
lpc and horn noise detection
 
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
Performance evaluation of rapid and spray and-wait dtn routing protocols unde...
 
Combining cryptographic primitives to prevent jamming attacks in wireless net...
Combining cryptographic primitives to prevent jamming attacks in wireless net...Combining cryptographic primitives to prevent jamming attacks in wireless net...
Combining cryptographic primitives to prevent jamming attacks in wireless net...
 

Similar to CNNs for NLP Lecture

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep LearningYasas Senarath
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksRimzim Thube
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptxthanhdowork
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesYannis Flet-Berliac
 
GNR638_Course Project for spring semester
GNR638_Course Project for spring semesterGNR638_Course Project for spring semester
GNR638_Course Project for spring semesterBijayChandraDasTECH0
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
Complete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxComplete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxArunKumar674066
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep LearningNatasha Latysheva
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 

Similar to CNNs for NLP Lecture (20)

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Conformer review
Conformer reviewConformer review
Conformer review
 
GNR638_project ppt.pdf
GNR638_project ppt.pdfGNR638_project ppt.pdf
GNR638_project ppt.pdf
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language Services
 
GNR638_Course Project for spring semester
GNR638_Course Project for spring semesterGNR638_Course Project for spring semester
GNR638_Course Project for spring semester
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Complete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxComplete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptx
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 

Recently uploaded

Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 

Recently uploaded (20)

Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 

CNNs for NLP Lecture

  • 1. Lecture 11. Convolution Neural Networks for NLP CS224n Natural Language Processing with Deep Learning UOS STAT NLP Study Changdae Oh 2021. 01. 16 1
  • 2. Contents 1. From RNN to CNN 2. Introduction to CNN 3. Simple CNN for Sentence Classification 4. Toolkit & ideas for NLP task 5. Deep CNN for Sentence Classification 6. Quasi-recurrent Neural Networks 2
  • 3. Contents 1. From RNN to CNN 2. Introduction to CNN 3. Simple CNN for Sentence Classification 4. Toolkit & ideas for NLP task 5. Deep CNN for Sentence Classification 6. Quasi-recurrent Neural Networks 3
  • 4. From RNN to CNN 4 Recurrent Neural Networks • Cannot capture phrases without prefix context • Often capture too much of last words in final vector • What if we compute vectors for every possible word subsequence of a certain length? • And calculate a representation for it • Another solution to bottleneck problem ! 1) Can get N-gram features 2) Capture patterns locally 3) Fastet than RNN RNN consider the final step’s hidden state as a representation about input sequence ! Bottleneck,,, Convolution Neural Networks For NLP
  • 5. Contents 1. From RNN to CNN 2. Introduction to CNN 3. Simple CNN for Sentence Classification 4. Toolkit & ideas for NLP task 5. Deep CNN for Sentence Classification 6. Quasi-recurrent Neural Networks 5
  • 6. Introduction to CNN 6 • Sliding window function applied to a matrix • The sliding window is called a filter (also kernel) • Use n*n filter, multiply its values element-wise with the original matrix, then sum them up  Channel  Filter  Kernel  Stride What is a convolution ? For image http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolutio n For text < Various terms used on CNN > Convolution is classically used to extract features from images  Padding  Pooling  Feature Map  Activation Map
  • 7. Introduction to CNN 7 • Adjust the size of output sequence • Preserve dimensions of data flowing across the layer 1D convolution for text (zero) padding Multiple filters pooling • Increase output channels • Capture various features • The filters to be specialized for different domain • Summarize the output of convolution • Capture sparse signal • Subsampling(down-sampling) • Give some invariance to fluctuation of inputs
  • 8. Introduction to CNN 8 Other notions increase stride local max pooling k-max pooling dilated convolution Compress data Compress representation Focus on more local characteristics Compress representation Keep more information than 1-max pooling Compress data Use a relatively small number of parameters to have a larger receptive field
  • 9. Introduction to CNN 9 Tools to see a wider range at once 1)Use bigger filters 2)Dilated convolution 3)Make CNNs deeper •https://www.slideshare.net/modulabs/2-cnn-rnn https://zzsza.github.io/data/2018/02/23/i ntroduction-convolution/ https://zzsza.github.io/data/2018/ 05/25/cs231n-cnn-architectures/
  • 10. Contents 1. From RNN to CNN 2. Introduction to CNN 3. Simple CNN for Sentence Classification 4. Toolkit & ideas for NLP task 5. Deep CNN for Sentence Classification 6. Quasi-recurrent Neural Networks 10
  • 11. Simple CNN for Sentence Classification 11 • Goal : Sentence Classification • A simple use of one convolutional layer and pooling https://arxiv.org/pdf/1408.5882.pdf Yoon Kim (2014) Notation Note, filter is a vector (all inputs and filters are flatten) Feature extraction
  • 12. Simple CNN for Sentence Classification 12 Max-over-time pooling Take only one maximum value as the feature from each feature maps Use multiple filters(100) with various widths • Different window sizes h(3, 4, 5) -> look at various n-grams • Can represent lots of domain features Multi-channel input idea • Having two channels of word vectors -- one that is kept static throughout training and one that is fine-tuned via backpropagation • Both channel sets are added to feature map before max-pooling
  • 13. Simple CNN for Sentence Classification 13 Regularization Dropout Max-norm regularization https://d2l.ai/ • Constrain l2 norms of weights vectors of each class to fixed threshold value s • prevents overfitting • Masking vector r of Bernoulli r.v. with prob. p of being 1 • Delete feature during training -> prevents co-adaptation / overfitting • At testing, scale final vector by keep probability p
  • 14. Simple CNN for Sentence Classification 14 Model Architecture https://arxiv.org/pdf/1510.03820.pdf,
  • 15. Simple CNN for Sentence Classification 15 Conclusion the model is very simple, but it’s quite powerful !
  • 16. Contents 1. From RNN to CNN 2. Introduction to CNN 3. Simple CNN for Sentence Classification 4. Toolkit & ideas for NLP task 5. Deep CNN for Sentence Classification 6. Quasi-recurrent Neural Networks 16
  • 17. Toolkit & ideas for NLP task 17 Bag of Vectors The average of word embeddings in sentence can be the vector of that sentence Text classification can be performed by just average of word vectors. Good baseline method ! Model comparison Window Model CNNs RNNs Good for single word classification for problems that do not need wide context (POS, NER) Advanced Model  Good for representing sentence meaning -> sentence classification  Easy to parallelize on GPUs.  Efficient and versatile  Cognitively plausible, good for sequence tagging / classification  Great for language models  Can be amazing with attention mechanisms  Much slower than CNNs
  • 18. Toolkit & ideas for NLP task 18 Gated units used vertically 1 x 1 convolutions • The gating in LSTMs / GRUs is a general idea • Can also gate vertically (in CNNs, deeps FNN) Gating = Skipping • Add shortcut connection • Can stack many layers without gradient vanishing • Convolution with kernels _ size : 1 • Add additional neural network layers with very few additional parameters • Can be used to map from many channels to fewer channels (preserves spatial dimensions, reduces depth) http://d2l.ai/chapter_convolutional-neural-networks/channels.html
  • 19. 19 • It is widely used in CNNs and Deep FNNs • In CNNs, transform the convolution output of a batch by scaling the activations to have zero mean and unit variance (Z-transform) Toolkit & ideas for NLP task Batch Normalization advantage disadvantage  Learn faster  Less sensitive to initialization  Prevent overfitting  Increase complexity  Computing resource
  • 20. 20 NEXT CHAPTER Toolkit & ideas for NLP task Encoder(CNN)-Decoder(RNN) networks Character-level representations • Shrink the input data continuously by stacking up convolutional layers • Take the final pulled vector as a sentence representation
  • 21. Contents 1. From RNN to CNN 2. Introduction to CNN 3. Simple CNN for Sentence Classification 4. Toolkit & ideas for NLP task 5. Deep CNN for Sentence Classification 6. Quasi-recurrent Neural Networks 21
  • 22. Deep CNN for Sentence Classification 22 • Starting point : sequence models have been very dominant in NLP but all the models are basically not very deep Conneau, Schwenk, Lecun, Barrault. EACL 2017 Note  Build a vision-like system for NLP  Works from the character-level  Result is constant size, since text is truncated or padded  Local max pooling https://arxiv.org/abs/1606.01781 VeryDeep-CNN
  • 23. Deep CNN for Sentence Classification 23 Classification tasks  Two conv. Layers, each one followed by a temporal BN and an ReLU activation  Padding for preservation of temporal resolution  Use small size filters (3) so that networks can be deep and get more expressive in a few parameters Convolutional block  Temporal resolution of the output is first down-sampled to a fixed dimension using k-max pooling (extracts the k most important features indep. position)  the 512 x k resulting features are transformed into a single vector (Flatten) which is the input to a three FC layer with ReLU  The number of output neurons in there depends on clf task
  • 24. Deep CNN for Sentence Classification 24 Conclusion  deeper networks are better  However, if it is too deep, its performance will be degraded  Shrinkage method – “MaxPooling“ is easier
  • 25. Contents 1. From RNN to CNN 2. Introduction to CNN 3. Simple CNN for Sentence Classification 4. Toolkit & ideas for NLP task 5. Deep CNN for Sentence Classification 6. Quasi-recurrent Neural Networks 25
  • 26. Quasi-recurrent Neural Networks 26 QRNNs : have the advantages of both CNN and RNN - Parallel computation across both timestep & minibatch dimensions - Output depend on the overall order of elements in the sequence (*) Denotes a masked convolution along the timestep dimension https://arxiv.org/abs/1611.01576 LSTM notation QRNN notation Parallelism across timesteps function controlled by gates that can mix states across timesteps, but which acts independently on each channel of the state vector Parallelism across channels “ Dynamic average pooling ”
  • 27. Reference 27 Zhang, Xiang, Junbo Zhao, and Yann LeCun. "Character-level convolutional networks for text classification." Advances in neural information processing systems 28 (2015): 649-657. Conneau, Alexis, et al. "Very deep convolutional networks for text classification." arXiv preprint arXiv:1606.01781 (2016). Bradbury, J., et al. "Quasi-recurrent neural networks. arXiv 2016." arXiv preprint arXiv:1611.01576. Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014). Zhang, Ye, and Byron Wallace. "A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification." arXiv preprint arXiv:1510.03820 (2015). https://zzsza.github.io/data/2018/05/25/cs231n-cnn-architectures/ https://www.slideshare.net/modulabs/2-cnn-rnn http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/ https://happyzipsa.tistory.com/10?category=748723 http://taewan.kim/post/cnn/ https://wikidocs.net/103496