SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
Deep learning for Natural
language processing
Viet-Trung Tran
1	
  
Some of the challenges in Language
Understanding
• Language is ambiguous:
– Every sentence has many possible
interpretations.
• Language is productive:
– We will always encounter new
words or new

constructions
• Language is culturally specific




2	
  
fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
ML: Traditional Approach
• For each new problem/question
– Gather as much LABELED data as you can get
– Throw some algorithms at it (mainly put in an SVM and

keep it at that)
– If you actually have tried more algos: Pick the best
– Spend hours hand engineering some features / feature

selection / dimensionality reduction (PCA, SVD, etc)
– Repeat…




3	
  
Deep learning vs the rest
4	
  
Deep Learning: Why for NLP ?
• Beat state of the art
– Language Modeling (Mikolov et al. 2011) [WSJ AR task]
– Speech Recognition (Dahl et al. 2012, Seide et al 2011;

following Mohammed et al. 2011)
– Sentiment Classification (Socher et al. 2011)
– MNIST hand-written digit recognition (Ciresan et al.

2010)
– Image Recognition (Krizhevsky et al. 2012) [ImageNet]




5	
  
Language semantics
• What is the meaning of a word?

(Lexical semantics)
• What is the meaning of a sentence?

([Compositional] semantics)
• What is the meaning of a longer piece of
text?

(Discourse semantics)




6	
  
One-hot encoding
•  Form vocabulary of words that maps lemmatized words to a
unique ID (position of word in vocabulary)
•  Typical vocabulary sizes will vary between 10 000 and 250
000
7	
  
One-hot encoding
•  The one-hot vector of an ID is a vector filled with 0s, except
for a 1at the position associated with the ID
–  for vocabulary size D=10, the one-hot vector of word ID w=4 is e(w)
= [ 0 0 0 1 0 0 0 0 0 0 ]
•  A one-hot encoding makes no assumption about word
similarity
•  All words are equally different from each other
8	
  
Word representation
•  Standard
–  Bag of Words
–  A one-hot encoding
–  20k to 50k dimensions
–  Can be improved by
factoring in document
frequency
•  Word embedding
–  Neural Word
embeddings
–  Uses a vector space
that attempts to
predict a word given a
context window
–  200-400 dimensions
Word	
  embeddings	
  make	
  seman0c	
  similarity	
  and	
  
synonyms	
  possible	
   9	
  
Distributional representations
•  “You shall know a word by the company it
keeps” (J. R. Firth 1957)
•  One of the most successful ideas of modern
•  statistical NLP!
10	
  
•  Word Embeddings (Bengio et al, 2001; Bengio et al,
2003) based on idea of distributed representations
for symbols (Hinton 1986)

•  Neural Word embeddings (Mnih and Hinton 2007,
Collobert & Weston 2008, Turian et al 2010;
Collobert et al. 2011, Mikolov et al.2011)
11	
  
Neural distributional
representations
•  Neural word embeddings
•  Combine vector space semantics with the
prediction of probabilistic models
•  Words are represented as a dense vector
Human	
  =	
  	
  
12	
  
Vector space model
13	
  
Word embeddings
Turian,	
  J.,	
  Ra0nov,	
  L.,	
  Bengio,	
  Y.	
  (2010).	
  Word	
  representa0ons:	
  	
  
A	
  simple	
  and	
  general	
  method	
  for	
  semi-­‐supervised	
  learning	
  
14	
  
15	
  
•  What words have embeddings closest to a given word?
From Collobert et al. (2011)

16	
  
Word Embeddings for MT: Mikolov (2013)
17	
  
Word Embeddings
•  one of the most exciting area of research in deep learning
•  introduced by Bengio, et al. 2013
•  W:words→Rn is a paramaterized function mapping words in
some language to high-dimensional vectors (200 to 500). 
–  W(‘‘cat")=(0.2, -0.4, 0.7, ...)
–  W(‘‘mat")=(0.0, 0.6, -0.1, ...)
•  Typically, the function is a lookup table, parameterized by a
matrix, θ, with a row for each word: Wθ(wn)=θn
•  W is initialized as random vectors for each word. 
•  Word embedding learns to have meaningful vectors to
perform some task.


 18	
  
Learning word vectors (Collobert et al. JMLR 2011)
•  Idea: A word and its context is a positive
training example, a random word in the same
context give a negative training example


19	
  
Example
•  Train a network for is predicting whether a 5-
gram (sequence of five words) is ‘valid.’ 
•  Source
– any text corpus (wikipedia)
•  Break half number of 5-grams to get negative
training examples
– Make 5-gram nonsensical
– "cat sat song the mat”


 20	
  
Neural network to determine if a 5-gram is
'valid' (Bottou 2011)
•  Look up each word in the 5-gram through W
•  Feed those into R network
•  R tries to predict if the 5-gram is 'valid' or 'invalid'
–  R(W(‘‘cat"), W(‘‘sat"), W(‘‘on"), W(‘‘the"), W(‘‘mat"))= 1
–  R(W(‘‘cat"), W(‘‘sat"), W(‘‘song"), W(‘‘the"), W(‘‘mat"))=0
•  The network needs to learn good parameters for both W
and R.
21	
  
22	
  
Idea
•  “a few people sing well” → “a couple people
sing well”
•  the validity of the sentence doesn’t change
•  if W maps synonyms (like “few” and
“couple”) close together
– R’s perspective little changes.


23	
  
Bingo
•  The number of possible 5-grams is massive
•  But, small number of data points to learn
from
•  Similar class of words
– “the wall is blue” → “the wall is red”
•  Multiple words
– “the wall is blue” → “the ceiling is red”
•  Shifting “red” closer to “blue” makes the
network R perform better.
24	
  
Word embedding property
•  Analogies between words encoded in the
difference vectors between words.
– W(‘‘woman")−W(‘‘man") ≃ W(‘‘aunt")−W(‘‘uncle")
– W(‘‘woman")−W(‘‘man") ≃ W(‘‘queen")−W(‘‘king")
25	
  
Linguis0c	
  Regulari0es:	
  Mikolov	
  (2013)	
  
26	
  
Word embedding property: Shared
representations
•  The use of word representations… has become a
key “secret sauce” for the success of many NLP
systems in recent years, across tasks including
named entity recognition, part-of-speech tagging,
parsing, and semantic role labeling. (Luong et al.
(2013))

27	
  
•  W and F learn to perform
task A. Later, G can learn
to perform B based on W

28	
  
Bilingual word-embedding
29	
  
English – Chinese word mapping
30	
  
Embed images and words in a single
representation
31	
  
Feedforward neural net language model
(NNLM) Belgio et. al., 2003
• Long training time
32	
  
Recurrent neural network based language
model (Mikolov et. al., 2010)
• Elman Network
33	
  
Simple RNN training
• Input vector: 1-of-N encoding (one hot)
• Repeated epoch
– S(0): vector of small value (0,1)
– Hidden layer: 30 – 500 units
– All training data from corpus are sequentially presented
– Init learning rate: 0.1
– Error function
– Standard backpropagation with stochastic gradient descent
• Conversion achieved after 10 – 20 epochs
34	
  
Word2vec (Mikolov et. al., 2013)
• Log-linear model
• Previous models: non-linear hidden layer ->
complexity
• Continuous word vectors are learned using
simple model
35	
  
Continuous BoW (CBOW) Model
• Similar to the feed-forward NNLM, but
– Non-linear hidden layer removed 
• Called CBOW (continuous BoW) because the
order of the words is lost
CBOW Model
Continuous Skip-gram Model
• Similar to CBOW, but 
– Tries to maximize classification of a word based on
another word in the same sentence
• Predicts words within a certain window
• Observations
– Larger window size => better quality of the resulting
word vectors, higher training time
– More distant words are usually less related to the current
word than those close to it
– Give less weight to the distant words by sampling less
from those words in the training examples
Continuous Skip-gram Model
RECURSIVE NEURAL
NETWORKS
40	
  
Modular Network that learns word
embeddings 
•  Fixed number of inputs 

41	
  
Recursive neural networks
•  Output of a module go into a module of the same
type
•  tree-structured neural networks
•  No fixed number of inputs
42	
  
Building on Word Vector Space Models
• But how can we represent the meaning of longer
phrases? 
• By mapping them into the same vector space! 








43	
  
How should we map phrases into a
vector space?
44	
  
Sentence Parsing: What we want
45	
  
Learn Structure and Representation
46	
  
Recursive Neural Networks for

Structure Prediction
47	
  
Recursive Neural Network Denition
48	
  
Recursive Application of Relational
Operators
49	
  
Parsing a sentence with an RNN
50	
  
Parsing a sentence
51	
  
Parsing a sentence
52	
  
Parsing a sentence
53	
  
Labeling in Recursive Neural Networks
54	
  
Recursive matrix-vector model
55	
  
Recursive neural tensor network 

56	
  
Socher et al. 2013: Sentence sentiment
analysis
57	
  
Neural tensor network
58	
  
Reversible sentence representation
BoNou	
  2011	
  
•  Bilingual sentence representation
59	
  
Cho et al. (2014)
60	
  
Credits
•  Richard
Socher, Christopher
– Stanford University
– nlp.stanford.edu/courses/NAACL2013/
•  Roelof Pieters, PhD candidate KTH/CSC
•  http://colah.github.io/
•  Bengio GSS 2012
61	
  
Language Modeling
•  A language model is a probabilistic model that
assigns probabilities to any sequence of words
p(w1, ... ,wT)
•  Language modeling is the task of learning a
language model that assigns high probabilities to
well formed sentences
•  Plays a crucial role in speech recognition and
machine translation systems
62	
  
N-gram models
•  An n-gram is a sequence of n words
–  unigrams(n=1):’‘is’’,‘‘a’’,‘‘sequence’’,etc.
–  bigrams(n=2): [‘‘is’’,‘‘a’’], [‘’a’’,‘‘sequence’’],etc.
–  trigrams(n=3): [‘’is’’,‘‘a’’,‘‘sequence’’],
[‘‘a’’,‘‘sequence’’,‘‘of’’],etc.
•  n-gram models estimate the conditional from n-
grams counts


•  The counts are obtained from a training corpus (a
dataset of word text)
63	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
TomĂĄĹĄ Mikolov - Distributed Representations for NLP
TomĂĄĹĄ Mikolov - Distributed Representations for NLPTomĂĄĹĄ Mikolov - Distributed Representations for NLP
TomĂĄĹĄ Mikolov - Distributed Representations for NLP
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transfer
 
(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 

Ähnlich wie Deep learning for nlp

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 

Ähnlich wie Deep learning for nlp (20)

AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptx
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Word embedding
Word embedding Word embedding
Word embedding
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 

Mehr von Viet-Trung TRAN

Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
Viet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
Viet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
Viet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Viet-Trung TRAN
 

Mehr von Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 

KĂźrzlich hochgeladen

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 

KĂźrzlich hochgeladen (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Deep learning for nlp

  • 1. Deep learning for Natural language processing Viet-Trung Tran 1  
  • 2. Some of the challenges in Language Understanding • Language is ambiguous: – Every sentence has many possible interpretations. • Language is productive: – We will always encounter new words or new
 constructions • Language is culturally specic
 
 2   fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN
  • 3. ML: Traditional Approach • For each new problem/question – Gather as much LABELED data as you can get – Throw some algorithms at it (mainly put in an SVM and
 keep it at that) – If you actually have tried more algos: Pick the best – Spend hours hand engineering some features / feature
 selection / dimensionality reduction (PCA, SVD, etc) – Repeat…
 
 3  
  • 4. Deep learning vs the rest 4  
  • 5. Deep Learning: Why for NLP ? • Beat state of the art – Language Modeling (Mikolov et al. 2011) [WSJ AR task] – Speech Recognition (Dahl et al. 2012, Seide et al 2011;
 following Mohammed et al. 2011) – Sentiment Classication (Socher et al. 2011) – MNIST hand-written digit recognition (Ciresan et al.
 2010) – Image Recognition (Krizhevsky et al. 2012) [ImageNet]
 
 5  
  • 6. Language semantics • What is the meaning of a word?
 (Lexical semantics) • What is the meaning of a sentence?
 ([Compositional] semantics) • What is the meaning of a longer piece of text?
 (Discourse semantics)
 
 6  
  • 7. One-hot encoding •  Form vocabulary of words that maps lemmatized words to a unique ID (position of word in vocabulary) •  Typical vocabulary sizes will vary between 10 000 and 250 000 7  
  • 8. One-hot encoding •  The one-hot vector of an ID is a vector lled with 0s, except for a 1at the position associated with the ID –  for vocabulary size D=10, the one-hot vector of word ID w=4 is e(w) = [ 0 0 0 1 0 0 0 0 0 0 ] •  A one-hot encoding makes no assumption about word similarity •  All words are equally different from each other 8  
  • 9. Word representation •  Standard –  Bag of Words –  A one-hot encoding –  20k to 50k dimensions –  Can be improved by factoring in document frequency •  Word embedding –  Neural Word embeddings –  Uses a vector space that attempts to predict a word given a context window –  200-400 dimensions Word  embeddings  make  seman0c  similarity  and   synonyms  possible   9  
  • 10. Distributional representations •  “You shall know a word by the company it keeps” (J. R. Firth 1957) •  One of the most successful ideas of modern •  statistical NLP! 10  
  • 11. •  Word Embeddings (Bengio et al, 2001; Bengio et al, 2003) based on idea of distributed representations for symbols (Hinton 1986) •  Neural Word embeddings (Mnih and Hinton 2007, Collobert & Weston 2008, Turian et al 2010; Collobert et al. 2011, Mikolov et al.2011) 11  
  • 12. Neural distributional representations •  Neural word embeddings •  Combine vector space semantics with the prediction of probabilistic models •  Words are represented as a dense vector Human  =     12  
  • 14. Word embeddings Turian,  J.,  Ra0nov,  L.,  Bengio,  Y.  (2010).  Word  representa0ons:     A  simple  and  general  method  for  semi-­‐supervised  learning   14  
  • 16. •  What words have embeddings closest to a given word? From Collobert et al. (2011) 16  
  • 17. Word Embeddings for MT: Mikolov (2013) 17  
  • 18. Word Embeddings •  one of the most exciting area of research in deep learning •  introduced by Bengio, et al. 2013 •  W:words→Rn is a paramaterized function mapping words in some language to high-dimensional vectors (200 to 500). –  W(‘‘cat")=(0.2, -0.4, 0.7, ...) –  W(‘‘mat")=(0.0, 0.6, -0.1, ...) •  Typically, the function is a lookup table, parameterized by a matrix, θ, with a row for each word: Wθ(wn)=θn •  W is initialized as random vectors for each word. •  Word embedding learns to have meaningful vectors to perform some task. 18  
  • 19. Learning word vectors (Collobert et al. JMLR 2011) •  Idea: A word and its context is a positive training example, a random word in the same context give a negative training example 19  
  • 20. Example •  Train a network for is predicting whether a 5- gram (sequence of ve words) is ‘valid.’ •  Source – any text corpus (wikipedia) •  Break half number of 5-grams to get negative training examples – Make 5-gram nonsensical – "cat sat song the mat” 20  
  • 21. Neural network to determine if a 5-gram is 'valid' (Bottou 2011) •  Look up each word in the 5-gram through W •  Feed those into R network •  R tries to predict if the 5-gram is 'valid' or 'invalid' –  R(W(‘‘cat"), W(‘‘sat"), W(‘‘on"), W(‘‘the"), W(‘‘mat"))= 1 –  R(W(‘‘cat"), W(‘‘sat"), W(‘‘song"), W(‘‘the"), W(‘‘mat"))=0 •  The network needs to learn good parameters for both W and R. 21  
  • 23. Idea •  “a few people sing well” → “a couple people sing well” •  the validity of the sentence doesn’t change •  if W maps synonyms (like “few” and “couple”) close together – R’s perspective little changes. 23  
  • 24. Bingo •  The number of possible 5-grams is massive •  But, small number of data points to learn from •  Similar class of words – “the wall is blue” → “the wall is red” •  Multiple words – “the wall is blue” → “the ceiling is red” •  Shifting “red” closer to “blue” makes the network R perform better. 24  
  • 25. Word embedding property •  Analogies between words encoded in the difference vectors between words. – W(‘‘woman")−W(‘‘man") ≃ W(‘‘aunt")−W(‘‘uncle") – W(‘‘woman")−W(‘‘man") ≃ W(‘‘queen")−W(‘‘king") 25  
  • 26. Linguis0c  Regulari0es:  Mikolov  (2013)   26  
  • 27. Word embedding property: Shared representations •  The use of word representations… has become a key “secret sauce” for the success of many NLP systems in recent years, across tasks including named entity recognition, part-of-speech tagging, parsing, and semantic role labeling. (Luong et al. (2013)) 27  
  • 28. •  W and F learn to perform task A. Later, G can learn to perform B based on W 28  
  • 30. English – Chinese word mapping 30  
  • 31. Embed images and words in a single representation 31  
  • 32. Feedforward neural net language model (NNLM) Belgio et. al., 2003 • Long training time 32  
  • 33. Recurrent neural network based language model (Mikolov et. al., 2010) • Elman Network 33  
  • 34. Simple RNN training • Input vector: 1-of-N encoding (one hot) • Repeated epoch – S(0): vector of small value (0,1) – Hidden layer: 30 – 500 units – All training data from corpus are sequentially presented – Init learning rate: 0.1 – Error function – Standard backpropagation with stochastic gradient descent • Conversion achieved after 10 – 20 epochs 34  
  • 35. Word2vec (Mikolov et. al., 2013) • Log-linear model • Previous models: non-linear hidden layer -> complexity • Continuous word vectors are learned using simple model 35  
  • 36. Continuous BoW (CBOW) Model • Similar to the feed-forward NNLM, but – Non-linear hidden layer removed • Called CBOW (continuous BoW) because the order of the words is lost
  • 38. Continuous Skip-gram Model • Similar to CBOW, but – Tries to maximize classication of a word based on another word in the same sentence • Predicts words within a certain window • Observations – Larger window size => better quality of the resulting word vectors, higher training time – More distant words are usually less related to the current word than those close to it – Give less weight to the distant words by sampling less from those words in the training examples
  • 41. Modular Network that learns word embeddings •  Fixed number of inputs 41  
  • 42. Recursive neural networks •  Output of a module go into a module of the same type •  tree-structured neural networks •  No xed number of inputs 42  
  • 43. Building on Word Vector Space Models • But how can we represent the meaning of longer phrases? • By mapping them into the same vector space! 
 
 
 
 43  
  • 44. How should we map phrases into a vector space? 44  
  • 45. Sentence Parsing: What we want 45  
  • 46. Learn Structure and Representation 46  
  • 47. Recursive Neural Networks for
 Structure Prediction 47  
  • 48. Recursive Neural Network Denition 48  
  • 49. Recursive Application of Relational Operators 49  
  • 50. Parsing a sentence with an RNN 50  
  • 54. Labeling in Recursive Neural Networks 54  
  • 56. Recursive neural tensor network 56  
  • 57. Socher et al. 2013: Sentence sentiment analysis 57  
  • 59. Reversible sentence representation BoNou  2011   •  Bilingual sentence representation 59  
  • 60. Cho et al. (2014) 60  
  • 61. Credits •  Richard Socher, Christopher – Stanford University – nlp.stanford.edu/courses/NAACL2013/ •  Roelof Pieters, PhD candidate KTH/CSC •  http://colah.github.io/ •  Bengio GSS 2012 61  
  • 62. Language Modeling •  A language model is a probabilistic model that assigns probabilities to any sequence of words p(w1, ... ,wT) •  Language modeling is the task of learning a language model that assigns high probabilities to well formed sentences •  Plays a crucial role in speech recognition and machine translation systems 62  
  • 63. N-gram models •  An n-gram is a sequence of n words –  unigrams(n=1):’‘is’’,‘‘a’’,‘‘sequence’’,etc. –  bigrams(n=2): [‘‘is’’,‘‘a’’], [‘’a’’,‘‘sequence’’],etc. –  trigrams(n=3): [‘’is’’,‘‘a’’,‘‘sequence’’], [‘‘a’’,‘‘sequence’’,‘‘of’’],etc. •  n-gram models estimate the conditional from n- grams counts •  The counts are obtained from a training corpus (a dataset of word text) 63 Â