SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Paper Presentation
Word Representations
in Vector space
Abdullah Khan Zehady
Department of Computer Science,
Purdue University.
E-mail: azehady@purdue.edu
Word Representation
Neural Word Embedding
● Continuous vector space representation
o Words represented as dense real-valued vectors in Rd
● Distributed word representation ↔ Word Embedding
o Embed an entire vocabulary into a relatively low-dimensional linear
space where dimensions are latent continuous features.
● Classical n-gram model works in terms of discrete units
o No inherent relationship in n-gram.
● In contrast, word embeddings capture regularities and relationships
between words.
Syntactic & Semantic Relationship
Regularities are observed as the constant offset vector between
pair of words sharing some relationship.
Gender Relation
KING-QUEEN ~ MAN - WOMAN
Singular/Plural Relation
KING-KINGS ~ QUEEN - QUEENS
Other Relations:
● Language
France - French
~
Spain - Spanish
● Past Tense
Go – Went
~
Capture - Captured
Vector Space Model
Language 1: English
Language 2: Estonian
Neural Net
Hidden
Layer
Input
Layer
Output
Layer
Language Model(LM)
● Different models for estimating continuous representations of
words.
○ Latent Semantic Analysis (LSA)
○ Latent Dirichlet Allocation (LDA)
○ Neural network Language Model(NNLM)
Feed Forward NNLM
● Consists of input, projection, hidden and output layers.
● N previous words are encoded using 1-of-V coding, where V is size of the
vocabulary. Ex: A = (1,0,...,0), B = (0,1,...,0), … , Z = (0,0,...,1) in R26
● NNLM becomes computationally complex between projection(P) and
hidden(H) layer
○ For N=10, size of P = 500-2000, size of H = 500-1000
○ Hidden layer is used to compute prob. dist. over all the words in
vocabulary V
● Hierarchical softmax as the rescue.
Recurrent NNLM
● No projection Layer, consists of input, hidden and output layers only.
● No need to specify the context length like feed forward NNLM
● What is special in RNN model?
○ Recurrent matrix that connects layer to itself.
○ Allows to form short-term memory
■ Information from the past is represen-
ted by the hidden layer
● RNN-embedded vector achieved state of the
art results in relational similarity identification task.
RNN Model
Recurrent NNLM
w(t): Input word at time t
y(t): Output layer produces a prob. Dist.
over words.
s(t): Hidden layer
U: Each column represents a word
● Four-gram neural net language model architecture(Bengio 2001)
● RNN is trained with SGD and backpropagation to maximize the
● log likelihood.
Bringing efficiency..
● Computational complexity of the NNLMs are high.
● We can remove the hidden layer and speed up 1000x
○ Continuous bag-of-words model
○ Continuous skip-gram model
● The full softmax can be replaced by:
○ Hierarchical softmax (Morin and Bengio)
○ Hinge loss (Collobert and Weston)
○ Noise contrastive estimation (Mnih et al.)
Continuous Bag of Word Model(CBOW)
● Non-linear hidden layer is removed
● Projection layer is shared for all words(not
just the projection matrix).
● All words get projected into the same
position(vectors are averaged).
● Naming Reson: Order of words in the
history does not influence the projection.
● Best performance obtained by a log-
linear classifier with four future and
four history words at the input
Predicts the current word based on
the context.
Continuous Skip-gram Model
● Objective: Tries to maximize
classification of a word based on another
word in the same sentence. Maximize the
average log probability
● Define p(wt+j |wt ) using the softmax
function:
Predicts surrounding word given
the current word.
Bringing efficiency..
● Computational complexity of the NNLMs are high.
● We can remove the hidden layer and speed up 1000x
○ Continuous bag-of-words model
○ Continuous skip-gram model
● The full softmax can be replaced by:
○ Hierarchical softmax (Morin and Bengio)
○ Hinge loss (Collobert and Weston)
○ Noise contrastive estimation (Mnih et al.)
Hierarchical Softmax for efficient computation
● This formulation is impractical because the cost of computing ∇logp(wO|wI)
is proportional to W, which is often large (105–107 terms).
● With hierarchical softmax, the cost is reduced
Hierarchical Softmax
● Uses a binary tree (Huffman code) representation of the output layer with the W
words as its leaves.
o A random walk that assigns probabilities to words.
● Instead of evaluating W output nodes, evaluate log(W) nodes to calculate prob. dist.
● Each word w can be reached by an appropriate path from the root of the tree● n(w, j): j-th node on the path from the root to w
● L(w): The length of this path
● n(w, 1) = root and n(w, L(w)) = w
● ch(n): An arbitrary fixed child of an inner node n
● [x] = 1 if x is true and [x] = -1 otherwise
Negative Sampling
● Noise Contrastive Estimation (NCE)
o A good model should be able to differentiate data from noise by means of
logistic regression.
o Alternative to the hierarchical softmax.
o Introduced by Gutmann and Hyvarinen and applied to language modeling by
Mnih and Teh.
● NCE approximates the log probability of the softmax
● Define Negative Sampling by the objective which replaces log P(w0|wI) in the skip-
gram.
● Task: Distinguish the target word wO from draws from the noise distribution
Subsampling of Frequent words
● Most frequent words provide less information than rare words.
o Co-occurrences of “France” and “Paris” is informative
o Co-occurrences of “France” and “the” is less informative
● A simple subsampling approach to counter imbalance
o Each word wi in the training set is discarded with probability
where f(wi) is the frequency of word wi and t is a chosen threshold,
typically around 10−5
● Aggressive subsampling of words whose frequency is greater than
t while preserving the ranking of the frequencies.
Empirical Results
Automatic learning by skip-gram model
● No supervised information
about what a capital city
means.
● But the model is still
capable of
o Automatic
organization of
concepts
o Learning implicit
relationship
PCA projection of 100- dimensional skip-gram vectors
Analogical Reasoning Performance
● Analogical Reasoning task introduced by Mikolov
o Syntactic analogies: “quick” : “quickly” :: “slow” : ? “slowly”
o Semantic analogies: “Germany” : “Berlin” :: “France” : ? “Paris”
Learning Phrases
● To learn phrase vectors
o First find words that appear frequently together, and infrequently in
other contexts.
o Replace with unique tokens. Ex: “New York Times” ->
New_York_Times
● Phrases are formed based on the unigram and bigram counts, using
δ(discounting coefficient) prevents too many phrases consisting of very
Learning Phrases
Goal: Compute the fourth phrase using the first three.
(Best model accuracy: 72%)
Phrase Skip-gram Results
● Accuracies of the Skip-gram models on the phrase analogy dataset
o Using different hyperparameters
o Models trained on approximately one billion words from the news
dataset
● Size of the training data matters.
o HS-Huffman( dimensionality=1000) trained on 33 billion words
reaches an accuracy of 72%
Additive compositionality
● Possible to meaningfully combine words by an element-wise addition of their
vector representations.
○ Word vectors represents the distribution of the context in which it appears.
● Vector values related logarithmically to the probabilities computed by output layer.
○ The sum of two word vectors is related to the product of the two context
distributions
Closest Entities
Closest entity search using two methods-Negative sampling and Hierarchical Softmax.
Compare with published word representations
Comments
● Reduction of computational complexity is impressive.
● Works with unsupervised/unlabelled data
● Vector representation can be extended to large pieces of text
Paragraph Vector (Mikolov et al. 2013)
● Applicable to a lot of NLP tasks
o Tagging
o Named Entity Recognition
o Translation
o Paraphrasing
Thank you.

Weitere ähnliche Inhalte

Was ist angesagt?

Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representationhyunyoung Lee
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 

Was ist angesagt? (20)

Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
NLP
NLPNLP
NLP
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Word embedding
Word embedding Word embedding
Word embedding
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 

Ähnlich wie Word representations in vector space

Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documentsAbdullah Khan Zehady
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2Jisoo Jang
 
Energy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language ProcessingEnergy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language Processingnxmaosdh232
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceKALPANATCSE
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmMeetupDataScienceRoma
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrasesYue Xiangnan
 
Reduction Monads and Their Signatures
Reduction Monads and Their SignaturesReduction Monads and Their Signatures
Reduction Monads and Their SignaturesMarco Maggesi
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...L. Thorne McCarty
 
Generating sentences from a continuous space
Generating sentences from a continuous spaceGenerating sentences from a continuous space
Generating sentences from a continuous spaceShuhei Iitsuka
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxRama Irsheidat
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology miningEstelle Delpech
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distancesGanesh Borle
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 

Ähnlich wie Word representations in vector space (20)

Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documents
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
 
Esa act
Esa actEsa act
Esa act
 
Language models
Language modelsLanguage models
Language models
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Energy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language ProcessingEnergy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language Processing
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrases
 
Reduction Monads and Their Signatures
Reduction Monads and Their SignaturesReduction Monads and Their Signatures
Reduction Monads and Their Signatures
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
 
Generating sentences from a continuous space
Generating sentences from a continuous spaceGenerating sentences from a continuous space
Generating sentences from a continuous space
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptx
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
MultiSeg
MultiSegMultiSeg
MultiSeg
 

Mehr von Abdullah Khan Zehady

Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...Abdullah Khan Zehady
 
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Abdullah Khan Zehady
 
Change of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldChange of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldAbdullah Khan Zehady
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural networkAbdullah Khan Zehady
 
How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?Abdullah Khan Zehady
 
Applying word vectors sentiment analysis
Applying word vectors sentiment analysisApplying word vectors sentiment analysis
Applying word vectors sentiment analysisAbdullah Khan Zehady
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super readsAbdullah Khan Zehady
 
Rudimentary bitcoin network analysis
Rudimentary bitcoin network analysisRudimentary bitcoin network analysis
Rudimentary bitcoin network analysisAbdullah Khan Zehady
 
Bitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubBitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubAbdullah Khan Zehady
 

Mehr von Abdullah Khan Zehady (17)

Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
 
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
 
Change of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldChange of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the world
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural network
 
Tribeflow on bitcoin data
Tribeflow on bitcoin dataTribeflow on bitcoin data
Tribeflow on bitcoin data
 
How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?
 
Applying word vectors sentiment analysis
Applying word vectors sentiment analysisApplying word vectors sentiment analysis
Applying word vectors sentiment analysis
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super reads
 
Bitcoin Multisig Transaction
Bitcoin Multisig TransactionBitcoin Multisig Transaction
Bitcoin Multisig Transaction
 
Bitcoin ideas
Bitcoin ideasBitcoin ideas
Bitcoin ideas
 
Bitcoin investments
Bitcoin investmentsBitcoin investments
Bitcoin investments
 
Rudimentary bitcoin network analysis
Rudimentary bitcoin network analysisRudimentary bitcoin network analysis
Rudimentary bitcoin network analysis
 
Rich gets richer-Bitcoin Network
Rich gets richer-Bitcoin NetworkRich gets richer-Bitcoin Network
Rich gets richer-Bitcoin Network
 
Bitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubBitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin Club
 
Bitcoin Network Analysis
Bitcoin Network AnalysisBitcoin Network Analysis
Bitcoin Network Analysis
 
Bitcoin & Bitcoin Mining
Bitcoin & Bitcoin MiningBitcoin & Bitcoin Mining
Bitcoin & Bitcoin Mining
 
The true measure of success
The true measure of successThe true measure of success
The true measure of success
 

Kürzlich hochgeladen

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 

Kürzlich hochgeladen (20)

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 

Word representations in vector space

  • 1. Paper Presentation Word Representations in Vector space Abdullah Khan Zehady Department of Computer Science, Purdue University. E-mail: azehady@purdue.edu
  • 2.
  • 4. Neural Word Embedding ● Continuous vector space representation o Words represented as dense real-valued vectors in Rd ● Distributed word representation ↔ Word Embedding o Embed an entire vocabulary into a relatively low-dimensional linear space where dimensions are latent continuous features. ● Classical n-gram model works in terms of discrete units o No inherent relationship in n-gram. ● In contrast, word embeddings capture regularities and relationships between words.
  • 5. Syntactic & Semantic Relationship Regularities are observed as the constant offset vector between pair of words sharing some relationship. Gender Relation KING-QUEEN ~ MAN - WOMAN Singular/Plural Relation KING-KINGS ~ QUEEN - QUEENS Other Relations: ● Language France - French ~ Spain - Spanish ● Past Tense Go – Went ~ Capture - Captured
  • 6. Vector Space Model Language 1: English Language 2: Estonian
  • 8. Language Model(LM) ● Different models for estimating continuous representations of words. ○ Latent Semantic Analysis (LSA) ○ Latent Dirichlet Allocation (LDA) ○ Neural network Language Model(NNLM)
  • 9. Feed Forward NNLM ● Consists of input, projection, hidden and output layers. ● N previous words are encoded using 1-of-V coding, where V is size of the vocabulary. Ex: A = (1,0,...,0), B = (0,1,...,0), … , Z = (0,0,...,1) in R26 ● NNLM becomes computationally complex between projection(P) and hidden(H) layer ○ For N=10, size of P = 500-2000, size of H = 500-1000 ○ Hidden layer is used to compute prob. dist. over all the words in vocabulary V ● Hierarchical softmax as the rescue.
  • 10. Recurrent NNLM ● No projection Layer, consists of input, hidden and output layers only. ● No need to specify the context length like feed forward NNLM ● What is special in RNN model? ○ Recurrent matrix that connects layer to itself. ○ Allows to form short-term memory ■ Information from the past is represen- ted by the hidden layer ● RNN-embedded vector achieved state of the art results in relational similarity identification task. RNN Model
  • 11. Recurrent NNLM w(t): Input word at time t y(t): Output layer produces a prob. Dist. over words. s(t): Hidden layer U: Each column represents a word ● Four-gram neural net language model architecture(Bengio 2001) ● RNN is trained with SGD and backpropagation to maximize the ● log likelihood.
  • 12. Bringing efficiency.. ● Computational complexity of the NNLMs are high. ● We can remove the hidden layer and speed up 1000x ○ Continuous bag-of-words model ○ Continuous skip-gram model ● The full softmax can be replaced by: ○ Hierarchical softmax (Morin and Bengio) ○ Hinge loss (Collobert and Weston) ○ Noise contrastive estimation (Mnih et al.)
  • 13. Continuous Bag of Word Model(CBOW) ● Non-linear hidden layer is removed ● Projection layer is shared for all words(not just the projection matrix). ● All words get projected into the same position(vectors are averaged). ● Naming Reson: Order of words in the history does not influence the projection. ● Best performance obtained by a log- linear classifier with four future and four history words at the input Predicts the current word based on the context.
  • 14. Continuous Skip-gram Model ● Objective: Tries to maximize classification of a word based on another word in the same sentence. Maximize the average log probability ● Define p(wt+j |wt ) using the softmax function: Predicts surrounding word given the current word.
  • 15. Bringing efficiency.. ● Computational complexity of the NNLMs are high. ● We can remove the hidden layer and speed up 1000x ○ Continuous bag-of-words model ○ Continuous skip-gram model ● The full softmax can be replaced by: ○ Hierarchical softmax (Morin and Bengio) ○ Hinge loss (Collobert and Weston) ○ Noise contrastive estimation (Mnih et al.)
  • 16. Hierarchical Softmax for efficient computation ● This formulation is impractical because the cost of computing ∇logp(wO|wI) is proportional to W, which is often large (105–107 terms). ● With hierarchical softmax, the cost is reduced
  • 17. Hierarchical Softmax ● Uses a binary tree (Huffman code) representation of the output layer with the W words as its leaves. o A random walk that assigns probabilities to words. ● Instead of evaluating W output nodes, evaluate log(W) nodes to calculate prob. dist. ● Each word w can be reached by an appropriate path from the root of the tree● n(w, j): j-th node on the path from the root to w ● L(w): The length of this path ● n(w, 1) = root and n(w, L(w)) = w ● ch(n): An arbitrary fixed child of an inner node n ● [x] = 1 if x is true and [x] = -1 otherwise
  • 18. Negative Sampling ● Noise Contrastive Estimation (NCE) o A good model should be able to differentiate data from noise by means of logistic regression. o Alternative to the hierarchical softmax. o Introduced by Gutmann and Hyvarinen and applied to language modeling by Mnih and Teh. ● NCE approximates the log probability of the softmax ● Define Negative Sampling by the objective which replaces log P(w0|wI) in the skip- gram. ● Task: Distinguish the target word wO from draws from the noise distribution
  • 19. Subsampling of Frequent words ● Most frequent words provide less information than rare words. o Co-occurrences of “France” and “Paris” is informative o Co-occurrences of “France” and “the” is less informative ● A simple subsampling approach to counter imbalance o Each word wi in the training set is discarded with probability where f(wi) is the frequency of word wi and t is a chosen threshold, typically around 10−5 ● Aggressive subsampling of words whose frequency is greater than t while preserving the ranking of the frequencies.
  • 21. Automatic learning by skip-gram model ● No supervised information about what a capital city means. ● But the model is still capable of o Automatic organization of concepts o Learning implicit relationship PCA projection of 100- dimensional skip-gram vectors
  • 22. Analogical Reasoning Performance ● Analogical Reasoning task introduced by Mikolov o Syntactic analogies: “quick” : “quickly” :: “slow” : ? “slowly” o Semantic analogies: “Germany” : “Berlin” :: “France” : ? “Paris”
  • 23. Learning Phrases ● To learn phrase vectors o First find words that appear frequently together, and infrequently in other contexts. o Replace with unique tokens. Ex: “New York Times” -> New_York_Times ● Phrases are formed based on the unigram and bigram counts, using δ(discounting coefficient) prevents too many phrases consisting of very
  • 24. Learning Phrases Goal: Compute the fourth phrase using the first three. (Best model accuracy: 72%)
  • 25. Phrase Skip-gram Results ● Accuracies of the Skip-gram models on the phrase analogy dataset o Using different hyperparameters o Models trained on approximately one billion words from the news dataset ● Size of the training data matters. o HS-Huffman( dimensionality=1000) trained on 33 billion words reaches an accuracy of 72%
  • 26. Additive compositionality ● Possible to meaningfully combine words by an element-wise addition of their vector representations. ○ Word vectors represents the distribution of the context in which it appears. ● Vector values related logarithmically to the probabilities computed by output layer. ○ The sum of two word vectors is related to the product of the two context distributions
  • 27. Closest Entities Closest entity search using two methods-Negative sampling and Hierarchical Softmax.
  • 28. Compare with published word representations
  • 29. Comments ● Reduction of computational complexity is impressive. ● Works with unsupervised/unlabelled data ● Vector representation can be extended to large pieces of text Paragraph Vector (Mikolov et al. 2013) ● Applicable to a lot of NLP tasks o Tagging o Named Entity Recognition o Translation o Paraphrasing

Hinweis der Redaktion

  1. words are represented as dense real-valued vectors in Rd
  2. words are represented as dense real-valued vectors in Rd
  3. he basic Skip-gram formulation defines p(wt+j |wt ) using the softmax function
  4. his formulation is impractical because the cost of computing ∇logp(wO|wI)isproportionaltoW,whichisoftenlarge(105–107 terms).
  5. each word w can be reached by an appropriate path from the root of the tree
  6. Neg-k : Negative sampling with k negative samples
  7. the vectors can be seen as representing the distribution of the context in which a word appears. These values are related logarithmically to the probabilities computed by the output layer, so the sum of two word vectors is related to the product of the two context distributions. The product works here as the AND function: words that are assigned high probabilities by both word vectors will have high probability, and the other words will have low probability. Thus, if “Volga River” appears frequently in the same sentence together with the words “Russian” and “river”, the sum of these two word vectors will result in such a feature vector that is close to the vector of “Volga River”.