SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Representation Learning
of Vector of Words and
Phrases
Felipe Moraes
felipemoraes@dcc.ufmg.br
1
LATIN - LAboratory for Treating INformation
Agenda
• Motivation
• One-hot-encoding
• Language Models
• Neural Language Models
• Neural Net Language Models (NN-LMs) (Bengio et al., ’03)
• Word2Vec (Mikolov et al., ’13).
• Supervised Prediction Tasks
• Recursive NNs (Socher et al., ’11).
• Paragraph Vector (Le & Mikolov, ’14).
Motivation
Input Learning Algorithm Output
Let's start with words for now
How to learn good
representations for texts?
One-hot encoding
• Form vocabulary of words that maps lemmatized words to a
unique ID (position of word in vocabulary)
• Typical vocabulary sizes will vary between 10 000 and 250 000
One-hot encoding
• From its word ID, we get a basic representation of a word through
the one-hot encoding of the ID
• the one-hot vector of an ID is a vector filled with 0s, except for a 1
at the position associated with the ID
• ex.: for vocabulary size D=10, the one-hot vector of word ID w=4
is

e(w) = [ 0 0 0 1 0 0 0 0 0 0 ]
• a one-hot encoding makes no assumption about word similarity
• all words are equally different from each other
• this is a natural representation to start with, though a poor one
One-hot encoding
• The major problem with the one-hot representation is that it is very high-
dimensional
• the dimensionality of e(w) is the size of the vocabulary
• a typical vocabulary size is ≈100 000
• a window of 10 words would correspond to an input vector of at least 1 000
000 units
• This has 2 consequences:
• vulnerability to overfitting
• millions of inputs means millions of parameters to train in a regular neural
network
• computationally expensive
Language Modeling
• A language model is a probabilistic model that assigns
probabilities to any sequence of words p(w1, ... ,wT)
• language modeling is the task of learning a language model
that assigns high probabilities to well formed sentences
• plays a crucial role in speech recognition and machine
translation systems
uma pessoa inteligente
a smart person
a person smart
N-gram Model
• An n-gram is a sequence of n words
• unigrams(n=1):’‘is’’,‘‘a’’,‘‘sequence’’,etc.
• bigrams(n=2): [‘‘is’’,‘‘a’’], [‘’a’’,‘‘sequence’’],etc.
• trigrams(n=3): [‘’is’’,‘‘a’’,‘‘sequence’’],[‘‘a’’,‘‘sequence’’,‘‘of’’],etc.
• n-gram models estimate the conditional from n-grams counts
• the counts are obtained from a training corpus (a data set of word
text)

N-gram Model
• Issue: data sparsity
• we want n to be large, for the model to be realistic
• however, for large values of n, it is likely that a given n-gram
will not have been observed in the training corpora
• smoothing the counts can help
• combine count(w1 , w2 , w3 , w4), count(w2 , w3 , w4),
count(w3 , w4), and count(w4) to estimate p(w4 |w1, w2,
w3)
• this only partly solves the problem
Neural Network Language
Model
• Solution:
• model the conditional p(wt | wt−(n−1) , ... ,wt−1)
with a neural network
• learn word representations to allow transfer to n-
grams not observed in training corpus
Neural Network Language
Model
Bengio, Y., Schwenk, H., Sencal, J. S., Morin, F., & Gauvain, J. L. (2006). Neural probabilistic
language models. In Innovations in Machine Learning (pp. 137-186). Springer Berlin Heidelberg.
NNLMs
• Predicting the probability of each next word is slow
in NNLMs because the output layer of the network
is the size of the dictionary.
Word2Vec
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jerey Dean. Distributed
Representations of Words and Phrases and their Compositionality. NIPS, 2013.
Word2Vec
San Francisco France
• It was recently shown that the word vectors capture many linguistic regularities, fo
example:
• vector operations vector('Paris') - vector('France') + vector('Italy') results in a v
that is very close to vector('Rome')
• vector('king') - vector('man') + vector('woman') is close to vector(‘queen’)
From Words to Phrases
• How could we learn representations for phrases of arbitrary
length?
• can we model relationships between words and multiword
expressions
• ex.: ‘‘ consider ’’ ≈ ‘‘ take into account ’’
• can we extract a representation of full sentences that
preserves some of its semantic meaning
• ex.: ‘‘ word representations were learned from a corpus ’’
≈ ‘‘ we trained word representations on a text data set’‘
Recursive Neural Networks
• Idea: recursively merge pairs of word/phrase representations























• We need 2 things:
• a model that merges pairs of representations
• a model that determines the tree structure
Paragraph Vector
Paragraph Vector
Paragraph Vector
Applications
Paragraph Vector
Applications

Weitere ähnliche Inhalte

Was ist angesagt?

GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptxNameetDaga1
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 

Was ist angesagt? (20)

GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Nlp
NlpNlp
Nlp
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
NLP
NLPNLP
NLP
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
5. phase of nlp
5. phase of nlp5. phase of nlp
5. phase of nlp
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 

Ähnlich wie Representation Learning of Vectors of Words and Phrases

A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
State-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word RepresentationsState-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word RepresentationsAusaf Ahmed
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...linshanleearchive
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结君 廖
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)hen_drik
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationYunchao He
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 

Ähnlich wie Representation Learning of Vectors of Words and Phrases (20)

Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
State-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word RepresentationsState-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word Representations
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 

Kürzlich hochgeladen

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligencePriyadharshiniG41
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 

Kürzlich hochgeladen (20)

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligence
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 

Representation Learning of Vectors of Words and Phrases

  • 1. Representation Learning of Vector of Words and Phrases Felipe Moraes felipemoraes@dcc.ufmg.br 1 LATIN - LAboratory for Treating INformation
  • 2. Agenda • Motivation • One-hot-encoding • Language Models • Neural Language Models • Neural Net Language Models (NN-LMs) (Bengio et al., ’03) • Word2Vec (Mikolov et al., ’13). • Supervised Prediction Tasks • Recursive NNs (Socher et al., ’11). • Paragraph Vector (Le & Mikolov, ’14).
  • 4. Let's start with words for now How to learn good representations for texts?
  • 5. One-hot encoding • Form vocabulary of words that maps lemmatized words to a unique ID (position of word in vocabulary) • Typical vocabulary sizes will vary between 10 000 and 250 000
  • 6. One-hot encoding • From its word ID, we get a basic representation of a word through the one-hot encoding of the ID • the one-hot vector of an ID is a vector filled with 0s, except for a 1 at the position associated with the ID • ex.: for vocabulary size D=10, the one-hot vector of word ID w=4 is
 e(w) = [ 0 0 0 1 0 0 0 0 0 0 ] • a one-hot encoding makes no assumption about word similarity • all words are equally different from each other • this is a natural representation to start with, though a poor one
  • 7. One-hot encoding • The major problem with the one-hot representation is that it is very high- dimensional • the dimensionality of e(w) is the size of the vocabulary • a typical vocabulary size is ≈100 000 • a window of 10 words would correspond to an input vector of at least 1 000 000 units • This has 2 consequences: • vulnerability to overfitting • millions of inputs means millions of parameters to train in a regular neural network • computationally expensive
  • 8. Language Modeling • A language model is a probabilistic model that assigns probabilities to any sequence of words p(w1, ... ,wT) • language modeling is the task of learning a language model that assigns high probabilities to well formed sentences • plays a crucial role in speech recognition and machine translation systems uma pessoa inteligente a smart person a person smart
  • 9. N-gram Model • An n-gram is a sequence of n words • unigrams(n=1):’‘is’’,‘‘a’’,‘‘sequence’’,etc. • bigrams(n=2): [‘‘is’’,‘‘a’’], [‘’a’’,‘‘sequence’’],etc. • trigrams(n=3): [‘’is’’,‘‘a’’,‘‘sequence’’],[‘‘a’’,‘‘sequence’’,‘‘of’’],etc. • n-gram models estimate the conditional from n-grams counts • the counts are obtained from a training corpus (a data set of word text)

  • 10. N-gram Model • Issue: data sparsity • we want n to be large, for the model to be realistic • however, for large values of n, it is likely that a given n-gram will not have been observed in the training corpora • smoothing the counts can help • combine count(w1 , w2 , w3 , w4), count(w2 , w3 , w4), count(w3 , w4), and count(w4) to estimate p(w4 |w1, w2, w3) • this only partly solves the problem
  • 11. Neural Network Language Model • Solution: • model the conditional p(wt | wt−(n−1) , ... ,wt−1) with a neural network • learn word representations to allow transfer to n- grams not observed in training corpus
  • 12. Neural Network Language Model Bengio, Y., Schwenk, H., Sencal, J. S., Morin, F., & Gauvain, J. L. (2006). Neural probabilistic language models. In Innovations in Machine Learning (pp. 137-186). Springer Berlin Heidelberg.
  • 13. NNLMs • Predicting the probability of each next word is slow in NNLMs because the output layer of the network is the size of the dictionary.
  • 14. Word2Vec Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jerey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS, 2013.
  • 15. Word2Vec San Francisco France • It was recently shown that the word vectors capture many linguistic regularities, fo example: • vector operations vector('Paris') - vector('France') + vector('Italy') results in a v that is very close to vector('Rome') • vector('king') - vector('man') + vector('woman') is close to vector(‘queen’)
  • 16. From Words to Phrases • How could we learn representations for phrases of arbitrary length? • can we model relationships between words and multiword expressions • ex.: ‘‘ consider ’’ ≈ ‘‘ take into account ’’ • can we extract a representation of full sentences that preserves some of its semantic meaning • ex.: ‘‘ word representations were learned from a corpus ’’ ≈ ‘‘ we trained word representations on a text data set’‘
  • 17. Recursive Neural Networks • Idea: recursively merge pairs of word/phrase representations
 
 
 
 
 
 
 
 
 
 
 
 • We need 2 things: • a model that merges pairs of representations • a model that determines the tree structure