SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Challenges in Transfer
Learning in NLP
MADRID NLP MEETUP
29th May 2019
Lara Olmos Camarena
‘Except as otherwise noted, this material
Olmos Camarena, Lara (2019). “Challenges in Transfer Learning in NLP”
is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0
International.
Contents
Introduction
‱ Motivation
‱ Evolution
‱ First definitions, challenges, frontiers
‱ Formal definition
Word embeddings
‱ Word embedding: typical definition
‱ Pre-trained word embedding
‱ Conceptual architecture
‱ Training word embeddings
‱ Evaluation of word embeddings
Next steps!
‱ ¿Why? Word embedding limitations and new directions
‱ Universal Language Model Fine-tuning
‱ Transformer and BERT
Appendix
‱ Frameworks related
‱ NLP&DL: Education
‱ More references and papers
Introduction
Motivation
Natural Language
Processing
challenges
word enrinchment and
training meaningfull
representation requires
so much effort
enterprise tasks face with
specific and domain
data of different sizes
(dificulties with small!)
text annotation is time-
consuming
Machine Learning
models
suffer with vectorial spaces
models (VSM)
built with tokens (bag of
words) because of it is high
dimensional and sparse.
loose of sequential and
hierarquical information
Loose of relations between
words and semantics
Deep Learning
and Transfer
Learning!
Transferring information from
one machine learning task to
another. Not necessary to
train from zero, save effort
and time.
Transfer learning might involve
transferring knowledge from the
solution of a simpler task to a more
complex one, or involve transferring
knowledge from a task where there
is more data to one where there is
less data.
Most machine learning systems
solve a single task. Transfer
learning is a baby step towards
artificial intelligence in which a
single program can
solve multiple tasks.
Transfer learning definition from: https://developers.google.com/machine-learning/glossary/
More on Transfer Learning! https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a
Evolution
 Language modelling, sometimes with neural network methods, to solve this problem: low
dimensional vectors (embeddings) for textual units selected: words, phrases or documents.
 Words embeddings obtained as output of different models and algorithms:
 Random: uniform, Xavier
 Predictive models: word2vec (skip-gram, CBOW)
 Count-based or coocurrences: GloVe
 Deep neural network models with different blocks trained: CNN, RNN, LSTM, biLSTM
 Recent and more complex ones: subword information FastText, biLSTM based ELMo
 New perspectives in transfer learning in NLP:
 Use pre-trained word embedding to initialize word vectors in embedding layers or other tasks
 Use pre-trained language models to directly represent text. Prodigy (spaCy), Zero-shot learning for classifiers
(Parallel Dots), Universal Sentence Encoder (Google), ULMFit, BERT and OpenAI Transformer
to use later in more complex or task specific models/systems.
NEW!
First definitions, challenges, frontiers
Initial context: Neural network models in NLP, catastrophic forgetting
and frontiers
‱ A review of the recent history of natural language processing - Sebastian Ruder
‱ Neural language models (2001), multi-task learning (2008), Word embeddings (2013), seq2seq (2014),
‱ 2018-2019: pretrained language models
‱ Fighting with catastrophic forgetting in NLP - Matthew Honnibal (2017)
‱ Frontiers of natural language processing
Transfer learning in NLP divulgation
‱ Rodríguez, Horacio. Embeddings, Deep Learning, Distributional Semantics en el Procesamiento de la
lengua, ... ÂżmĂĄs que marketing?
‱ NLP’s ImageNet moment
‱ ULMFiT, ELMo, and the OpenAI transformer is one key paradigm shift: going from just initializing the
first layer of our models to pretraining the entire model with hierarchical representations. If learning
word vectors is like only learning edges, these approaches are like learning the full hierarchy of
features, from edges to shapes to high-level semantic concepts.
‱ Efective transfer learning for NLP – Madison May – Indico
‱ Transfer learning with language models – Sebastian Ruder
‱ T3chFest – Transfer learning for NLP and Computer Vision - Pablo Vargas Ibarra, Manuel López Sheriff –
Kernel Analytics
Formal definition
 Neural Transfer Learning for Natural Language Processing – Sebastian Ruder
 Previous work found: “Transfer learning for Speech and Language Processing”
https://arxiv.org/pdf/1511.06066.pdf
Word embeddings
Word embedding: typical definitions
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language
processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. Conceptually it involves a
mathematical embedding from a space with one dimension per word to a continuous vector space with a much lower dimension.
Methods to generate this mapping include neural networks, dimensionality reduction on the word co-occurrence
matrix, probabilistic models, explainable knowledge base method, and explicit representation in terms of the context in which
words appear.
Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in
NLP tasks such as syntactic parsing and sentiment analysis.
(Ref.: Wikipedia)
A word embedding is a learned representation from text where words that have the same meaning have a similar
representation.
(Ref.: Machine Learning Mastering)
The Distributional Hypothesis is that words that occur in the same contexts tend to have similar meanings (Harris, 1954). The
underlying idea that "a word is characterized by the company it keeps" was popularized by Firth (1957), and it is implicit in
Weaver's (1955) discussion of word sense disambiguation (originally written as a memorandum, in 1949). The Distributional
Hypothesis is the basis for Statistical Semantics. Although the Distributional Hypothesis originated in Linguistics, it is now
receiving attention in Cognitive Science (McDonald and Ramscar, 2001). The origin and theoretical basis of the Distributional
Hypothesis is discussed by Sahlgren (2008).
(Ref: ACL Wiki)
Pre-
trained word
embedding
 Word embedding helps to capture vocabulary
from a corpus (f.e.: public or web corpora) not
seen in the task specific language or limited time
training and used in general common language.
 The representation allows to establish lineal
relations and capture similarities between
words from sintax and contexts seen in the
language and domain where the word embedding was
trained. Be careful with bias!
 If pre-trained with public and external corpus,
some words will be not recognized
(unknown word problem), when fighting with jargon
and argots! Also ortographic errors detected!
Conceptual Architecture
(common offline/training and online/predict/execution time)
Representation
obtained with
Word embedding
client 0.29, 0.53, -1.1, 0.23, 
, 0.43
branch 0.39, 0.49, -1.2, 0.42, 
, 0.98
Text entry
processed
(sentence,
Short text,
Long text)
Neural network
model
Unsupervised
model
Classifiers
(
)
[ [ 0.29, 0.53, -1.1, 0.23, 
, 0.43], 
, [0.39, 0.49, -1.2, 0.42, 
, 0.98] ]
Mapping / look-up table
Vocabulary Word vectors
Response
for
specific
task
Evaluation
EvaluationEvaluation
Concatenation as input (for example)
Training word embeddings
 Different Word embedding trained for benchmarking – perspective of pre-
trained initialization of embedding layer for our neural network model
 Initialization with random distributions: uniform, Xavier
 Trained with word2vec
 Trained with GloVe
 Trained with Tensorflow with NCE estimation
 Using different corpora for training:
 public corpora (example projects like Billion Words)
 force-context phrases created for supervised synonyms
 domain specific document corpus
 real text users/industry dataset
GloVe example
Trained with force-contexts phrases built with vocabulary
Vocabulary size = 856, window size with = 3, vector size = 50
word2vec example
Corpus: force context phrases, Vocabulary size: 856
words
Algorithm: word2vec, Word vector size: 50
Evaluation of word embeddings
 Intrinsic proofs: discovering internal word synonyms. Synonyms
trained, are still synonyms after embedding? Which embedding
improves similarity for our task?
 Cualitative evaluation: word kneighbours clusters visualization (t-SNE and PCA),
Brown clustering
 Cuantitative evaluation: similarity test with cosine function and thresold using
 Extrinsic proofs: testing Word embedding improvement when
changing initialization for fix in specific task model.
 Dataset, domain and task dependent
 NLP processing pipeline must match the input sentence and word embedding
construction (stopword removal, normalization, stemming...)
Enso Benchmark - https://github.com/IndicoDataSolutions/EnsoMORE?
Next steps!
ÂżWhy? Word embedding limitations
 Handling out of vocabulary words or unknown words
 Solved with subword level embeddings (character and n-gran based based): FastText embedding
 Unique representation for word, problem with polysemy and word senses!
 Solved with Elmo: Deep contextualized word representations
 Word embedding domain adaptation
 Lack of studies to solve bias and vectorial space studies, also in evaluation
 Research transition from feature-based to fine-tuning complex pre-trained Deep Learning architectures for
NLP principal tasks: classification, inference, question answering, natural language generation

 Started in language modelling: sentence embedding like SpaCy, Universal Sentence Encoder (DAN and Transformer)
 State of the art: ULMFit, Transformer, BERT, Open AI GPT

 Challenges with Deep Learning dificulties: explainability of neural network models, its representation
knowledge learned and to use and adapt in real business problems ethically!
New directions and challenges
Universal Language Model Fine-tuning
Neural Transfer Learning for Natural Language Processing – Sebastian Ruder
Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint
arXiv:1801.06146 (2018).
Transformer
https://ai.google/research/pubs/pub46201
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,
... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural
information processing systems(pp. 5998-6008).
https://arxiv.org/pdf/1706.03762.pdf
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint
arXiv:1810.04805.
https://arxiv.org/pdf/1810.04805.pdf
TO START:
http://jalammar.github.
io/illustrated-bert/
BERT
Conclusions
Thank you for your
attention!
Any questions?
Appendix
Frameworks related
 Gensim: https://radimrehurek.com/gensim/models/word2vec.html
 Glovepy: https://pypi.org/project/glovepy/
 SpaCy: https://spacy.io/
 FastText – Facebook Research: https://github.com/facebookresearch/fastText
 Universal Sentence Encoder - Tensorflow:
https://tfhub.dev/google/universal-sentence-encoder/2
 AllenNLP - ELMo - http://aclweb.org/anthology/N18-1202
 https://allennlp.org/elmo
 Enso - https://github.com/IndicoDataSolutions/Enso
 BERT - https://github.com/google-research/bert
NLP&DL: Education
 Stanford University - Christopher Manning – Natural Language Processing with
Deep Learning
 https://www.youtube.com/watch?v=OQQ-
W_63UgQ&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6
 http://cs224d.stanford.edu/
 http://web.stanford.edu/class/cs224n/
 University of Oxford - Deep Natural Language Processing
 https://github.com/oxford-cs-deepnlp-2017/lectures
 Bar Ilan University’s - Yoav Goldberg - Senior Lecturer Computer Science
Department NLP Lab - http://u.cs.biu.ac.il/~nlp/
 Book: Neural Network Methods for Natural Language Processing
 Universitat PolitĂšcnica de Catalunya – Horacio RodrĂ­guez - Embeddings working
notes - http://www.cs.upc.edu/~horacio/docencia.html
http://www.lsi.upc.edu/~ageno/anlp/embeddings.pdf
More references and papers
 Academic material:
 Ruder, Sebastian. On word embeddings. http://ruder.io/word-embeddings-2017/, http://ruder.io/word-embeddings-1/
 Jurafsky, Daniel, & Martin, James H (2017). Speech and Language Processing. Draft 3Âș Edition. Chapter 15-16. Vector Semantics and Semantics with Dense Vectors
 Goldberg, Yoav (2017). Neural Network Methods for Natural Language Processing (Book)
 Word embedding papers:
 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
 Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
 Le, Quoc, and Mikolov, Tomas. "Distributed representations of sentences and documents." International conference on machine learning. 2014.
 Pennington, Jeffrey, Richard Socher, and Christopher Manning. "GloVe: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in
natural language processing (EMNLP). 2014.
 Mnih, Andriy, and Koray Kavukcuoglu. "Learning word embeddings efficiently with noise-contrastive estimation." Advances in neural information processing systems. 2013.
 Schnabel, Tobias and Labutov, Igor (2015). Evaluation methods for unsupervised word embeddings.
 Wang, Yanshan and Lui, Sijia (2018). A Comparison of Word Embeddings for the Biomedical Natural Language Processing. arXiv:1802.00400v3
 Next steps!
 http://jalammar.github.io/illustrated-bert/
 Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).
 Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Weitere Àhnliche Inhalte

Was ist angesagt?

Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
TomĂĄĆĄ Mikolov - Distributed Representations for NLP
TomĂĄĆĄ Mikolov - Distributed Representations for NLPTomĂĄĆĄ Mikolov - Distributed Representations for NLP
TomĂĄĆĄ Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Lifeng (Aaron) Han
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?Traian Rebedea
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovBhaskar Mitra
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...KozoChikai
 

Was ist angesagt? (20)

Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
TomĂĄĆĄ Mikolov - Distributed Representations for NLP
TomĂĄĆĄ Mikolov - Distributed Representations for NLPTomĂĄĆĄ Mikolov - Distributed Representations for NLP
TomĂĄĆĄ Mikolov - Distributed Representations for NLP
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
 

Ähnlich wie Challenges in transfer learning in nlp

Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modellingRiddhi Jain
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answeringAli Kabbadj
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Rasa NLU and ML Interpretability
Rasa NLU and ML InterpretabilityRasa NLU and ML Interpretability
Rasa NLU and ML Interpretabilityztopol
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET Journal
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfdevangmittal4
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGkevig
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGijnlc
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGkevig
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...IJCI JOURNAL
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 

Ähnlich wie Challenges in transfer learning in nlp (20)

Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modelling
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Rasa NLU and ML Interpretability
Rasa NLU and ML InterpretabilityRasa NLU and ML Interpretability
Rasa NLU and ML Interpretability
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Doc format.
Doc format.Doc format.
Doc format.
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 

KĂŒrzlich hochgeladen

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 

KĂŒrzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Challenges in transfer learning in nlp

  • 1. Challenges in Transfer Learning in NLP MADRID NLP MEETUP 29th May 2019 Lara Olmos Camarena ‘Except as otherwise noted, this material Olmos Camarena, Lara (2019). “Challenges in Transfer Learning in NLP” is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
  • 2. Contents Introduction ‱ Motivation ‱ Evolution ‱ First definitions, challenges, frontiers ‱ Formal definition Word embeddings ‱ Word embedding: typical definition ‱ Pre-trained word embedding ‱ Conceptual architecture ‱ Training word embeddings ‱ Evaluation of word embeddings Next steps! ‱ ÂżWhy? Word embedding limitations and new directions ‱ Universal Language Model Fine-tuning ‱ Transformer and BERT Appendix ‱ Frameworks related ‱ NLP&DL: Education ‱ More references and papers
  • 4. Motivation Natural Language Processing challenges word enrinchment and training meaningfull representation requires so much effort enterprise tasks face with specific and domain data of different sizes (dificulties with small!) text annotation is time- consuming Machine Learning models suffer with vectorial spaces models (VSM) built with tokens (bag of words) because of it is high dimensional and sparse. loose of sequential and hierarquical information Loose of relations between words and semantics Deep Learning and Transfer Learning! Transferring information from one machine learning task to another. Not necessary to train from zero, save effort and time. Transfer learning might involve transferring knowledge from the solution of a simpler task to a more complex one, or involve transferring knowledge from a task where there is more data to one where there is less data. Most machine learning systems solve a single task. Transfer learning is a baby step towards artificial intelligence in which a single program can solve multiple tasks. Transfer learning definition from: https://developers.google.com/machine-learning/glossary/ More on Transfer Learning! https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a
  • 5. Evolution  Language modelling, sometimes with neural network methods, to solve this problem: low dimensional vectors (embeddings) for textual units selected: words, phrases or documents.  Words embeddings obtained as output of different models and algorithms:  Random: uniform, Xavier  Predictive models: word2vec (skip-gram, CBOW)  Count-based or coocurrences: GloVe  Deep neural network models with different blocks trained: CNN, RNN, LSTM, biLSTM  Recent and more complex ones: subword information FastText, biLSTM based ELMo  New perspectives in transfer learning in NLP:  Use pre-trained word embedding to initialize word vectors in embedding layers or other tasks  Use pre-trained language models to directly represent text. Prodigy (spaCy), Zero-shot learning for classifiers (Parallel Dots), Universal Sentence Encoder (Google), ULMFit, BERT and OpenAI Transformer to use later in more complex or task specific models/systems. NEW!
  • 6. First definitions, challenges, frontiers Initial context: Neural network models in NLP, catastrophic forgetting and frontiers ‱ A review of the recent history of natural language processing - Sebastian Ruder ‱ Neural language models (2001), multi-task learning (2008), Word embeddings (2013), seq2seq (2014), ‱ 2018-2019: pretrained language models ‱ Fighting with catastrophic forgetting in NLP - Matthew Honnibal (2017) ‱ Frontiers of natural language processing Transfer learning in NLP divulgation ‱ RodrĂ­guez, Horacio. Embeddings, Deep Learning, Distributional Semantics en el Procesamiento de la lengua, ... ÂżmĂĄs que marketing? ‱ NLP’s ImageNet moment ‱ ULMFiT, ELMo, and the OpenAI transformer is one key paradigm shift: going from just initializing the first layer of our models to pretraining the entire model with hierarchical representations. If learning word vectors is like only learning edges, these approaches are like learning the full hierarchy of features, from edges to shapes to high-level semantic concepts. ‱ Efective transfer learning for NLP – Madison May – Indico ‱ Transfer learning with language models – Sebastian Ruder ‱ T3chFest – Transfer learning for NLP and Computer Vision - Pablo Vargas Ibarra, Manuel LĂłpez Sheriff – Kernel Analytics
  • 7. Formal definition  Neural Transfer Learning for Natural Language Processing – Sebastian Ruder  Previous work found: “Transfer learning for Speech and Language Processing” https://arxiv.org/pdf/1511.06066.pdf
  • 9. Word embedding: typical definitions Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with one dimension per word to a continuous vector space with a much lower dimension. Methods to generate this mapping include neural networks, dimensionality reduction on the word co-occurrence matrix, probabilistic models, explainable knowledge base method, and explicit representation in terms of the context in which words appear. Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as syntactic parsing and sentiment analysis. (Ref.: Wikipedia) A word embedding is a learned representation from text where words that have the same meaning have a similar representation. (Ref.: Machine Learning Mastering) The Distributional Hypothesis is that words that occur in the same contexts tend to have similar meanings (Harris, 1954). The underlying idea that "a word is characterized by the company it keeps" was popularized by Firth (1957), and it is implicit in Weaver's (1955) discussion of word sense disambiguation (originally written as a memorandum, in 1949). The Distributional Hypothesis is the basis for Statistical Semantics. Although the Distributional Hypothesis originated in Linguistics, it is now receiving attention in Cognitive Science (McDonald and Ramscar, 2001). The origin and theoretical basis of the Distributional Hypothesis is discussed by Sahlgren (2008). (Ref: ACL Wiki)
  • 10. Pre- trained word embedding  Word embedding helps to capture vocabulary from a corpus (f.e.: public or web corpora) not seen in the task specific language or limited time training and used in general common language.  The representation allows to establish lineal relations and capture similarities between words from sintax and contexts seen in the language and domain where the word embedding was trained. Be careful with bias!  If pre-trained with public and external corpus, some words will be not recognized (unknown word problem), when fighting with jargon and argots! Also ortographic errors detected!
  • 11. Conceptual Architecture (common offline/training and online/predict/execution time) Representation obtained with Word embedding client 0.29, 0.53, -1.1, 0.23, 
, 0.43 branch 0.39, 0.49, -1.2, 0.42, 
, 0.98 Text entry processed (sentence, Short text, Long text) Neural network model Unsupervised model Classifiers (
) [ [ 0.29, 0.53, -1.1, 0.23, 
, 0.43], 
, [0.39, 0.49, -1.2, 0.42, 
, 0.98] ] Mapping / look-up table Vocabulary Word vectors Response for specific task Evaluation EvaluationEvaluation Concatenation as input (for example)
  • 12. Training word embeddings  Different Word embedding trained for benchmarking – perspective of pre- trained initialization of embedding layer for our neural network model  Initialization with random distributions: uniform, Xavier  Trained with word2vec  Trained with GloVe  Trained with Tensorflow with NCE estimation  Using different corpora for training:  public corpora (example projects like Billion Words)  force-context phrases created for supervised synonyms  domain specific document corpus  real text users/industry dataset
  • 13. GloVe example Trained with force-contexts phrases built with vocabulary Vocabulary size = 856, window size with = 3, vector size = 50 word2vec example Corpus: force context phrases, Vocabulary size: 856 words Algorithm: word2vec, Word vector size: 50
  • 14. Evaluation of word embeddings  Intrinsic proofs: discovering internal word synonyms. Synonyms trained, are still synonyms after embedding? Which embedding improves similarity for our task?  Cualitative evaluation: word kneighbours clusters visualization (t-SNE and PCA), Brown clustering  Cuantitative evaluation: similarity test with cosine function and thresold using  Extrinsic proofs: testing Word embedding improvement when changing initialization for fix in specific task model.  Dataset, domain and task dependent  NLP processing pipeline must match the input sentence and word embedding construction (stopword removal, normalization, stemming...) Enso Benchmark - https://github.com/IndicoDataSolutions/EnsoMORE?
  • 16. ÂżWhy? Word embedding limitations  Handling out of vocabulary words or unknown words  Solved with subword level embeddings (character and n-gran based based): FastText embedding  Unique representation for word, problem with polysemy and word senses!  Solved with Elmo: Deep contextualized word representations  Word embedding domain adaptation  Lack of studies to solve bias and vectorial space studies, also in evaluation  Research transition from feature-based to fine-tuning complex pre-trained Deep Learning architectures for NLP principal tasks: classification, inference, question answering, natural language generation
  Started in language modelling: sentence embedding like SpaCy, Universal Sentence Encoder (DAN and Transformer)  State of the art: ULMFit, Transformer, BERT, Open AI GPT
  Challenges with Deep Learning dificulties: explainability of neural network models, its representation knowledge learned and to use and adapt in real business problems ethically! New directions and challenges
  • 17. Universal Language Model Fine-tuning Neural Transfer Learning for Natural Language Processing – Sebastian Ruder Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).
  • 18. Transformer https://ai.google/research/pubs/pub46201 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems(pp. 5998-6008). https://arxiv.org/pdf/1706.03762.pdf Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://arxiv.org/pdf/1810.04805.pdf TO START: http://jalammar.github. io/illustrated-bert/ BERT
  • 19. Conclusions Thank you for your attention! Any questions?
  • 21. Frameworks related  Gensim: https://radimrehurek.com/gensim/models/word2vec.html  Glovepy: https://pypi.org/project/glovepy/  SpaCy: https://spacy.io/  FastText – Facebook Research: https://github.com/facebookresearch/fastText  Universal Sentence Encoder - Tensorflow: https://tfhub.dev/google/universal-sentence-encoder/2  AllenNLP - ELMo - http://aclweb.org/anthology/N18-1202  https://allennlp.org/elmo  Enso - https://github.com/IndicoDataSolutions/Enso  BERT - https://github.com/google-research/bert
  • 22. NLP&DL: Education  Stanford University - Christopher Manning – Natural Language Processing with Deep Learning  https://www.youtube.com/watch?v=OQQ- W_63UgQ&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6  http://cs224d.stanford.edu/  http://web.stanford.edu/class/cs224n/  University of Oxford - Deep Natural Language Processing  https://github.com/oxford-cs-deepnlp-2017/lectures  Bar Ilan University’s - Yoav Goldberg - Senior Lecturer Computer Science Department NLP Lab - http://u.cs.biu.ac.il/~nlp/  Book: Neural Network Methods for Natural Language Processing  Universitat PolitĂšcnica de Catalunya – Horacio RodrĂ­guez - Embeddings working notes - http://www.cs.upc.edu/~horacio/docencia.html http://www.lsi.upc.edu/~ageno/anlp/embeddings.pdf
  • 23. More references and papers  Academic material:  Ruder, Sebastian. On word embeddings. http://ruder.io/word-embeddings-2017/, http://ruder.io/word-embeddings-1/  Jurafsky, Daniel, & Martin, James H (2017). Speech and Language Processing. Draft 3Âș Edition. Chapter 15-16. Vector Semantics and Semantics with Dense Vectors  Goldberg, Yoav (2017). Neural Network Methods for Natural Language Processing (Book)  Word embedding papers:  Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.  Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.  Le, Quoc, and Mikolov, Tomas. "Distributed representations of sentences and documents." International conference on machine learning. 2014.  Pennington, Jeffrey, Richard Socher, and Christopher Manning. "GloVe: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.  Mnih, Andriy, and Koray Kavukcuoglu. "Learning word embeddings efficiently with noise-contrastive estimation." Advances in neural information processing systems. 2013.  Schnabel, Tobias and Labutov, Igor (2015). Evaluation methods for unsupervised word embeddings.  Wang, Yanshan and Lui, Sijia (2018). A Comparison of Word Embeddings for the Biomedical Natural Language Processing. arXiv:1802.00400v3  Next steps!  http://jalammar.github.io/illustrated-bert/  Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).  Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.