4. Language Representation
Task 1 Model
Language Representation
Task 2 Model
Language representation is common across
tasks, enabling transfer learning.
7. Bag of Words (<2013)
Input text:
What model sees:
This film really dragged
Words as symbols
8. aardvark
abstain
abstract
ablation
…
Task Model
…
This film really dragged
[I] [pet] [the] dog [and] [it] [barked]
Context
Center Word
Context
Sentiment
Word2Vec / GloVe / FastText
Word embeddings (2013-)
92%
negative
3%
neutral
5%
positive
9. Input text:
What model sees:
This film really dragged
(This)
(film)
(really)
(dragged)
Word embeddings (2013-)
10. Input text:
What model sees:
This film really dragged
(This)
(film)
(really)
(dragged)
The movie was boring
(movie)
(was)
(boring)
(The)
Word embeddings (2013-)
Similar words are close
in embedding space
11. ELMo, CoVe
USE, GPDS, InferSent)
Word:
(Sentence:
ELMo
Contextual embeddings (2017-)
This film really dragged
from “NLP’s ImageNet Moment Has Arrived”
Task Model
92%
negative
3%
neutral
5%
positive
12. Input text:
What model sees:
(This)
(film)
(really)
(dragged)
This film really dragged
Contextual embeddings (2017-)
Word embedding
dependent on other
words in sentence
13. Input text:
What model sees:
(This)
(film)
(really)
(dragged)
(movie)
(was)
(boring)
(The)
This film really dragged
The movie was boring
In this context, boring and
dragged are semantically similar
Contextual embeddings (2017-)
14. This film really dragged
Task head
92%
negative
3%
neutral
5%
positive
The quick brown fox
The quick brown ?
Fine-tuning (2018)
ULMFiT, OpenAI GPT, BERT, LM-LSTM (2015)
LM head
15. This film really dragged
Task head
92%
negative
3%
neutral
5%
positive
The quick brown fox
The quick brown ?
Fine-tuning (2018)
ULMFiT, OpenAI GPT, BERT, LM-LSTM (2015)
LM head
Almost entire model is pre-trained and
fine-tuned, with a thin task-specific layer.
Result is higher accuracy and reduced
training data requirements.
17. Student:
Partial dialog
I’m looking for an
engineering course
How about CS481?
I already have a heavy
course load
Student:
Advisor:
I suggest CS425
Advisor:
Candidate responses
+ 95 more
Nice talking to you
CS221 is not too demanding
Hi, how can I help you?
What are you interested in?
CS221 is not too demanding
…
Hi, how can I help you?
1
I suggest CS425
Ranking
model
2
100
…
Ranked responses
18. Student:
Partial dialog
I’m looking for an
engineering course
How about CS481?
I already have a heavy
course load
Student:
Advisor:
I suggest CS425
Advisor:
Candidate response
+ 95 more
Nice talking to you
CS221 is not too demanding
Hi, how can I help you?
What are you interested in?
Classifier
Correct
0.87
Incorrect
0.13
Response ranking as classification problem
19. Pre-trained, Fine-tuned
Models Evaluated
OpenAI GPT** BERT***Multi-turn ESIM+ELMo (MT-EE)
Multi-Turn ESIM*
Feature-based
*Enhanced Sequential Inference Model
Submitted Model
**Generative Pretrained Transformer ***Bidrectional Encoder Representations
from Transformers
21. OpenAI GPT BERT
• Unidirectional self-attention
• Standard language model pretraining
• BooksCorpus
• Designed for arbitrary text inputs
• Single sentence, two sentences,
multiple choice, etc.
• Accomplished through delimiter
tokens
• Bidirectional self-attention
• “Masked” language model pretraining
• BooksCorpus + Wikipedia
• Optimizations for sentence pairs
• Architecture
• Segment embedding
• Pre-training
• Next sentence prediction
22.
23. Student:
Dialog Context
Do I need to study any
math courses?
Advisor:
Candidate response
You have completed all math
required for your degreeBERT
?
24. [CLS] do I need to study any math courses ? [SEP] you have completed all math required for your degree . [SEP]
BERT-Base