Comparison of Transfer-Learning Approaches for Response Selection in Multi-Turn Conversations

1
Comparison of Transfer-Learning
Approaches for Response
Selection in Multi-Turn
Conversations
© 2019 PARC All Rights Reserved
Jesse Vig and Kalai Ramea
DSTC 7 Workshop, AAAI ‘19
Honolulu, HI
Jan. 27, 2019

Recent Trends in
Transfer Learning for NLP

Language Representation
Task Model
Two components of an NLP model

Task 1 Model
Task 2 Model
Language representation is common across
tasks, enabling transfer learning.

Task Model
2013
Language
Representation
Task Model
Present
Deeper language representations support greater transfer

Task Model
92%
negative
3%
neutral
5%
positive
This film really dragged
Sentiment
Bag of Words (<2013)
Language
Representation
Model
(unigrams,
bigrams)

Bag of Words (<2013)
Input text:
What model sees:
Words as symbols

aardvark
abstain
abstract
ablation
…
Task Model
…
[I] [pet] [the] dog [and] [it] [barked]
Context
Center Word
Context
Sentiment
Word2Vec / GloVe / FastText
Word embeddings (2013-)
92%
negative
3%
neutral
5%
positive

Input text:
What model sees:
(This)
(film)
(really)
(dragged)

Input text:
What model sees:
(This)
(film)
(really)
(dragged)
The movie was boring
(movie)
(was)
(boring)
(The)
Similar words are close
in embedding space

ELMo, CoVe
USE, GPDS, InferSent)
Word:
(Sentence:
ELMo
Contextual embeddings (2017-)
from “NLP’s ImageNet Moment Has Arrived”
Task Model
92%
negative
3%
neutral
5%
positive

Input text:
What model sees:
(This)
(film)
(really)
(dragged)
Word embedding
dependent on other
words in sentence

Input text:
What model sees:
(This)
(film)
(really)
(dragged)
(movie)
(was)
(boring)
(The)
The movie was boring
In this context, boring and
dragged are semantically similar

Task head
92%
negative
3%
neutral
5%
positive
The quick brown fox
The quick brown ?
Fine-tuning (2018)
ULMFiT, OpenAI GPT, BERT, LM-LSTM (2015)
LM head

Task head
92%
negative
3%
neutral
5%
positive
The quick brown fox
The quick brown ?
Fine-tuning (2018)
ULMFiT, OpenAI GPT, BERT, LM-LSTM (2015)
LM head
Almost entire model is pre-trained and
fine-tuned, with a thin task-specific layer.
Result is higher accuracy and reduced
training data requirements.

Comparison of Transfer Learning
Approaches for DSTC7, Track 1

Student:
Partial dialog
I’m looking for an
engineering course
How about CS481?
I already have a heavy
course load
Student:
Advisor:
I suggest CS425
Advisor:
Candidate responses
+ 95 more
Nice talking to you
CS221 is not too demanding
Hi, how can I help you?
What are you interested in?
…
1
I suggest CS425
Ranking
model
2
100
…
Ranked responses

Student:
Partial dialog
I’m looking for an
engineering course
How about CS481?
I already have a heavy
course load
Student:
Advisor:
I suggest CS425
Advisor:
Candidate response
+ 95 more
Nice talking to you
What are you interested in?
Classifier
Correct
0.87
Incorrect
0.13
Response ranking as classification problem

Pre-trained, Fine-tuned
Models Evaluated
OpenAI GPT** BERT***Multi-turn ESIM+ELMo (MT-EE)
Multi-Turn ESIM*
Feature-based
*Enhanced Sequential Inference Model
Submitted Model
**Generative Pretrained Transformer ***Bidrectional Encoder Representations
from Transformers

Multi-Turn ESIM+ELMo (MT-EE)
Position-based weighted sum
Classifier
utterance0 response
ELMo
ESIM-
Speaker-specific feedforward+
utteranceT-1 response
ELMo
ESIM-
Speaker-specific feedforward +
α0 αT-1
Context utterances

OpenAI GPT BERT
• Unidirectional self-attention
• Standard language model pretraining
• BooksCorpus
• Designed for arbitrary text inputs
• Single sentence, two sentences,
multiple choice, etc.
• Accomplished through delimiter
tokens
• Bidirectional self-attention
• “Masked” language model pretraining
• BooksCorpus + Wikipedia
• Optimizations for sentence pairs
• Architecture
• Segment embedding
• Pre-training
• Next sentence prediction

Student:
Dialog Context
Do I need to study any
math courses?
Advisor:
Candidate response
You have completed all math
required for your degreeBERT
?

[CLS] do I need to study any math courses ? [SEP] you have completed all math required for your degree . [SEP]
BERT-Base

SentenceA
(context)
SentenceB
(response)
SentenceA
SentenceB
BERT-Base pretrained model (not fine-tuned).
Already knows to attend between sentences,
likely due to NSP pretraining task.

Resources
BERT visualization tool:
https://github.com/jessevig/bertviz
Tutorials
http://jalammar.github.io/illustrated-bert/
http://jalammar.github.io/illustrated-transformer/
https://t.co/ZB9049OPwH
Code:
Blog post:
Paper
http://workshop.colips.org/dstc7/papers/17.pdf

Comparison of Transfer-Learning Approaches for Response Selection in Multi-Turn Conversations

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (6)

Ähnlich wie Comparison of Transfer-Learning Approaches for Response Selection in Multi-Turn Conversations

Ähnlich wie Comparison of Transfer-Learning Approaches for Response Selection in Multi-Turn Conversations (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Comparison of Transfer-Learning Approaches for Response Selection in Multi-Turn Conversations