1. Building a dialogue system
using a generative model
2020/06/25 1
M1 Kento Tanaka
(⽣成モデルに基づく対話システムの構築)
2. Background
2020/06/25 2
Introduction
▶ Users are relying on systems able to support an interaction for
searching information (Siri, Alexa, etc.) [Zhou+, 2018]
▶ The use of NN has led to a flurry of research on large-scale,
non-task-oriented DS. [Sordoni+, 2015]
Goal
Create a smooth and sociable dialogue system
3. 2020/06/25 3
Introduction (What is ‘good’ chatbot?)
▶ One crucial step in the development of DS is evaluation.
[Deriu, 2019]
Human evaluations:
・High accuracy but expensive
Automatic evaluations:
・Low accuracy but cheap
・Hard to scale
・Metrics from MT (to compare a generated response to a target.)
Very weakly correlation with human judgements.
4. 2020/06/25 4
Related works (Word overlap-based Metrics)
▶ BLEU-N [Liu, 2016]
▶ ROUGE-L [Liu, 2016]
・Analyze the co-occurrences of n-grams
- tgt : I work on machine learning.
- pred : He works on machine learning.
・BP : Penalizing sentences that are
too short
・It is a F-measure based on the
LCS(Longest Common Subsequence)
5. 2020/06/25 5
Related works (Embedding-based Metrics)
▶ Embedding Average [Liu, 2016]
▶ Vector Extrema [Liu, 2016]
▶ Greedy Matching [Liu, 2016]
・Calculate sentence-level embedding.
・Calculate sentence-level embedding.
・Average of the cosine similarity of the
words with the highest cosine similarity.
6. 2020/06/25 6
Method.1
▶ OpenNMT
- is an open source ecosystem for neural machine translation
and neural sequence learning.
▶ Dataset
- Training data : twitter 1.2M pairs.
- Test data : twitter 100 pairs.
7. 2020/06/25 7
Method.2 (without proper noun)
▶ OpenNMT
- is an open source ecosystem for neural machine translation
and neural sequence learning.
▶ Dataset
- Training data : twitter 0.7M pairs.
- Test data : twitter 100 pairs.
▶ Preprocessing
- Removing proper nouns from a dataset.
- Conversion to Kansai-ben based on rules at the end of words.
8. 2020/06/25 8
Method.3 (Considering context)
▶ OpenNMT
- is an open source ecosystem for neural machine translation
and neural sequence learning.
▶ Dataset
- Training data : twitter 1.5M triple sets.
- Test data : twitter 100 triple sets.
▶ Preprocessing
- Removing proper nouns from a dataset.
- Conversion to Kansai-ben based on rules at the end of words.
▶ Context
- Learning with three sets of data.
「晩ご飯どうする?」 & 「ハンバーグはどう?」Input:
Output: 「昨日も食べたやん!カレーがええなぁ。」
9. 2020/06/25 9
Evaluation
Human evaluations:
Automatic evaluations:
Grice’s Maxims Conversation
[Grice, 1975]
1. Quality
2. Quantity
3. Relation
4. Manner
・Adaptability of dialogue
・Informative
・Completeness of utterance
・Context considerations
Evaluation criteria
Embedding Average
▶ Grading on a 5-point scale
▶ Grading by 4 people
10. 2020/06/25 10
Result
Human evaluations
Automatic
evaluations
Adaptability Informative Completeness Context Embedding Average
Model1 3.045 2.185 3.195 2.45 0.51555
Model2 2.94 2.05 2.97 2.285 0.52623
Model3 3.11 3.18 2.92 2.58 0.93575
Table1. Human evaluations and automatic evaluations
・Increased input have anything to do with it.
・Model3 is the best in embedding avg.
11. 2020/06/25 11
Conclusion
▶ Created a generation-based dialogue system.
▶ Low adaptability
Increase the amount of good quality data
▶ Yielded commonplace responses.
Ideal: more diverse, interesting, and appropriate responses.
▶ Automatic evaluations that are highly correlated with human
judgment are needed.