Chat bot made by the chainer
chainer is the neural network framework
Japanese sentence remove. I don't know the reason why removing the Japanese sentence
This presentation material is the Pycon 2016
*******************************************
大串 正矢
Ogushi Masaya
Twitter:https://twitter.com/SnowGushiGit
Qiita:http://qiita.com/GushiSnow
Github:https://github.com/SnowMasaya
*******************************************
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Chat bot made by the chainer
1. Chat Bot Made by Chainer
Chainer is a Neural Network Framework
PyCon JP 2016
Masaya Ogushi
1
2. Attention
I will not show any Mathematical formula
If you understand the machine learning
model, I recommend to read the paper in
the last page
2
3. Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
3
Chat Bot
Choose
the Topic
Understand
Contents Control
Dialogue
Generate
Answer
Answer
Candidates
Neural Network1
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network 2
Question
Question Answer
5. Self Introduction
Name:Masaya Ogushi
@SnowGushiGit
PORT. Inc
Web Development
Research and Development team
Tech-Circle staff
machine learning, Natural Language
Processing, Crawler Dev, Automatic
Infrastructure Construction, parallel
processing, SearchFunction
5
9. Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
9
Chat Bot
Choose
the Topic
Understand
Contents Control
Dialogue
Generate
Answer
Answer
Candidates
Neural Network1
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network 2
Question
Question Answer
11. Dialogue Value
Continuously
It is possible to use the
prior conversation’s
information
11
I really love to
play the tennis
Chat Bot
That’s sounds
great
Well,
a friend of mine owns
a sports shop and is
looking for help.
find the part time job
candidates
Yeah well
I looking for the
part time job
Do you know the
any good ones?
12. Dialogue Value
Interactive
It is possible to react to
new information
12
I found the
delicious sweets
I am on a diet
Don’t say such a
things during the
I’m on a diet
This food is a
good for losing
the weight
Really ???
Chat Bot
16. Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
16
Chat Bot
Choose
the Topic
Understand
Contents Control
Dialogue
Generate
Answer
Answer
Candidates
Neural Network1
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network 2
Question
Question Answer
19. Which looks and character is a big gap ?
19
Character of the Bot
I read the
“Statistical Machine
Translation”
I read the
“Statistical Machine
Translation”
20. Character of the Bot
I recommend
Matsuya
Do you know any
good part time
jobs
No, something
more suited to
me
How about a
sweets shop ?
Sounds Good
Character is very important
20
Food Shop
Maybe
Sweets are
the better
Looks Funny
Uhhh.. He
only think
about food.
You understand
it better than I
expected
Learning
allowance
Surprise
Chat Bot
21. Character of the Bot
Improve the new use
experiences
Decreasing the expected
value of the answer and
Became easer to talk to
Image of the Icon
Conversation of the
Bot
21
Cognitive Science
Expect value of the
answer:High
Easy to talk:Low
Expect value of the
answer:Low
Easy to talk:High
Feel free to talk
to me
Feel free to talk
to me
Talk it
22. Character of the Bot
Preparing the sentences
for each character is costly
We have to change a little
in the same sentences
*The Example uses the
different levels of
politeness in Japanese, the
nuances of which are hard
to translate into English
22
あなたは食べたパン
の数を
把握されていますか?
お前は食ったパンの
数を
覚えているか?
あなたは食べたパン
の数を
覚えているの?
24. Character of the Bot
We would like to change
the Character but not
change the contents
24
あなたは食べたパン
の数を
把握されていますか?
お前は食ったパンの
数を
覚えているか?
あなたは食べたパン
の数を
覚えているの?
25. Character of the Bot
25
Woman
Fat Man
Steward
Do you remember the
number of the breads ?
Add the
Character
26. Character of the Bot
NeuralStoryTeller
Add the Character into the normal sentences。Add the romantic elements below
26
27. 27
Character of the Bot
It is possible to apply to a variety of situations, if we
prepare the characters sentences
28. 28
Character of the Bot
I’m sorry.
I can’t implement this characters function.
29. Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
29
Chat Bot
Choose
the Topic
Understand
Contents Control
Dialogue
Generate
Answer
Answer
Candidates
Neural Network1
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network 2
Question
Question Answer
31. Agenda
:
System Architecture
Dialogue Interface is Slack
Prepare the conversation data form Twitter
Pre training use the Wikipedia Data and the Dialogue Breakdown Collection
Choose the TOPIC by using the WordNet and WikiPedia Entity Vector
Dialogue model made by Chainer
Question and Answer functionally uses Elasticsearch
31
Chat Bot
Choose
the Topic
Understand
Contents Control
Dialogue
Generate
Answer
Answer
Candidates
Neural Network1
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network 2
Question
Question Answer
WordNet
WikiPediaVector
32. Agenda
:32
Chat Bot
Choose
the Topic
Understand
Contents Control
Dialogue
Generate
Answer
Answer
Candidates
Neural Network1
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network 2
Question
Question Answer
WordNet
WikiPediaVector
Choose the Topic
33. System
Choose the Topic
Conversation contents is changed by the someone
33
Which is
more important
to you
me or your job
Please tell
me where
you bought
your clothes
Please give
me a money
Boy
Friend
Young
Sister
Father
34. System
Choose the Topic
Word Net
Data set is the grouped into set of cognitive synonyms, each
expressing a distinct concept
34
Scottish hold Black cat Orange Cat
Cat
37. System
The way of grouping
Mapping the concept space
Grouping by distance
37
cat
cat
cat
Tiger
38. System
Mapping the concept space
38
Facebook
Twitter
Close??
We could not understand
the distance comparing
each words
We have to map the
word to space, which
makes it possible to
measure the distance
39. System
Mapping the concept space
Entity Linking
Mapping the Keyword to the Knowledge space
39
Facebook
Twitter
Close??
Knowledge
Space Facebook
Twitter
SNS
40. System
Choose the Topic
Mapping the concept to the knowledge space
Japanese WikiPedia Entity Vector !!!
Vector representations of Words and WikiPedia(Knowldge)
(Wikipedia is the called the Entity)
40
43. System
Measuring the Concept Distance
43
Choose the appropriate
measure for the
distance in mapping
space
If we make a mistake
choosing the
measuring of the
distance.
It
looks yellow
is close
Light blue is
closer than
yellow
47. System
Choose the Topic
Add the Unknown words of the WordNet from the Wikipedia Entity Vector.
47
Black Cat
White Cat
:
calico cat
:
CatWikipedia Entity
Vector
Close the
Cosine
Similarity
Add the
Unknown words
Word Net
48. System
Choose the Topic
Calculate the each concept Average vector
48
Black Cat:[0.2, 0.3, 0.4…]
White Cat:[0.1, 0.3, 0.…]
:
Cat
Shiba:[0.1, 0.3, 0.4…]
Tosa:[0.1, 0.2, 0.…]
:
Dog
Average Vector
Average Vector
49. System
Choose the Topic
If the average vector is close to each concept, group them by concept
49
Black Cat:[0.2, 0.3, 0.4…]
White Cat:[0.1, 0.3, 0.…]
:
Cat
Shiba:[0.1, 0.3, 0.4…]
Tosa:[0.1, 0.2, 0.…]
:
Dog
Average Vector
Average Vector
grouping the each
concept
51. System
Choose the Topic
Choose the concept from over the 1000 words. It is easy to match the phrase.
51
Black Cat
White Cat
:
Cat
Shiba
Tosa
:
Dog
swan
duck
:
Bird
koala
Koala
Choosing the Concept
53. System
Choose the Topic
The way of the choosing the dialogue
Choose the each concept by the word match rate
53
Where
can I buy
cute
clothes ?
Boy Friend
Cool
Nice guy
:
Young Sister
Cute
Clothes
:
Father
money
gentle
:
Calculate the
word match
rate
54. Agenda
:54
Chat Bot
Choose
the Topic
Understand
Contents Control
Dialogue
Generate
Answer
Answer
Candidates
Neural Network1
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network 2
Question
Question Answer
WordNet
WikiPediaVector
Understand Contents
Control Dialogue
Generate the Answer
59. 59
System
Mapping natural language to the vector space using the Bag of
words
(Prepare the Dictionary and Count the word in the dictionary)
Low High
Word Phrase sentenceExpression
I show am me your you … when are
1, 0, 0, 0, 0, 0, … 0, 0I
am
Shota
I show am me your you … when are
0, 0, 1, 0, 0, 0, … 0, 0
I show am me your you … when are
0, 0, 0, 0, 0, 0, … 0, 0
It is rate time to use it, but over the million wordsData
61. 61
System
Deep Learning is an efficient method for learning high-quality
distributed vector representations that capture a large
number of precise syntactic and semantic word relationships
Low High
Word Phrase sentenceExpression
I
am
Shota
Distributed representations of words in a vector
space by the Deep leaning
Data
Deep Learning
0.5, 0.0, 1.0, 1.0, 0.3, 0.0
0.5, 0.0, 1.0, 1.0, 0.0, 0.0
0.5, 0.0, 1.0, 0.5, 0.3, 0.0
65. 65
System
+
太郎 さん こんにちは
Focus is important for important phrasing.
A Attention Model(Neural Network) considers which are
the focus words
66. System
Value of the Neural Network
Expression
Continuously
Focus
66
Which is more
important
to you
me or your job
Please tell
me where
you bought
your clothes
Please give
me money
Boy
Friend
Young
Sister
Father
69. 69
System
Mapping the Phrases to a neural network space.
The middle layer express a neural network space.
太郎 さん こんにちは
太郎:1
さん:0
こんにちは:0
:
太郎:1
さん:0
こんにちは:0
:
70. 70
System
Continuously learn from phrases
0
0
0
0
1
:
0 output
layer
さん
こんにちはhidden
layer
太郎の時
の
隠れ層
Transform Matrix
Copy the past value
太郎 さん こんにちは
78. System
It is very simple to decide
Is there a question mark (?)
If you interested in detecting questions, I recommend you read the paper below
Li, Baichuan, et al. "Question identification on twitter." Proceedings of the 20th ACM international conference on Information
and knowledge management. ACM, 2011.
78
Where
can I buy
cute
clothes ?
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network1
Question Answer
Please tell
me where
I can find
cute
clothes
80. Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
80
Chat Bot
Choose
the Topic
Understand
Contents Control
Dialogue
Generate
Answer
Answer
Candidates
Neural Network1
Understand
Contents Control
Dialogue
Generate
Answer
Neural Network 2
Question
Question Answer
82. Feature Plan
Prepare the enough test
Not Enough test code
Evaluation
F measure
Apply the latest Chainer
I hear the Trainer function is good
Rule base and Neural Network
NeuralStoryTeller
Add the character
82
83. Conclusion
Word Net is a Concept Dataset
It is possible to find other data which express the concept
Mapping words to Vector space using Wikipedia Entity Vectors
We make the Vector spaces using our own data set
Hybrid function (Neural Network and Rule based)
Please search github for “Chainer Slack Twitter”
Please give me a star
I prepare the Docker Container
please search for “Docker hub Chainer-Slack-Twitter-Dialogue”
83
85. Reference
• Chainerで学習した対話用のボットをSlackで使用+Twitterから学習データを取得してファインチューニン
• http://qiita.com/GushiSnow/items/79ca7deeb976f50126d7
• WordNet
• http://nlpwww.nict.go.jp/wn-ja/
• 日本語 Wikipedia エンティティベクトル
• http://www.cl.ecei.tohoku.ac.jp/~m-suzuki/jawiki_vector/
• PAKUTASO
• https://www.pakutaso.com/
• Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural
machine translation." arXiv preprint arXiv:1508.04025 (2015).
• Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence
summarization." arXiv preprint arXiv:1509.00685 (2015).
• Tech Circle #15 Possibility Of BOT
• http://www.slideshare.net/takahirokubo7792/tech-circle-15-possibility-of-bot
• Generating Stories about Images
• https://medium.com/@samim/generating-stories-about-images-d163ba41e4ed#.h80qhbd54
• 二つの文字列の類似度
• http://d.hatena.ne.jp/ktr_skmt/20111214/1323835913
• Li, Baichuan, et al. "Question identification on twitter." Proceedings of the 20th ACM international conference
on Information and knowledge management. ACM, 2011.
• 音源:スカイウォーキング
• http://dova-s.jp/bgm/download5052.html
• 音源:get into the rhythm
• http://dova-s.jp/bgm/download5145.html
• 構文解析
• http://qiita.com/laco0416/items/b75dc8689cf4f08b21f6
85