Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking
1. Exploring Convolutional and Recurrent Neural Networks
in Sequential Labelling for Dialogue Topic Tracking
Seokhwan Kim, Rafael E. Banchs, Haizhou Li
Human Language Technology Department, Institute for Infocomm Research (I2
R), Singapore
Dialogue Topic Tracking
Categorizing the topic state at each time step
f(t) =
B-{c ∈ C} if ut is at the beginning of a segment belongs to c,
I-{c ∈ C} else if ut is inside a segment belongs to c,
O otherwise,
Examples of dialogue topic tracking
Speaker Utterance Topic
Guide How can I help you? B - OPEN
Tourist Can you recommend some good places to visit in Singa-
pore?
B - ATTR
Guide Well if you like to visit an icon of Singapore, Merlion will be a
nice place to visit.
I - ATTR
Tourist Okay. But I’m particularly interested in amusement parks. B - ATTR
Guide Then, what about Universal Studio? I - ATTR
Tourist Good! How can I get there from Orchard Road by public
transportation?
B - TRSP
Guide You can take the red line train from Orchard and transfer
to the purple line at Dhoby Ghaut. Then, you could reach
HarbourFront where Sentosa Express departs.
I - TRSP
Tourist How long does it take in total? I - TRSP
Guide It’ll take around half an hour. I - TRSP
Tourist Alright. I - TRSP
Guide Or, you can use the shuttle bus service from the hotels in
Orchard, which is free of charge.
B - TRSP
Tourist Great! That would be definitely better. I - TRSP
Guide After visiting the park, you can enjoy some seafoods at the
riverside on the way back.
B - FOOD
Tourist What food do you have any recommendations to try there? I - FOOD
Guide If you like spicy foods, you must try chilli crab which is one of
our favourite dishes.
I - FOOD
Tourist Great! I’ll try that. I - FOOD
Model 1: Convolutional Neural Networks (CNNs)
Convolutional neural network architecture for dialogue topic tracking
ut-1
ut
ut-2
ut-h+1
…
Input utterances
within window size h
Embedding layer with three different channels
for current, previous, and history utterances
Convolutional layer
with multiple kernel sizes
Max pooling
layer
Dense layer
w softmax output
Representing an utterance as a matrix with n rows of k-dimensional word vectors
Each input has three channels for the current, previous, and the history utterances
A convolutional filter has the same width k and a window size m as its height
The maximum value is selected from each feature map in max pooling layer
The values from max pooling are forwarded to the fully-connected softmax layer
Model 2: Recurrent Neural Networks (RNNs)
Recurrent neural network architecture for dialogue topic tracking
ut-h+1
…
ut-2
ut-1
ut
Inputs Utterance-level
embedding layer
sf
t-h+1
sf
t-2
sf
t-1
sf
t
Forward
layer
sb
t-h+1
sb
t-2
sb
t-1
sb
t
Backward
layer
yt-h+1
…
yt-2
yt-1
yt
Output
labels
Each utterance is represented with k-dimensional pre-trained embeddings
A sequence of the utterance vectors within h time steps are connected
Hidden states from uni-/bi-directional recurrent layers are passed to softmax
Model 3: Recurrent Convolutional Networks (RCNNs)
Recurrent convolutional network architecture for dialogue topic tracking
…
Inputs
…
ut-1
ut
ut-2
ut-h+1
Convolutional
layer
Forward
layer
sf
t-1
sf
t
sf
t-2
sf
t-h+1
Backward
layer
sb
t-1
sb
t
sb
t-2
sb
t-h+1
Output
labels
yt-1
yt
yt-2
yt-h+1
Max pooling
layer
Each feature vector generated after the max pooling layers in the CNN architecture
is connected to the recurrent layers in the RNN architecture
Evaluation
TourSG corpus
Human-human mixed initiative dialogues
35 sessions, 21 hours, 31,034 utterances
Manually annotated with nine topic categories
Models
Baselines
Support Vector Machines (SVM)
Conditional Random Fields (CRF)
CNNs: learned from scratch/pre-trained word2vec
RNNs: uni-directional/bi-directional RNNs/LSTMs
RCNNs: uni-directional/bi-directional RCNNs/LRCNs
Results
Models Features P R F
SVM bag-of-ngrams, speaker 59.85 59.94 59.90
SVM doc2vec, speaker 46.66 52.31 49.32
SVM bag-of-ngrams, speaker, doc2vec 59.91 60.01 59.96
CRF bag-of-ngrams, speaker 60.05 60.97 60.51
CRF doc2vec, speaker 61.77 49.57 55.00
CRF bag-of-ngrams, speaker, doc2vec 60.08 61.00 60.54
CNN learned from scratch 63.88 62.87 63.37
CNN learned from pre-trained word2vec 66.91 68.61 67.75
RNN uni-directional 49.51 53.75 51.55
RNN bi-directional 48.73 49.82 49.27
LSTM uni-directional 49.45 50.23 49.84
LSTM bi-directional 48.42 48.77 48.59
RCNN uni-directional 67.08 68.67 67.86
RCNN bi-directional 67.25 69.39 68.30
LRCN uni-directional 67.50 69.04 68.26
LRCN bi-directional 67.60 69.62 68.59
Error Distributions
SVM CRF CNN LRCN
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
7000
7500
Numberoferrors
missing
extraneous
wrong category
wrong boundary
1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg