A self training framework for exploratory discourse detection final

A self-training framework for
exploratory discourse detection
Zhongyu Wei
SoLAR symposiumOpen University,UK, 26 June 2012

PhD student, SEEM, The Chinese University of Hong Kong, Hong Kong
SocialLearn intern, Open University, UK
zywei@se.cuhk.edu.hk

Outline
 Exploratory dialogue analysis
 A self-training framework
 Datasets and experiments
 Applications

Online learning resources explosion

Learning Online
Forum Seminar

Online
Distant Conferen
Educatio ce
n
Platform

the critical, knowledge-building
discourse?...

How many points in the webinar
triggered learning/knowledge-building?

This person contributes a lot during the
chat.

This part appears to have very good
content that will provoke deeper learning

Data in this study taken from a 2 day OU conference in Elluminate & Cloudworks:

Exploratory dialogue analysis
 Exploratory dialogue
 ……represents a joint, coordinated from of co-reasoning
in language, with speakers sharing knowledge,
challenging ideas, evaluating evidence and considering
Categor ... …
options Description Example
y
Challen Identifies that something
ge may be wrong and in need I disagree. Freemind is a superb
of correction piece of software to use...
Evaluati Has a descriptive quality That's a really interesting
on approach
Extensio Builds on or provides I've embedded helen's slide
n resources that support share over in cloudworks
discussion http://link.com
Reasoni The process of thinking an Why intranet only? What
Mercer, N. (2004). Sociocultural discourse analysis: analysing classroom talk as a social mode of thinking. Journal of Applied Linguistics, 1(2),
137-168.
ng idea through. meaning CLOSED in

Low exploratory dialogue
Time Contribution
3:12 PM LOL
3:12 PM It's not looking good.
3:13 PM Sorry, had to do that.
3:13 PM jaaa
3:13 PM Ouch!
3:13 PM It was a vuvuzela.
3:13 PM I though that was you @Alistair
3:13 PM I've taken away the vuvuzela from you now!
3:13 PM LOL

Higher exploratory dialogue
Time Contribution
2:42 PM I hate talking. :-P My question was whether "gadgets" were just
basically widgets and we could embed them in various web sites,
like Netvibes, Google Desktop, etc.
2:42 PM Thanks, that's great! I am sure I understood everything, but looks
inspiring!
2:43 PM Yes why OU tools not generic tools?
2:43 PM Issues of interoperability
2:43 PM The "new" SocialLearn site looks a lot like a corkboard where you
can add various widgets, similar to those existing web start pages.
2:43 PM What if we end up with as many apps/gadgets as we have social
networks and then we need a recommender for the apps!
2:43 PM My question was on the definition of the crowd in the wisdom of
crowds we acsess in the service model?

Exploratory dialogue detection
 Problem Statement
 Given an online chatting session S = {d0, d1 … …dn}, dk
stands for the kth dialogue, classify dk as exploratory or
non-exploratory.
 Solution from learning analytics
 Sociocultural discourse analysis method
 Manual
 High precision and low recall
Category Cue phrases
Challenge But if, have to respond, my view
Evaluation Good example, good point
Extension More links, for example
Reasoning That is why, next step

Exploratory dialogue classification

Explorator
Explorator y
y
Dialog Discourse
ue Non-
Classifier
Explorator
y
 Dialogue is represented by a feature vector.
 {I think she is right}{I, think, she, is, right, I-
think, think-she, she-is, is-right, I-think-she, think-
she-is, she-is-right}

Exploratory dialogue classification
 Instance-based supervised classifier training
Explor
Explor
Explorato
atory
atory Explorator
ry Classifier y
Training Discourse
Non-
Non-
Non- Classifier
Explorat
Explorat
Explorator
ory
ory
y

 Feature-based supervised classifier training
Explor
Explor
Explorato Explorato
atory
atory Feature Classifie
ry Featu ry
Generati re r
List Discourse
Non-
Non- on Training
Non- Classifier
Explorat
Explorat
Explorator
ory
ory
y

An example of feature list

Feature Exploratory Non-
Exploratory
what-is 0.9992 0.0008
good-point 0.9995 0.0005
your-audio- 0.001 0.999
should
thank-you 0.004 0.996
my-name 0.07 0.93

A self-training framework

Annotat
Classifier Classifie
ed
training r
data

 Step 1: Training initial classifier on annotated
data.
 Annotated data is time consuming to obtain

Unlabele
d
data

Annotat
Classifier Classifie
ed
training r
data

Annotat Pseudo- Instance
ed annotate Selection
data d data

 Step 2: Classify unlabeled data, select high
confidence instances and combine them with
annotated data
 Step 3:Re-train classifier on the augmented training

Unlabele
d
data

Explorator
Annotat
Classifier Classifie y Resul
ed
training r Discourse ts
data
Detection

Annotat Pseudo- Instance Test
ed annotate Selection data
data d data

 Step 4: Obtain final classifier: No improvement on
validation dataset; After a certain iteration; No class
label changes.
 Step 5: Detect exploratory dialogues on the test data.

Unlabele
d
data

Explorator
Annotat
Classifier Classifie y Resul
ed
training r Discourse ts
data
Detection

Annotat Pseudo- Instance Test
ed annotate Selection data
data d data
 Self-training will introduce noisy instance.

KNN based Instance Selection approach
 K nearest neighbors classification
Blue stands for “exploratory”
Gray stands for “non-exploratory”
1 nearest neighbor is “exploratory”
2 nearest neighbors is
“exploratory”
5 nearest neighbors is “non-
exploratory”

KNN based Instance Selection approach
Pseudo annotated instances P = {p1,p2,…
…pn }
pk = (lk, ck) . Lk is pseudo label, ck is
confidence value

Form a candidate list
Choose instances with ck > r

For pk in the candidate list, identify the K
nearest neighbors and update the pseudo
label of pk by KNN

Obtain new pseudo annotated instances P-
updated

Data source: OU online
conference
 4 sessions including 2634 posts.

Data in this study taken from a 2 day OU conference in Elluminate & Cloudworks:

Annotation
 2 Annotators with one morning training.
 Four categories are given.
 Kappa value (binary) is 0.5978 (moderate).
 Only posts with the consistent labels are
collected. Total# Exploratory # Non-Exploratory
Session
#

OU_22A 529 380 149
M
OU_22P 661 508 153
M
OU_23A 456 310 146
M

Experiment Setup
 Baseline:
 CP: Cue phrase based method
 MEGE: Supervised Max Entropy GE (Generalized Expectation)
approach (feature based)
 ME: Supervised Max Entropy approach (instance based)
 SMEGE: Self-training Max Entropy GE approach (feature based)
 SME: Self-training Max Entropy approach (instance based)
 Experiment Setup
 Use one session as training part, one session as testing part, one
session as validation
 During the self-training process, examples include cue-phrase
are added to training dataset at the first stage.
 Pseudo samples are added with the same ratio of exploratory
and non-exploratory as training dataset.
 Confidence value 0.8
 Feature threshold 0.65

Evaluation Criterion

Exp Exp
Exp Exp

NonEx
Exp p
NonEx NonEx
p p

Experiment Result
Approach Accuracy Precision Recall F1
Cue- 0.5389 0.9523 0.4241 0.5865
Phrase
MaxEnt 0.8099 0.8526 0.8675 0.8499
MaxEntGE 0.7932 0.8817 0.8078 0.8292
Self- 0.8088 0.8331 0.9011 0.8574
training
MaxEnt
Self- 0.8181 0.8818 0.8406 0.8554
 Cue-phrase method give high precision, but low accuracy.
training
 Feature-based self-training approach improve on all criteria
MaxEntGE
(the last row).
 Instance-based self-training algorithm (4th row) perform even
worse according to accuracy precision.

Experiment Result
Session MaxEnt MaxEnt- MaxEntG MaxEntG
Selftrain E E-
Selftrain
OU_22AM 0.8190 0.8467 0.7887 0.8270
OU_22PM 0.8034 0.8311 0.7738 0.8116
OU_23AM 0.8268 0.8282 0.8114 0.8297
OU_23PM
 Instance-based self-training algorithm (2nd 0.8042
0.7906 0.7294 0.7989
column) is sensitive to the initial classifier’s
performance.
 Feature-based self-training approach gives more
stable results (the last column).

Transcript level visualization

Time line Visualization

80

60

40

20

0
9:28
9:32

10:13

11:48

12:00

12:04
12:05
9:36
9:40
9:41
9:46
9:50
9:53
9:56
10:00
10:05
10:07
10:07
10:09

10:17
10:23
10:27
10:31
10:35
10:40
10:45
10:52
10:55
11:04
11:08
11:11
11:17
11:20
11:24
11:26
11:28
11:31
11:32
11:35
11:36
11:38
11:39
11:41
11:44
11:46

11:52
11:54

12:03
-20

-40
1. anybody else with poor audio?
2. is anyone else Exploratorydifficulty hearing this?
Average having …
-60 3. background noise makes it difficult to hear
1. Sheffield, UK not as sunny as yesterday - 1. See you!
still warm 2. bye for now!
2. Greetings from Hong Kong 3. bye, and thank
3. Morning from Wiltshire, sunny here! you
4. Bye all for now

Time line Visualization
Time User Id Content
added to which 2M often drops to 10% of that in peak
11:46 AM User_2
80 times
I really disagree - ECDL was the starting point for many
11:47 AM User_3
60 many first time users
11:47 AM User_1
40 online basics won't load in final third first
11:47 AM User_1
20 mobile won't work round her
11:47 AM User_1
0 and satlellite costs 40 a month for 1 gig data transfer
9:28
9:32

10:13

11:48

12:00

12:04
12:05
I think the issue about the skills needed to really embrace
9:36
9:40
9:41
9:46
9:50
9:53
9:56
10:00
10:05
10:07
10:07
10:09

10:17
10:23
10:27
10:31
10:35
10:40
10:45
10:52
10:55
11:04
11:08
11:11
11:17
11:20
11:24
11:26
11:28
11:31
11:32
11:35
11:36
11:38
11:39
11:41
11:44
11:46

11:52
11:54

12:03
-20
technologies is a huge one and with web 2.0 technologies
-40 things are becoming more complicated, as I say often you
dont just get this stuff by attending a workshop, you have
Average Exploratory …
-60 to participate and appropriate them to your interests and
11:47 AM User_4 context and network of others.
We use myguide on mobile broadband for outreach.
Works OK, but not great and thats in city centre
11:47 AM User_5 boardering 3G/GPS.

User Visualization
Contribution Distribution of Users
50
Exploratory Message Count

45

40

35
Time User Id Content
30 because although some people can
25 11:42 get 'online' the feed is so poor that
20 AM User_1 many pages won't load. eg myguide
15 how much time and money was spent
10 11:42 getting everyone to use a mobile
5 AM User_1 phone?
0 nothing. because it was perceived to
0 10 20 30 40 50 60
be useful, therefore there is no need
Total Message Count time and money on
to spend
11:43 digitalinclusion, until the access to the
AM User_1 internet works
in order to get a 2meg connection to
11:44 everyone we need fibre to the final
AM User_1 third

User Visualization
Contribution Distribution of Users
Time
50 User Id Content
Exploratory Message Count

9:51
45 Hello Im a tutor at Saudi arabia
AM
40
User_6 branch
35
9:51
30
AM Moderator hello Saudi Arabia!
25
9:51
20
AM User_6 hi
15
9:52 Welcome Ashawa - did we meet in
10
AM Moderator Kuwait a couple of years ago?
5
9:52
0
AM 0 User_6
10 20 no actually
30 40 50 60

9:52 Total Message Count
AM Moderator @ashawa - maybe next time
9:52
AM User_6 yes I wish

i
i
This step appears to have very good
content that will provoke deeper learning

i
i
This step appears to have some content
that will provoke deeper learning

i
i
This step appears to have little content
that will provoke deeper learning

Conclusion
 We have extended our previously proposed self-
training framework for exploratory discourse
detection in synchronous textchat (Elluminate
conference sessions).
 Propose a K Nearest Neighbors algorithm based
instance selection method.
 Applied the proposed approach to SocialLearn
platform.

Future Work
Text analytics:
 Integrate KNN instance selection method into the
self-training framework
 Explore other features for exploratory dialogue
classification: inter-dialogue features, global
features.
 Build a more reliable dataset for sub-category
classification, challenge, evaluation, reasoning, e
xtension.

Future Work
Visual analytics:
 Investigate how these can be rendered most
usefully for educators and learners
 Investigate user feedback when deployed
 Different users will appreciate different levels of
detail
 Purdue Signals experience suggests that complex
underlying analytics should be usefully distilled into
very simple feedback
 But as analytics literacy grows, will users value
more powerful insights?

Acknowledgments
 Thanks for the guidance and consideration of Dr.
He Yulan, Dr. Simon and Dr. Rebecca.
 Thanks for the consideration from all the other
colleagues in Knowledge Media Institute.

Zhongyu Wei
The Chinese University of Hong Kong, Hong Kong
http://www.se.cuhk.edu.hk/~zywei/

Yulan He
The Open University, UK
http://people.kmi.open.ac.uk/yulan/

Simon Buckingham Shum
http://oro.open.ac.uk/view/person/sjb72.html

Rebecca Ferguson
http://oro.open.ac.uk/view/person/rf2656.html

A self training framework for exploratory discourse detection final

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie A self training framework for exploratory discourse detection final

Ähnlich wie A self training framework for exploratory discourse detection final (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A self training framework for exploratory discourse detection final

Hinweis der Redaktion