Invited talk given on 12-06-2015 at the University of Oxford, Oxford e-Research Centre.
The talk introduces our notion of socio-technical computation as the implicit purposeful collective action of human collectives on the Web and transcendental information cascades as a means to capture this.
Relevant references:
[1] Luczak-Roesch, M., Tinati, R., Simperl, E., Van Kleek, M., Shadbolt, N., & Simpson, R. (2014). Why won't aliens talk to us? Content and community dynamics in online citizen science. Proceedings of the Eighth AAAI Conference on Weblogs and Social Media, {ICWSM} 2014, Ann Arbor, Michigan, USA, June 1-4, 2014.
[2] Markus Luczak-Roesch, Ramine Tinati, Kieron O'Hara, and Nigel Shadbolt. 2015. Socio-technical Computation. In Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing (CSCW'15 Companion). ACM, New York, NY, USA, 139-142. http://doi.acm.org/10.1145/2685553.2698991
[3] Markus Luczak-Roesch, Ramine Tinati, and Nigel Shadbolt. 2015. When Resources Collide: Towards a Theory of Coincidence in Information Spaces. To appear in WWW’15 Companion, May 18–22, 2015, Florence, Italy. http://dx.doi.org/10.1145/2740908.2743973
From coincidence to purposeful flow? Properties of transcendental information cascades.
1. From coincidence to purposeful flow? Properties of transcendental
information cascades.
Markus Luczak-Roesch
University of Southampton, Web and Internet Science Group
@mluczak | http://sociam.org
6. Dominance of microposts and implicit coordination
PH SG SW NN GZ CC PF SF AP WS
91%
Vocabularyshift 2
0
6
4
10
8
Microposts
Luczak-Roesch, M., Tinati, R., Simperl, E., Van Kleek, M., Shadbolt, N., & Simpson, R. (2014). Why
won't aliens talk to us? Content and community dynamics in online citizen science. Proceedings of
the Eighth AAAI Conference on Weblogs and Social Media, {ICWSM} 2014, Ann Arbor, Michigan,
USA, June 1-4, 2014.
10. A qualitative investigation of crowdsourced
disaster response
• Haiti (Ushahidi, N=298)
– requests for help from
identified local source
• Congo (Ushahidi, N=102)
– information about the
situation but not who is
responsible for this
information
– more non-local sources
• Ebola (Twitter, N=298)
– comments
• tasteless jokes
• racist comments
• concern that the crisis could
spread and call to
governments to close the
borders
11. Boundaries of crowdsourced disaster response
• Wrong things go viral
• Crowdsourcing informativeness
of social media information not
synchronized with crises*
negative neutral positive
11
“When you tell a […] kid that is has got Ebola”
*Olteanu, A., Vieweg, S., & Castillo, C. (2015). What to Expect When the Unexpected Happens: Social
Media Communications Across Crises. In In Proc. of 18th ACM Computer Supported Cooperative Work
and Social Computing (CSCW’15), (No. EPFL-CONF-203562).
12. The future of disaster crowd work
Synchronization
Coordination
13. We can observe situations when online communication does not
happen along explicit social ties (especially in critical situations
when time to make decisions is rare). Instead of talking
explicitly with each other people are
broadcasting about the same event or topic.
Source: United Nations Development Programme, https://goo.gl/Z1uXdV, CC BY-NC-ND 2.0
14. “An informational
cascade occurs when it is
optimal for an individual,
having observed the actions of
those ahead him, to follow the
behavior of the preceding
individual without regard to
his own information.” [1]
[2]
[1] Bikhchandani, Sushil, David Hirshleifer, and Ivo Welch. "A theory of fads, fashion, custom, and cultural
change as informational cascades." Journal of political Economy (1992): 992-1026.
[2] Cheng, Justin, et al. "Can cascades be predicted?." Proceedings of the 23rd international conference
on World wide web. International World Wide Web Conferences Steering Committee, 2014.
Boundaries of context-rich approaches
18. Does the accumulated information propagation behaviour on the
Web form giant purposeful processes?
Source:MichaelDales,https://goo.gl/IKXs4X,CCBY-NC2.0
19. Discovering the algorithms of Social Machines
Socio-technical Computation
The computational capability embodied in cascades of information
sharing activities on the Web that are not necessarily conditioned by
system-specific or social network features but only time and inherent
properties of pairs of resources.
Markus Luczak-Roesch, Ramine Tinati, Kieron O'Hara, and Nigel Shadbolt. 2015. Socio-technical Computation. In
Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing
(CSCW'15 Companion). ACM, New York, NY, USA, 139-142. http://doi.acm.org/10.1145/2685553.2698991
20. 2-state model infinite-state model
HF LF
[3] Kleinberg, Jon. "Bursty and hierarchical structure in streams." Data
Mining and Knowledge Discovery 7.4 (2003): 373-397.
Time
Numberofobserveddocuments
Content streams as automata [3]
23. Building transcendental information cascades
conditionality.
In [20] we presented the initial definition of a transcenden-
tal information cascade as a 4-tupel TC = (V, E, R, F). This
4-tupel represents a directed network consisting of a set of
nodes V and edges E, derived when applying a set of matching
functions F to a set of resources R = {r1, r2, ..., rm}, ri =
(ui, ti, ci), where every ui is a unique identifier of a resource
ri that was shared at the time ti with the content ci. Nodes in
the network are those resources from R that contain a set Ii of
one or multiple cascade identifiers. A cascade identifier is any
unique informational pattern that is recognized by applying
a matching function to the content or any other inherent
properties of a resource (e.g. simple string matching algorithms
to identify keywords in content). Formally a matching function
fk 2 F, k 2 N, k n is defined as:
fk(ci) =
8
>>>>><
>>>>>:
{i1, i2, ..., ix} if fk matches patterns
{i1, i2, ..., ix} in ci
x 2 N
; otherwise
Nodes V and edges E are then given as follows
V ={v1, v2, ..., vp}
vy = (uy, ty, Iy),
E ={e1, e2, ..., eq}
ez =(ua, ub, ⇤z)
with Ii = {i1, i2, ..., io} = f1(ci) [ f2(ci) [ ... [ fn(ci) being
the result of the concatenation of all identifiers found by all
matching functions2
. An edge exists between any two nodes
that share a unique subset of all the cascade identifiers that
were found for them. This subset and none of its subsets is
part of the identifiers found for any node that was created in the
time period between when the two linked nodes were created.
⇤z ={ir|
ir 2 Ia ^ ir 2 Ib,
8ir ! V 0
=
{vc|vc = (uc,tc, Ic), ir 2 Ic ^ ta tc tb} = ;,
vc 2 V, r 2 N, r |Ib|}
A node that contains a cascade identifier that was not
detected for any other nodes before is called the identifier
root. Beside this we call a node without any incoming edges
a network root and node that has no outgoing edges a stub.
network are those resources from R that contain a set Ii of
e or multiple cascade identifiers. A cascade identifier is any
que informational pattern that is recognized by applying
matching function to the content or any other inherent
perties of a resource (e.g. simple string matching algorithms
dentify keywords in content). Formally a matching function
2 F, k 2 N, k n is defined as:
fk(ci) =
8
>>>>><
>>>>>:
{i1, i2, ..., ix} if fk matches patterns
{i1, i2, ..., ix} in ci
x 2 N
; otherwise
des V and edges E are then given as follows
V ={v1, v2, ..., vp}
vy = (uy, ty, Iy),
E ={e1, e2, ..., eq}
ez =(ua, ub, ⇤z)
h Ii = {i1, i2, ..., io} = f1(ci) [ f2(ci) [ ... [ fn(ci) being
result of the concatenation of all identifiers found by all
tching functions2
. An edge exists between any two nodes
t share a unique subset of all the cascade identifiers that
re found for them. This subset and none of its subsets is
t of the identifiers found for any node that was created in the
e period between when the two linked nodes were created.
⇤z ={ir|
ir 2 Ia ^ ir 2 Ib,
8ir ! V 0
=
{vc|vc = (uc,tc, Ic), ir 2 Ic ^ ta tc tb} = ;,
vc 2 V, r 2 N, r |Ib|}
A node that contains a cascade identifier that was not
ected for any other nodes before is called the identifier
t. Beside this we call a node without any incoming edges
etwork root and node that has no outgoing edges a stub.
r cascade model clearly yields different outputs depending
the data to hand (e.g. determined by the extent of the
Please note that [20] contains an unintentionally malformed equation for
as the wrong symbol was used to refer to the concatenation of the matching
ctions.
Fig. 1. Depending on the applied matching functions, different transcendental
information cascade representations can be generated for the same input data.
A fictive example of a transcendental cascade based on our
model is shown in Figure 2. Consider a system that features
hashtags as an established form of identifying content patterns.
The visualisation uses the following approach to represent
distinct identifiers and time: Nodes are chronologically ordered
alongside the horizontal dimension from left (the oldest node)
to right (the most recent node); additionally nodes are ordered
alongside the vertical dimension depending on the set of
identifiers present in a node (each unique set is assigned to
a distinct level). Consequently, the visualisation represents the
content creation sequence (“#A”) - (“#A#B”) - (“#A”) - (“#A”)
- (“#A#B#C”) - (“#C”) - (“#A”) - (“#B#D”) - (“#A”).
Fig. 2. Example of a cascade that emerges along five different identifiers.
#A, #B, #A#B#C, #B#D and #C are fictive hashtags (or hashtag combinations
resepectively) treated as the indentifying content patterns
In order to understand how edges are labelled we highlight
the sub-graph involving the nodes 2, 3, 4, and 5. Conforming
to our cascade model an edge exist between nodes 2 and 3
nding of its use but also an abstract global
ropose a new model that we call transcen-
ascades. Informed by Kleinbergs work on
document streams [2] it regards time as
le condition for relationships between any
meaning that we focus on coincidence of
activities rather than socially-determined
nted the initial definition of a transcenden-
ade as a 4-tupel TC = (V, E, R, F). This
a directed network consisting of a set of
E, derived when applying a set of matching
et of resources R = {r1, r2, ..., rm}, ri =
very ui is a unique identifier of a resource
t the time ti with the content ci. Nodes in
se resources from R that contain a set Ii of
cade identifiers. A cascade identifier is any
al pattern that is recognized by applying
n to the content or any other inherent
rce (e.g. simple string matching algorithms
s in content). Formally a matching function
n is defined as:
, i2, ..., ix} if fk matches patterns
{i1, i2, ..., ix} in ci
x 2 N
otherwise
E are then given as follows
V ={v1, v2, ..., vp}
vy = (uy, ty, Iy),
E ={e1, e2, ..., eq}
ez =(ua, ub, ⇤z)
, io} = f1(ci) [ f2(ci) [ ... [ fn(ci) being
ncatenation of all identifiers found by all
2
. An edge exists between any two nodes
subset of all the cascade identifiers that
m. This subset and none of its subsets is
s found for any node that was created in the
n when the two linked nodes were created.
{ir|
Web crawl), and the matching algorithms determining which
cascade identifiers will be spotted (e.g. reuse of hashtags,
URIs, quotes, images, or maybe exploiting wider semantics
or sentiment) as depicted in Figure ??.
Fig. 1. Depending on the applied matching functions, different transcendental
information cascade representations can be generated for the same input data.
A fictive example of a transcendental cascade based on our
model is shown in Figure 2. Consider a system that features
hashtags as an established form of identifying content patterns.
The visualisation uses the following approach to represent
distinct identifiers and time: Nodes are chronologically ordered
alongside the horizontal dimension from left (the oldest node)
to right (the most recent node); additionally nodes are ordered
alongside the vertical dimension depending on the set of
identifiers present in a node (each unique set is assigned to
a distinct level). Consequently, the visualisation represents the
content creation sequence (“#A”) - (“#A#B”) - (“#A”) - (“#A”)
- (“#A#B#C”) - (“#C”) - (“#A”) - (“#B#D”) - (“#A”).
i that was shared at the time ti with the content ci. Nodes in
he network are those resources from R that contain a set Ii of
ne or multiple cascade identifiers. A cascade identifier is any
nique informational pattern that is recognized by applying
matching function to the content or any other inherent
roperties of a resource (e.g. simple string matching algorithms
o identify keywords in content). Formally a matching function
k 2 F, k 2 N, k n is defined as:
fk(ci) =
8
>>>>><
>>>>>:
{i1, i2, ..., ix} if fk matches patterns
{i1, i2, ..., ix} in ci
x 2 N
; otherwise
Nodes V and edges E are then given as follows
V ={v1, v2, ..., vp}
vy = (uy, ty, Iy),
E ={e1, e2, ..., eq}
ez =(ua, ub, ⇤z)
with Ii = {i1, i2, ..., io} = f1(ci) [ f2(ci) [ ... [ fn(ci) being
he result of the concatenation of all identifiers found by all
matching functions2
. An edge exists between any two nodes
hat share a unique subset of all the cascade identifiers that
were found for them. This subset and none of its subsets is
art of the identifiers found for any node that was created in the
ime period between when the two linked nodes were created.
⇤z ={ir|
ir 2 Ia ^ ir 2 Ib,
8ir ! V 0
=
{vc|vc = (uc,tc, Ic), ir 2 Ic ^ ta tc tb} = ;,
vc 2 V, r 2 N, r |Ib|}
A node that contains a cascade identifier that was not
etected for any other nodes before is called the identifier
oot. Beside this we call a node without any incoming edges
network root and node that has no outgoing edges a stub.
Our cascade model clearly yields different outputs depending
n the data to hand (e.g. determined by the extent of the
2Please note that [20] contains an unintentionally malformed equation for
his as the wrong symbol was used to refer to the concatenation of the matching
unctions.
Fig. 1. Depending on the applied matching functions, different transcendental
information cascade representations can be generated for the same input data.
A fictive example of a transcendental cascade based on our
model is shown in Figure 2. Consider a system that features
hashtags as an established form of identifying content patterns.
The visualisation uses the following approach to represent
distinct identifiers and time: Nodes are chronologically ordered
alongside the horizontal dimension from left (the oldest node)
to right (the most recent node); additionally nodes are ordered
alongside the vertical dimension depending on the set of
identifiers present in a node (each unique set is assigned to
a distinct level). Consequently, the visualisation represents the
content creation sequence (“#A”) - (“#A#B”) - (“#A”) - (“#A”)
- (“#A#B#C”) - (“#C”) - (“#A”) - (“#B#D”) - (“#A”).
Fig. 2. Example of a cascade that emerges along five different identifiers.
#A, #B, #A#B#C, #B#D and #C are fictive hashtags (or hashtag combinations
resepectively) treated as the indentifying content patterns
In order to understand how edges are labelled we highlight
the sub-graph involving the nodes 2, 3, 4, and 5. Conforming
to our cascade model an edge exist between nodes 2 and 3
25. Capturing the unintended action resulting from information
sharing activities of human collectives.
t
Document stream
Transcendental Information Cascade
26. Temporal text/data mining
=
t ∈[t− 2
,t+ 2
] j=1
t′∈[t− W
2
,t+ W
2
] |dt′ |
of each theme can then be modeled as the
theme strengths over time.
of theme life cycles thus involves the follow-
(1) Construct an HMM to model how themes
ach other in the collection. (2) Estimate the
meters of the HMM using the whole stream
served example sequence. (3) Decode the col-
el each word with the hidden theme model
is generated. (4) For each trans-collection
when it starts, when it terminates, and how
me.
IMENTS AND RESULTS
reparation
ts are constructed to evaluate the proposed
methods. The first, tsunami news data, con-
ticles about the event of Asia Tsunami dated
o Feb. 8 2005. We downloaded 7468 news
0 selected sources, with the keyword query
shown in Table 1, three of the sources are in
m are in Europe and the rest are in the U.S.
e Nation News Source Nation
UK Times of India India
US VOA US
mes India Washington Post US
mes US Washington Times US
UK Xinhua News China
ws sources of Asia Tsunami data set
with the previous one. We use the mixture model discussed
in Section 3 to extract the most salient themes in each time
interval. We set the background parameter λB = 0.95 and
number of themes in each time interval to be 6. The varia-
tion of λB is discussed later. Table 3 shows the top 10 words
with the highest probabilities in each theme span. We see
that most of these themes suggest meaningful subtopics in
the context of the Asia tsunami event.
!"##$%#&'($)&"*"%+
,-.#
/$%0"(+#&'$(&.$%1+-$%
2"(#$%13&456"(-"%0"
7$%1+-$%&81+09
2$3-+-013&:##;"#
/(-+-0-#)&&$%&:(1<
=+1+-#+-0#
Figure 6: Theme evolution graph for Asia Tsunami
With these theme spans, we use KL-divergence to further
identify evolutionary transitions. Figure 6 shows a theme
evolution graph discovered from Asia Tsunami data when
the threshold for evolution distance is set to ξ = 12. From
Figure 6, we can see several interesting evolution threads
which are annotated with symbols.
The thread labeled with a may be about warning systems
[4] Subašić, I., & Berendt, B. (2013). Story graphs: Tracking document set evolution using dynamic graphs.
Intelligent Data Analysis, 17(1), 125-147.
[5] Mei, Q., & Zhai, C. (2005, August). Discovering evolutionary theme patterns from text: an exploration of
temporal text mining. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge
discovery in data mining (pp. 198-207). ACM.
[5]
“The key notion of
TTM is burstiness –
sudden increases in
frequency of text
fragments, and all TTM
methods aim to model
burstiness.” [4]
27. t
t
F1
Fn
…
…
C11
C21
C22
C23
t0
t1
t2
t3
t4
t5
t7
t8
t6
t6
-‐
t0
t2
-‐
t1
t8
-‐
t2
t4
-‐
t2
t7
-‐
t4
t5
-‐
t3
t1
-‐
t0
t2
-‐
t1
t4
-‐
t1
t4
-‐
t3
t6
-‐
t5
t8
-‐
t6
t7
-‐
t4
t5
-‐
t4
t3
-‐
t2
There is more than one “reality”
28. Analyzing low-level properties of the multiple
states of a system that exist at the same time
4
1 15
10
Tags
URIs
KID & APH
Single node motifs
long uniform paths
short uniform paths
long non-uniform paths
29. Analyzing low-level properties of the multiple
states of a system that exist at the same time
Tags
URIs
KID&APH
Identifier entropy
4. Overview of the results of the cascade comparison. Cascade size distribution and wi
d with a log scale on the y-axis.
ain one or few identifiers equally distributed. Very large identifiers
e size distribution and wiener index are plotted on a log-log scale; identifier entropy is
large identifiers (KID, APH, URIs), cascades which are based on
varying profiles of increasing
randomness with growing
cascade size
31. t
F1
Fn
…
…
C11
C21
C22
C23
Formalising the
multiple possible
representations of
a system at any time
and their relationships.
Not all representing
purposeful action but
reflecting useful
informational properties.
32. By focusing only on the
coincidence of
information occurrence,
we can capture and
analyse emergent
collective action across
system boundaries and
independent from social
network contexts.
Markus Luczak-Roesch
@mluczak
http://markus-luczak.de
Source:GiuliaForsythe,http://goo.gl/6hpZ0W,CCBY-NC-SA2.0
33. References
• Markus Luczak-Roesch, Ramine Tinati, Kieron O'Hara, and Nigel
Shadbolt. 2015. Socio-technical Computation. In
Proceedings of the 18th ACM Conference Companion on Computer
Supported Cooperative Work & Social Computing (CSCW'15
Companion). ACM, New York, NY, USA, 139-142. http://
doi.acm.org/10.1145/2685553.2698991
• Markus Luczak-Roesch, Ramine Tinati, and Nigel Shadbolt. 2015.
When Resources Collide: Towards a Theory of
Coincidence in Information Spaces. To appear in
WWW’15 Companion, May 18–22, 2015, Florence, Italy. http://
dx.doi.org/10.1145/2740908.2743973