Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
CIKM 2011 | Ed Chi on Model-Driven Research in Social Computing
1. CIKM 2011 | Invited Talk
Model-Driven Research in Social Computing
Ed H. Chi
Google
Research
Work done while
at Palo Alto
Research Center
(PARC)
2011-10-27 CIKM 2011 Invited Talk 1
2. Some
Google
Social
Stats
n 250,000
words
are
written
each
minute
on
Blogger
-‐
that’s
360
million
words
a
day
n Every
16
seconds
people
view
enough
photos
from
Picasa
Web
Albums
to
cover
an
entire
football
field
n Every
8
minutes,
more
photos
are
viewed
on
Picasa
Web
Albums
than
exist
in
the
entire
Time-‐LIFE
photo
collection
2011-10-27 CIKM 2011 Invited Talk 2
3. YouTube
Stats
n 150
years
of
YouTube
video
are
watched
everyday
on
Facebook
(up
2.5x
y/y)
n every
minute
400+
tweets
contain
YouTube
links
(up
3x
y/y)
[Q1
20111]
n 100M+
people
take
a
social
action
with
YouTube
(likes,
shares,
comments,
etc)
every
week
(10/15/10)
2011-10-27 CIKM 2011 Invited Talk 3
4. Google+
Stats
n 40
million
people
joined
Google
since
launch.
n People
are
2x-‐3x
times
more
likely
to
share
content
with
one
of
their
circles
than
to
make
a
public
post.
2011-10-27 CIKM 2011 Invited Talk 4
5. Social
Stream
Research
n Analytics
– Factors
impacting
retweetability
[Suh
et
al,
IEEE
Social
Computing
2010]
– Location
field
of
user
profiles
[Hecht
et
al,
CHI
2011]
– Organic
Q&A
behaviors
[Paul
et
al,
ICWSM’11]
– Languages
used
in
Twitter
[Hong
et
al,
ICWSM’11]
n Improving
Stream
Experience
– Topic-‐based
summarization
&
browsing
of
tweets
[Bernstein
et
al,
UIST2010]
– Tweet
recommendation
[Chen
et
al,
CHI2010
&
CHI2011]
2011-10-27 CIKM 2011 Invited Talk 5
6. Invisible
Brokerage
Signals
across
Language
Barriers
Joint
work
w/
Lichan
Hong,
Gregorio
Convertino
[Hong
et
al.,
ICWSM
July
2011]
2011-10-27 CIKM 2011 Invited Talk 6
7. Motivation
for
Studying
Languages
n Twitter
is
an
international
phenomenon
– Most
research
focused
on
English
users
– Question
about
generalization
to
non-‐English
– Understand
cross-‐language
usage
differences
– Design
implications
for
international
users
n Research
Questions:
– What
is
the
language
distribution
in
Twitter?
– How
do
users
of
different
languages
use
Twitter?
– How
do
bilingual
users
spread
information
across
languages?
2011-10-27 CIKM 2011 Invited Talk 7
8. Data
Collection
&
Processing
Twitter
stream
04/18/10-‐05/16/10
(4
weeks)
62M
tweets
Google
Language
API
&
LingPipe
104
languages
Top
10
languages
2011-10-27 CIKM 2011 Invited Talk 8
9. Top
10
Languages
in
Twitter
Language
Tweets
%
Users
English
31,952,964
51.1
5,282,657
Japanese
11,975,429
19.1
1,335,074
Portuguese
5,993,584
9.6
993,083
Indonesian
3,483,842
5.6
338,116
Spanish
2,931,025
4.7
706,522
Dutch
883,942
1.4
247,529
Korean
754,189
1.2
116,506
French
603,706
1.0
261,481
German
588,409
1.0
192,477
Malay
559,381
0.9
180,147
2011-10-27 CIKM 2011 Invited Talk 9
10. Human-‐Coding
Study
n 2,000
random
tweets
from
62M
tweets
n 2
human
judges
for
each
of
top
1o
languages
– native
speakers
or
proficient
– discuss
to
resolve
disagreement
n Hard
to
find
Indonesian
&
Malay
judges
n Presented
2,000
tweets
to
each
judge
n Judge
selected
tweets
in
his/her
language
2011-10-27 CIKM 2011 Invited Talk 10
11. Machine
vs.
Human
T-‐P:
true
positive,
T-‐N:
true
negative,
F-‐N:
false-‐negative,
F-‐P:
false
positive
Language
T-‐P
T-‐N
F-‐N
F-‐P
Cohen’s
Kappa
English
974
971
20
35
0.95
Japanese
370
1,595
0
35
0.94
Portuguese
170
1,803
19
8
0.92
Indonesian
106
1,875
15
4
0.91
Spanish
96
1,889
11
4
0.92
Dutch
18
1,978
2
2
0.90
Korean
24
1,976
0
0
1.00
French
13
1,980
0
7
0.79
German
12
1,979
2
7
0.72
Malay
8
1,979
4
9
0.55
2011-10-27 CIKM 2011 Invited Talk 11
12. Accuracy
of
Language
Detection
n Two
Types
of
Errors
– Got
ur
dirct
msg.i’m
lukng
4wrd
2
twt
wit
u
too.so,wat
doing
ha…(detected
as
Afrikaans)
– High
error
rate
for
tweets
of
1~2
words
2011-10-27 CIKM 2011 Invited Talk 12
13. Machine
vs.
Human
Language
T-‐P
T-‐N
F-‐N
F-‐P
Cohen’s
Kappa
French
13
1,980
0
7
0.79
German
12
1,979
2
7
0.72
Malay
8
1,979
4
9
0.55
• French:
5/7
F-‐P
have
2
words
• German:
1/2
F-‐N
has
1
word;
6/7
F-‐Ps
are
in
English
• Malay:
3/4
F-‐Ns
&
7/9
F-‐Ps
are
in
Indonesian
2011-10-27 CIKM 2011 Invited Talk 13
15. Use
of
URLs
in
62M
Tweets
Language
URLs
n Chi
Square
tests
confirmed
that
All
21%
differences
by
language
are
English
25%
significant.
Japanese
13%
Portuguese
13%
Indonesian
13%
Spanish
15%
Dutch
17%
Korean
17%
French
37%
German
39%
Malay
17%
2011-10-27 CIKM 2011 Invited Talk 15
16. Significant
Cross-‐Language
Differences
Language
URLs
Hashtags
Mentions
Replies
Retweets
All
21%
11%
49%
31%
13%
English
25%
14%
47%
29%
13%
Japanese
13%
5%
43%
33%
7%
Portuguese
13%
12%
50%
32%
12%
Indonesian
13%
5%
72%
20%
39%
Spanish
15%
11%
58%
39%
14%
Dutch
17%
13%
50%
35%
11%
Korean
17%
11%
73%
59%
11%
French
37%
12%
48%
36%
9%
German
39%
18%
36%
25%
8%
Malay
17%
5%
62%
23%
29%
Chi
Square
tests
confirmed
that
differences
by
language
are
significant
2011-10-27 CIKM 2011 Invited Talk 16
17. Implications
Language
URLs
Hashtags
Mentions
Replies
Retweets
All
21%
11%
49%
31%
13%
Korean
17%
11%
73%
59%
11%
German
39%
18%
36%
25%
8%
n Use
of
Twitter
for
social
networking
vs.
information
sharing
different
in
different
languages
n Design
of
recommendation
engines
– Korean
users:
promote
conversational
tweets
– German
users:
promote
tweets
with
URLs
2011-10-27 CIKM 2011 Invited Talk 17
18. Studying
Bilingual
Brokers
n Importance
of
brokers
– Structural
holes
(Burt’92),
LiveJournal
(Herring
et
al’07)
n Define
bilingual
brokers
as
Users
who
tweeted
in
a
pair
of
languages
n Caveat
– Under-‐estimated
due
to
4-‐week
time
limit
– Over-‐estimated
due
to
language
detection
errors
2011-10-27 CIKM 2011 Invited Talk 18
19. Number
of
Bilingual
Brokers
E
J
P
I
S
D
K
F
G
J
140,730
P
488,545
13,228
I
230,023
4,825
29,405
S
359,117
10,139
112,524
36,068
D
150,041
6,383
30,855
34,906
30,916
K
19,722
6,384
906
2,014
1,109
972
F
194,931
10,463
53,607
34,586
49,445
33,568
1,244
G 110,748
6,053
22,106
21,471
21,989
22,162
786
24,763
M 148,365
4,208
31,184
135,427
31,967
29,331
1,518
30,257
18,301
2011-10-27 CIKM 2011 Invited Talk 19
20. Sharing
URLs
Across
Languages
E
J
P
I
S
D
K
F
G
M
E 3,013
18,399
985
4,986
1,144
212
1,791
1,647
540
J
3,013
77
37
58
29
43
59
46
18
P 18,399
77
74
1,644
198
2
453
168
123
I
985
37
74
67
64
1
53
38
279
S 4,986
58
1,644
67
139
0
286
139
53
D 1,144
29
198
64
139
2
112
126
48
K 212
43
2
1
0
2
3
3
1
F
1,791
59
453
53
286
112
3
157
53
G 1,647
46
168
38
139
126
3
157
40
M 540
18
123
279
53
48
1
53
40
2011-10-27 CIKM 2011 Invited Talk 20
21. Sharing
Hashtags
Across
Languages
E
J
P
I
S
D
K
F
G
M
E 8,178
33,197
14,96 27,284
6,685
798
9,410
7,208
5,517
9
J
8,178
331
135
351
218
149
352
260
100
P 33,197
331
535
4,682
604
13
1,231
580
400
I
14,969
135
535
762
684
25
713
415
6,046
S 27,284
351
4,682
762
819
28
1,468
708
463
D 6,685
218
604
684
819
26
851
769
424
K 798
149
13
25
28
26
25
18
20
F
9,410
352
1,231
713
1,468
851
25
879
411
G 7,208
260
580
415
708
769
18
879
265
M 5,517
100
400
6,046
463
424
20
411
265
2011-10-27 CIKM 2011 Invited Talk 21
22. Implications
n Indicators
of
connection
strength
between
languages
– Number
of
bilingual
brokers
– Acts
of
brokerage:
sharing
URLs
&
hashtags
n English
well
connected
to
others,
and
may
function
as
a
hub
n Need
to
improve
cross-‐language
communications
2011-10-27 CIKM 2011 Invited Talk
? 22
23. Visible
Social
Signals
from
Shared
Items
Kudos
to
Jilin
Chen,
Rowan
Nairn
[Chen
et
al,
CHI2010]
[Chen
et
al.,
CHI2011]
2011-10-27 CIKM 2011 Invited Talk 23
25. Information
Gathering/Seeking
n The
Filtering
Problem:
– “I
get
1,000+
items
in
my
stream
daily
but
only
have
time
to
read
10
of
them.
Which
ones
should
I
read?”
n The
Discovery
Problem:
– “There
are
millions
of
URLs
posted
daily
on
Twitter.
Am
I
missing
something
important
there
outside
my
own
Twitter
stream?”
2011-10-27 CIKM 2011 Invited Talk 25
26. Stream
Recommender
n Zerozero88.com
– Twitter
as
the
platform
– URLs
as
the
medium
– Produces
your
personal
headlines
2011-10-27 CIKM 2011 Invited Talk 26
27. URL Sources
Topic Relevance
User Topic Profiles
Scores
Social Network Scores Local Social Network
Recommendation Engine
Ø Multiply scores
Ø Rank URLs using multiplied scores
Ø Recommend highest ranked URLs
2011-10-27 CIKM 2011 Invited Talk 27
28. URL
Sources
n Considering
all
URLs
was
impossible
n FoF:
URLs
from
followee-‐of-‐followees
– Social
Local
News
is
Better
n Popular:
URLs
that
are
popular
across
whole
Twitter
– Popular
News
is
Better
Component Possible Design Choices
URL Sources FoF (followee-of-followees)
Popular
2011-10-27 CIKM 2011 Invited Talk 28
29. URL Sources
Topic Relevance
User Topic Profiles
Scores
Social Network Scores Local Social Network
Recommendation Engine
Ø Multiply scores
Ø Rank URLs using multiplied scores
Ø Recommend highest ranked URLs
2011-10-27 CIKM 2011 Invited Talk 29
31. Topic
Profile
of
URLs
n Built
from
tweets
that
contain
the
URL
n However,
tweets
are
short
– term
vectors
for
URLs
are
often
too
sparse
n Adopt
a
term
expansion
technique
using
a
search
engine
Best
of
Show
CES
2011:
The
Motorola
Atrix
http://tcrn.ch/e0g3Oh
Add to
Profile
smartphone,
mobility, …
2011-10-27 CIKM 2011 Invited Talk 31
32. Topic
Profile
of
Users
n Self-‐Topic:
content
profile
based
on
my
posts
– My
Interest
as
Information
Producer
n Followee-‐Topic:
content
profile
based
on
my
followees’
posts
– My
Interest
as
Information
Gatherer
n None,
for
comparison
purpose
Component Possible Design Choices
Topic Self-Topic
Relevance Followee-Topic
Scores None
2011-10-27 CIKM 2011 Invited Talk 32
33. My
Followees
Profile Profile
Profile Profile
Collect & Profile
Profile
Profile Profile
Profile Profile
Profile
A term is weighted higher in your profile if Find Top
more of your followees have the term as Key Terms
their top key terms
Terms Terms
Terms Terms
Profile Aggregate Terms
Terms Terms
Terms Terms
Terms
2011-10-27 CIKM 2011 Invited Talk 33
34. URL Sources
Topic Relevance
User Topic Profiles
Scores
Social Network Scores Local Social Network
Recommendation Engine
Ø Multiply scores
Ø Rank URLs using multiplied scores
Ø Recommend highest ranked URLs
2011-10-27 CIKM 2011 Invited Talk 34
35. Social
Network
Scores
n “Popular
Vote”
in
among
my
followees-‐of-‐followees
– People
“vote”
a
URL
by
tweeting
it
– URLs
with
more
votes
in
total
are
assigned
higher
score
– Votes
are
weighted
using
social
network
structure
n None,
for
comparison
purpose
Component Possible Design Choices
Social Social Voting
Network None
Scores
2011-10-27 CIKM 2011 Invited Talk 35
36. The
Intuition:
Local
Influence
follow
15 People
follows
Whose URLs should be
weighted higher?
Me
follows
5 People follow
2011-10-27 CIKM 2011 Invited Talk 36
37. Possible
Recommender
Designs
Component Possible Design Choices
URL Sources FoF (followee-of-followees)
Popular
Topic Self-Topic
Relevance Followee-Topic Recommendation Engine
Scores None
Social Social Voting Ø Multiply scores
Network None Ø Rank URLs using multiplied scores
Scores Ø Recommend highest ranked URLs
• 2 (URL source) x 3 (topic score) x 2 (social score) = 12
possible algorithm designs in total"
• Random selection if for both scores we chose None"
2011-10-27 CIKM 2011 Invited Talk 37
38. Study
Design
n Within-‐subject
design
n Each
subject
evaluated
5
URL
recommendations
from
each
of
the
12
algorithms
– Show
60
URLs
in
random
order,
and
ask
for
binary
rating
– 60
ratings
x
44
subjects
=
2640
ratings
in
total
39. Summary
of
Results
Popular URLs
FoF URLs
Social Vote Only
Best Performing
2011-10-27 CIKM 2011 Invited Talk 39
39
40. Algorithms
Differ
Not
Only
in
Accuracy!
n Relevance
vs.
Serendipity
in
recommendations
n From
a
subject
in
the
pilot
interview
of
zerozero88:
– “There
is
a
tension
between
the
discovery
and
the
affirming
aspect
of
things.
I
am
getting
tweets
about
things
that
I
am
already
interested
in.
Something
I
crave
…,
is
an
element
of
surprise
or
whimsy.
...
I
am
getting
a
lot
of
things
I
am
interested
in,
but
that
is
not
necessarily
a
good
thing
for
me
personally”
2011-10-27 CIKM 2011 Invited Talk 40
41. Design
Rule
n Interaction
costs
determine
number
of
people
who
participate
# People willing to participate
– Surplus
of
attention
&
motivation
at
small
transaction
costs
n Therefore:
n Important
to
keep
interaction
costs
low
– Recommendation
– Summarization
Cost of participation
n Or
bring
new
benefits
2008-05-13 CSCL 2011 Keynote