It’s not in their tweets: Modeling topical expertise of Twitter users

It’s not in their tweets: Modeling topical expertise
of Twitter users
Claudia Wagner, Vera Liao, Peter Pirolli, Les Nelson and Markus Strohmaier
Amsterdam, 16.4.2012

with…

Vera Liao

Peter Pirolli Markus Strohmaier

Les Nelson

3
Motivation
On Twitter information consumption is mainly
driven by social networks

Users need to decide whom to follow in order to
get trustful and relevant information about the
topics they are interested in
Evidence from real-life
Search online for evidence

Searching for evidence at
Twitter user’s profile page

Bio
List Memberships

Tweets and Retweets

6
Research Questions
How useful are different types of user-related
data for humans to inform their expertise
judgments of Twitter users?

data for learning computational expertise
models of users?

User Study
Expertise Judgments of humans
16 participants
Task: Rate (1-5) expertise level of selected Twitter users (with
high and low expertise) for the topic „semanticweb“
3 Conditions under which the user accounts were presented to
subjects:
Condition 1: Tweets, Retweets, List, Bio
Condition 2: Only Tweets and Retweets are shown
Condition 3: Only List and Bio are shown
For each condition and expertise level we have 4 Twitter pages
(4 replicates)
4 * 3 * 2 = 24 pages to rate per subject

User Study
Expertise Judgments of humans
2-way ANOVA
cond 1 (tweets, bio and lists)
cond 2 (only tweets)

3.5
Within-Subject Variables: cond 3 (only bio and lists)
•Twitter user expertise (high/low)
•3 Conditions

Mean Rating per Twitter User
Interaction between conditions and

3.0
Twitter user expertise is significant
(F(2) = 8,326 , p < 0,01 )

Post-Hoc Test shows that users’
2.5

ability to correctly judge expertise of
Twitter users differs significantly
under condition 1 and 2 and
condition 2 and 3. Low Expertise High Expertise

9
Research Questions
data for humans to inform their expertise
judgments of Twitter users?

data for learning computational expertise
models of users?

10
Dataset
10 topics
semanticweb, biking, wine, democrat, republican,
medicine, surfing, dogs, nutrition and diabetes
We use Wefollow directories as a manually
created proxy ground truth for expertise
Top 150 users per Wefollow directory
Excluded users who are in more than one of the
10 directories and users who mainly tweet non-
english

11
Dataset
1145 users
Most recent 1000 tweets and retweets
Most recent 300 user lists
Bio info
Information on Twitter is sparse
Extend URLs in Tweets, RTs and bio
Use list names as search query terms
Use top 5 search query result snippets obtained
from Yahoo Boss to enrich list information
3

Computational Expertise Models
Methodology
Learn latent semantic structures (topics) from Twitter
communication by fitting an LDA model

T1 T2 T3

Top 20 stemmed words of 3 randomly select topics learned by an LDA model
with T=50

Methodology
Associate users with topics by using statistical Inference based
on different types of user related data  user’s topical expertise
profile

Bio
T1 T2 T3
Lists
T1 T2 T3
Tweets
T1 T2 T3

RTs
T1 T2 T3

Topical Similarity between
lists/bio/tweets/RTs

1.0
0.8
JS−Divergence

0.6
0.4

List−Bio
List−Tweet
List−Retweet
0.2

Bio−Tweet
Bio−Retweet
Tweet−Retweet
0.0

10 50 80 200 400 600

#Topics

15
Types of User Lists
Manual inspection of user lists
Selected 10 users at random and inspected their
user list memberships (455 user lists)

We found 3 main classes of user lists:
Personal judgments (e.g., “great people”, “geeks”)
Personal relationships (e.g., “my family”,“colleagues”)
Topical Lists (e.g., “science”, “researcher”, “healthcare”)

16
Value of User Lists
3 human raters judged if a list (label and/or
description) belongs to the class Topical Lists

77,67% of user lists were topical lists
Inter-rater agreement Kappa=0.62

Quantify the Value of
17
Lists/Bio/Tweets/RTs
Which type of information reflects best the
topical expertise of a user?
Information Theoretic Evaluation
Which type of topic distribution reflects best the underlying
category information of the user?
Normalized Mutual Information (NMI) between user’s topic
distributions and user’s Wefollow directory
Task-based Evaluation
Which type of topic distributions are most useful for classifying
users into their Wefollow directories?
F1-score of classifcation models

Information-Theoretic Evaluation of
18

0.7
Tweet
Bio
List
0.6

Retweet
0.5
NMI

0.4
0.3
0.2

T=10 T=50 T=80 T=200 T=400 T=600

#Topics

Task-based Evaluation of
Compare topic distributions inferred via different
types of user-related data within a classification
task

Objective: Classifying users into Wefollow directories
by using topic distribution as features

Classification Task:
Train Partial Least Square classifier with topic
distributions inferred via different types of user-related
data as features
Perform 5-fold-cross validation
Use F-measure (harmonic mean of precision and
recall) to compare classifiers’ performance

F−Measure

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

biking
Bio

List
Tweet

Retweet

democrat

diabetes

dogs

medicine

nutrition

republican

semanticweb

surfing

wine

F−Measure

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

T=10
Bio

List
Tweet

T=30
Retweet

T=50

T=70

T=80

T=100

T=200

T=300

T=400

T=500

T=600

T=700

T=300 List Tweet

wine
wine

surfing
surfing

semanticweb semanticweb

republican republican

nutrition nutrition

medicine medicine

dogs dogs

diabetes diabetes

democrat democrat

biking biking

diabetes

dogs
biking

medicine

nutrition

republican

semanticweb

surfing

wine
democrat
diabetes

dogs
biking

medicine

nutrition

republican

semanticweb

surfing

wine
democrat

x-axis shows reference values
y-axis shows predictions

Conclusions
Different types of user-related data lead to
different topic annotations
List-based topic annotations are most distinct from all others
Bio-, tweet- and retweet-based topic annotations are quite similar

For creating topical expertise profiles of users
information about their list memberships is most
useful
For informing humans’ expertise judgments about
Twitter users contextual information (user’ bio and
list memberships) is most useful

24
Implications & Limitations
User Interface
Make user lists and bio information more prominent
Incentives for people to use lists more heavily
E.g. provide weakly list-summaries

Search and Recommender Systems could benefit
from exploiting user list information

Results are biased towards users with high
Wefollow rank

Bio and User Lists are useful for judging topical expertise
Experimental Setup

THANK YOU

claudia.wagner@joanneum.at
http://claudiawagner.info

src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

It’s not in their tweets: Modeling topical expertise of Twitter users

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie It’s not in their tweets: Modeling topical expertise of Twitter users

Ähnlich wie It’s not in their tweets: Modeling topical expertise of Twitter users (20)

Mehr von Claudia Wagner

Mehr von Claudia Wagner (18)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

It’s not in their tweets: Modeling topical expertise of Twitter users

Hinweis der Redaktion