1. The Wisdom of the Audience: An Empirical Study of
Social Semantics in Twitter Streams
Claudia Wagner, Philipp Singer, Lisa Posch and Markus Strohmaier
10th Extended Semantic Web Conference, Montpellier, 29.5.2013
2. Problem
#music
#fashion
Authors make their messages as informative as required but do not provide
more information than necessary (Maxim of Quantity by Grice (1975))
[src: http://www.techweekeurope.co.uk/wp-content/uploads/2012/07/Twitter.jpg]
3. Research Questions3
RQ 1: To what extent is the background knowledge of audiences useful for
analyzing the semantics of social media messages?
RQ 2: What are the characteristics of an audience which possesses useful
background knowledge for interpreting the meaning of a stream's messages
and which types of streams tend to have useful audiences?
[scr: http://www.teachthought.com/twitter-hashtags-for-teacher/]
4. Methodology
Message Classification Task
Use hashtags as ground truth
Laniado and Mika (2010) showed that around half of all hashtags can
be associated with Freebase concepts
Compare real audience with random audience - how well can an
audience predict the hashtag of a tweet?
The audience which is better in guessing the hashtag of a Twitter
message is better in interpreting the meaning of the message
Null hypothesis: If the audience of a stream does not possess
more knowledge about the semantics of the stream's messages
than a randomly selected baseline audience, the results from
both classification models should not differ significantly
4
5. Methodology
Train different multiclass classifiers on the background
knowledge of the audience
Logistic Regression, Stochastic Gradient Descent, Multinomial Naive
Bayes and Linear SVM
Compare different approaches for estimating the
background knowledge
Different audience and content selection approaches
Different methods for estimating the background knowledge
Test how well each model can predict the hashtag of
future messages
Weighted Macro F1
5
6. Dataset
Diverse sample of hashtags
Romero et al. (2011) identified eight categories of
hashtags on a large data sample
celebrity, games, idioms, movies/TV, music, political, sports, and
technology
We randomly draw from each category ten
hashtags which were still in use
6
8. Dataset8
t0 t1 t2
3/4/2012 4/1/2012 4/29/2012
stream
tweets
crawl of
social
structure
stream
tweets
crawl of
social
structure
stream
tweets
crawl of
social
structure
1 week
crawl of audience
tweets
crawl of audience
tweets
crawl of audience
tweets
t1 t2 t3
Stream Tweets 94,634 94,984 95,105
Stream Authors 53,593 54,099 53,750
Friends 7,312,792 7,896,758 8,390,143
Audience Tweets 29,144,641 29,126,487 28,513,876
9. Audience Selection
A
B
C
AuthorsAudienceRank
1
2
3
Stream
Team bc tryouts tomo
#football
What we learned this
week: Chelsea are
working in reverse
and Avram is coming
#football #soccer
Weekend pleeeease
hurrrrry #sanmarcos
#football
Holy #ProBowl I'm
spent for the rest of
the day. #football
Fifa warns Indonesia
to clean up its football
or face sanctions
#Indonesia #Football
10. Background Knowledge
Content Selection
Recent
The most recent messages authored by the
audience users
Top Links (plain and enriched)
the messages authored by the audience which
contain one of the top links of that audience
Top Tags
the messages authored by the audience which
contain one of the top hashtags of that audience
10
11. Background Knowlegde
Representation
Preprocessing: remove stopwords, twitter
syntax, stemming
Represent background knowledge of the audience
via the most likely topics or most important words
of their messages
Bag of Words: TF and TFIDF
Topic Models: LDA
11
12. Empirical Evaluation
RQ 1: To what extent does the background
knowledge of the audience support the semantic
annotation of individual messages?
Combine audience selection and background
knowledge estimation approaches to generate
semantic features of the messages authored by an
audience
Training data on audience’s messages crawled at t0
Test model using messages of the hashtag streams
crawled at t1
12
13. Results13
F1 (TF-IDF) F1 (LDA)
Random Guessing 1/78 1/78
Baseline (random audience) 0.01 0.01
F1 (TF-IDF) F1 (LDA)
Random Guessing 1/78 1/78
Baseline (random audience) 0.01 0.01
Audience - recent 0.25 0.23
F1 (TF-IDF) F1 (LDA)
Random Guessing 1/78 1/78
Baseline (random audience) 0.01 0.01
Audience – recent 0.25 0.23
Audience – top links enriched 0.13 0.10
Audience – top links plain 0.12 0.10
F1 (TF-IDF) F1 (LDA)
Random Guessing 1/78 1/78
Baseline (random audience) 0.01 0.01
Audience – recent 0.25 0.23
Audience – top links enriched 0.13 0.10
Audience – top links plain 0.12 0.10
Audience – top tags 0.24 0.21
The audience of a hashtag stream contains knowledge
which is useful for predicting the hashtags of future
messages
14. Results14
F1 (TF-IDF) F1 (LDA)
Random Guessing 1/78 1/78
Baseline (random audience) 0.01 0.01
F1 (TF-IDF) F1 (LDA)
Random Guessing 1/78 1/78
Baseline (random audience) 0.01 0.01
Audience - recent 0.25 0.23
F1 (TF-IDF) F1 (LDA)
Random Guessing 1/78 1/78
Baseline (random audience) 0.01 0.01
Audience – recent 0.25 0.23
Audience – top links enriched 0.13 0.10
Audience – top links plain 0.12 0.10
F1 (TF-IDF) F1 (LDA)
celebrity 0.17 0.15
games 0.25 0.22
idioms 0.09 0.05
movies 0.22 0.18
music 0.23 0.18
political 0.36 0.33
sports 0.45 0.42
technology 0.22 0.22
15. Empirical Evaluation
RQ 2: What are the characteristics of an
audience which possesses useful background
knowledge for interpreting the meaning of a
stream's messages and which types of streams
tend to have useful audiences?
Correlation analysis between the ability of an
audience to interpret the meaning of
messages and structural properties of the
stream
15
16. Structural Stream Properties
Static Measures
Coverage: informational, hashtag, retweet and
conversational extent of a stream
Entropy: randomness of a stream's authors and their
followers, followees and friends
Overlap: overlap between authors and followers,
authors and followees and authors and friends
Dynamic Measures
KL divergence between the author-, the follower-, and
the friend-distributions of a stream at different time
points
16
17. Stat. Significant Spearman
Rank Correlation (p<0.05)17
F1 (TF-IDF) F1 (LDA)
Overlap Author-Follower 0.675 0.655
Overlap Author-Followee 0.642 0.628
Overlap Author-Friend 0.612 0.602
Streams which are produced and consumed by a
community of users who are tightly interconnected tend to
have a useful audience.
A useful audience possesses background knowledge which
helps interpreting the meaning of messages.
18. Stat. Significant Spearman
Rank Correlation (p<0.05)18
F1 (TF-IDF) F1 (LDA)
Conversation Coverage 0.256 0.256
Conversational streams tend to have a useful
audience.
19. Stat. Significant Spearman
Rank Correlation (p<0.05)19
F1 (TF-IDF) F1 (LDA)
Entropy Author Distribution -0.270 -0.400
Entropy Friend Distribution -0.307 -
Entropy Follower Distribution -0.400 -0.319
Entropy Followee Distribution -0.401 -0.368
Streams which are produced and consumed by a
focused set of authors, followers, followees and
friends tend to have a useful audience.
20. Stat. Significant Spearman
Rank Correlation (p<0.05)20
F1 (TF-IDF) F1 (LDA)
KL Follower Distribution -0.281 -
KL Followee Distribution -0.343 -0.302
KL Author Distribution -0.359 -0.307
Socially stable streams tend to have an audience
which is good in interpreting the meaning of a
stream's messages.
21. Summary & Conclusions
The audience of a social stream possesses knowledge which
may indeed help to interpret the meaning of a stream's
messages
But not all streams have similar useful audiences
The audience of a social stream seems to be most useful if
the stream is created and consumed by a stable, focused and
communicative community – i.e., a group of users who are
interconnected and have few core users to whom almost
everyone is connected
We do not know if those relations are causal but we got
similar results when repeating our experiments on t1 and t2
21
22. Current and Future Work
Compare the utility of ontological knowledge with
audience background knowledge for the hashtag
prediction task
Algorithmic exploitation of our results
Hybrid hashtag recommendation algorithm
Structural stream measures may inform weighting (how much
can we count on the audience?)
Differentiate between social and topical hashtags
User-centric algorithms work only for active users who used
hashtags before
An audience-integrated approach only requires an active audience
22
23. References
Grice, H. P. (1975). Logic and conversation. In Speech acts, 3, 41–58. New
York: Academic Press.
Laniado, D., & Mika, P. (2010). Making sense of twitter. In Proceedings of
the 9th international semantic web conference (pp. 470-485). Shanghai,
China.
Romero, D. M., Meeder, B., & Kleinberg, J. (2011). Differences in the
me-chanics of information diffusion across topics: idioms, political hashtags,
and complex contagion on twitter. In Proceedings of the 20th international
conference on world wide web (pp. 695–704). Hyderabad, India.
24
Solet‘sstartbyexploringwhatistheproblemwearetryingtoadresswiththiswork? So take a minutetoreadthose 2 sample tweetsandthinkabouthowyouwould tag them. Whataretopicsofthosetweets? You will noticeitisoftenprettyhardtopredictthetopicsof a tweetwithouthavingfurthercontextualinformation. So in thisexamplethesolutionismusicandfashion. Why do theauthors not providemorecontext? Makethemessageasshortaspossibleandasinformtiveasrequiredandtheyknow/estimatetheiraudience. But whatdoesthatmeaniftheymaketheirinformationaas informative asrequired? This impliesthatthey must estimatewhatthereaudienceknows.Theaudienceofthesecondauthorhere will knowwhois Rafael Cennamo. The audienceofthefirstauthor will knowthatsheis a singer. This examplenicelyshowsthattherearecertainpeoplewhohavetherightbackgroundknowledgetomake sense out ofthesparseinformation.
So thisworkbuilds on thisintuitionthatifonehasaccesstothebackgroundknowledgeoftheaudiencethiswouldhelptointerpretthemeaningoftweetsandannotatethemwithsemantics. Andtherefore, we hypothesize that semantic technologies which aim to enrich social media with semantics would benefit from incorporating the background knowledge of the audience.Concretlyweexplorethefollowing 2 researchquestions
Toadresstheseresearchquestionsweconduct a messageclassificationtofigure out howwellcantheaudiencepredictthehashtagsofstream‘smessages. Thatmeansweusedtweetsoftheaudienceofhashtagstreamsastrainingsamplesandthehashtagitselfaslabels. Wetestedhowwelltheaudience-informedclassifiercanpredictthehashtagoffuturemessagescomparedto a classifiertrained on messagesofrandomaudiences.
Wetrained different classificationmodelsandfoundthat linear SVM worksbestandthereforereportthisresultshere. Toestimatethe BK foreachhashtagstreamwecompare different audienceandcontentselectionapproachesand different methodsforestimatingthe BK.
As a datasetwecreated a diverse sample ofhahstags. Weusedthe 8 categoriesofhashtagswhichhavebeenidentifiedby Romero and Kleinberg in theirpreviousworkanddrawfromeachcategory 10 hashtags (weonlyusedhashtagswhichare still active).We bias our random sample towards active hashtag streams by re-sampling hashtags for which we found less than 1,000 messages when crawling (4. March 2012). For those categories for which we could not find 10 hashtags which had more than 1,000 messages (games and celebrity), we select the most active hashtags per category (i.e., the hashtags for which we found the most messages). Since two hashtags #bsb and #mj appeared in the sample of two different categories, we ended up having a sample of 78 different hashtags (see Table 1).
Hereyoucanseesome sample hashtagsforeachcategorywhichareinluded in our sample. Weendeduphaving 78 hashtagcatgeories.
After having selected a diverse sample of hashtags we started the crawling process. We crawled the data for each hastag at 3 different time points. First we retrieved hashtag-stream tweets that were authored within the last week. Next we extracted the authors of those tweets and crawled their SN. Finally, we also crawled the most recent 3,200 tweets (or less if less were available) of all users who belong either to the top hundred authors or audience users of each hashtag stream. We ranked authors by the number of tweets they published within the hashtag stream and audience users by the number of authors they are friends with. We repeated this process three times.
To estimate the audience of a hashtag stream, we rank the friends (or followers) of the stream's authors by the number of authors they are related with. In this example, the hashtag stream #football has four authors. User B is a friend of all four authors of the stream and is therefore most likely to be exposed to the messages of the stream and to be able to interpret them. Consequently, user B receives the highest rank. User C is a friend of two authors and receives the second highest rank. The user with the lowest rank (user A) is only the friend of one author of the stream.
After havingselectedtheaudienceusers per hashtagstream, we also need a waytoselectthecontentsourcewhichwewanttouselaterforestimatingtheaudience‘s BK. Wetested 4 different approacheshere.
Finnaly, after havingselectedthe top audienceusersandtheircontentof a hashtagstreamwe also need a waytoabstractthecontentandrepresentit in form offeatures
Toanswerour 1st RQ wecombine different audienceselectionand BK estimationstrategiesandtrain a linear SVM usingmessagesauthoredbytheaudienceat t0 orbeforeastrainingsamples. Weextractfeaturesfromthosemessagesbyusingtfidfandtopicmodel. Wetesttheperformanceoftheclassifiersusingfuturehashtagstreammessages.
Ok let‘slookattheresultsfromthisstudy. First notethatsincewehave 78 classes a randomguesserwouldachieve a verylowperformance. Weusedas a baseline a classifiertrainedwith a randomaudience. Thatmeanswerandomlyswappedaudiencesandhashtagstreamsbeforetraining. Thereforewedistroyedtherelationbetweentrainingdataandlabels. A baselineclassifiercanachieve an F1 score of 0.01. Whenusingthe real audienceof a hashtagstream (the top 10 friends) andtheirmostrecentlypublishedmessagesweachive a performanceof 0.25. This issignificvantlybetter.
So I showedyoutheoverallperformance so far, but we also lookedattheperformanceof individual categoriesofhashtagsusingthebestmethod (mostrecentmessagesauthoredbythe top audience). Onecanseefromthisthatthereisone negative outlier–idioms. Sincethosearethehashtagswhichare not representingsemanticconceptsweare not so worriedaboutthat.
Since we observed that the audience of certain hashtag streams is more useful than for others we wanted to know what types of streams tend to have useful audiences. Therefore we performed a correlation analysis btw… we don’t want to claim that there is a causal relation but we repeated our exp at different time points and observed the same correlations.
Toquantifystructuralpropertiesofsocialstreamsweuse a coupleofstaticanddynamicstreammeasures.
Across all categories, the overlapmeasures show the highest positive correlation with the F1-scores. This indicates that streams which..
The only significant coverage measure is the conversational coverage measure. It indicates that the audiences of conversational streams are better in interpreting the meaning of a stream's messages. This suggests that it is not only important that a community exists around a stream, but also that the community is communicative.
All entropy measures show significant negative correlations with the F1-Scores. This shows that the more focused the author-, follower-, followee- and/or friend-distribution of a stream is (i.e., lower entropy), the higher the F1-Scores of an audience-based classification model are. The entropy measures the randomness of a random variable. For example, the author-entropy describes how random the set of authors around a hashtag stream is formed – i.e., how well one can predict who will author the next message in this hashtag stream. The friend-entropy describes how random the friends of hashtag stream's authors are – i.e., how well one can predict who will be a friend of most hashtag stream's authors. Our results suggest that streams tend to have a better audience if their authors and author's followers, followees and friends are less random.
The KL divergences of the author-, follower-, and followee-distributions show a significant negative correlation with the F1-Scores. The author distribution of a hashtag stream reveals how many messages each author contributed. The follower distribution reveals how many authors a follower is followiing. The Followee distribution reveals how many authors are following a folllowee.This indicates that the more stable the author, follower and followee distribution is over time, the better the audience of a stream. If for example the followee distribution of a stream changes heavily over time, authors are shifting their social focus. If the author distribution of a stream has a high KL divergence, this indicates that the sets of authors of streams are changing over time.
To summerize our work provides initial empirical evidence for the fact that… And we also found that certain stream characteritiscs are strongly correlated with the usefulness of the audience. We do not know if this is a causal relation but we repeated our experiment on a later time point and found compareable results.
Currentlyand in thenearfuturewework on comparingtheutilityof .... Andwe also wanttoexploitourresultsfordeveloping a newhashtagrecommendationalgorithm.
To understand whether the structure of a stream has an effect on the usefulness of its audience for interpreting the meaning of its messages, we perform a correlation analysis and investigate to what extent the ability of an audience to interpret the meaning of messages correlates with structural stream properties introduced in Section Structural Stream Measures.