Predicting News Popularity by Mining Online Discussions

Predicting News Popularity by
Mining Online Discussions
Georgios Rizos, Symeon Papadopoulos and Yiannis Kompatsiaris
Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
SNOW/WWW 2016, April 12, 2016, Montréal, Québec, Canada.

Popularity
Operational definition ( $$$)
• Number of views
• Number of comments
• Number of users commenting
• Number of shares
• Number of thumbs up, thumbs down
….
• Bonus: Controversiality
#2

Overview
#4
Online discussions 
Comment trees/user graphs 
Tree/Graph Features 
Predictive model

Comment Trees and User Graphs
#5
User X
I blah blah….
User Y
But bla bla….
User X
No!
User W
Are you kiddin?
User Z
Wat?
X
X
X
Z
Y
W
Y
WZ
A
Comment Tree
User Graph
User A story poster
A

#6
Sony Hack #1 VS Sony Hack #2

#13
Degree
Depth
Elements of an Engaging Discussion

Related Work: Post Popularity
News story discussion size prediction:
• Modelling the comment timestamp time-series
(Tatar et al., 2014)
• Feature set relevant to time of posting (hour, day),
entities mentioned, etc. (Tsagkias et al., 2009)
These do not leverage the graph structure!
Online forum thread analysis:
• Prediction of thread overall quality. Also includes
rudimentary graph features (Lee et al. 2014)
Only very simple comment tree features.
#14

Related Work: Post Diffusion
Twitter hashtag popularity prediction:
• Prediction based on adoption graph properties and
communities (Weng et al., 2014)
• Prediction based on geolocation and adoption graph
conductance (Bora et al., 2015)
Facebook post share count prediction:
• Prediction based on share graph, author and
temporal property features (Cheng et al., 2014)
Setting different than online discussion mining.
#15

Related Work: Discussion Mining
Uncovering other graph based qualities:
• Comment tree h-index hypothesized to be a proxy
for discussion controversiality (Gomez et al., 2008)
• A user-comment h-index variation hypothesized to
be a proxy for political discussion deliberation
(Gonzalez-Bailon et al., 2010)
• Share graph Wiener index shown to be a proxy of
quality/interestingness of the initial post via an SIS
diffusion simulation (Goel et al., 2015)
Indices not applied for popularity prediction!
#16

Comment Tree Features
#17
• Quantification of depth, width, bushy-ness, the
existence of multiple long threads and branching
complexity of comment tree structure.

User Graph Features
#18
• Quantification of user recurrence and branching
complexity of user graph structure.

Temporal Features
#19
• Quantification of growth rate of the discussion using
simple measures borrowed from (Cheng et al. 2014)

Datasets
#20
• We used three datasets of news story posts and
online discussions.
• RedditNews dataset: Sample of posts in news-based
subreddits made in 2014 (thanks derp.institute!)

Evaluation: Model building
• Random Forest regression to handle inhomogeneous
features
• Prediction targets:
– Comment count: all datasets
– User count: eponymous users only; all datasets
– Score: #upvotes - #downvotes; RedditNews only
– Controversiality: #disagreements; RedditNews only
• Score and Controversiality are penalized for small
number of votes
• Different models built for different timepoints in the
discussion evolution corresponding to 1%-14% of the
stories lifetime (1% ~ 10 minutes)
#21

Evaluation: VS Simple Graph Features
#22
• The proposed graph features capture lead to better
prediction compared to rudimentary graph features.
• Large improvement when target is controversiality.

Evaluation: VS Simple Graph Features
#23

Evaluation: Feature Type Comparison
#24
• Comment tree features good for comment
prediction and user graph for user count.
• Integration of all feature types yields best results,
except for controversiality where all_graph is best.

Evaluation: Feature Type Comparison
#25

Evaluation: Top-100 Controversial
#26
• Which stories will be the most controversial?
• We report the Jaccard Coefficient (x100) between
true top-100 and the top-100 predicted by the
prediction framework using the three feature sets.

Conclusion
• Key contributions
– Improved popularity prediction using lightweight graph-
based features
– Post controversiality prediction showed significant
improvement
• Future Work
– Leveraging other information modalities, such as text
– Investigate dependence on the topic category of a story or
the type of the post (e.g., text post or multimedia).
#27

Thank you!
• Resources:
Code: https://github.com/MKLab-ITI/news-popularity-prediction
Online demo: http://reveal-mklab.iti.gr/reveal/popularity
• Get in touch:
@sympap / papadop@iti.gr
@georgios_rizos/ georgerizos@iti.gr
#28

References (1/2)
• A. Tatar, P. Antoniadis, M. D. De Amorim, and S. Fdida. From
popularity prediction to ranking online news. Social Network
Analysis and Mining, 4(1):1–12, 2014.
• M. Tsagkias, W. Weerkamp, and M. De Rijke. Predicting the volume
of comments on online news stories. In Proceedings of the 18th
ACM Conference on Information and Knowledge Management,
pages 1765–1768. ACM, 2009.
• J. Lee, M. Yang, and H. Rim. Discovering high-Quality threaded
discussions in online forums. Journal of Computer Science and
Technology, 29(3):519–531, 2014.
• L. Weng, F. Menczer, and Y.-Y. Ahn. Predicting successful
memes using network and community structure. arXiv
preprint arXiv:1403.6199, 2014.
• S. Bora, H. Singh, A. Sen, A. Bagchi, and P. Singla. On the
role of conductance, geography and topology in predicting
hashtag virality. arXiv preprint arXiv:1504.05351, 2015.
#29

References (2/2)
• J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and
J. Leskovec. Can cascades be predicted? In Proceedings of
the 23rd international conference on World Wide Web,
pages 925–936, 2014.
• V. G´omez, A. Kaltenbrunner, and V. L´opez. Statistical
analysis of the social network and discussion threads in
slashdot. In Proceedings of the 17th intern. conference on
World Wide Web, pages 645–654. ACM, 2008.
• S. Gonzalez-Bailon, A. Kaltenbrunner, and R. E. Banchs. The
structure of political discussion networks: a model for the
analysis of online deliberation. Journal of Information
Technology, 25(2):230–243, 2010.
• S. Goel, A. Anderson, J. Hofman, and D.J. Watts. The structural
virality of online diffusion. Management Science, 62(1): 180–
196, 2015
#30

Predicting News Popularity by Mining Online Discussions

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Predicting News Popularity by Mining Online Discussions

Ähnlich wie Predicting News Popularity by Mining Online Discussions (20)

Mehr von Symeon Papadopoulos

Mehr von Symeon Papadopoulos (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Predicting News Popularity by Mining Online Discussions

Hinweis der Redaktion