SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Predicting News Popularity by
Mining Online Discussions
Georgios Rizos, Symeon Papadopoulos and Yiannis Kompatsiaris
Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
SNOW/WWW 2016, April 12, 2016, Montréal, Québec, Canada.
Popularity
Operational definition ( $$$)
• Number of views
• Number of comments
• Number of users commenting
• Number of shares
• Number of thumbs up, thumbs down
….
• Bonus: Controversiality
#2
#3
Overview
#4
Online discussions 
Comment trees/user graphs 
Tree/Graph Features 
Predictive model
Comment Trees and User Graphs
#5
User X
I blah blah….
User Y
But bla bla….
User X
No!
User W
Are you kiddin?
User Z
Wat?
X
X
X
Z
Y
W
Y
WZ
A
Comment Tree
User Graph
User A story poster
A
#6
Sony Hack #1 VS Sony Hack #2
#7
#8
10 mins
22
#9
30 mins
44
#10
1 hour
713
#11
1 ½ hour
57 10
#12
14358
2 ¼ hours
#13
Degree
Depth
Elements of an Engaging Discussion
Related Work: Post Popularity
News story discussion size prediction:
• Modelling the comment timestamp time-series
(Tatar et al., 2014)
• Feature set relevant to time of posting (hour, day),
entities mentioned, etc. (Tsagkias et al., 2009)
These do not leverage the graph structure!
Online forum thread analysis:
• Prediction of thread overall quality. Also includes
rudimentary graph features (Lee et al. 2014)
Only very simple comment tree features.
#14
Related Work: Post Diffusion
Twitter hashtag popularity prediction:
• Prediction based on adoption graph properties and
communities (Weng et al., 2014)
• Prediction based on geolocation and adoption graph
conductance (Bora et al., 2015)
Facebook post share count prediction:
• Prediction based on share graph, author and
temporal property features (Cheng et al., 2014)
Setting different than online discussion mining.
#15
Related Work: Discussion Mining
Uncovering other graph based qualities:
• Comment tree h-index hypothesized to be a proxy
for discussion controversiality (Gomez et al., 2008)
• A user-comment h-index variation hypothesized to
be a proxy for political discussion deliberation
(Gonzalez-Bailon et al., 2010)
• Share graph Wiener index shown to be a proxy of
quality/interestingness of the initial post via an SIS
diffusion simulation (Goel et al., 2015)
Indices not applied for popularity prediction!
#16
Comment Tree Features
#17
• Quantification of depth, width, bushy-ness, the
existence of multiple long threads and branching
complexity of comment tree structure.
User Graph Features
#18
• Quantification of user recurrence and branching
complexity of user graph structure.
Temporal Features
#19
• Quantification of growth rate of the discussion using
simple measures borrowed from (Cheng et al. 2014)
Datasets
#20
• We used three datasets of news story posts and
online discussions.
• RedditNews dataset: Sample of posts in news-based
subreddits made in 2014 (thanks derp.institute!)
Evaluation: Model building
• Random Forest regression to handle inhomogeneous
features
• Prediction targets:
– Comment count: all datasets
– User count: eponymous users only; all datasets
– Score: #upvotes - #downvotes; RedditNews only
– Controversiality: #disagreements; RedditNews only
• Score and Controversiality are penalized for small
number of votes
• Different models built for different timepoints in the
discussion evolution corresponding to 1%-14% of the
stories lifetime (1% ~ 10 minutes)
#21
Evaluation: VS Simple Graph Features
#22
• The proposed graph features capture lead to better
prediction compared to rudimentary graph features.
• Large improvement when target is controversiality.
Evaluation: VS Simple Graph Features
#23
Evaluation: Feature Type Comparison
#24
• Comment tree features good for comment
prediction and user graph for user count.
• Integration of all feature types yields best results,
except for controversiality where all_graph is best.
Evaluation: Feature Type Comparison
#25
Evaluation: Top-100 Controversial
#26
• Which stories will be the most controversial?
• We report the Jaccard Coefficient (x100) between
true top-100 and the top-100 predicted by the
prediction framework using the three feature sets.
Conclusion
• Key contributions
– Improved popularity prediction using lightweight graph-
based features
– Post controversiality prediction showed significant
improvement
• Future Work
– Leveraging other information modalities, such as text
– Investigate dependence on the topic category of a story or
the type of the post (e.g., text post or multimedia).
#27
Thank you!
• Resources:
Code: https://github.com/MKLab-ITI/news-popularity-prediction
Online demo: http://reveal-mklab.iti.gr/reveal/popularity
• Get in touch:
@sympap / papadop@iti.gr
@georgios_rizos/ georgerizos@iti.gr
#28
References (1/2)
• A. Tatar, P. Antoniadis, M. D. De Amorim, and S. Fdida. From
popularity prediction to ranking online news. Social Network
Analysis and Mining, 4(1):1–12, 2014.
• M. Tsagkias, W. Weerkamp, and M. De Rijke. Predicting the volume
of comments on online news stories. In Proceedings of the 18th
ACM Conference on Information and Knowledge Management,
pages 1765–1768. ACM, 2009.
• J. Lee, M. Yang, and H. Rim. Discovering high-Quality threaded
discussions in online forums. Journal of Computer Science and
Technology, 29(3):519–531, 2014.
• L. Weng, F. Menczer, and Y.-Y. Ahn. Predicting successful
memes using network and community structure. arXiv
preprint arXiv:1403.6199, 2014.
• S. Bora, H. Singh, A. Sen, A. Bagchi, and P. Singla. On the
role of conductance, geography and topology in predicting
hashtag virality. arXiv preprint arXiv:1504.05351, 2015.
#29
References (2/2)
• J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and
J. Leskovec. Can cascades be predicted? In Proceedings of
the 23rd international conference on World Wide Web,
pages 925–936, 2014.
• V. G´omez, A. Kaltenbrunner, and V. L´opez. Statistical
analysis of the social network and discussion threads in
slashdot. In Proceedings of the 17th intern. conference on
World Wide Web, pages 645–654. ACM, 2008.
• S. Gonzalez-Bailon, A. Kaltenbrunner, and R. E. Banchs. The
structure of political discussion networks: a model for the
analysis of online deliberation. Journal of Information
Technology, 25(2):230–243, 2010.
• S. Goel, A. Anderson, J. Hofman, and D.J. Watts. The structural
virality of online diffusion. Management Science, 62(1): 180–
196, 2015
#30

Weitere ähnliche Inhalte

Ähnlich wie Predicting News Popularity by Mining Online Discussions

Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Marco Brambilla
 
The Elusive Nature of Software Documentation
The Elusive Nature of Software DocumentationThe Elusive Nature of Software Documentation
The Elusive Nature of Software Documentation
Margaret-Anne Storey
 

Ähnlich wie Predicting News Popularity by Mining Online Discussions (20)

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social Network
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter Data
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social Media
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Social media as a tool for terminological research
Social media as a tool for terminological researchSocial media as a tool for terminological research
Social media as a tool for terminological research
 
Methods and Tools for Facilitating Social Participation
Methods and Tools for Facilitating Social ParticipationMethods and Tools for Facilitating Social Participation
Methods and Tools for Facilitating Social Participation
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
Univ. of AZ Global Racing Symposium 2015 - Digital Strategies
Univ. of AZ Global Racing Symposium 2015 - Digital StrategiesUniv. of AZ Global Racing Symposium 2015 - Digital Strategies
Univ. of AZ Global Racing Symposium 2015 - Digital Strategies
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
 
New skill sets for journalism
New skill sets for journalismNew skill sets for journalism
New skill sets for journalism
 
Twitter in Academic Conferences
Twitter in Academic ConferencesTwitter in Academic Conferences
Twitter in Academic Conferences
 
Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots? Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots?
 
The Elusive Nature of Software Documentation
The Elusive Nature of Software DocumentationThe Elusive Nature of Software Documentation
The Elusive Nature of Software Documentation
 
Inferring social media user attributes using language and network information
Inferring social media user attributes using language and network informationInferring social media user attributes using language and network information
Inferring social media user attributes using language and network information
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 

Mehr von Symeon Papadopoulos

Mehr von Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 

Kürzlich hochgeladen

Capstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfCapstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdf
eliklein8
 
Sociocosmos empowers you to go trendy on social media with a few clicks..pdf
Sociocosmos empowers you to go trendy on social media with a few clicks..pdfSociocosmos empowers you to go trendy on social media with a few clicks..pdf
Sociocosmos empowers you to go trendy on social media with a few clicks..pdf
SocioCosmos
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdf
eliklein8
 
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
ZurliaSoop
 
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
Cara Menggugurkan Kandungan 087776558899
 
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Heena Escort Service
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
Health
 
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
Cara Menggugurkan Kandungan 087776558899
 

Kürzlich hochgeladen (20)

Capstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfCapstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdf
 
Capstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutionCapstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolution
 
Sociocosmos empowers you to go trendy on social media with a few clicks..pdf
Sociocosmos empowers you to go trendy on social media with a few clicks..pdfSociocosmos empowers you to go trendy on social media with a few clicks..pdf
Sociocosmos empowers you to go trendy on social media with a few clicks..pdf
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdf
 
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
 
Coorg Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Coorg Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsCoorg Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Coorg Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfSEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
 
Marketing Plan - Social Media. The Sparks Foundation
Marketing Plan -  Social Media. The Sparks FoundationMarketing Plan -  Social Media. The Sparks Foundation
Marketing Plan - Social Media. The Sparks Foundation
 
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
 
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
 
Enhancing Consumer Trust Through Strategic Content Marketing
Enhancing Consumer Trust Through Strategic Content MarketingEnhancing Consumer Trust Through Strategic Content Marketing
Enhancing Consumer Trust Through Strategic Content Marketing
 
Kayamkulam Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kayamkulam Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKayamkulam Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kayamkulam Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIRBVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
 
The Butterfly Effect
The Butterfly EffectThe Butterfly Effect
The Butterfly Effect
 
Sri Ganganagar Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Sri Ganganagar Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsSri Ganganagar Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Sri Ganganagar Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Madikeri Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Madikeri Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsMadikeri Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Madikeri Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
 
Jhunjhunu Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Jhunjhunu Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsJhunjhunu Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Jhunjhunu Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
 
Content strategy : Content empire and cash in
Content strategy : Content empire and cash inContent strategy : Content empire and cash in
Content strategy : Content empire and cash in
 

Predicting News Popularity by Mining Online Discussions

  • 1. Predicting News Popularity by Mining Online Discussions Georgios Rizos, Symeon Papadopoulos and Yiannis Kompatsiaris Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) SNOW/WWW 2016, April 12, 2016, Montréal, Québec, Canada.
  • 2. Popularity Operational definition ( $$$) • Number of views • Number of comments • Number of users commenting • Number of shares • Number of thumbs up, thumbs down …. • Bonus: Controversiality #2
  • 3. #3
  • 4. Overview #4 Online discussions  Comment trees/user graphs  Tree/Graph Features  Predictive model
  • 5. Comment Trees and User Graphs #5 User X I blah blah…. User Y But bla bla…. User X No! User W Are you kiddin? User Z Wat? X X X Z Y W Y WZ A Comment Tree User Graph User A story poster A
  • 6. #6 Sony Hack #1 VS Sony Hack #2
  • 7. #7
  • 13. #13 Degree Depth Elements of an Engaging Discussion
  • 14. Related Work: Post Popularity News story discussion size prediction: • Modelling the comment timestamp time-series (Tatar et al., 2014) • Feature set relevant to time of posting (hour, day), entities mentioned, etc. (Tsagkias et al., 2009) These do not leverage the graph structure! Online forum thread analysis: • Prediction of thread overall quality. Also includes rudimentary graph features (Lee et al. 2014) Only very simple comment tree features. #14
  • 15. Related Work: Post Diffusion Twitter hashtag popularity prediction: • Prediction based on adoption graph properties and communities (Weng et al., 2014) • Prediction based on geolocation and adoption graph conductance (Bora et al., 2015) Facebook post share count prediction: • Prediction based on share graph, author and temporal property features (Cheng et al., 2014) Setting different than online discussion mining. #15
  • 16. Related Work: Discussion Mining Uncovering other graph based qualities: • Comment tree h-index hypothesized to be a proxy for discussion controversiality (Gomez et al., 2008) • A user-comment h-index variation hypothesized to be a proxy for political discussion deliberation (Gonzalez-Bailon et al., 2010) • Share graph Wiener index shown to be a proxy of quality/interestingness of the initial post via an SIS diffusion simulation (Goel et al., 2015) Indices not applied for popularity prediction! #16
  • 17. Comment Tree Features #17 • Quantification of depth, width, bushy-ness, the existence of multiple long threads and branching complexity of comment tree structure.
  • 18. User Graph Features #18 • Quantification of user recurrence and branching complexity of user graph structure.
  • 19. Temporal Features #19 • Quantification of growth rate of the discussion using simple measures borrowed from (Cheng et al. 2014)
  • 20. Datasets #20 • We used three datasets of news story posts and online discussions. • RedditNews dataset: Sample of posts in news-based subreddits made in 2014 (thanks derp.institute!)
  • 21. Evaluation: Model building • Random Forest regression to handle inhomogeneous features • Prediction targets: – Comment count: all datasets – User count: eponymous users only; all datasets – Score: #upvotes - #downvotes; RedditNews only – Controversiality: #disagreements; RedditNews only • Score and Controversiality are penalized for small number of votes • Different models built for different timepoints in the discussion evolution corresponding to 1%-14% of the stories lifetime (1% ~ 10 minutes) #21
  • 22. Evaluation: VS Simple Graph Features #22 • The proposed graph features capture lead to better prediction compared to rudimentary graph features. • Large improvement when target is controversiality.
  • 23. Evaluation: VS Simple Graph Features #23
  • 24. Evaluation: Feature Type Comparison #24 • Comment tree features good for comment prediction and user graph for user count. • Integration of all feature types yields best results, except for controversiality where all_graph is best.
  • 25. Evaluation: Feature Type Comparison #25
  • 26. Evaluation: Top-100 Controversial #26 • Which stories will be the most controversial? • We report the Jaccard Coefficient (x100) between true top-100 and the top-100 predicted by the prediction framework using the three feature sets.
  • 27. Conclusion • Key contributions – Improved popularity prediction using lightweight graph- based features – Post controversiality prediction showed significant improvement • Future Work – Leveraging other information modalities, such as text – Investigate dependence on the topic category of a story or the type of the post (e.g., text post or multimedia). #27
  • 28. Thank you! • Resources: Code: https://github.com/MKLab-ITI/news-popularity-prediction Online demo: http://reveal-mklab.iti.gr/reveal/popularity • Get in touch: @sympap / papadop@iti.gr @georgios_rizos/ georgerizos@iti.gr #28
  • 29. References (1/2) • A. Tatar, P. Antoniadis, M. D. De Amorim, and S. Fdida. From popularity prediction to ranking online news. Social Network Analysis and Mining, 4(1):1–12, 2014. • M. Tsagkias, W. Weerkamp, and M. De Rijke. Predicting the volume of comments on online news stories. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 1765–1768. ACM, 2009. • J. Lee, M. Yang, and H. Rim. Discovering high-Quality threaded discussions in online forums. Journal of Computer Science and Technology, 29(3):519–531, 2014. • L. Weng, F. Menczer, and Y.-Y. Ahn. Predicting successful memes using network and community structure. arXiv preprint arXiv:1403.6199, 2014. • S. Bora, H. Singh, A. Sen, A. Bagchi, and P. Singla. On the role of conductance, geography and topology in predicting hashtag virality. arXiv preprint arXiv:1504.05351, 2015. #29
  • 30. References (2/2) • J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec. Can cascades be predicted? In Proceedings of the 23rd international conference on World Wide Web, pages 925–936, 2014. • V. G´omez, A. Kaltenbrunner, and V. L´opez. Statistical analysis of the social network and discussion threads in slashdot. In Proceedings of the 17th intern. conference on World Wide Web, pages 645–654. ACM, 2008. • S. Gonzalez-Bailon, A. Kaltenbrunner, and R. E. Banchs. The structure of political discussion networks: a model for the analysis of online deliberation. Journal of Information Technology, 25(2):230–243, 2010. • S. Goel, A. Anderson, J. Hofman, and D.J. Watts. The structural virality of online diffusion. Management Science, 62(1): 180– 196, 2015 #30

Hinweis der Redaktion

  1. Identifying top online news stories very early after their posting is very important. Recently, the NY Times have launched Blossom tool that recommends which of their already published stories will become viral on Facebook. Funny stories or pictures may attract lots of upvotes but insightful stories may raise discussion. The discussion structure can be analyzed in order to reveal more intricate qualities about the initial post, such as controversiality. Web publishers Little featured content real estate  select only best performing content Social media accounts  select only content that is going to create engagement NYT recently built the Blossom Slack bot to help decide which stories to post to social media
  2. For relatively early time (5%), top controversial story predicted by all_graph: Gun deaths for U.S. officers rose by 56 percent in 2014: report. Users link to related info Others claim title is worded to evoke sensationalism Others claim that civilian deaths are more important
  3. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/