1. Detecting, Modeling, & Predicting
User Temporal Intention
in Social Media
Hany M. SalahEldeen
Old Dominion University
Department of Computer Science
Advisor: Dr. Michael L. Nelson
TPDL ‘12 Doctoral Consortium
Hany SalahEldeen & Michael Nelson Doctoral Consortium
2. Let’s breakdown the title first…
Detecting, Modeling, & Predicting
User Temporal Intention
in Social Media
Hany SalahEldeen & Michael Nelson Doctoral Consortium
3. Let’s breakdown the title first…
Detecting, Modeling, & Predicting
User Temporal Intention
in Social Media
Hany SalahEldeen & Michael Nelson Doctoral Consortium
4. Let’s breakdown the title first…
Detecting, Modeling, & Predicting
User Temporal Intention
in Social Media
Hany SalahEldeen & Michael Nelson Doctoral Consortium
5. Scenario 1:
Jenny reading Jeff’s tweets
Hany SalahEldeen & Michael Nelson Doctoral Consortium
6. Michael Jackson Dies
Snapshot on: June 25th 2009
http://web.archive.org/web/20090625232522/http://www.cnn.com/
Hany SalahEldeen & Michael Nelson Doctoral Consortium
7. Jeff tweets about it…
Published on: June 25th 2009
https://twitter.com/mdnitehk/status/2333993907
Hany SalahEldeen & Michael Nelson Doctoral Consortium
8. Jenny is off the grid…
Jeff’s friend Jenny was on a vacation in Hawaii for a
month
Hany SalahEldeen & Michael Nelson Doctoral Consortium
9. Jenny starts catching up a month later
When she came back she checked Jeff’s tweets and
was shocked!
Read on: July26th 2009
https://twitter.com/mdnitehk/status/2333993907
Hany SalahEldeen & Michael Nelson Doctoral Consortium
10. Jenny follows the link on July 26th
She quickly clicked on the link in the tweet…
CNN page on:
July 26th 2009
http://web.archive.org/web/20090726234411/http://www.cnn.com/
Hany SalahEldeen & Michael Nelson Doctoral Consortium
11. Jenny is confused!
• Implication:
• Jenny thought Jeff is making a joke about her
favorite singer and she got mad at him
• Problem:
• The tweet and the resource the tweet links
to have become unsynchronized.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
12. Scenario 2:
The Egyptian Revolution
Hany SalahEldeen & Michael Nelson Doctoral Consortium
14. Reading about it in Storify.com a year
later in March 2012
http://storify.com/maq4sure/egypts-revolution
Hany SalahEldeen & Michael Nelson Doctoral Consortium
15. I noticed some shared images are missing
http://storify.com/maq4sure/egypts-revolution
Hany SalahEldeen & Michael Nelson Doctoral Consortium
16. Some tweets are still intact
https://twitter.com/miss_amy_qb/status/32477898581483521
Hany SalahEldeen & Michael Nelson Doctoral Consortium
17. …and some lost their meaning with
the disappearance of the images
https://twitter.com/aishes/status/32485352102952960
Missing ?
https://twitter.com/omar_chaaban/status/32203697597452289
Hany SalahEldeen & Michael Nelson Doctoral Consortium
18. The tweet remains but the shared
image disappeared…
http://yfrog.com/h5923xrvbqqvgzj
Hany SalahEldeen & Michael Nelson Doctoral Consortium
19. Cairo….we have a problem!
• Implication:
• The reader cannot understand what the
author of the tweet meant because the image
is not available.
• Problem:
• The post is available but the linked resource
(image) is completely missing.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
20. …back to the title
Detecting, Modeling, & Predicting
User Temporal Intention
in Social Media
Hany SalahEldeen & Michael Nelson Doctoral Consortium
21. …back to the title
Detecting, Modeling, & Predicting
User Temporal Intention
in Social Media
Hany SalahEldeen & Michael Nelson Doctoral Consortium
22. The Anatomy of a Tweet
Hany SalahEldeen & Michael Nelson Doctoral Consortium
23. The Anatomy of a Tweet
Author’s username
Other user mention
Social
Post Tweet Body
Interaction Publishing Shortened URL Hash Tag
options timestamp to resource
Shared Resource
Hany SalahEldeen & Michael Nelson Doctoral Consortium
24. 3 URIs = 3 Chances to fail
Hany SalahEldeen & Michael Nelson Doctoral Consortium
26. User’s Temporal Intention
The Focus of our research Instrumented shortener
Share time Implicit Explicit
Click time Implicit Explicit
Instrumented web client
Out of our scope
Purview of Facebook, Engineering problem
Twitter, Google, …etc
Solved by providing
tools
Hany SalahEldeen & Michael Nelson Doctoral Consortium
27. Sometimes you want a
previous version
The Correct Temporal
Intention
CNN.com at the closest time to the tweet: 25th June 2009 ~ 7pm
Hany SalahEldeen & Michael Nelson Doctoral Consortium
28. Sometimes you want the
current version
The Correct Temporal
Intention
In this case the current state of the press releases page
Hany SalahEldeen & Michael Nelson Doctoral Consortium
29. Research Question
Can we estimate the users’
intention at the time of posting
and reading to predict and
maintain temporal consistency?
Hany SalahEldeen & Michael Nelson Doctoral Consortium
30. Research Goals
• Detect the temporal intention of the:
1. Author upon sharing time
2. The reader upon dereferencing time
• Model this intention as a function of time, nature of the resource,
and its context.
• Predict how resources change with time and the intention behind
sharing them to minimize inconsistency.
• Implement the prediction model to automatically preserve
vulnerable social content that is prone to change or loss.
• Create an environment implementing this framework that
provides a smooth temporal navigation of the social web.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
31. Related Work
• User’s Web Search Intention • Persistence of shared resources
– A. Ashkan ECIR ’09 – M. Nelson D-Lib ‘02
– C. Lee AINA ‘05 – R. Sanderson OR’11
– A. Loser IRSW ‘08 – F. McCown JCDL ‘07
– L. Azzopardi ECIR ‘09
– R. Baeza-Yates SPIR‘06
– N. Dai HT ’11
• URL Shortening
– D. Antoniades WWW ’11
• Commercial Intention
– Q. Guo SIGIR ’10 • Tweeting, Micro-blogging and Popularity
– A. Benczur AIRWeb ’07
– S. Wu WWW ’11
– A. Java SNA-KDD ’07
• Sentiment Analysis
– H. Kwak WWW ’10
– G. Mishne AAAI ‘06
– J. Bollen JCS ‘11
• Social Networks Growth and Evolution
• Access to Archives
– B. Meeder WWW ’11
– H. Van de Sompel OR‘09
Hany SalahEldeen & Michael Nelson Doctoral Consortium
32. Dissertation Plan
BEGIN
Read Literature
Collect Datasets
Analyze Archives Coverage
Analyze Shortened URIs
Prototype Application
Analyze Shared Resources Persistence and Coverage
Current
Analyze Contextual Intention
State
Create Intention-based dataset
Extract Intention Features
Train a Parametric Model to predict intention
Evaluate, test, cross-validate the model
Create a mockup application
Extend the model to induce preservation
Finish Writing the Dissertation
PhD Defense
Hany SalahEldeen & Michael Nelson Doctoral Consortium
33. Dissertation Plan
BEGIN
Read Literature
Collect Datasets
Analyze Archives Coverage
Analyze Shortened URIs
Prototype Application
Analyze Shared Resources Persistence and Coverage
Analyze Contextual Intention
Create Intention-based dataset
Extract Intention Features
Train a Parametric Model to predict intention
Evaluate, test, cross-validate the model
Create a mockup application
Extend the model to induce preservation
Finish Writing the Dissertation
PhD Defense
Hany SalahEldeen & Michael Nelson Doctoral Consortium
34. Estimating Web Archiving Coverage
• Goal: Estimate how much of the public web is present in the public archives
and how many copies are available?
• Action:
– Getting 4 different datasets from 4 different sources:
• Search Engines Indices
• Bit.ly
• DMOZ
• Delicious.
• Results: *
* Table Courtesy of
Ahmed AlSum JCDL 2011
• Publications:
– How much of the web is archived? JCDL '11
Hany SalahEldeen & Michael Nelson Doctoral Consortium
35. Dissertation Plan
BEGIN
Read Literature
Collect Datasets
Analyze Archives Coverage
Analyze Shortened URIs
Prototype Application
Analyze Shared Resources Persistence and Coverage
Analyze Contextual Intention
Create Intention-based dataset
Extract Intention Features
Train a Parametric Model to predict intention
Evaluate, test, cross-validate the model
Create a mockup application
Extend the model to induce preservation
Finish Writing the Dissertation
PhD Defense
Hany SalahEldeen & Michael Nelson Doctoral Consortium
36. Shortened URI analysis
• Goal: Have a better understanding of URI shortening and
resolving, understand the effect of time on this process and the correlation
between the page’s features and characteristics, and its resolution.
• Action:
– Fresh Bit.lys
– Get hourly clicklogs, rate of change, social networking spread, and other
contextual information
– Longitudinal study
• Evaluation:
– Compare results with frequency of change analysis of Cho and Garcia-
Molina.
– Compare results with Antoniades et al. WWW 2011.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
37. Dissertation Plan
BEGIN
Read Literature
Collect Datasets
Analyze Archives Coverage
Analyze Shortened URIs
Prototype Application
Analyze Shared Resources Persistence and Coverage
Analyze Contextual Intention
Create Intention-based dataset
Extract Intention Features
Train a Parametric Model to predict intention
Evaluate, test, cross-validate the model
Create a mockup application
Extend the model to induce preservation
Finish Writing the Dissertation
PhD Defense
Hany SalahEldeen & Michael Nelson Doctoral Consortium
38. Estimating Loss of Shared
Resources in Social Media
• Goal: Estimate how much of the public web is present in the public archives
and how many copies are available?
• Action:
– Sampling from 6 public events
– Events spanning 3 years
– Existence in the current web
– Existence in the public archives
– Find relation with time
• Results:
– After 1st year ~11% will be lost
– After that we will continue on losing 0.02% daily
• Publications:
– A year after the Egyptian revolution, 10% of the social media documentation is gone.
http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
– Losing my revolution: How Many Resources Shared on Social Media Have Been
Lost? TPDL '12
Hany SalahEldeen & Michael Nelson Doctoral Consortium
39. Dissertation Plan
BEGIN
Read Literature
Collect Datasets
Analyze Archives Coverage
Analyze Shortened URIs
Prototype Application
Analyze Shared Resources Persistence and Coverage
User Intention Analysis
Create Intention-based dataset
Extract Intention Features
Train a Parametric Model to predict intention
Evaluate, test, cross-validate the model
Create a mockup application
Extend the model to induce preservation
Finish Writing the Dissertation
PhD Defense
Hany SalahEldeen & Michael Nelson Doctoral Consortium
40. User Intention Analysis
• Goal: Have a better understanding of User Intention and what factors affect
it. Also create a new testing and training set.
• Action:
– Get a sample set of tweets selected at random
– Extract the URIs
– Get closest Memento
– Download the snapshot & current version
– Use Amazon’s Mechanical Turk in choosing the best version
• Evaluation:
– Measure cross-rater agreement and confidence.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
41. Proposed Work
• Data Gathering
• Feature Extraction
• Modeling the intention engine
• Evaluation
• Application: Prediction and Preservation
Hany SalahEldeen & Michael Nelson Doctoral Consortium
43. Possible Solution for Jenny
The resource has changed since last time it was shared
Do you wish to see the version the author intended or
the current version?
Current Version Intended Version
Hany SalahEldeen & Michael Nelson Doctoral Consortium
44. Proposed Framework
Archived Version
Feature
Classifier
Extraction
Example Features: Current Version
- Tweet Content
- Click Logs
- Other Tweets
- Shared Resource
- Timemaps
Hany SalahEldeen & Michael Nelson Doctoral Consortium
50. My Publications
• S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much
of the web is archived? In Proceedings of the 11th annual international ACM/IEEE
joint conference on Digital libraries, JCDL '11, pages 133{136, 2011.
• H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media
content has been lost? Accepted in TPDL 2012
• H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian
revolution, 10% of the social media documentation is gone. http://ws-
dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
51. References
• D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short
urls. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 715 {724, New York, NY, USA,
2011. ACM.
• A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th
European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 578{586, Berlin, Heidelberg, 2009.
Springer-Verlag.
• L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In
Proceedings DIR-2006, 2006.
• R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and
M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages
98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9.
• A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd
international workshop on Adversarial information retrieval on the web, AIRWeb '07, pages 89{92, New York, NY, USA, 2007.
ACM.
• J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010.
• N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM
conference on Hypertext and hypermedia, HT '11, pages 17{26, New York, NY, USA, 2011. ACM.
• N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20 th international conference
companion on World wide web, WWW '11, pages 29{30, New York, NY, USA, 2011. ACM.
• K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques
coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in
Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin /
Heidelberg, 2007. 10.1007/978-3-540-77485-3 11.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
52. References
• Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd
international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 130{137, New York, NY, USA,
2010. ACM.
• A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th
WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56{65, New York, NY,
USA, 2007. ACM.
• H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international
conference on World wide web, WWW '10, pages 591{600, New York, NY, USA, 2010. ACM.
• C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced
Information Networking and Applications - Volume 2, AINA '05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society.
• A. Loser, W. M. Barczynski, and F. Brauer. What's the intention behind your query? a few observations from a large developer community.
In IRSW, 2008.
• F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL '07: Proceedings of
the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007.
• B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times
in twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 517{526, New York, NY, USA, 2011.
ACM.
• G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing
Weblogs (AAAI-CAAW), 2006.
• M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002.
• R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento. CoRR,
abs/1105.3459, 2011.
• H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. CoRR,
abs/0911.1112, 2009.
• S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international
conference on World wide web, WWW '11, pages 705{714, New York, NY, USA, 2011. ACM.
Hany SalahEldeen & Michael Nelson Doctoral Consortium