The document discusses how big data from the internet has changed our understanding of the world. It defines big data as unprecedented in scale and scope relating to a given phenomenon, often accumulating large volumes of streaming data at high velocity. Examples are given of academic research projects at the Oxford Internet Institute that have analyzed large datasets from social media and search engines to gain new insights into political discussions, information seeking behavior, and social networks. Challenges and opportunities of analyzing big data are also outlined.
The Internet is Big Data: How internet research has changed our understanding of the world
1. The Internet is Big Data
How internet research has changed our
understanding of the world
Ralph Schroeder, Professor & MSc Programme Director
Eric T. Meyer, Research Fellow & DPhil Programme Director @etmeyer
http://www.slideshare.net/etmeyer/2012oiisvco
Silicon Valley Comes to Oxford 20:20 Talk
18 November 2012
3. Introduction
Big Data: Our definition
Big data are data that are unprecedented in scale and scope
in relation to a given phenomenon.
They are often streams of data (rather than fixed datasets),
accumulating large volumes, often at high velocity.
4. Business Value and Academic Value
Strategic Knowledge
• Generally time-limited (with exceptions)
• Value comes from knowing what your competitors don’t
• Often has high monetary value if it can be exploited
5. Business Value and Academic Value
Durable Knowledge
• Less time-limited (with exceptions)
• Value comes from adding to the world’s knowledge (the global
brain is cumulative/scientific)
• Rarely has direct monetary value, but has value in terms of
creating the possibility both of future knowledge and of future
exploitation and commercial uses
6. Mark Graham & Bernie
Hogan’s project
investigated inequalities
in the creation of
knowledge
The map shown here
reveals uneven spread of
geo-tagged Wikipedia
articles 2011-12
7. Sandra Gonzalez-Bailon et al.
USENET Political Discussions (1999-2005)
8
x 10000
6
4
2
0
09/1999 09/2000 09/2001 09/2002 09/2003 09/2004
gun white war war world war war
white
black
gun white news people
world gun white
news
white
news
time world
good time
news war news terrorist world time people
peace gun death
people people people
peoplegood
news time hate house dead
black terrorist party
hate dead
hatewhite
socialdead
hate world war house good f ree man truth lie
man house good man
party free
deathgood black death hate partydeath house party
mancrime time man dead black free black lie god death
truth gun torture fraud win free
house time house peace truth free
letter god terrorist gun black
money boy
abortion flag
party
world
good cut Showed the connection between emotional
death reactions and U.S. presidential approval ratings,
power
but also how emotional language underwent
0:1
0:1
0:1
0:1
0:1
hate man
fraud free long-term shifts after events such as 9/11
truthcrime
Source: González-Bailón, S., Banchs, R. E., & Kaltenbrunner, A. (2012). Emotions, Public Opinion, and U.S. Presidential Approval Rates: A 5-Year Analysis of
Online Political Discussions. Human Communication Research, 38(2), 121-143.
8. Twitter-bots
OII master’s students Alexander Furnas and Devin Gaffney saw a large spike in then-US
presidential candidate Mitt Romney’s Twitter followers, and decided to look at the new
followers:
Furnas, A. and Gaffney, D. (2012). ‘Statistical Probability That Mitt Romney's New Twitter Followers Are Just Normal Users: 0%’. The Atlantic, July 31,
http://www.theatlantic.com/technology/archive/2012/07/statistical-probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31, 2012).
13. Hi
Idea Idea
? Idea !
!
Idea Idea
Idea
! Idea
? Idea
Idea
! ?
!
Idea
?
?
14. Hi
Idea Idea
? Idea !
Idea Suggestion Idea Comment Link Copy
!
Idea Idea
Idea
Share !Question Thought Information Idea
Feeling Action Fact Idea Question Comment
? Idea
Action Link Suggestion Share Thought Idea
Idea
Reply Retweet Support Cats Deny Reply !Link ?
Thought Question Fact Comment Idea
!
Suggestion Information Information Action
Comment Action Link Query Comment Idea
Suggestion Denial Support Retweet Thought
?
?
16. ?
?
?
?
?
?
?
?
?
Source: Waller, V. (2011). “Not Just Information:Who Searches for What on the Search Engine Google?” Journal of the American Society for Information Science &
Technology 62(4): 761-775.
17. ?
?
?
“Surprisingly, ?the distribution of
?
types of search query did not vary
?
significantly across the different
?
?
Lifestyle Groups (p>0.01).”
?
Source: Waller, V. (2011). “Not Just Information:Who Searches for What on the Search Engine Google?” Journal of the American Society for Information Science &
Technology 62(4): 761-775.
20. Big Data Analytics
• Cost of analytical tools
• Access to data
• Why should anyone share?
• Skills to use the tools
• How different skills and disciplines work together
• From Big Data to Big (Hi-res) Picture
• Marketing Tailoring
• Forecasting Prediction
• A/B and other experiments
• Complex Trends Linking datasets plus modelling
21. Fishing for Knowledge Data
Trawling the Sea of Bigin a Sea of Information
Image sources (All CC): http://www.flickr.com/photos/mikecogh/6610354927/; http://www.flickr.com/photos/ponyboy101/2278057689/; http://www.flickr.com/photos/buzzhoffman/4140232128/; http://www.flickr.com/photos/circulating/1785350080/
22. Additional readings and references
Bond, Robert et al. (2012). ‘A 61-million-person experiment in social influence and political mobilization’,
Nature 489: 295–298.
Bruns, A. and Liang, Y.E. (2012). ‘Tools and methods for capturing Twitter data during natural disasters’, First
Monday, 17 (4 – 2), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/3937/3193
Furnas, A. and Gaffney, D. (2012). ‘Statistical Probability That Mitt Romney's New Twitter Followers Are Just
Normal Users: 0%’. The Atlantic, July 31, http://www.theatlantic.com/technology/archive/2012/07/statistical-
probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31,
2012).
Giles, J. (2012). ‘Making the Links: From E-mails to Social Networks, the Digital Traces left Life in the
Modern World are Transforming Social Science’, Nature, 488: 448-50.
Kwak, H. et al. (2010). ‘What is Twitter, a Social Network or a News Media?’ Proceedings of the 19th
International World Wide Web (WWW) Conference, April 26-30, 2010, Raleigh NC.
Manyika, J. et al. (2011). ‘Big data: the next frontier for innovation, competition and productivity’, McKinsey
Global Institute, available at: http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/
big_data_the_next_frontier_for_innovation (last accessed August 29, 2012).
Silver, Nate. (2012). The Signal and the Noise: The Art and Science of Prediction. London: Allen Lane.
Tancer, B. (2009). Click: What Millions of People are Doing Online and Why It Matters. New York: Harper
Collins, 2009.
Wu, S. , J.M. Hofman, W.A. Mason, and D.J. Watts, (2011). ‘Who says what to whom on twitter’, Proceedings
of the 20th international conference on World Wide Web. (on Duncan Watts webpage,
http://research.microsoft.com/en-us/people/duncan/, last accessed August 29, 2012).
23. Oxford Internet Institute
Ralph Schroeder Eric T. Meyer
ralph.schroeder@oii.ox.ac.uk eric.meyer@oii.ox.ac.uk
http://www.oii.ox.ac.uk/people/?id=26 http://www.oii.ox.ac.uk/people/?id=120
@etmeyer
http://www.slideshare.net/etmeyer/2012oiisvco
See http://www.oii.ox.ac.uk/research/projects/?id=98
With support from: