SlideShare a Scribd company logo
1 of 34
Drinking from the fire hose?
The pitfalls & potential of Big Data
Josh Cowls, Oxford Internet Institute
with contributions from Eric Meyer, Ralph Schroeder and
Linnet Taylor
t2i Lab, Chalmers, 27th March 2014
Overview
• Background
• Definitions
• Innovations and implications
• Learning to drink from the fire hose
The Oxford Internet Institute
• Department of University of Oxford
• MO: ‘Understanding life online’
• Multi-disciplinary mix (social sciences plus physical and medical sciences,
and humanities)
• 45 researchers (and growing)
• 50 students (MSc Social Science of Internet; PhD programme)
• Generating big data on social, political and economic behaviour from
social media
www.oii.ox.ac.uk
• Funded by the Alfred P. Sloan Foundation
• 2012 – 2014
• Data sources:
• 120 interviews, mainly with social scientists but some
interviewees from business, government
• Reports, workshops, publications
• No representative sample, but some patterns of
disciplinary and skills background and career trajectory
NB where unattributed, quotes used in this presentation are excerpted from
interviews conducted as part of this project.
Accessing and Using Big Data to Advance Social
Science Knowledge
Big Data: our definition
Big data are data that are
unprecedented in scale and scope in
relation to a given phenomenon.
They are often streams of data (rather than fixed
datasets), accumulating large volumes, often at high
velocity.
Big Data: other definitions
• ‘Transactional’ (Margetts et al)
• ‘Things that one can do at a large scale that
cannot be done at a smaller one’ (Mayer-
Shonberger and Cukier)
• The ‘3 Vs’: volume, velocity, variety – but also
veracity, visualisability, viscosity? (Gartner)
... what Big Data isn’t
• A generalisable, quantifiable ‘amount’ of data
• A race to the top (Mutually Assured Distraction)
• The same for every discipline, field or sector
A ‘working’ definition
• The Big Data phenomenon might be less about
what the dataset is and more about how we
work with it
• (Even if this is indistinguishable in practice)
Shifts in mindset
From Mayer-Shonberger and Cukier:
• “The ability to analyse vast amounts of data
about a topic rather than be forced to settle for
smaller sets”
• “A willingness to embrace data’s real-world
messiness rather than privilege exactitude”
• “A growing respect for correlations rather than a
continuing quest for elusive causality”
Implications for research
Whither the sample?
“the sample survey[‘s] glory years ... are in the past”
Savage and Burrows, 2007
Implications for research
Whither the sample?
“sampling is like an analog photographic print. It looks good
from a distance, but as you stare closer, zooming in on a
particular detail, it gets blurry ... Often, the really interesting
things in life are found in places that samples fail to fully
catch”
Mayer-Shonberger and Cukier 2012
Implications for research
More or mess?
“social media is really, really fascinating, and the reason is
because it ... falls into this category of there’s something
there but we don’t know what it is. So you can measure
public opinion on Twitter and clearly that’s indicative of
something, but we don’t quite know what it’s representative
of”
Brandon Stewart, Harvard University Department of
Government
Implications for research
More or mess?
“the problem with the hashtag stuff [is that] we have
wonderful case studies but we don’t know what they sit in
essentially, what the framework is, if that’s 1% or 10% or
100% of the current conversation in Australia or whatever”
Axel Bruns, Queensland University of Technology
Implications for research
More or mess?
“the big problem that we haven’t cracked is that if
someone tweets a sentiment it’s not necessarily what
they’re feeling, it can be for a variety of reasons, so it doesn’t
really reflect what they feel necessarily”
Mike Thelwall, University of Wolverhampton
Implications for research
Do we care about causes?
“Big Data is all about correlation; it’s not about causation,
which means that you don’t need to have a theory
beforehand. You just start looking for correlation … so you
don’t have any idea about the structure of the data, you just
find a funny correlation.”
Sara Esposti, Open University Business School
Implications for research
Do we care about causes?
“a central concern of social science is, we don’t just want to
find statistical associations, we actually want to uncover the
underlying causal processes by which social systems work ...
The data themselves don’t tell you about cause and effect,
there’s actually a very complex often, complex inferential
process you have to go through in order to extract from the
data the things that you really care about
David Jensen, University of Massachusetts
Implications for research
Do we care about causes?
“I’ve been talking to some computer scientists who are
rising stars, they’re really doing well, and they acknowledge
that the way in which the field works, novelty is the key
issue. And so there’s always an incentive or a pressure to
keep on doing new stuff with new data, even though they
might have wanted to go into more depth into something.
Sandra Gonzalez-Bailon, Annenberg School of
Communication, University of Pennsylvania
The challenge
How can we extract meaning from Big Data – learn
to drink from the fire hose?
Drinking from the fire hose
• Understanding the data
• Collaborating
• Mixing methods
Drinking from the fire hose: understanding the data
The rise of the information society has given us
myriad new forms of data and accompanying ways
of analysing it.
The challenging part is abstracting meaning about
society in general from data created and harvested
online.
Drinking from the fire hose: understanding the data
Example: it’s hard to predict elections using Twitter
“[Of] 14 different attempts to predict elections
based on Twitter data ... Only half of them were
successful ... All of this looks close to mere chance”
Gayo-Avello 2012
Drinking from the fire hose: understanding the data
Example: Facebook isn’t going anywhere, and
neither is Princeton
Canarella and Spechler 2014 Develin 2014
Drinking from the fire hose: understanding the data
But it’s much simpler, conceptually speaking, to
analyse online phenomena on their own terms
Yasseri, Hale & Margetts 2013
Drinking from the fire hose: understanding the data
But it’s much simpler, conceptually speaking, to
analyse online phenomena on their own terms
Hale, Yasseri, Cowls, Meyer,
Schroeder & Margetts (submitted)
Drinking from the fire hose: understanding the data
Of course, online data can still provide insights into
offline life, but these must be well-grounded.
e.g. Seth Stephens-Davidowitz, ‘The Cost of Racial
Animus on a Black Candidate: Evidence Using
Google Data’
• Google accounts for >50% of search engine market (less
concern over representativeness)
• Google searches are private and anonymous (less
concern over reliability)
• This method uncovers a social phenomenon, racism,
which would be harder to detect in pre-Internet
approaches e.g. interviews or surveys
Drinking from the fire hose: understanding the data
Beware false prophets
XKCD
Drinking from the fire hose: understanding the data
Beware false prophets: analyses using thousands of
variables can generate millions or billions of
possible relationships – not all (or most) will be valid
or meaningful
Drinking from the fire hose: understanding the data
Beware false prophets
“if you look at the data long enough you’ll find predictive
signals that are in fact completely spurious...for about, I think
a 20 or 25 year period, the US stock market was perfectly
correlated with the level of butter production in Bangladesh
… if you look at hundreds and hundreds of these indicators,
whether it’s the level of Bangladesh butter production or the
number of cars in New York City or whatever it is, eventually
you'll find something that just by pure chance matches what
you're looking for. ”
Mike Cafarella, University of Michigan
Drinking from the fire hose: collaborating
Big data research often necessitates a wide variety
of skills and perspectives. The growth of teams in
academic research has been increasing for decades:
Drinking from the fire hose: collaborating
This trend is likely to persist as big data research
becomes more common
“the best research will often merge in collaboration
between computer scientists who will have access to the
tools and the background to further develop and apply
those, and with social scientists who will have, sort of, good
pressing social questions that we can get insight into with
the data that is now available. ”
Scott Hale, Oxford Internet Institute
Drinking from the fire hose: collaborating
This trend is likely to persist as big data research
becomes more common
“I can find someone to optimise an algorithm, I can pay
someone to build a website but what I want is someone that
is going to be thinking the human side through every step of
the way, and when you build an algorithm and when you
write a line of code you ask, does this make sense in terms of
the phenomena that I am trying to model or trying to
interpret.”
Josh Introne, Michigan State University
Drinking from the fire hose: mixing methods
While Big Data is necessarily quantitative, it can be
used in conjunction with other methods.
“For me, I think if I only look at the numbers I don’t get the
whole picture … if we look at, for example, Twitter data, you
can see some tendencies, but if you want to answer the right
question then I think it’s necessary to do more qualitative
studies … So I’m doing interviews with political parties, I’m
also doing interviews with journalists, in order to talk about
how they are using social media as journalistic tools. ”
Bente Kalsnes, University of Oslo
Drinking from the fire hose: mixing methods
This means correlations can point the way for
deeper causal explanatory research.
“So you start off with the patterns and then what you
should be doing is saying ‘Well, here’s some possible
reasons’, and then when you’ve found some relationships
which really deserve more study then you would go off and
do a more detailed qualitative assessment as to whether this
was true or not. . ”
Richard Webber, King’s College London
Conclusion: learning to drink from the fire hose
The major question around Big Data is not what the
data looks like and more about what we do with it.
The Big Data approach seems to challenge basic tenets
of academic research, undermining precision, validity
and explanatory power
However, with a greater understanding of the nature
of data, a collaborative approach and a willingness to
employ multiple methods, we’ll be better equipped to
drink from the Big Data fire hose.

More Related Content

What's hot

portfolio Mo and TIJUANA
portfolio Mo and TIJUANAportfolio Mo and TIJUANA
portfolio Mo and TIJUANA
Muhammad Carvan
 
press release final
press release finalpress release final
press release final
Jeff Maehre
 
AGE AND TECHNOLOGY REPORT
AGE AND TECHNOLOGY REPORTAGE AND TECHNOLOGY REPORT
AGE AND TECHNOLOGY REPORT
Kumiko Sasa
 

What's hot (17)

Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi VatrapuSocial Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
 
portfolio Mo and TIJUANA
portfolio Mo and TIJUANAportfolio Mo and TIJUANA
portfolio Mo and TIJUANA
 
Explainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AIExplainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AI
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer Phenomena
 
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...
Leveraging Human Factors for Effective Security Training, for ISSA 2013 CISO ...
 
press release final
press release finalpress release final
press release final
 
Measuring the Success of Your Social Media Initiatives
Measuring the Success of Your Social Media InitiativesMeasuring the Success of Your Social Media Initiatives
Measuring the Success of Your Social Media Initiatives
 
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011
Teaching Johnny Not to Fall for Phish, for ISSA 2011 in Pittsburgh on Feb2011
 
Research For Business Communication
Research For Business CommunicationResearch For Business Communication
Research For Business Communication
 
Improving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingImproving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive Interviewing
 
Shuhanhui zhuang desma9_midterm
Shuhanhui zhuang desma9_midtermShuhanhui zhuang desma9_midterm
Shuhanhui zhuang desma9_midterm
 
Inspiration Architecture: Oregon Virtual Reference Summit 2014
Inspiration Architecture: Oregon Virtual Reference Summit 2014Inspiration Architecture: Oregon Virtual Reference Summit 2014
Inspiration Architecture: Oregon Virtual Reference Summit 2014
 
Our kids and the digital utilities
Our kids and the digital utilitiesOur kids and the digital utilities
Our kids and the digital utilities
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Data Ethics for Mathematicians
Data Ethics for MathematiciansData Ethics for Mathematicians
Data Ethics for Mathematicians
 
AGE AND TECHNOLOGY REPORT
AGE AND TECHNOLOGY REPORTAGE AND TECHNOLOGY REPORT
AGE AND TECHNOLOGY REPORT
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 

Viewers also liked

Viewers also liked (6)

pemilih cerdas
pemilih cerdaspemilih cerdas
pemilih cerdas
 
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
 
Curso online legislacao especial para concursos
Curso online legislacao especial para concursosCurso online legislacao especial para concursos
Curso online legislacao especial para concursos
 
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
 
Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
 
Búsqueda secuencial en tabla ordenada
Búsqueda secuencial  en tabla ordenadaBúsqueda secuencial  en tabla ordenada
Búsqueda secuencial en tabla ordenada
 

Similar to 'Drinking from the fire hose? The pitfalls and potential of Big Data'.

Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference Observatory: Are Social Media Impacting on Research?Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference
 

Similar to 'Drinking from the fire hose? The pitfalls and potential of Big Data'. (20)

Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
Introduction to Computational Social Science
Introduction to Computational Social ScienceIntroduction to Computational Social Science
Introduction to Computational Social Science
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
 
Voices from the Field
Voices from the FieldVoices from the Field
Voices from the Field
 
Bigdatahuman
BigdatahumanBigdatahuman
Bigdatahuman
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
The Case for Social Consumer Insights
The Case for Social Consumer InsightsThe Case for Social Consumer Insights
The Case for Social Consumer Insights
 
Taylor Ghost of Altmetrics Yet to Come
Taylor Ghost of Altmetrics Yet to ComeTaylor Ghost of Altmetrics Yet to Come
Taylor Ghost of Altmetrics Yet to Come
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
Survey Research in Design
Survey Research in DesignSurvey Research in Design
Survey Research in Design
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Bias
 
Online Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical ApproachOnline Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical Approach
 
Blink6 02 consumer_trackyourself
Blink6 02 consumer_trackyourselfBlink6 02 consumer_trackyourself
Blink6 02 consumer_trackyourself
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Finding the Story in the Data
Finding the Story in the DataFinding the Story in the Data
Finding the Story in the Data
 
Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...
 
Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference Observatory: Are Social Media Impacting on Research?Charleston Conference Observatory: Are Social Media Impacting on Research?
Charleston Conference Observatory: Are Social Media Impacting on Research?
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

'Drinking from the fire hose? The pitfalls and potential of Big Data'.

  • 1. Drinking from the fire hose? The pitfalls & potential of Big Data Josh Cowls, Oxford Internet Institute with contributions from Eric Meyer, Ralph Schroeder and Linnet Taylor t2i Lab, Chalmers, 27th March 2014
  • 2. Overview • Background • Definitions • Innovations and implications • Learning to drink from the fire hose
  • 3. The Oxford Internet Institute • Department of University of Oxford • MO: ‘Understanding life online’ • Multi-disciplinary mix (social sciences plus physical and medical sciences, and humanities) • 45 researchers (and growing) • 50 students (MSc Social Science of Internet; PhD programme) • Generating big data on social, political and economic behaviour from social media www.oii.ox.ac.uk
  • 4. • Funded by the Alfred P. Sloan Foundation • 2012 – 2014 • Data sources: • 120 interviews, mainly with social scientists but some interviewees from business, government • Reports, workshops, publications • No representative sample, but some patterns of disciplinary and skills background and career trajectory NB where unattributed, quotes used in this presentation are excerpted from interviews conducted as part of this project. Accessing and Using Big Data to Advance Social Science Knowledge
  • 5. Big Data: our definition Big data are data that are unprecedented in scale and scope in relation to a given phenomenon. They are often streams of data (rather than fixed datasets), accumulating large volumes, often at high velocity.
  • 6. Big Data: other definitions • ‘Transactional’ (Margetts et al) • ‘Things that one can do at a large scale that cannot be done at a smaller one’ (Mayer- Shonberger and Cukier) • The ‘3 Vs’: volume, velocity, variety – but also veracity, visualisability, viscosity? (Gartner)
  • 7. ... what Big Data isn’t • A generalisable, quantifiable ‘amount’ of data • A race to the top (Mutually Assured Distraction) • The same for every discipline, field or sector
  • 8. A ‘working’ definition • The Big Data phenomenon might be less about what the dataset is and more about how we work with it • (Even if this is indistinguishable in practice)
  • 9. Shifts in mindset From Mayer-Shonberger and Cukier: • “The ability to analyse vast amounts of data about a topic rather than be forced to settle for smaller sets” • “A willingness to embrace data’s real-world messiness rather than privilege exactitude” • “A growing respect for correlations rather than a continuing quest for elusive causality”
  • 10. Implications for research Whither the sample? “the sample survey[‘s] glory years ... are in the past” Savage and Burrows, 2007
  • 11. Implications for research Whither the sample? “sampling is like an analog photographic print. It looks good from a distance, but as you stare closer, zooming in on a particular detail, it gets blurry ... Often, the really interesting things in life are found in places that samples fail to fully catch” Mayer-Shonberger and Cukier 2012
  • 12. Implications for research More or mess? “social media is really, really fascinating, and the reason is because it ... falls into this category of there’s something there but we don’t know what it is. So you can measure public opinion on Twitter and clearly that’s indicative of something, but we don’t quite know what it’s representative of” Brandon Stewart, Harvard University Department of Government
  • 13. Implications for research More or mess? “the problem with the hashtag stuff [is that] we have wonderful case studies but we don’t know what they sit in essentially, what the framework is, if that’s 1% or 10% or 100% of the current conversation in Australia or whatever” Axel Bruns, Queensland University of Technology
  • 14. Implications for research More or mess? “the big problem that we haven’t cracked is that if someone tweets a sentiment it’s not necessarily what they’re feeling, it can be for a variety of reasons, so it doesn’t really reflect what they feel necessarily” Mike Thelwall, University of Wolverhampton
  • 15. Implications for research Do we care about causes? “Big Data is all about correlation; it’s not about causation, which means that you don’t need to have a theory beforehand. You just start looking for correlation … so you don’t have any idea about the structure of the data, you just find a funny correlation.” Sara Esposti, Open University Business School
  • 16. Implications for research Do we care about causes? “a central concern of social science is, we don’t just want to find statistical associations, we actually want to uncover the underlying causal processes by which social systems work ... The data themselves don’t tell you about cause and effect, there’s actually a very complex often, complex inferential process you have to go through in order to extract from the data the things that you really care about David Jensen, University of Massachusetts
  • 17. Implications for research Do we care about causes? “I’ve been talking to some computer scientists who are rising stars, they’re really doing well, and they acknowledge that the way in which the field works, novelty is the key issue. And so there’s always an incentive or a pressure to keep on doing new stuff with new data, even though they might have wanted to go into more depth into something. Sandra Gonzalez-Bailon, Annenberg School of Communication, University of Pennsylvania
  • 18. The challenge How can we extract meaning from Big Data – learn to drink from the fire hose?
  • 19. Drinking from the fire hose • Understanding the data • Collaborating • Mixing methods
  • 20. Drinking from the fire hose: understanding the data The rise of the information society has given us myriad new forms of data and accompanying ways of analysing it. The challenging part is abstracting meaning about society in general from data created and harvested online.
  • 21. Drinking from the fire hose: understanding the data Example: it’s hard to predict elections using Twitter “[Of] 14 different attempts to predict elections based on Twitter data ... Only half of them were successful ... All of this looks close to mere chance” Gayo-Avello 2012
  • 22. Drinking from the fire hose: understanding the data Example: Facebook isn’t going anywhere, and neither is Princeton Canarella and Spechler 2014 Develin 2014
  • 23. Drinking from the fire hose: understanding the data But it’s much simpler, conceptually speaking, to analyse online phenomena on their own terms Yasseri, Hale & Margetts 2013
  • 24. Drinking from the fire hose: understanding the data But it’s much simpler, conceptually speaking, to analyse online phenomena on their own terms Hale, Yasseri, Cowls, Meyer, Schroeder & Margetts (submitted)
  • 25. Drinking from the fire hose: understanding the data Of course, online data can still provide insights into offline life, but these must be well-grounded. e.g. Seth Stephens-Davidowitz, ‘The Cost of Racial Animus on a Black Candidate: Evidence Using Google Data’ • Google accounts for >50% of search engine market (less concern over representativeness) • Google searches are private and anonymous (less concern over reliability) • This method uncovers a social phenomenon, racism, which would be harder to detect in pre-Internet approaches e.g. interviews or surveys
  • 26. Drinking from the fire hose: understanding the data Beware false prophets XKCD
  • 27. Drinking from the fire hose: understanding the data Beware false prophets: analyses using thousands of variables can generate millions or billions of possible relationships – not all (or most) will be valid or meaningful
  • 28. Drinking from the fire hose: understanding the data Beware false prophets “if you look at the data long enough you’ll find predictive signals that are in fact completely spurious...for about, I think a 20 or 25 year period, the US stock market was perfectly correlated with the level of butter production in Bangladesh … if you look at hundreds and hundreds of these indicators, whether it’s the level of Bangladesh butter production or the number of cars in New York City or whatever it is, eventually you'll find something that just by pure chance matches what you're looking for. ” Mike Cafarella, University of Michigan
  • 29. Drinking from the fire hose: collaborating Big data research often necessitates a wide variety of skills and perspectives. The growth of teams in academic research has been increasing for decades:
  • 30. Drinking from the fire hose: collaborating This trend is likely to persist as big data research becomes more common “the best research will often merge in collaboration between computer scientists who will have access to the tools and the background to further develop and apply those, and with social scientists who will have, sort of, good pressing social questions that we can get insight into with the data that is now available. ” Scott Hale, Oxford Internet Institute
  • 31. Drinking from the fire hose: collaborating This trend is likely to persist as big data research becomes more common “I can find someone to optimise an algorithm, I can pay someone to build a website but what I want is someone that is going to be thinking the human side through every step of the way, and when you build an algorithm and when you write a line of code you ask, does this make sense in terms of the phenomena that I am trying to model or trying to interpret.” Josh Introne, Michigan State University
  • 32. Drinking from the fire hose: mixing methods While Big Data is necessarily quantitative, it can be used in conjunction with other methods. “For me, I think if I only look at the numbers I don’t get the whole picture … if we look at, for example, Twitter data, you can see some tendencies, but if you want to answer the right question then I think it’s necessary to do more qualitative studies … So I’m doing interviews with political parties, I’m also doing interviews with journalists, in order to talk about how they are using social media as journalistic tools. ” Bente Kalsnes, University of Oslo
  • 33. Drinking from the fire hose: mixing methods This means correlations can point the way for deeper causal explanatory research. “So you start off with the patterns and then what you should be doing is saying ‘Well, here’s some possible reasons’, and then when you’ve found some relationships which really deserve more study then you would go off and do a more detailed qualitative assessment as to whether this was true or not. . ” Richard Webber, King’s College London
  • 34. Conclusion: learning to drink from the fire hose The major question around Big Data is not what the data looks like and more about what we do with it. The Big Data approach seems to challenge basic tenets of academic research, undermining precision, validity and explanatory power However, with a greater understanding of the nature of data, a collaborative approach and a willingness to employ multiple methods, we’ll be better equipped to drink from the Big Data fire hose.