SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Sources of Big Data for the Social
Sciences
Micah Altman
Director of Research
MIT Libraries
Prepared for
Program on Information Science Brown Bag Series
MIT
August 2015
Roadmap
Sources of Big Data for the Social Sciences
 What the @#%&! Is
“big data”?
 Two examples of big
data in social &
health sciences
 Open questions
 Potential roles for
libraries
Big Data
Challenges
Acquisition
Retention
Analysis
Access
Sources of Big Data for the Social Sciences
Credits
&
Disclaimers
DISCLAIMER
These opinions are my own, they are not the
opinions of MIT, Brookings, any of the project
funders, nor (with the exception of co-authored
previously published work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about
the future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston
Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert
Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan
Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel,
Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.
Sources of Big Data for the Social Sciences
Collaborators & Co-Conspirators
 Workshop Series Co-Organizers
– U.S. Census Bureau
 Cavan Capps
 Ron Prevost
 Research Support
 Supported by the U.S. Census Bureau
Sources of Big Data for the Social Sciences
Related Work
Main Project:
 Census-MIT Big Data Workshop Series
projects.informatics.mit.edu/bigdataworkshop
s
Related publications:
(Reprints available from: informatics.mit.edu )
 Altman, M., D. O’Brien, S. Vadhan, A. Wood. 2014. “Big Data Study:
Request for Information.”
 Altman, M Altman M, Wood A, O'Brien D, Gasser M., Vadhan, S. Towards a
Modern Approach to Privacy-Aware Government Data Releases. Berkeley
Journal of Technology Law. Forthcoming.
 Altman M, McDonald MP. 2014. Public Participation GIS : The Case of
Redistricting. Proceedings of the 47th Annual Hawaii International
Conference on Systems Science .
Sources of Big Data for the Social Sciences
Workshops Series: Big Data and Official Statistics
Sources of Big Data for the Social Sciences
Acquisition
Challenges
Using New forms of Information for
Official Economic Statistics
[August 3-4]
Privacy Challenges
Location Confidentiality and
Official Surveys
[October 5-6]
Inference Challenges
Transparency and Inference
[December 7-8]
Expected outcomes:
 Workshop reports
(September, October, December)
 Integrated white paper
(February)
 Identifying new opportunities for
statistical agencies
 Inform the
Census Big Data Research
Program.
projects.informatics.mit.edu/bigdataworkshops
Sources of Big Data for the Social Sciences
What the
@#%&!
is Big Data?
Small, Big, Massive & Ginormous
Sources of Big Data for the Social Sciences
 Data Characteristics: the k “V’s” of big data
 Volume
 Velocity
 Variety
 + Veracity
 + Variability
 + …
“Big” is in the use, not just the data
Sources of Big Data for the Social Sciences
When do challenges of “big” exceed limits of well-
selected traditional methods and practices?
 Data Management – Workflow & Governance
Challenge
 Implementation – Performance Challenges
 Analysis methods – Inferential Challenges
Sources of Big Data for the Social Sciences
Why pay attention
now?
Trends and Challenges
Sources of Big Data for the Social Sciences
 Trends
 Increasingly data-driven economy
 Individuals are increasingly mobile
 Technology changes data uses
 Stakeholder expectations are changing
 Agency budgets and staffing remain flat.
 The next generation of official statistics
 Utilize broad sources of information
 Increase granularity, detail, and timeliness
 Reduce cost & burden
 Maintain confidentiality and security
 Multi-disciplinary challenges :
 Computation, Statistics, Informatics, Social Science,
Policy
Sources of Big Data for the Social Sciences
Two examples
(Good Cop, Bad Cop?)
Strategies
(and U.S. Debate Strategies)
Sources of Big Data for the Social Sciences
More Information
• Grimmer, Justin, and Gary King. "General purpose computer-
assisted clustering and conceptualization." Proceedings of the
National Academy of Sciences 108.7 (2011): 2643-2650.
• King, Gary, Jennifer Pan, and Margaret E Roberts. 2013.
“How Censorship in China Allows Government Criticism but
Silences Collective Expression.” American Political Science
Review 107 (2 (May): 1-18. Copy at http://j.mp/LdVXqN
“Posts with negative, even vitriolic, criticism of
the state, its leaders, and its policies are not
more likely to be censored… the censorship
program is aimed at curtailing collective
action by silencing comments that represent,
reinforce, or spur social mobilization, regardless
of content.”
Data Source - Social Media Messages
Data: Structure - Network, Unstructured Text,
Structured metadata
Unit of Observation - Individuals; Interactions
Collection Design - Pure observational
Desired Inferences - Causal inference
– what censorship
strategies cause observed
reaction
- Inference to Population
Frame
Performance challenges - High volume
- Complex network structure
- Scaling bespoke algorithms
- Sparsity
- Systematic and sparse
metadata
Management
Challenges
- License
- Replication
- Revision Control
Inferential Challenges - Measurement error
– extracting topics from text
Using Google Searches to Forecast Disease Outbreaks
Sources of Big Data for the Social Sciences
More Information
• Ginsberg, Jeremy, et al. "Detecting influenza epidemics using
search engine query data." Nature 457.7232 (2009): 1012-
1014.
• Lazer, David, et al. "The parable of Google Flu: traps in big
data analysis." Science 343.14 March (2014).
“Big data hubris” is the often implicit
assumption that big data are a
substitute
for, rather than a supplement to,
traditional data collection and analysis.
Data Source - Google search queries
Data: Structure - Quasi-tabular, structured
metadata and unstructured text
Unit of Observation - Interactions with a system
Collection Design - Pure observational
Desired Inferences - Predictive inference
-- where will flu clusters appear
next
-- Short-term (nearcasting)
-- small-area (fine-spatial
granularity)
- Inference to general population
Performance challenges - Streaming algorithms
Management Challenges - Replication
- Transparency
- Variability
Inferential Challenges - External Validity
- Measurement error
– extracting topics from text
- Overfitting
- Sampling
Comparing Cases
Sources of Big Data for the Social Sciences
Chinese Censorship Flu Prediction
Data Source - Social Media Messages - Google search queries
Data: Structure - Network, Unstructured Text,
Structured metadata
- Quasi-tabular, structured metadata
and unstructured text
Unit of Observation - Individuals; Interactions - Interactions with a system
Collection Design - Pure observational - Pure observational
Desired Inferences - Causal inference
– what censorship strategies cause
observed reaction
- Inference to Population Frame
- Predictive inference
-- where will flu clusters appear next
-- Short-term (nearcasting)
-- small-area (fine-spatial
granularity)
- Inference to general population
Performance challenges - High volume
- Complex network structure
- Scaling bespoke algorithms
- Sparsity
- Systematic and sparse metadata
- Streaming algorithms
Management Challenges - License
- Replication
- Revision Control
- Replication
- Transparency
- Variability
Inferential Challenges - Measurement error
– extracting topics from text
- External Validity
- Measurement error
– extracting topics from text
- Overfitting
- Sampling
Sources of Big Data for the Social Sciences
Why is dealing with
big data hard?
Big Data
Challenges
Acquisition
Source
s
Incentives Quality
Provenance
Retention
Change
Management
Integration
Security
Storage
Analysis
Bias
CausationComputation
Visualization
Acces
s
Transparency
Reproducibility
Durable Access
(Preservation)
Confidentialiity
Challenges of Big Data
Challenges of Big Data
Acquisition
Challenges:
Quality, Provenance,
Sources
Some Sources of Economic Information
Challenges of Big Data
 Smartphone sensors – GPS +
 Vehicle systems
 IoT – smart thermostats, fire alarms
 Transactions – online, internal
 Search behavior – search engine queries
 Social media – twitter, FaceBook, LinkedIN
 Imagery – satellite, thermal, video
 …
Source Characteristics
Challenges of Big Data
 Unit of Observation
 Location, virtual service, communication network,
individual
 Context
 Behavior, transaction, environment, statement
 Measure characteristics
 Measure scale
 Measure structure
 Accuracy, precision
 Frame & Sample characteristics
Challenges of Big Data
Analysis Challenges:
Bias, Computation,
Causation, Integration
Some Potential Sources of Analysis Error
Challenges of Big Data
Target
Population
Frame
Selection
Super
Population
Laws
(structures)
λ
β
(generates)
Parameters
• Selection bias
• Frame uncertainty
• Measurement error
• Unknown
measurement
semantics
• Non-independence
of measures
• Non-independence
of samples
• Model uncertainty
• Unknown causal
structure
• Shift in
measurements,
samples, frames
Challenges of Big Data
Access Challenge:
Data
Repeatability,
Transparency,
Preservation
Many Initiatives to Improve Scientific Reliability
 Retraction monitoring
 Data citation
 Clinical trial
preregistration
 Registered replication
 Open data
 Badges
Challenges of Big Data
Some Types of Reproducibility Issues
Challenges of Big Data
• Fraud
• Misconduct
• Negligence
• Bit Rot
• Versioning problem
• Replication
• Reproduction
• Extension
• Result Validation
• Fact Checking
• Calibration, Extension, Reuse
• Undereporting
• Data Dredging
• Multiple Comparisons’ P-Hacking
• Sensitivity, Robustness
• Reliability
• Generalizability
Ensuring Repeatability & Transparency
Challenges of Big Data
‘
‘’ΩΩΩΩ
Theory
(Rules, Entities, Concepts)
Algorithm
(Protocol, Operationalization)
Theory
(Rules, Entities, Concepts)
Theory
(Rules, Entities, Concepts)
Implementation
(Software, Coding Rules, Instrumentation )
Execution
(Deployment, House Survey Style, Equipment
Setting )
’
Algorithms
(Protocol, Operationalization)
Implementations
(Software, Coding Rules, Instrumentation Design )
Executions
(Deployment, House Survey Style, Operating System,
Hardware, Starting Values, PRNG seeds)
Structure
Formats
Versions/Revisions
Selections
Integrations
Instantiations
(copies)
Execution Context
(weather, compiler, operating system system load)
Challenges of Big Data
Access Challenge:
Data Confidentiality,
Security
Durable, Long-Term Access
• Why durable access?
• The rule of law require maintaining authentic public records
• Scientific advances rely on a cumulative, traceable evidence base
• Art, history, culture require durable access to national heritage
information
• Our nation needs durable access to a strategic information reserve
• Humanity needs durable long-term access information in order to
communicate to future generations
• Big data challenges to durability
• Velocity – information is updated, sometime overwritten
• Many sources are commercial/private
– not routinely archived, preserved
• Modeling future value of information
• Maintaining privacy and confidentiality
Challenges of Big Data
Big data challenges…
 Anonymization can completely destroy utility
 The “Netflix Problem”: large, sparse datasets that overlap
can be probabilistically linked [Narayan and Shmatikov
2008]
 Observable Behavior Leaves Unique
“Fingerprints”
 The “GIS”: fine geo-spatial-temporal data impossible
mask, when correlated with external data [Zimmerman
2008; ]
 Big Data can be Rich, Messy & Surprising
 The “Facebook Problem”: Possible to identify masked
network data, if only a few nodes controlled. [Backstrom,
et. al 2007]
 The “Blog problem” : Pseudononymous communication
can be linked through textual analysis [Novak wet. al
2004]
Source: [Calberese 2008; Real
Time Rome Project 2007]
Challenges of Big Data
Little Data in a Big World
 Little Data in a Big World
 The “Favorite Ice Cream” problem
-- public information that is not risky
can help us learn information that is
risky
 The “Doesn’t Stay in Vegas”
problem
-- information shared locally can be
found anywhere
 The “Unintended Algorithmic
Discrimination” problem
-- algorithms are often not
transparent, and can amplify
human biases
Challenges of Big Data
Categorizing Challenges
Sources of Big Data for the Social Sciences
 Implementation – Performance
Challenges
 Systems challenges
 Exceed capacity of locally
managed storage
 Location and migration of data
becomes critical for performance
 Standard backup, recovery and
data integrity mechanisms
ineffective
 Communication bandwidth
 Algorithmic Challenges
 “in core” vs. “out-of-core”
implementations
 O(N^2) vs. O(log n) complexity
 Static vs. streaming algorithms
 Serial vs. massively parallel
 Distributed – shared-nothing
algorithms
 Analysis methods – Inferential
Challenges
 Sources: Designed vs. “found”
data
 Model-based vs. data-based
 Causal inference vs.
Descriptive/ predictive
(forecasting) inference
 Data Management & Workflow
 Provenance
 Data quality
 Change management
 Continuous integration
 Accommodating variety –
semantics, quality
 Transparency and reproducibility
 Privacy
 Security
 Data Governance and Policy
 Standards
 Incentives
 Certifications
 Regulation
Sources of Big Data for the Social Sciences
Some Open
Questions About
Data Sources
Preliminary Observations from First Workshop
Sources of Big Data for the Social Sciences
Topic:
Sources of Economic Big Data
Use Case:
Commodity Flow Survey
Observations:
 Different classes of decisions require different sources of data:
 E.g. much designed survey data contributes baseline data for
decisions about infrastructure and strategic planning
 Transaction based big data could contribute frequency and granularity of
estimates
 In big data, data sources are stakeholders
 Businesses need to react quickly and predict the future – and need frequently
updated detailed data
 Critical to provide a value proposition to business
 Critical to develop a trust relationship
 Some Potential sources
 ERP and DRP operations data
 EDI
 Mobile Phone
 Traffic Data
Some Non-Technical Questions About Sources
Sources of Big Data for the Social Sciences
● Who are the key stakeholders in big data source,
and what are the key stakeholder incentives?
○ What key decisions does this information support for
stakeholders? What are the gaps in data from the
stakeholder perspective?
○ What are barriers associated with new sources
of information?
○ Legal barriers
○ Economic barriers
○ Social/trust barriers
Sources of Big Data for the Social Sciences
Potential Roles for
Libraries
Potential Roles -- Infrastructure
Sources of Big Data for the Social Sciences
 Dissemination
 Catalog range of new statistics/indicators , sources
 Selection based on quality
 Guide proper use
 Durability
 Ensure long-term accessibility of big-data
 Manage provenance, versioning
 Provide transparency of new indicators/statistics
 Security & Confidentiality
 Libraries could be a trusted and accountable 3rd party
 Store and integrate data from multiple sources
 Could develop expert implementation of privacy
best practices
Potential Roles - Leadership
Sources of Big Data for the Social Sciences
 Advocacy
 Advocate for quality, transparency,
replication, durable access.
 Standardization
 Develop new methods for big data
management
 Identify “best practices” for replication,
transparency, long-term access
 Standardize licenses for reuse,
preservation
Additional References
● Einav, Liran, and Jonathan Levin. "Economics in the age
of big data." Science 346.6210 (2014): 1243089.
http://www.sciencemag.org/content/346/6210/1243089.sh
ort
● Varian, Hal R. "Big data: New tricks for econometrics."
The Journal of Economic Perspectives 28.2 (2014): 3-27.
http://people.ischool.berkeley.edu/~hal/Papers/2013/ml.p
df
 Reimsbach-Kounatze, C. (2015), “The Proliferation of
“Big Data” and Implications for Official Statistics and
Statistical Agencies: A Preliminary Analysis”, OECD
Digital Economy Papers, No. 245, OECD Publishing.
http://dx.doi.org/10.1787/5js7t9wqzvg8-en
 Kriger, David S., et al. Freight Transportation Surveys.
Vol. 410. Transportation Research Board, 2011.
http://www.nap.edu/catalog/13627/nchrp-synthesis-410-
freight-transportation-surveys
Sources of Big Data for the Social Sciences
Questions?
E-mail: escience@mit.edu
Web: informatics.mit.edu
Sources of Big Data for the Social Sciences
Creative Commons License
This work. Managing Confidential
information in research, by Micah Altman
(http://redistricting.info) is licensed under
the Creative Commons Attribution-Share
Alike 3.0 United States License. To view a
copy of this license, visit
http://creativecommons.org/licenses/by-
sa/3.0/us/ or send a letter to Creative
Commons, 171 Second Street, Suite 300,
San Francisco, California, 94105, USA.
Sources of Big Data for the Social Sciences

Weitere ähnliche Inhalte

Was ist angesagt?

State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...Micah Altman
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation InfrastructureMicah Altman
 
Managing confidential data
Managing confidential dataManaging confidential data
Managing confidential dataMicah Altman
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPMicah Altman
 
Comments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyComments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyMicah Altman
 
Privacy tool osha comments
Privacy tool osha commentsPrivacy tool osha comments
Privacy tool osha commentsMicah Altman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...Micah Altman
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research PaperAnita de Waard
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research PaperAnita de Waard
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...Micah Altman
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...datascienceiqss
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE
 

Was ist angesagt? (20)

State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
Managing confidential data
Managing confidential dataManaging confidential data
Managing confidential data
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTP
 
Comments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyComments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data Privacy
 
Privacy tool osha comments
Privacy tool osha commentsPrivacy tool osha comments
Privacy tool osha comments
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy Issues
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research Paper
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 

Andere mochten auch

Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...
BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...
BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...Micah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Transaction processing system
Transaction processing systemTransaction processing system
Transaction processing systemuday sharma
 
Data collection presentation
Data collection presentationData collection presentation
Data collection presentationKanchan Agarwal
 
Source of Data in Research
Source of Data in ResearchSource of Data in Research
Source of Data in ResearchManu K M
 

Andere mochten auch (7)

Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...
BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...
BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Transaction processing system
Transaction processing systemTransaction processing system
Transaction processing system
 
Data collection presentation
Data collection presentationData collection presentation
Data collection presentation
 
Source of Data in Research
Source of Data in ResearchSource of Data in Research
Source of Data in Research
 

Ähnlich wie BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES

Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
 
“Big data” in human services organisations: Practical problems and ethical di...
“Big data” in human services organisations: Practical problems and ethical di...“Big data” in human services organisations: Practical problems and ethical di...
“Big data” in human services organisations: Practical problems and ethical di...husITa
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchMicah Altman
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
Data science innovations
Data science innovations Data science innovations
Data science innovations suresh sood
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingMicah Altman
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics DataMicah Altman
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...The Higher Education Academy
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesMicah Altman
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
June 2015 (142)  MIS Quarterly Executive   67The Big Dat.docxJune 2015 (142)  MIS Quarterly Executive   67The Big Dat.docx
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docxcroysierkathey
 
Using Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsUsing Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsNeo4j
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 

Ähnlich wie BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES (20)

Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
 
“Big data” in human services organisations: Practical problems and ethical di...
“Big data” in human services organisations: Practical problems and ethical di...“Big data” in human services organisations: Practical problems and ethical di...
“Big data” in human services organisations: Practical problems and ethical di...
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science Research
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy Framing
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics Data
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use Cases
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
June 2015 (142)  MIS Quarterly Executive   67The Big Dat.docxJune 2015 (142)  MIS Quarterly Executive   67The Big Dat.docx
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
 
Using Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsUsing Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale Analytics
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 

Mehr von Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Micah Altman
 
Agenda's for Preservation Research
Agenda's for Preservation ResearchAgenda's for Preservation Research
Agenda's for Preservation ResearchMicah Altman
 

Mehr von Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
 
Agenda's for Preservation Research
Agenda's for Preservation ResearchAgenda's for Preservation Research
Agenda's for Preservation Research
 

Kürzlich hochgeladen

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES

  • 1. Sources of Big Data for the Social Sciences Micah Altman Director of Research MIT Libraries Prepared for Program on Information Science Brown Bag Series MIT August 2015
  • 2. Roadmap Sources of Big Data for the Social Sciences  What the @#%&! Is “big data”?  Two examples of big data in social & health sciences  Open questions  Potential roles for libraries Big Data Challenges Acquisition Retention Analysis Access
  • 3. Sources of Big Data for the Social Sciences Credits & Disclaimers
  • 4. DISCLAIMER These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators Secondary disclaimer: “It’s tough to make predictions, especially about the future!” -- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. Sources of Big Data for the Social Sciences
  • 5. Collaborators & Co-Conspirators  Workshop Series Co-Organizers – U.S. Census Bureau  Cavan Capps  Ron Prevost  Research Support  Supported by the U.S. Census Bureau Sources of Big Data for the Social Sciences
  • 6. Related Work Main Project:  Census-MIT Big Data Workshop Series projects.informatics.mit.edu/bigdataworkshop s Related publications: (Reprints available from: informatics.mit.edu )  Altman, M., D. O’Brien, S. Vadhan, A. Wood. 2014. “Big Data Study: Request for Information.”  Altman, M Altman M, Wood A, O'Brien D, Gasser M., Vadhan, S. Towards a Modern Approach to Privacy-Aware Government Data Releases. Berkeley Journal of Technology Law. Forthcoming.  Altman M, McDonald MP. 2014. Public Participation GIS : The Case of Redistricting. Proceedings of the 47th Annual Hawaii International Conference on Systems Science . Sources of Big Data for the Social Sciences
  • 7. Workshops Series: Big Data and Official Statistics Sources of Big Data for the Social Sciences Acquisition Challenges Using New forms of Information for Official Economic Statistics [August 3-4] Privacy Challenges Location Confidentiality and Official Surveys [October 5-6] Inference Challenges Transparency and Inference [December 7-8] Expected outcomes:  Workshop reports (September, October, December)  Integrated white paper (February)  Identifying new opportunities for statistical agencies  Inform the Census Big Data Research Program. projects.informatics.mit.edu/bigdataworkshops
  • 8. Sources of Big Data for the Social Sciences What the @#%&! is Big Data?
  • 9. Small, Big, Massive & Ginormous Sources of Big Data for the Social Sciences  Data Characteristics: the k “V’s” of big data  Volume  Velocity  Variety  + Veracity  + Variability  + …
  • 10. “Big” is in the use, not just the data Sources of Big Data for the Social Sciences When do challenges of “big” exceed limits of well- selected traditional methods and practices?  Data Management – Workflow & Governance Challenge  Implementation – Performance Challenges  Analysis methods – Inferential Challenges
  • 11. Sources of Big Data for the Social Sciences Why pay attention now?
  • 12. Trends and Challenges Sources of Big Data for the Social Sciences  Trends  Increasingly data-driven economy  Individuals are increasingly mobile  Technology changes data uses  Stakeholder expectations are changing  Agency budgets and staffing remain flat.  The next generation of official statistics  Utilize broad sources of information  Increase granularity, detail, and timeliness  Reduce cost & burden  Maintain confidentiality and security  Multi-disciplinary challenges :  Computation, Statistics, Informatics, Social Science, Policy
  • 13. Sources of Big Data for the Social Sciences Two examples (Good Cop, Bad Cop?)
  • 14. Strategies (and U.S. Debate Strategies) Sources of Big Data for the Social Sciences More Information • Grimmer, Justin, and Gary King. "General purpose computer- assisted clustering and conceptualization." Proceedings of the National Academy of Sciences 108.7 (2011): 2643-2650. • King, Gary, Jennifer Pan, and Margaret E Roberts. 2013. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” American Political Science Review 107 (2 (May): 1-18. Copy at http://j.mp/LdVXqN “Posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored… the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content.” Data Source - Social Media Messages Data: Structure - Network, Unstructured Text, Structured metadata Unit of Observation - Individuals; Interactions Collection Design - Pure observational Desired Inferences - Causal inference – what censorship strategies cause observed reaction - Inference to Population Frame Performance challenges - High volume - Complex network structure - Scaling bespoke algorithms - Sparsity - Systematic and sparse metadata Management Challenges - License - Replication - Revision Control Inferential Challenges - Measurement error – extracting topics from text
  • 15. Using Google Searches to Forecast Disease Outbreaks Sources of Big Data for the Social Sciences More Information • Ginsberg, Jeremy, et al. "Detecting influenza epidemics using search engine query data." Nature 457.7232 (2009): 1012- 1014. • Lazer, David, et al. "The parable of Google Flu: traps in big data analysis." Science 343.14 March (2014). “Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis. Data Source - Google search queries Data: Structure - Quasi-tabular, structured metadata and unstructured text Unit of Observation - Interactions with a system Collection Design - Pure observational Desired Inferences - Predictive inference -- where will flu clusters appear next -- Short-term (nearcasting) -- small-area (fine-spatial granularity) - Inference to general population Performance challenges - Streaming algorithms Management Challenges - Replication - Transparency - Variability Inferential Challenges - External Validity - Measurement error – extracting topics from text - Overfitting - Sampling
  • 16. Comparing Cases Sources of Big Data for the Social Sciences Chinese Censorship Flu Prediction Data Source - Social Media Messages - Google search queries Data: Structure - Network, Unstructured Text, Structured metadata - Quasi-tabular, structured metadata and unstructured text Unit of Observation - Individuals; Interactions - Interactions with a system Collection Design - Pure observational - Pure observational Desired Inferences - Causal inference – what censorship strategies cause observed reaction - Inference to Population Frame - Predictive inference -- where will flu clusters appear next -- Short-term (nearcasting) -- small-area (fine-spatial granularity) - Inference to general population Performance challenges - High volume - Complex network structure - Scaling bespoke algorithms - Sparsity - Systematic and sparse metadata - Streaming algorithms Management Challenges - License - Replication - Revision Control - Replication - Transparency - Variability Inferential Challenges - Measurement error – extracting topics from text - External Validity - Measurement error – extracting topics from text - Overfitting - Sampling
  • 17. Sources of Big Data for the Social Sciences Why is dealing with big data hard?
  • 19. Challenges of Big Data Acquisition Challenges: Quality, Provenance, Sources
  • 20. Some Sources of Economic Information Challenges of Big Data  Smartphone sensors – GPS +  Vehicle systems  IoT – smart thermostats, fire alarms  Transactions – online, internal  Search behavior – search engine queries  Social media – twitter, FaceBook, LinkedIN  Imagery – satellite, thermal, video  …
  • 21. Source Characteristics Challenges of Big Data  Unit of Observation  Location, virtual service, communication network, individual  Context  Behavior, transaction, environment, statement  Measure characteristics  Measure scale  Measure structure  Accuracy, precision  Frame & Sample characteristics
  • 22. Challenges of Big Data Analysis Challenges: Bias, Computation, Causation, Integration
  • 23. Some Potential Sources of Analysis Error Challenges of Big Data Target Population Frame Selection Super Population Laws (structures) λ β (generates) Parameters • Selection bias • Frame uncertainty • Measurement error • Unknown measurement semantics • Non-independence of measures • Non-independence of samples • Model uncertainty • Unknown causal structure • Shift in measurements, samples, frames
  • 24. Challenges of Big Data Access Challenge: Data Repeatability, Transparency, Preservation
  • 25. Many Initiatives to Improve Scientific Reliability  Retraction monitoring  Data citation  Clinical trial preregistration  Registered replication  Open data  Badges Challenges of Big Data
  • 26. Some Types of Reproducibility Issues Challenges of Big Data • Fraud • Misconduct • Negligence • Bit Rot • Versioning problem • Replication • Reproduction • Extension • Result Validation • Fact Checking • Calibration, Extension, Reuse • Undereporting • Data Dredging • Multiple Comparisons’ P-Hacking • Sensitivity, Robustness • Reliability • Generalizability
  • 27. Ensuring Repeatability & Transparency Challenges of Big Data ‘ ‘’ΩΩΩΩ Theory (Rules, Entities, Concepts) Algorithm (Protocol, Operationalization) Theory (Rules, Entities, Concepts) Theory (Rules, Entities, Concepts) Implementation (Software, Coding Rules, Instrumentation ) Execution (Deployment, House Survey Style, Equipment Setting ) ’ Algorithms (Protocol, Operationalization) Implementations (Software, Coding Rules, Instrumentation Design ) Executions (Deployment, House Survey Style, Operating System, Hardware, Starting Values, PRNG seeds) Structure Formats Versions/Revisions Selections Integrations Instantiations (copies) Execution Context (weather, compiler, operating system system load)
  • 28. Challenges of Big Data Access Challenge: Data Confidentiality, Security
  • 29. Durable, Long-Term Access • Why durable access? • The rule of law require maintaining authentic public records • Scientific advances rely on a cumulative, traceable evidence base • Art, history, culture require durable access to national heritage information • Our nation needs durable access to a strategic information reserve • Humanity needs durable long-term access information in order to communicate to future generations • Big data challenges to durability • Velocity – information is updated, sometime overwritten • Many sources are commercial/private – not routinely archived, preserved • Modeling future value of information • Maintaining privacy and confidentiality Challenges of Big Data
  • 30. Big data challenges…  Anonymization can completely destroy utility  The “Netflix Problem”: large, sparse datasets that overlap can be probabilistically linked [Narayan and Shmatikov 2008]  Observable Behavior Leaves Unique “Fingerprints”  The “GIS”: fine geo-spatial-temporal data impossible mask, when correlated with external data [Zimmerman 2008; ]  Big Data can be Rich, Messy & Surprising  The “Facebook Problem”: Possible to identify masked network data, if only a few nodes controlled. [Backstrom, et. al 2007]  The “Blog problem” : Pseudononymous communication can be linked through textual analysis [Novak wet. al 2004] Source: [Calberese 2008; Real Time Rome Project 2007] Challenges of Big Data
  • 31. Little Data in a Big World  Little Data in a Big World  The “Favorite Ice Cream” problem -- public information that is not risky can help us learn information that is risky  The “Doesn’t Stay in Vegas” problem -- information shared locally can be found anywhere  The “Unintended Algorithmic Discrimination” problem -- algorithms are often not transparent, and can amplify human biases Challenges of Big Data
  • 32. Categorizing Challenges Sources of Big Data for the Social Sciences  Implementation – Performance Challenges  Systems challenges  Exceed capacity of locally managed storage  Location and migration of data becomes critical for performance  Standard backup, recovery and data integrity mechanisms ineffective  Communication bandwidth  Algorithmic Challenges  “in core” vs. “out-of-core” implementations  O(N^2) vs. O(log n) complexity  Static vs. streaming algorithms  Serial vs. massively parallel  Distributed – shared-nothing algorithms  Analysis methods – Inferential Challenges  Sources: Designed vs. “found” data  Model-based vs. data-based  Causal inference vs. Descriptive/ predictive (forecasting) inference  Data Management & Workflow  Provenance  Data quality  Change management  Continuous integration  Accommodating variety – semantics, quality  Transparency and reproducibility  Privacy  Security  Data Governance and Policy  Standards  Incentives  Certifications  Regulation
  • 33. Sources of Big Data for the Social Sciences Some Open Questions About Data Sources
  • 34. Preliminary Observations from First Workshop Sources of Big Data for the Social Sciences Topic: Sources of Economic Big Data Use Case: Commodity Flow Survey Observations:  Different classes of decisions require different sources of data:  E.g. much designed survey data contributes baseline data for decisions about infrastructure and strategic planning  Transaction based big data could contribute frequency and granularity of estimates  In big data, data sources are stakeholders  Businesses need to react quickly and predict the future – and need frequently updated detailed data  Critical to provide a value proposition to business  Critical to develop a trust relationship  Some Potential sources  ERP and DRP operations data  EDI  Mobile Phone  Traffic Data
  • 35. Some Non-Technical Questions About Sources Sources of Big Data for the Social Sciences ● Who are the key stakeholders in big data source, and what are the key stakeholder incentives? ○ What key decisions does this information support for stakeholders? What are the gaps in data from the stakeholder perspective? ○ What are barriers associated with new sources of information? ○ Legal barriers ○ Economic barriers ○ Social/trust barriers
  • 36. Sources of Big Data for the Social Sciences Potential Roles for Libraries
  • 37. Potential Roles -- Infrastructure Sources of Big Data for the Social Sciences  Dissemination  Catalog range of new statistics/indicators , sources  Selection based on quality  Guide proper use  Durability  Ensure long-term accessibility of big-data  Manage provenance, versioning  Provide transparency of new indicators/statistics  Security & Confidentiality  Libraries could be a trusted and accountable 3rd party  Store and integrate data from multiple sources  Could develop expert implementation of privacy best practices
  • 38. Potential Roles - Leadership Sources of Big Data for the Social Sciences  Advocacy  Advocate for quality, transparency, replication, durable access.  Standardization  Develop new methods for big data management  Identify “best practices” for replication, transparency, long-term access  Standardize licenses for reuse, preservation
  • 39. Additional References ● Einav, Liran, and Jonathan Levin. "Economics in the age of big data." Science 346.6210 (2014): 1243089. http://www.sciencemag.org/content/346/6210/1243089.sh ort ● Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives 28.2 (2014): 3-27. http://people.ischool.berkeley.edu/~hal/Papers/2013/ml.p df  Reimsbach-Kounatze, C. (2015), “The Proliferation of “Big Data” and Implications for Official Statistics and Statistical Agencies: A Preliminary Analysis”, OECD Digital Economy Papers, No. 245, OECD Publishing. http://dx.doi.org/10.1787/5js7t9wqzvg8-en  Kriger, David S., et al. Freight Transportation Surveys. Vol. 410. Transportation Research Board, 2011. http://www.nap.edu/catalog/13627/nchrp-synthesis-410- freight-transportation-surveys Sources of Big Data for the Social Sciences
  • 41. Creative Commons License This work. Managing Confidential information in research, by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by- sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Sources of Big Data for the Social Sciences