SlideShare a Scribd company logo
1 of 74
Data Sharing with ICPSR: Fueling the Cycle
of Science through Discovery, Access
Tools, and Long-Term Availability
Johanna Bleckman and Kaye Marz, ICPSR
IASSIST 2015
Minneapolis, MN
June 2, 2015
Learning objectives
You will become more familiar with:
• Federal data sharing requirements and
other good reasons to share data
• Options for sharing data
• Protection of confidentiality when
sharing data
• Data discovery tools
• Online data exploration tools from ICPSR
Objective I:
Federal data sharing requirements and
other good reasons to share data
Why share data?
• Federal data sharing requirements allow
effective use of federal agency resources
– Office of Science and Technology Policy
memorandum (2013)
– National Science Foundation (2008)
 Requirement for Data Management Plans in 2010
– National Institutes of Heath (2003)
 Requirement for Data Sharing Plans included
– National Institute of Justice (1978)
Why share data?
• Benefits to the social science community (NIH, 2003)
– Reinforces open scientific inquiry
– Encourages diversity of analysis and opinion
– Promotes new research, testing of new hypotheses and
methods of analysis
– Supports studies on data collection methods and
measurement
– Facilitates education of new researchers
– Enables exploration of topics not envisioned by initial
investigators
– Permits creation of datasets combined from multiple sources
Why share data?
• Benefits to the researcher
– Data in the public domain generate new research which
cites the original research
– Data are preserved and can be obtained by the original
researchers if their copies are lost or destroyed
– Archiving data helps researchers meet requirements of
NIH and NSF data management plans and NIJ data
archiving special conditions requirements
– Information on research community’s interest in a dataset
can assist with the success of future grant funding
proposals
• Lost in a technical malfunction
• Destroyed in a flood in the
department
• Files were on the university
server but are long gone
• Data are kept for 10 years
beyond the last time used
• Purged after retirement
• Institutional review boards
required the data be destroyed
after the project
Where are the data now?
Source: Pienta, Amy, Myron Gutmann, &
Jared Lyle. 2009. “Research Data in The
Social Sciences: How Much is Being
Shared?” Research Conference on Research
Integrity, Niagara Falls, NY.
Sharing data…more publications!
Table. Median Number of Publications by Data Sharing Status
Data
Archived
(n=111)
Data
Shared
Informal
(n=415)
Data Not
Shared
(n=409)
Total
(n=935)
Primary PI Publications 6 6 3 4
Secondary Publications 8 6 3 5
Publications with Students 4 3 1 2
Source: Pienta, Amy M., George Alter, and Jared Lyle. 2010. “The Enduring Value of Social Science Research: The Use
and Reuse of Primary Research Data.” Presented at the BRICK, DIME, STRIKE Workshop, The Organisation,
Economics, and Policy of Scientific Research, Turin, Italy, April 23‐24, 2010 (http://hdl.handle.net/2027.42/78307)
Data deposited
in archive
Discovered and
accessed from
archive
Analyzed and
results
published
New datasets or
research ideas
New data
collected
Cycle of
Science
Data deposited
in archive
Discovered and
accessed from
archive
Analyzed and
results
published
New datasets or
research ideas
New data
collected
Cycle of
Science
Data deposited
in archive
Discovered and
accessed from
archive
Analyzed and
results
published
New datasets or
research ideas
New data
collected
Cycle of
Science
Data deposited
in archive
Discovered and
accessed from
archive
Analyzed and
results
published
New datasets or
research ideas
New data
collected
Cycle of
Science
Objective II:
Options for sharing data
Options for sharing data through ICPSR
.
How do ICPSR and openICPSR compare
to other data service providers?
• Have professional data curators to review deposited
materials who are experts in developing metadata (tags) for
the social and behavioral sciences
• Provide an immediate distribution network of over 760
institutions looking for research data, that has powerful
search tools, and a data catalog indexed by major search
engines
• Sustained by a respected organization with over 50 years of
experience in reliably protecting research data
• Prepared to accept and disseminate sensitive and/or
restricted-use data
How is openICPSR different from ICPSR?
• Sustained by deposit fee to
individuals from non-members
institutions ($600/project)
• Data freely available to the
public (or for a nominal charge
for restricted-use data)
• Most data available only in the
raw (bit-level) form as deposited
and described by the depositor;
may be fully curated if the
Professional Curation option
was chosen and the quoted fee
paid for ICPSR’s curation and/or
disclosure review services
• Sustained by institutional
member fees; depositors are not
charged to deposit data
• Data are freely available to
individuals affiliated with
member institutions; non-
members pay an access fee; both
may pay fee to access restricted-
use data in VDE
• Data are fully curated including
professional processing, value-
added documentation, and
renderings in popular statistical
programs and online analysis
Who might use openICPSR?
• Researchers required to share data freely with the
public to comply with grant/contract
requirements
• Authors required to share data for replication
purposes to comply with journal requirements
• Researchers required to share sensitive data or
data with disclosure risk with the public from a
secure digital environment
• Researchers, including students, who want to
share data publicly as good practice or for the
purposes of replication
Deposit agreements
• Has implicit or explicit copyright and have the right to
make it publicly available through ICPSR
• Permits ICPSR to redisseminate, promote, catalog,
reformat, store, and preserve the data collection
• Permits ICPSR to transform or enhance the collection to
protect confidentiality and for usability
• Has removed all direct identifiers and done due diligence
to prevent disclosure of subject identities
• Holds ICPSR and University of Michigan harmless from
liability for breaches of subject confidentiality or invasions
of privacy
openICPSR deposit terms includes more
• Deposited work will be distributed under an Attribution
4.0 Creative Commons License
• Depositor has institutional approval to share the data
collection
• Data collection will be preserved as-is and available at no
cost to data users
• Research subjects have consented to sharing the data
and/or the depositor has institutional approval to share
the data
openICPSR: Public-access sharing
solution for Institutions and Journals
• Branded repository to represent and showcase their
research data using a fully-hosted (cloud) service
• Able to fulfill grant requirements that will pass an audit
• No need to have technical staff or equipment to manage
the repository
• Easy and clean interface for deposit and access
• Administrative (usage) reporting available 24/7
• Able to assure the Research Administration Office that
data are safe and secure for the long term
openICPSR: Examples of branded sites and services
Your logo. Your colors.
A unique URL.
On-demand deposit.
On-demand reports.
Should the data be restricted?
• Re-identification
–Can individuals be identified from information in this material?
–If the data were made public, could someone use a
combination of variables (e.g. age, sex, race, occupation,
geography) to find individuals in a publicly available
database?
• Harm
–Does this material include sensitive information?
–Would the release of individually identifiable information
create a risk of harm (e.g. psychological distress, social
embarrassment, financial loss) greater than the risks that
people experience in everyday life?
Identifiers
• Direct
– Names
– Addresses, including ZIP codes and other postal codes
– Telephone and fax numbers, including area codes
– Social security numbers
– Other linkable numbers such as driver’s license numbers,
certification numbers, medical device numbers, etc.
• Indirect
– Detailed geographic information
– Organizations belonged to, offices or posts held by respondent
– Educational institutions attended, year of graduation
– Detailed occupational titles
– Place where respondent was born or grew up
– Exact dates of significant life events (birth, death, marriage, etc)
– Detailed income
Direct identifiers must be removed from
data before sharing! Data with indirect
identifiers may need to be restricted.
What sensitive topics could cause harm?
• Psychological well being or mental health
• Sexual attitudes, preferences, or practices
• Use of alcohol or drugs
• Illegal behavior
• Behavior that puts the respondent at risk of criminal or
civil liability
• Behavior damaging to an individual’s financial standing,
employability, or reputation
• Medical information that could lead to discrimination,
stigmatization, etc.
Let’s make a deposit with openICPSR
www.openicpsr.org
Objective III: Protection of confidentiality
when sharing data
Common Objection/Misperception:
“My data are too sensitive to share. . .”
• Many restricted data files can be processed for public
release
• ICPSR has been sharing restricted-use data for over a
decade via three methods:
– Secure Download
– Virtual Data Enclave
– Physical Enclave
• ICPSR stores & shares over 6,400 restricted-use datasets
associated with over 2,000 ‘active’ restricted-use data
contracts
Virtual Data Enclave
• Virtual machine launched from your desktop machine
• Functions just like your localmachine, but with
restrictions on whatcan enter and exitthe
environment
• Full suite of statisticalpackagesand other software
Virtual Data enclave (VDE) – user experience
VDE
My laptop’s task bar
The Visual
Examples of Disclosure Risk Concerns
• Tabular output including demographics or unique
characteristics/activities, resulting in small cell sizes
• Geographic information (zip codes, cities/towns, counties)
• Geocodes, GIS data, maps
• Longitudinal data – data from research subjects at multiple time
points
• Verbatim text – interviews, video transcriptions, short answer
responses
• Photos, videos, audio recordings
Common Rules for Tables
Each cell size is sufficiently large to prevent identifiability
• Establish minimum threshold (often 3, 5, or 10), dependent on
type and context of the data
• Rules for cells, rows, and columns:
– All cases in any row or column should not be in a single cell
– Cell percentage should not correspond to a cell size less
than a threshold number
– A cell should not be a high percent of all cases included in a
row or column (more than 60%)
• Combinations of tables should also meet guidelines
Disclosure Risk Protections (DRP)
• Release data from only a sample of the population
• Remove/mask obvious identifiers and high-risk variables
• Limit geographic detail
• Limit the number and detailed breakdown of categories
• Top or bottom coding
• Recode into intervals or round values
• Addition of noise
• Swap records
• Blank and impute
• Aggregate and replace with mean value (aka blurring)
Disclosure Risk Remediation, Example1
Disclosure Risk Remediation, Example 2
Documenting Disclosure Mitigation
• Maintain and deposit the syntax used to
modify the data
• Include either:
– a narrative description of changes in the
study-level documentation, or
– a series of item-specific narratives to be
included at the variable level (codebook/user
guide)
Objective IV:
Data discovery tools
• One of the world’s oldest and largest social
science data archives, est. 1962
• Data distributed on punch cards, then reel-
to-reel tape, now:
– Data available on demand
– Over 8,200 studies with over 68,700 data sets
• Membership organization among 21
universities, now:
– Currently over 760 members world-wide
– Federal funding of public collections
What is ICPSR?
- Then and Now -
The Concept of “Data Curation”
• Curation, from the Latin "to care," is the process used to add value to
data, maximize access, and ensure long-term preservation
• Data curation is akin to work performed by an art or museum curator.
– Data are organized, described, cleaned, enhanced, and preserved for
public use, much like the work done on paintings or rare books to make
the works accessible to the public now and in the future
• Curation provides meaningful and enduring access to data
• Data curation is the foundation for effective, long-term data sharing
Assessing the data in the collection
• Searching for and downloading data
• Codebooks
• Full descriptives
• Variable search and compare
• Simple crosstabs and frequencies
• Online analysis functionality
Study Search Behaviors
In practice, we encounter three typical search
behaviors from our users:
• A user has a research question in mind.
• A user is looking for a dataset that contains
specific variables.
• A user is looking for a specific dataset and has
the study title or investigator name.
Natural Language Searching
• Does juvenile drug use lead to delinquency?
• juvenile “drug use” delinquency
• “juvenile drug use”
Searching by Variable/Concept
Variable Search Tips, 1
• Enter words or strings that are likely to appear
in a variable name, label, question, and value
labels:
• Presidential election will return variables
dealing with all presidential elections
• Presidential election Obama will return only
variables dealing with the 2008 and 2012
presidential elections
Variable Search Tips, 2
• Use quotes to search for specific phrases:
• "life satisfaction""minority rights""community
programs"
• The minus sign may be used to remove certain
types of results:
• "Presidential debate" -"Bill Clinton" will
eliminate the debates from 1992 and 1996 in
which Clinton participated
Variable Search Tips, 3
• A Boolean "and" is implied in the search.
• The search automatically does stemming,
there's no need to type in an asterisk. It's also
case-insensitive.
• A fielded search is also available.
Searching for a Specific Dataset
It’s really a searchable database . . .
containing over 65,000 citations of known published
and unpublished works resulting from analyses of
data archived at ICPSR
. . .that can generate study bibliographies
associating each study with the literature about it
. . . Included in the integrated search
on the ICPSR Web site
The Bibliography of Data-related Literature
Demonstrating the Impact of Research
Objective V: Online data exploration tools
Simple Crosstab
Compare Variables
Crosstab Creator
Crosstab Assignment Builder
Online Analysis
Assistance is available
• Our user support staff are available from 8:00
a.m. to 5:00 p.m., ET, Monday to Friday
– Phone: 734-647-2200
– netmail@icpsr.umich.edu
References
• Office of Science and Technology Policy memorandum (2013). Retrieved on
27/05/2015 from
https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_
access_memo_2013.pdf
• National Science Foundation, Social, Behavioral and Economic Sciences (SBE).
(2010). Data Management Plan Policy. Retrieved 27/05/2015 from
http://www.nsf.gov/sbe/SBE_DataMgmtPlanPolicy.pdf
• National Science Foundation, Social, Behavioral and Economic Sciences (SBE)
(2008). Data Archiving Policy. Retrieved 27/05/2015 from
http://www.nsf.gov/sbe/ses/common/archive.jsp .
• National Institutes of Health. (2003). NIH Data Sharing Policy and
Implementation Guidance (includes data sharing plan requirements). Retrieved
27/05/2015 from
grants2.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#finfre
References (cont.)
• National Institute of Justice (2014). About the Data Resources Program.
Retrieved on 27/05/2015 from http://www.nij.gov/funding/data-resources-
program/Pages/about.aspx
• National Institute of Justice (2012). Data Archiving Plans for NIJ Funding
Applicants. Retrieved 27/05/2015 from http://www.nij.gov/funding/data-
resources-program/applying/data-archiving-strategies.htm
• Pienta, Amy. M., George Alter., and Jared Lyle. (2010). The enduring value of
social science research: The use and reuse of primary research data. Presented
at the BRICK, DIME, STRIKE Workshop, The Organisation, Economics, and Policy
of Scientific Research, Turin, Italy, April 23‐24, 2010
(http://hdl.handle.net/2027.42/78307)
References (cont.)
• Confidentiality and Data Access Committee (CDAC), Federal Committee on
Statistical Methodology. Statistical Policy Working Paper 22 (Second version,
2005), Report on Statistical Disclosure Limitation Methodology,
http://www.fcsm.gov/working-papers/spwp22.html.
• Other Requirements Relating to Uses and Disclosures of Protected Health
Information, 45 CFR 164.514 (2002),
http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf
• U.S. Department of Health and Human Services, National Institutes of Health.
Research Repositories, Databases, and the HIPPA Privacy Rule (posted
January 12, 2014; revised July 2, 2004),
http://privacyruleandresearch.nih.gov/research_repositories.asp
• Sweeney, L., “Simple Demographics Often Identify People Uniquely,”. Carnegie
Mellon University, Data Privacy Working Paper 3, Carnegie Mellon
University,. Pittsburgh 2000.
http://dataprivacylab.org/projects/identifiability/index.html
Resources
• Link to Guidelines for Effective Data Management Plans (PDF),
http://www.icpsr.umich.edu/files/datamanagement/DataManagementPlans-
All.pdf.
• Link to Guidelines and Resources for OSTP Data Access Plans (video),
https://www.youtube.com/watch?v=sWnMFEKmfnE
• Link to ICPSR Deposit Data (web page),
http://www.icpsr.umich.edu/icpsrweb/deposit/index.jsp
• Link to openICPSR (website and vidoes) https://www.openicpsr.org/
• Link to Disclosure Risk Training – For Public Use or Not For Public Use (video),
https://www.youtube.com/watch?v=9vdWseLay9g&feature=youtu.be&list=P
LqC9lrhW1VvaKgzk-S87WwrlSMHliHQo6%22
Thank you!
Q&A

More Related Content

What's hot

Research Data Management
Research Data ManagementResearch Data Management
Research Data Managementaaroncollie
 
Data Services presentation for Psychology
Data Services presentation for PsychologyData Services presentation for Psychology
Data Services presentation for PsychologyLynda Kellam
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 
Research Data Management for SOE
Research Data Management for SOEResearch Data Management for SOE
Research Data Management for SOELynda Kellam
 
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...ASIS&T
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel ASIS&T
 
Data Services/ICPSR presentation for School of Education
Data Services/ICPSR presentation for School of EducationData Services/ICPSR presentation for School of Education
Data Services/ICPSR presentation for School of EducationLynda Kellam
 
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesRDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesASIS&T
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointMark Parsons
 
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...ASIS&T
 
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FuturePoster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data CitationMicah Altman
 
Data as a Library Aquisition
Data as a Library AquisitionData as a Library Aquisition
Data as a Library Aquisitionaaroncollie
 
RDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library AssociationsRDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library AssociationsResearch Data Alliance
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC
 

What's hot (20)

Levine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal ConsiderationsLevine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal Considerations
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Data Services presentation for Psychology
Data Services presentation for PsychologyData Services presentation for Psychology
Data Services presentation for Psychology
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Research Data Management for SOE
Research Data Management for SOEResearch Data Management for SOE
Research Data Management for SOE
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
Data Services/ICPSR presentation for School of Education
Data Services/ICPSR presentation for School of EducationData Services/ICPSR presentation for School of Education
Data Services/ICPSR presentation for School of Education
 
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesRDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
 
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...
 
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FuturePoster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Data as a Library Aquisition
Data as a Library AquisitionData as a Library Aquisition
Data as a Library Aquisition
 
RDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library AssociationsRDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library Associations
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 
Stephenson - Data Curation for Quantitative Social Science Research
Stephenson - Data Curation for Quantitative Social Science ResearchStephenson - Data Curation for Quantitative Social Science Research
Stephenson - Data Curation for Quantitative Social Science Research
 

Similar to Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Access Tools, and Long-Term Availability

Research Ethics and Use of Restricted Access Data
Research Ethics and Use of Restricted Access DataResearch Ethics and Use of Restricted Access Data
Research Ethics and Use of Restricted Access Datalibbiestephenson
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE
 
Publishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 JunePublishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 JuneARDC
 
Sharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesSharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesLancaster University Library
 
The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...Louise Corti
 
Stewarding Big Data
Stewarding Big DataStewarding Big Data
Stewarding Big DataBill LeFurgy
 
NIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesNIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesIUPUI
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...datacite
 
Research Transparency in the Social Sciences: DA-RT
Research Transparency in the Social Sciences: DA-RTResearch Transparency in the Social Sciences: DA-RT
Research Transparency in the Social Sciences: DA-RTARDC
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for EngineersSherry Lake
 
ICPSR Workshop Template - 2012/13
ICPSR Workshop Template - 2012/13ICPSR Workshop Template - 2012/13
ICPSR Workshop Template - 2012/13ICPSR
 
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...ASIS&T
 
Library resources and services for grant development
Library resources and services for grant developmentLibrary resources and services for grant development
Library resources and services for grant developmentrds-wayne-edu
 
Accessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five SafesAccessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five SafesLouise Corti
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interityIUPUI
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfreypvhead123
 

Similar to Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Access Tools, and Long-Term Availability (20)

Research Ethics and Use of Restricted Access Data
Research Ethics and Use of Restricted Access DataResearch Ethics and Use of Restricted Access Data
Research Ethics and Use of Restricted Access Data
 
Preparing Research Data for Sharing
Preparing Research Data for SharingPreparing Research Data for Sharing
Preparing Research Data for Sharing
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Publishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 JunePublishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 June
 
Access and licencing of data
Access and licencing of dataAccess and licencing of data
Access and licencing of data
 
Sharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesSharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and Opportunities
 
The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...
 
Preparing research data for sharing
Preparing research data for sharingPreparing research data for sharing
Preparing research data for sharing
 
Stewarding Big Data
Stewarding Big DataStewarding Big Data
Stewarding Big Data
 
NIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesNIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - Slides
 
Why managedata
Why managedataWhy managedata
Why managedata
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
 
Research Transparency in the Social Sciences: DA-RT
Research Transparency in the Social Sciences: DA-RTResearch Transparency in the Social Sciences: DA-RT
Research Transparency in the Social Sciences: DA-RT
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
ICPSR Workshop Template - 2012/13
ICPSR Workshop Template - 2012/13ICPSR Workshop Template - 2012/13
ICPSR Workshop Template - 2012/13
 
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
 
Library resources and services for grant development
Library resources and services for grant developmentLibrary resources and services for grant development
Library resources and services for grant development
 
Accessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five SafesAccessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five Safes
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 

More from ICPSR

Asa integrating data 2 19-2014 with cites
Asa integrating data 2 19-2014 with citesAsa integrating data 2 19-2014 with cites
Asa integrating data 2 19-2014 with citesICPSR
 
Data in the HS Classroom: When, Why, and How?
Data in the HS Classroom: When, Why, and How?Data in the HS Classroom: When, Why, and How?
Data in the HS Classroom: When, Why, and How?ICPSR
 
ICPSR Secure Data Service: Broadening Access. Reducing Risk.
ICPSR Secure Data Service: Broadening Access. Reducing Risk.ICPSR Secure Data Service: Broadening Access. Reducing Risk.
ICPSR Secure Data Service: Broadening Access. Reducing Risk.ICPSR
 
Data in The Classroom: It's Not Just for Nerds Anymore!
Data in The Classroom:  It's Not Just for Nerds Anymore!Data in The Classroom:  It's Not Just for Nerds Anymore!
Data in The Classroom: It's Not Just for Nerds Anymore!ICPSR
 
Quantitative Literacy: Don't be afraid of data (in the classroom)!
Quantitative Literacy:  Don't be afraid of data (in the classroom)!Quantitative Literacy:  Don't be afraid of data (in the classroom)!
Quantitative Literacy: Don't be afraid of data (in the classroom)!ICPSR
 
ICPSR Data Services
ICPSR Data ServicesICPSR Data Services
ICPSR Data ServicesICPSR
 
TeachingWithData.org Outreach Presentation
TeachingWithData.org Outreach Presentation TeachingWithData.org Outreach Presentation
TeachingWithData.org Outreach Presentation ICPSR
 
ICPSR Data Managment
ICPSR Data ManagmentICPSR Data Managment
ICPSR Data ManagmentICPSR
 
ICPSR Data Sharing
ICPSR Data SharingICPSR Data Sharing
ICPSR Data SharingICPSR
 
Spice up your lecture with Inquiry-based Learning
Spice up your lecture with Inquiry-based LearningSpice up your lecture with Inquiry-based Learning
Spice up your lecture with Inquiry-based LearningICPSR
 
Guidance on Data Management Plans
Guidance on Data Management PlansGuidance on Data Management Plans
Guidance on Data Management PlansICPSR
 
TeachingWithData.org ASA Presentation 2010
TeachingWithData.org ASA Presentation 2010TeachingWithData.org ASA Presentation 2010
TeachingWithData.org ASA Presentation 2010ICPSR
 
TeachingWithData.org -- Faculty Presentation
TeachingWithData.org -- Faculty PresentationTeachingWithData.org -- Faculty Presentation
TeachingWithData.org -- Faculty PresentationICPSR
 
Bulletinspring2010final
Bulletinspring2010finalBulletinspring2010final
Bulletinspring2010finalICPSR
 
ICPSR: Resources for Use in Undergraduate Instruction
ICPSR: Resources for Use in Undergraduate InstructionICPSR: Resources for Use in Undergraduate Instruction
ICPSR: Resources for Use in Undergraduate InstructionICPSR
 
Using Quantitative Data in Teaching: ICPSR Resources
Using Quantitative Data in Teaching: ICPSR ResourcesUsing Quantitative Data in Teaching: ICPSR Resources
Using Quantitative Data in Teaching: ICPSR ResourcesICPSR
 
What Is A Virtual Meeting?
What Is A Virtual Meeting?What Is A Virtual Meeting?
What Is A Virtual Meeting?ICPSR
 

More from ICPSR (17)

Asa integrating data 2 19-2014 with cites
Asa integrating data 2 19-2014 with citesAsa integrating data 2 19-2014 with cites
Asa integrating data 2 19-2014 with cites
 
Data in the HS Classroom: When, Why, and How?
Data in the HS Classroom: When, Why, and How?Data in the HS Classroom: When, Why, and How?
Data in the HS Classroom: When, Why, and How?
 
ICPSR Secure Data Service: Broadening Access. Reducing Risk.
ICPSR Secure Data Service: Broadening Access. Reducing Risk.ICPSR Secure Data Service: Broadening Access. Reducing Risk.
ICPSR Secure Data Service: Broadening Access. Reducing Risk.
 
Data in The Classroom: It's Not Just for Nerds Anymore!
Data in The Classroom:  It's Not Just for Nerds Anymore!Data in The Classroom:  It's Not Just for Nerds Anymore!
Data in The Classroom: It's Not Just for Nerds Anymore!
 
Quantitative Literacy: Don't be afraid of data (in the classroom)!
Quantitative Literacy:  Don't be afraid of data (in the classroom)!Quantitative Literacy:  Don't be afraid of data (in the classroom)!
Quantitative Literacy: Don't be afraid of data (in the classroom)!
 
ICPSR Data Services
ICPSR Data ServicesICPSR Data Services
ICPSR Data Services
 
TeachingWithData.org Outreach Presentation
TeachingWithData.org Outreach Presentation TeachingWithData.org Outreach Presentation
TeachingWithData.org Outreach Presentation
 
ICPSR Data Managment
ICPSR Data ManagmentICPSR Data Managment
ICPSR Data Managment
 
ICPSR Data Sharing
ICPSR Data SharingICPSR Data Sharing
ICPSR Data Sharing
 
Spice up your lecture with Inquiry-based Learning
Spice up your lecture with Inquiry-based LearningSpice up your lecture with Inquiry-based Learning
Spice up your lecture with Inquiry-based Learning
 
Guidance on Data Management Plans
Guidance on Data Management PlansGuidance on Data Management Plans
Guidance on Data Management Plans
 
TeachingWithData.org ASA Presentation 2010
TeachingWithData.org ASA Presentation 2010TeachingWithData.org ASA Presentation 2010
TeachingWithData.org ASA Presentation 2010
 
TeachingWithData.org -- Faculty Presentation
TeachingWithData.org -- Faculty PresentationTeachingWithData.org -- Faculty Presentation
TeachingWithData.org -- Faculty Presentation
 
Bulletinspring2010final
Bulletinspring2010finalBulletinspring2010final
Bulletinspring2010final
 
ICPSR: Resources for Use in Undergraduate Instruction
ICPSR: Resources for Use in Undergraduate InstructionICPSR: Resources for Use in Undergraduate Instruction
ICPSR: Resources for Use in Undergraduate Instruction
 
Using Quantitative Data in Teaching: ICPSR Resources
Using Quantitative Data in Teaching: ICPSR ResourcesUsing Quantitative Data in Teaching: ICPSR Resources
Using Quantitative Data in Teaching: ICPSR Resources
 
What Is A Virtual Meeting?
What Is A Virtual Meeting?What Is A Virtual Meeting?
What Is A Virtual Meeting?
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 

Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Access Tools, and Long-Term Availability

  • 1. Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Access Tools, and Long-Term Availability Johanna Bleckman and Kaye Marz, ICPSR IASSIST 2015 Minneapolis, MN June 2, 2015
  • 2. Learning objectives You will become more familiar with: • Federal data sharing requirements and other good reasons to share data • Options for sharing data • Protection of confidentiality when sharing data • Data discovery tools • Online data exploration tools from ICPSR
  • 3. Objective I: Federal data sharing requirements and other good reasons to share data
  • 4. Why share data? • Federal data sharing requirements allow effective use of federal agency resources – Office of Science and Technology Policy memorandum (2013) – National Science Foundation (2008)  Requirement for Data Management Plans in 2010 – National Institutes of Heath (2003)  Requirement for Data Sharing Plans included – National Institute of Justice (1978)
  • 5. Why share data? • Benefits to the social science community (NIH, 2003) – Reinforces open scientific inquiry – Encourages diversity of analysis and opinion – Promotes new research, testing of new hypotheses and methods of analysis – Supports studies on data collection methods and measurement – Facilitates education of new researchers – Enables exploration of topics not envisioned by initial investigators – Permits creation of datasets combined from multiple sources
  • 6. Why share data? • Benefits to the researcher – Data in the public domain generate new research which cites the original research – Data are preserved and can be obtained by the original researchers if their copies are lost or destroyed – Archiving data helps researchers meet requirements of NIH and NSF data management plans and NIJ data archiving special conditions requirements – Information on research community’s interest in a dataset can assist with the success of future grant funding proposals
  • 7. • Lost in a technical malfunction • Destroyed in a flood in the department • Files were on the university server but are long gone • Data are kept for 10 years beyond the last time used • Purged after retirement • Institutional review boards required the data be destroyed after the project Where are the data now? Source: Pienta, Amy, Myron Gutmann, & Jared Lyle. 2009. “Research Data in The Social Sciences: How Much is Being Shared?” Research Conference on Research Integrity, Niagara Falls, NY.
  • 8. Sharing data…more publications! Table. Median Number of Publications by Data Sharing Status Data Archived (n=111) Data Shared Informal (n=415) Data Not Shared (n=409) Total (n=935) Primary PI Publications 6 6 3 4 Secondary Publications 8 6 3 5 Publications with Students 4 3 1 2 Source: Pienta, Amy M., George Alter, and Jared Lyle. 2010. “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.” Presented at the BRICK, DIME, STRIKE Workshop, The Organisation, Economics, and Policy of Scientific Research, Turin, Italy, April 23‐24, 2010 (http://hdl.handle.net/2027.42/78307)
  • 9. Data deposited in archive Discovered and accessed from archive Analyzed and results published New datasets or research ideas New data collected Cycle of Science
  • 10. Data deposited in archive Discovered and accessed from archive Analyzed and results published New datasets or research ideas New data collected Cycle of Science
  • 11. Data deposited in archive Discovered and accessed from archive Analyzed and results published New datasets or research ideas New data collected Cycle of Science
  • 12. Data deposited in archive Discovered and accessed from archive Analyzed and results published New datasets or research ideas New data collected Cycle of Science
  • 14. Options for sharing data through ICPSR .
  • 15. How do ICPSR and openICPSR compare to other data service providers? • Have professional data curators to review deposited materials who are experts in developing metadata (tags) for the social and behavioral sciences • Provide an immediate distribution network of over 760 institutions looking for research data, that has powerful search tools, and a data catalog indexed by major search engines • Sustained by a respected organization with over 50 years of experience in reliably protecting research data • Prepared to accept and disseminate sensitive and/or restricted-use data
  • 16. How is openICPSR different from ICPSR? • Sustained by deposit fee to individuals from non-members institutions ($600/project) • Data freely available to the public (or for a nominal charge for restricted-use data) • Most data available only in the raw (bit-level) form as deposited and described by the depositor; may be fully curated if the Professional Curation option was chosen and the quoted fee paid for ICPSR’s curation and/or disclosure review services • Sustained by institutional member fees; depositors are not charged to deposit data • Data are freely available to individuals affiliated with member institutions; non- members pay an access fee; both may pay fee to access restricted- use data in VDE • Data are fully curated including professional processing, value- added documentation, and renderings in popular statistical programs and online analysis
  • 17. Who might use openICPSR? • Researchers required to share data freely with the public to comply with grant/contract requirements • Authors required to share data for replication purposes to comply with journal requirements • Researchers required to share sensitive data or data with disclosure risk with the public from a secure digital environment • Researchers, including students, who want to share data publicly as good practice or for the purposes of replication
  • 18. Deposit agreements • Has implicit or explicit copyright and have the right to make it publicly available through ICPSR • Permits ICPSR to redisseminate, promote, catalog, reformat, store, and preserve the data collection • Permits ICPSR to transform or enhance the collection to protect confidentiality and for usability • Has removed all direct identifiers and done due diligence to prevent disclosure of subject identities • Holds ICPSR and University of Michigan harmless from liability for breaches of subject confidentiality or invasions of privacy
  • 19. openICPSR deposit terms includes more • Deposited work will be distributed under an Attribution 4.0 Creative Commons License • Depositor has institutional approval to share the data collection • Data collection will be preserved as-is and available at no cost to data users • Research subjects have consented to sharing the data and/or the depositor has institutional approval to share the data
  • 20. openICPSR: Public-access sharing solution for Institutions and Journals • Branded repository to represent and showcase their research data using a fully-hosted (cloud) service • Able to fulfill grant requirements that will pass an audit • No need to have technical staff or equipment to manage the repository • Easy and clean interface for deposit and access • Administrative (usage) reporting available 24/7 • Able to assure the Research Administration Office that data are safe and secure for the long term
  • 21. openICPSR: Examples of branded sites and services Your logo. Your colors. A unique URL. On-demand deposit. On-demand reports.
  • 22. Should the data be restricted? • Re-identification –Can individuals be identified from information in this material? –If the data were made public, could someone use a combination of variables (e.g. age, sex, race, occupation, geography) to find individuals in a publicly available database? • Harm –Does this material include sensitive information? –Would the release of individually identifiable information create a risk of harm (e.g. psychological distress, social embarrassment, financial loss) greater than the risks that people experience in everyday life?
  • 23. Identifiers • Direct – Names – Addresses, including ZIP codes and other postal codes – Telephone and fax numbers, including area codes – Social security numbers – Other linkable numbers such as driver’s license numbers, certification numbers, medical device numbers, etc. • Indirect – Detailed geographic information – Organizations belonged to, offices or posts held by respondent – Educational institutions attended, year of graduation – Detailed occupational titles – Place where respondent was born or grew up – Exact dates of significant life events (birth, death, marriage, etc) – Detailed income Direct identifiers must be removed from data before sharing! Data with indirect identifiers may need to be restricted.
  • 24. What sensitive topics could cause harm? • Psychological well being or mental health • Sexual attitudes, preferences, or practices • Use of alcohol or drugs • Illegal behavior • Behavior that puts the respondent at risk of criminal or civil liability • Behavior damaging to an individual’s financial standing, employability, or reputation • Medical information that could lead to discrimination, stigmatization, etc.
  • 25. Let’s make a deposit with openICPSR www.openicpsr.org
  • 26. Objective III: Protection of confidentiality when sharing data
  • 27. Common Objection/Misperception: “My data are too sensitive to share. . .” • Many restricted data files can be processed for public release • ICPSR has been sharing restricted-use data for over a decade via three methods: – Secure Download – Virtual Data Enclave – Physical Enclave • ICPSR stores & shares over 6,400 restricted-use datasets associated with over 2,000 ‘active’ restricted-use data contracts
  • 28. Virtual Data Enclave • Virtual machine launched from your desktop machine • Functions just like your localmachine, but with restrictions on whatcan enter and exitthe environment • Full suite of statisticalpackagesand other software
  • 29. Virtual Data enclave (VDE) – user experience VDE My laptop’s task bar
  • 31. Examples of Disclosure Risk Concerns • Tabular output including demographics or unique characteristics/activities, resulting in small cell sizes • Geographic information (zip codes, cities/towns, counties) • Geocodes, GIS data, maps • Longitudinal data – data from research subjects at multiple time points • Verbatim text – interviews, video transcriptions, short answer responses • Photos, videos, audio recordings
  • 32. Common Rules for Tables Each cell size is sufficiently large to prevent identifiability • Establish minimum threshold (often 3, 5, or 10), dependent on type and context of the data • Rules for cells, rows, and columns: – All cases in any row or column should not be in a single cell – Cell percentage should not correspond to a cell size less than a threshold number – A cell should not be a high percent of all cases included in a row or column (more than 60%) • Combinations of tables should also meet guidelines
  • 33. Disclosure Risk Protections (DRP) • Release data from only a sample of the population • Remove/mask obvious identifiers and high-risk variables • Limit geographic detail • Limit the number and detailed breakdown of categories • Top or bottom coding • Recode into intervals or round values • Addition of noise • Swap records • Blank and impute • Aggregate and replace with mean value (aka blurring)
  • 36. Documenting Disclosure Mitigation • Maintain and deposit the syntax used to modify the data • Include either: – a narrative description of changes in the study-level documentation, or – a series of item-specific narratives to be included at the variable level (codebook/user guide)
  • 38. • One of the world’s oldest and largest social science data archives, est. 1962 • Data distributed on punch cards, then reel- to-reel tape, now: – Data available on demand – Over 8,200 studies with over 68,700 data sets • Membership organization among 21 universities, now: – Currently over 760 members world-wide – Federal funding of public collections What is ICPSR? - Then and Now -
  • 39. The Concept of “Data Curation” • Curation, from the Latin "to care," is the process used to add value to data, maximize access, and ensure long-term preservation • Data curation is akin to work performed by an art or museum curator. – Data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future • Curation provides meaningful and enduring access to data • Data curation is the foundation for effective, long-term data sharing
  • 40. Assessing the data in the collection • Searching for and downloading data • Codebooks • Full descriptives • Variable search and compare • Simple crosstabs and frequencies • Online analysis functionality
  • 41. Study Search Behaviors In practice, we encounter three typical search behaviors from our users: • A user has a research question in mind. • A user is looking for a dataset that contains specific variables. • A user is looking for a specific dataset and has the study title or investigator name.
  • 42. Natural Language Searching • Does juvenile drug use lead to delinquency? • juvenile “drug use” delinquency • “juvenile drug use”
  • 44. Variable Search Tips, 1 • Enter words or strings that are likely to appear in a variable name, label, question, and value labels: • Presidential election will return variables dealing with all presidential elections • Presidential election Obama will return only variables dealing with the 2008 and 2012 presidential elections
  • 45. Variable Search Tips, 2 • Use quotes to search for specific phrases: • "life satisfaction""minority rights""community programs" • The minus sign may be used to remove certain types of results: • "Presidential debate" -"Bill Clinton" will eliminate the debates from 1992 and 1996 in which Clinton participated
  • 46. Variable Search Tips, 3 • A Boolean "and" is implied in the search. • The search automatically does stemming, there's no need to type in an asterisk. It's also case-insensitive. • A fielded search is also available.
  • 47. Searching for a Specific Dataset
  • 48. It’s really a searchable database . . . containing over 65,000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR . . .that can generate study bibliographies associating each study with the literature about it . . . Included in the integrated search on the ICPSR Web site The Bibliography of Data-related Literature
  • 50.
  • 51. Objective V: Online data exploration tools
  • 53.
  • 54.
  • 56.
  • 57.
  • 58.
  • 60.
  • 61.
  • 62.
  • 64.
  • 65.
  • 66.
  • 67.
  • 69. Assistance is available • Our user support staff are available from 8:00 a.m. to 5:00 p.m., ET, Monday to Friday – Phone: 734-647-2200 – netmail@icpsr.umich.edu
  • 70. References • Office of Science and Technology Policy memorandum (2013). Retrieved on 27/05/2015 from https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_ access_memo_2013.pdf • National Science Foundation, Social, Behavioral and Economic Sciences (SBE). (2010). Data Management Plan Policy. Retrieved 27/05/2015 from http://www.nsf.gov/sbe/SBE_DataMgmtPlanPolicy.pdf • National Science Foundation, Social, Behavioral and Economic Sciences (SBE) (2008). Data Archiving Policy. Retrieved 27/05/2015 from http://www.nsf.gov/sbe/ses/common/archive.jsp . • National Institutes of Health. (2003). NIH Data Sharing Policy and Implementation Guidance (includes data sharing plan requirements). Retrieved 27/05/2015 from grants2.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#finfre
  • 71. References (cont.) • National Institute of Justice (2014). About the Data Resources Program. Retrieved on 27/05/2015 from http://www.nij.gov/funding/data-resources- program/Pages/about.aspx • National Institute of Justice (2012). Data Archiving Plans for NIJ Funding Applicants. Retrieved 27/05/2015 from http://www.nij.gov/funding/data- resources-program/applying/data-archiving-strategies.htm • Pienta, Amy. M., George Alter., and Jared Lyle. (2010). The enduring value of social science research: The use and reuse of primary research data. Presented at the BRICK, DIME, STRIKE Workshop, The Organisation, Economics, and Policy of Scientific Research, Turin, Italy, April 23‐24, 2010 (http://hdl.handle.net/2027.42/78307)
  • 72. References (cont.) • Confidentiality and Data Access Committee (CDAC), Federal Committee on Statistical Methodology. Statistical Policy Working Paper 22 (Second version, 2005), Report on Statistical Disclosure Limitation Methodology, http://www.fcsm.gov/working-papers/spwp22.html. • Other Requirements Relating to Uses and Disclosures of Protected Health Information, 45 CFR 164.514 (2002), http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf • U.S. Department of Health and Human Services, National Institutes of Health. Research Repositories, Databases, and the HIPPA Privacy Rule (posted January 12, 2014; revised July 2, 2004), http://privacyruleandresearch.nih.gov/research_repositories.asp • Sweeney, L., “Simple Demographics Often Identify People Uniquely,”. Carnegie Mellon University, Data Privacy Working Paper 3, Carnegie Mellon University,. Pittsburgh 2000. http://dataprivacylab.org/projects/identifiability/index.html
  • 73. Resources • Link to Guidelines for Effective Data Management Plans (PDF), http://www.icpsr.umich.edu/files/datamanagement/DataManagementPlans- All.pdf. • Link to Guidelines and Resources for OSTP Data Access Plans (video), https://www.youtube.com/watch?v=sWnMFEKmfnE • Link to ICPSR Deposit Data (web page), http://www.icpsr.umich.edu/icpsrweb/deposit/index.jsp • Link to openICPSR (website and vidoes) https://www.openicpsr.org/ • Link to Disclosure Risk Training – For Public Use or Not For Public Use (video), https://www.youtube.com/watch?v=9vdWseLay9g&feature=youtu.be&list=P LqC9lrhW1VvaKgzk-S87WwrlSMHliHQo6%22

Editor's Notes

  1. Abstract: Federal data sharing requirements increase public access to federally-funded scientific data. For researchers, data sharing is a key resource in translating research into knowledge, policies, and practices. This workshop will assist participants in facilitating data sharing in the cycle of science that starts with deposited data, which through additional use, leads to the sharing of knowledge that inspires new data collection.  The workshop will cover several deposit options (to fully-curated archives and the public access archive, openICPSR), differences between sharing public-use and restricted-use data, and benefits to depositors through the ICPSR Website. A hands-on demonstration of making a deposit is planned. Finding data for the unique needs of a research project can be challenging, particularly in a world that values both the liberal use and protection of research data. The workshop will describe and demonstrate the array of discovery and exploration tools that leverage ICPSR’s vast data catalog, metadata, and online analysis options, discuss the discovery, use, and publishing from restricted-use data, and include group discussion of disclosure issues and hands-on time with ICPSR data tools. Participants will become more familiar with: Federal data sharing requirements Options for sharing data Data discovery tools Protection of confidentiality when sharing data Presenters: Johanna Bleckman; Kaye Marz, Tuesday, June 2, 2015 at 1:30-3:30 pm
  2. Our Learning Objectives today are to help you to become more familiar with: Federal (U.S.) data sharing requirements and other good reasons to share data Options for sharing data through ICPSR and a hands-on walk through of doing a deposit Johanna will cover the protection of confidentiality when sharing data and give specific examples of assessing and modifying the data She will show you ICPSR’s data discovery tools And also demonstrate ICPSR online data exploration tools in which you can do also on your own laptop if you have one with you
  3. So let’s proceed on to our first objective.
  4. First, we’ll explain why individuals should share their research data. Number 1 is because the government says you have to if the research was federally-funded. Data sharing makes effective use of the agency’s resources by avoiding unnecessary duplication of data collection and conserves research funds to support more investigators. In the U.S., our federal data sharing requirements include: In February 2013, the Executive Office of the President's Office of Science and Technology Policy published a memo (entitled "Increasing Access to the Results of Federally Funded Scientific Research,“) which directs funding agencies with an annual R&D budget over $100 million to develop a public access plan for disseminating the results of their research. (ICPSR DM&C site) The National Science Foundation and the National Institute of Health, both major U.S. research funding agencies, issued data sharing requirements to their grantees. A couple years later NSF issued a requirement for data management plans be included in proposals submitted to NSF starting in Jan 2011; NIH’s data sharing plan was part of their initial requirement. DMPs provide a structure for grantees to plan for data sharing at the end of the award period. The National Institute of Justice in the U.S. Department of Justice established a Data Resource Program to share their grantee data already in the late 1970s. More info about DMPs are available on the ICPSR YouTube channel.
  5. A second reason to share data is the benefits to the social science community The National Institutes of Health (NIH) data sharing policy (2003) identifies the following benefits from sharing data: Reinforces open scientific inquiry and encourages diversity of analysis and opinion, both important to the scientific enterprise Promotes new research, the testing of new or alternative hypotheses and methods of analysis Supports studies on data collection methods and measurement Facilitates the education of new researchers Enables the exploration of topics not envisioned by the initial investigators, and Permits the creation of new datasets when data from multiple sources are combined. This data integration and harmonization or other “data mash-ups” are of high interest to funding agencies and researchers alike.
  6. And the benefit isn’t only to the greater good; individual researchers benefit as well: Data in the public domain generate new research which cites the original research Data are preserved and can be obtained by the original researchers if their copies are lost or destroyed Archiving data helps researchers meet federal agency requirements Information on the research community’s interest in a dataset that ICPSR provides online as usage statistics can assist researchers with success of future grant funding proposals.
  7. Dr. Pienta and her colleagues at ICPSR conducted a study in 2008 and received the following responses about where the researcher’s data are located. The answers are a data counterpart to that lost library book: technical malfunctions or limitations, purged, or destroyed. More than 25% of the data were reported lost. Less than 15% reported archiving their data.
  8. Well, those researchers missed out on a real opportunity, because sharing data leads to more publications! The same study showed that the median number of publications were higher for data archived or shared, whether primary publications the original research team, by secondary analysts, or by their students.
  9. So, here we have a graphic of the cycle of science, which this workshop seeks to fuel. In the upper left, new data are collected, these data are then deposited in an archive or repository, the data are discovered and accessed by other researchers, who analyze the published their results, and this secondary research uncovers new research ideas and or creates new datasets from the data mash-ups mentioned earlier, which then leads to new research and more data being collected.
  10. So where data librarians and archivists have a role in this cycle? Helping with data management plans and data archiving
  11. So where data librarians and archivists have a role in this cycle? Helping with data management plans and data archiving Helping users find data in the repository
  12. So where data librarians and archivists have a role in this cycle? Helping with data management plans and data archiving Helping users find data in the repository Helping users find analyzed results in publications based on the data, which leads to more research and more data. So data librarians and archivists are crucial actors in this cycle of science.
  13. And now will talk about options for sharing data
  14. openICPSR is ICPSR’s public-access data collection. openICPSR assists researchers in meeting requirements for public access to federally funded research data. It ensures that data depositors fulfill public-access requirements of grant and contract RFPs. It is available to any researchers (from member and non-member institutions) who wish to make their data available to the public via a sustainable archive.
  15. Calendar Year 2013: Over 185,200 searches for data conducted on ICPSR’s data search tools
  16. ICPSR is an international consortium of member institutions.. It is a fee for access model – it is sustained by membership fees that are pooled to allow deposits to be made at no charge and members to access the data at no charge. Nonmembers of ICPSR can obtain member data for an access fee. Members and non-members may pay a fee to access restricted-use data in the VDE. These pooled funds also cover full data curation services, professional processing, value-added documentation, and data offered in the popular statistical programs, and many also for online analysis. openICPSR is a public access research data-sharing service that uses a fee for deposit model. Currently the fee of $600/project is charged to non-members and waived for researchers at ICPSR member institutions. The deposit fee enables public to access research data without charge—or in the case of restricted-use data, for nominal charge. It is important to note that fees for openICPSR are intended to be written into grant and contract proposals. Fees should be funded by the project, not outright by an institution or members of ICPSR. Most data in openICPSR are “as-is,” as deposited and described the depositor unless the depositor chose the Professional Curation option, which has a fee to cover use of ICPSR’’s curation and/or disclosure review services.
  17. Why is ICPSR venturing into this space? Because we were hearing from agencies that ICPSR might not be ‘open enough’ for some of our research scientists.
  18. Now we’re going to wade through a bit of legalese. But just for a couple of slides. Both the ICPSR and openICPSR Deposit Agreements have the same terms shown here. ICPSR request to be able to transform or enhance the collection even with openICPSR since ICPSR may elect to fully curate for the membership a very popular collection in openICPSR.
  19. But the openICPSR deposit agreement terms have 4 more clauses.
  20. And just launched earlier this year: openICPSR is available as a branded and hosted site for institutions and journals. Organization pays annual fee. Rates graduated based upon the projected number of new deposits per year and a maximum total GB of storage. Pricing plans are available on the openICPSR .
  21. Here are screenshots of openICPSR branded sites. On the left are site front pages with the organization’s logo and colors. The URL is also customized to be unique to that site. In the back on the right side, you can see the Browse Deposit list page with the site’s own colors. In the front, you can see the web traffic report for the other site.
  22. But before sharing data we need to consider the important concepts reidentification and harm to research subjects. Should the data be restricted? Re-identification Can individuals be identified from information in this material? If the data were made public, could someone use a combination of variables (e.g. age, sex, race, occupation, geography) to find individuals in a publicly available database? Harm Does this material include sensitive information? Would the release of individually identifiable information create a risk of harm (e.g. psychological distress, social embarrassment, financial loss) greater than the risks that people experience in everyday life?
  23. Here is a brief list of the most obvious direct and indirect identifiers. Direct identifiers must be removed from the data before sharing with ICPSR. Data that retain indirect identifiers may need to be made available as a restricted-use collection. Note that ICPSR will remove a collection from openICPSR if it determines that the files pose risk of disclosure or harm to the individuals represented in the data. ICPSR will notify the investigator, the investigator’s institution, and funding agency that ICPSR has done so.
  24. The other concern is potential harm. These are a concern if the data have any concerns about reidentification of any individual in the data.
  25. Let’s make a deposit with openICPSR. We do not want to create a test deposit in the production version of openICPSR, so only for today, we’ll use the openICPSR sandbox to make a deposit using the scenario in your workshop folder.
  26. A common misperception that we encounter is the belief that an investigator’s primary data are too disclosive to share. On the contrary, however, many restricted data files can be processed for public release, either in downloadable form or via a Restricted Use Agreement, with the right kind and the right level risk analysis and remediation. In fact, many of our studies start out too disclosive for public access, but can be modified such that they are still analytically valuable but disclosure risk is low enough that the files can be downloaded under a simple Term of Use. ICPSR obviously shares an abundance of downloadable data, but we have also been sharing restricted-use data for over a decade. Restricted-use data applicants submit to us a Data Use Agreement signed by their institution, a data security plan (meaning how they will store the data securely), and documentation from their institution’s IRB. We disseminate the data one of three ways. 1. Secure Download - most of the restricted data files curated and disseminated via ICPSR are distributed to approved users via a secure download link. 2. Virtual Data Enclave (I’ll talk more about this in a moment) 3. Physical Enclave – a room in the basement of ICPSR that allows access to the data on a stand-alone computer without access to the internet or ICPSR networks. We call this “monitored processing” – the analyst is accompanied by ICPSR staff and all of their output is hand-reviewed before s/he can leave the building. ICPSR stores & shares over 6,400 restricted-use datasets associated with over 2,000 currently ‘active’ restricted-use data contracts We’re currently adding about 50 new contracts each month.
  27. So what is it… Virtual machine launched from your desktop machine Functions just like your local machine, but with restrictions on what can enter and exit the environment Full suite of statistical packages and other software
  28. This is a screenshot of my machine’s screen when the VDE is active. You can see that the majority of the real estate here is the Virtual Environment, with my start menu, Windows Explorer accessing my files on the ICPSR servers… but then you can see that I have TWO task bars, one that is inside and relevant to the Virtual Environment, and one that is outside and relevant to my physical machine. I can toggle easily between the VDE and my email, the VDE and files on my local machine or on a server that I am connected to outside of the VDE
  29. This is the logic of the VDE, with me as ICPSR staff and you as an approved researcher, working on our local machines, logging in through this University of Michigan virtual space, and accessing restricted data on ICPSR servers.
  30. When we’re looking at data through the lens of disclosure, there are several things we’re targeting as likely sources of disclosure or calculations of risk. Tabular output. Tables crossing variables, particularly demographic variables, are a typical source of disclosure risk. Geographic information. Geographic information is always a concern, and can be an issue even for large geographic regions like states when combined with other information. As geographic information becomes more precise and available, the risk of disclosure increases. Dr. L. Sweeney from the Carnegie Mellon, in his paper “Simple Demographics Often Identify People Uniquely,” reported on the percentage of the population in the United States in the 1990 census that had characteristics that likely made them unique based only on gender and date of birth, plus location information [discuss table] Also Geocodes, GIS data, maps, etc. Longitudinal or panel data. Studies that contain information across time for the same individuals of course increases the risk of reidentification, given that the history over time provides much more risk for a unique pattern of values than does a single case from a one-time study. Verbatim responses or recordings. Cases that include the person’s personal history in his/her own words, greatly increases risk of identification or of reidentification. Names and locations might be mentioned by the respondent, and events may be unique to the individual. The style of speaking, regionalisms, and accents (if it’s a recording) all provide more identifiable information than on a multiple-choice survey. Any released information must include only content that has been selected and/or anonymized sufficiently to prevent identification. One of the projects in my archive is a study of teaching effectiveness, and it includes thousands of hours of videos of classroom sessions. Sports teams, school names or mascots, etc., in addition to faces and voices. Photographs or videos. A person may be recognized by an acquaintance, but reidentification could also occur through use of facial recognition software and other photos tagged with the person’s identity that are publicly available or available through social media.
  31. Each cell size is sufficiently large to prevent identifiability: An unweighted non-zero cell size should not be less than a small threshold number. Commonly used values are 3, 5, 10, depending on the sensitivity of the data. Percentage and count based rules for cells, rows, and columns: All cases in any row or column should not be in a single cell. If percentages are displayed, no cell should be 100% for a row or a column. If a cell entry is a percentage, that percentage should not correspond to a cell size less than a threshold number. A table cell should not ‘dominate’—a cell should not be a high percent of all cases included in a row or column, e.g. more than 60%. Combinations of tables (any table that could be constructed by combining a set of tables (e.g., addition and subtraction of rows or columns) also needs to meet guidelines. The values for ‘threshold’ numbers for cell size or value depend on the type and context of the data.
  32. So what do we do about it? Quite often Disclosure Risk Protections (or DRP) need to be incorporated to create uncertainty in the data in order to protect subjects. So the goal is “plausible deniability” – meaning one cannot say with certainty that this information pertains to a particular individual or organization. Listed on the screen right now are a list of DRP options. One DRP is to release data from only a sample of the population Remove or mask obvious identifiers and high-risk variables, i.e., they can be deleted from the data or original values are masked with a generic code Limit geographic detail (so you might recode up to a higher level of geography (such as county to state) or drop or further restrict access to geography variables Limit the number and detailed breakdown of categories within variables, so in the criminal justice world, specific offense codes can be combined into property, violent, drug, other Top or bottom-code to combined extreme codes in a variable, e.g., particular young or old ages appropriate for the data Recode into intervals or round values, e.g., collapse individual age values into 5-year increments Add noise by multiplying or adding a random numbers to the original value Swap or rank swap records (aka switching), e.g., matching records on key variables (demographics and study-specific variables of interest) and swapping the record across states Blank and impute: Selecting records at random or purposefully, blanking out selected variables, and replacing original values with imputed values Aggregating across small groups of respondents and replacing one individual's reported value with the average (aka blurring)
  33. In a study of teaching effectiveness, 1,930 teachers were observed and videotaped during classes. The observations were coded, and several teaching practice and effectiveness measures were created. A secondary analyst plans to study teacher effectiveness measures for eighth grade English. She is interested in the effect of years of teaching experience and student-and-teacher interactions on scores related to communicating with students. She wants to determine if there are significant differences by ethnicity and gender. This is a table created to check unweighted sample sizes by years of teaching experience for the primary demographic categories she is interested in. In the top table here, sample sizes are listed by year for the first 5 years of experience, and then by 5-year groups from 10 to 40 years. As you can see, the shaded cells for ethnicity and gender don’t meet the minimum cell size requirement, as you can see here. In the recoded version below you can see that I have collapsed Ethnicity into White, other, and Black or Hispanic. I also collapsed the 35 and 40 year values into a single 35+ value. Those actions together ameliorated this specific disclosure risk. We are always careful to balance the research utility against disclosure protections and it may be the case that this is a reasonable analytic sacrifice within the context of this study. If not, we would pursue one or more of the other tactics described on the previous slide.
  34. As we were processing these data it became apparent to us that individual students were identifiable via a combination of variables, but age in months was the biggest issue. We wanted to preserve as much of that specificity as possible given what we know about the utility of these data (meaning that we did not want to coarsen the age variable – age in months matters for school-age children). What we chose instead was to add “noise” to the age variable – adding or subtracting up to two months from each child’s age. This table here shows the means and standard deviations pre-noise, so to speak, and post-noise, demonstrating that while we have changed the data, the effects are very small. The correlations are all well above .98, which was our minimum threshold.
  35. I encourage you to document any disclosure mitigation that you carry out on your files before you share them. Best practice to: Maintain the syntax used to modify the data Include either: a narrative description of changes in the study-level documentation, or even better, a series of item-specific narratives to be included at the variable level (codebook/user guide)
  36. First, a little bit of history so that you can get a sense of how far we’ve come…. ICPSR is one of the world’s oldest and largest social science data archives, established in 1962 We first distributed data on of course punch cards, Then reel to reel tape And now, thankfully, the data are available on demand, most datasets can be downloaded directly from our websites. We are a membership organization, meaning that we are sustained by membership dues paid by institutions who in return have free access to all of our data and other services. We begin in 1962 with 22 member universities, and now we have over 750 member institutions from across the globe. As of January of this year, over 63,350 datasets (over 181,716 files) available for download associated with over 8,000 studies. In addition, just over 1,200 restricted-use studies (which includes over 6,000 datasets) available for analysis. ICPSR catalog indexes over 9,000 catalog entries/studies (as of 1/2015). To give you a sense of volume of downloads, total downloads for FY 2014 = over 683,200 datasets (428,900 studies) downloaded/accessed.
  37. What is Data Curation? In general, curation means to undergo a process to add value, maximize access, and ensure long-term preservation. As you would imagine, data curation is similar to the work done by museum or art gallery curators – so we organize, describe, clean, enhance, and preserve social science data so that the public, and more specifically the research community, can more easily and broadly access and make use of the data now and into the future. Similar to unique works of art, keeping research data only in our individual offices, or on our office computers, is a risky proposition if our goal is long-term preservation. It makes a lot more sense to turn it over to a professional curator to take the steps necessary to protect its integrity and make it findable and usable in perpetuity. So – I included this slide mostly because one of the folks who responded to our survey was hoping to leave this session with more information about how ICPSR curates data. The technical aspects of that are a bit out of scope for this workshop, but whoever you are – or anyone who is interested! – please do find me afterward if you have specific questions!
  38. A powerful search engine that supports multiple effective strategies for discovering data, We create and support full documentation including codebooks, quite often instruments, and other rich metadata, We have a variable search and compare tool that allows you to search our entire, enormous collection of variable level metadata to identify items of interest, whether it’s to identify data you want to analyze, compare how concepts are approached across various surveys, or perhaps across disciplines, or to begin to think about creating your own survey. Analysis tools a simple crosstab or frequency tool and then the full online analysis program called SDA, out of UC Berkeley.
  39. Our study search indexes the metadata records that ICPSR staff create, as well as the full text of the documentation files, including all the variable markup and question and answer text. In the study search, the metadata record is heavily weighted, especially the study title and the subject terms. This is as opposed to the variable search which I’ll talk more about in a moment. In practice, we encounter three typical search behaviors from our users that dictate which search options best meet the user’s needs. A user has a specific research question in mind and they want to find data that directly addresses that issue. The second common search behavior that we see is users searching for specific variables, which may or may not be conceptually related to one another. 3. The third search behavior I’ll demo, is when a user is looking for a specific dataset and has the study title or investigator name. That’s a pretty easy one.
  40. For some users, finding the right dataset is as simple as typing in the research question. Unfortunately, less great search engines are so common that users are more likely to provide one or two query terms, on the assumption that more query terms would narrow search results too far. The challenge then for us is educating our users and making sure our metadata is sufficient to the task given the capabilities of the search. To provide an example, let’s do a natural language search on the ICPSR website using a simple research question, like “Does juvenile drug use lead to delinquency?” The results are pretty good, and tend to be better than concise keyword searches, such as “juvenile ‘drug use’ delinquency,” [demo] which results in studies that only grab two of the three factors or focus on prediction of delinquency instead of delinquency itself. If I searched for “juvenile drug use” as a phrase, I’d actually get only one result. That’s one of the big limitations of phrase searching.
  41. One great feature of our website and the search tool is the variable relevance sort. So if you do a search and then sort by variable relevance, we weight the variable metadata more heavily and then display matching variables on the search results page, along with instructions that the user should separate different concepts with commas. This makes it relatively easy for a user to find a dataset that has a specific combination of variables. The variable display also includes checkboxes beside each variable and we provide a “compare variable” utility that allows users to view selected variables side by side, which I will demo in a few minutes.
  42. If a user is searching for a particular investigator, the search is relatively easy, unless the person’s name is very common. (E.g., Smith.) Our site offers a “Browse by Author” page to make the task somewhat easier, and to allow users to see variations on the name if they run into trouble. Site visitors can use quotation marks for phrase searching, but this can be problematic if the user is mistaken about the word order or gets a word in the title wrong. For example, “study” versus “survey.” In general, searching by title is very effective and we don’t need to provide the user with additional guidance.
  43. ICPSR’s Bibliography of Data-related Literature, an ongoing project to grow a searchable database that contains over 65,000 citations of known published and unpublished works resulting from analyses of data held in the ICPSR archive. Our bibliography was developed with support from the National Science Foundation, and it represents over 50 years of scholarship in the quantitative social sciences, extending from the inception of ICPSR in 1962 to the present. In the collection are: Resources using data in the ICPSR holdings as the primary data source Resources using ICPSR data in a comparison with the primary dataset investigated Resources "about" an ICPSR dataset or study series.
  44. The Bibliography of course facilitates literature searches to help folks identify research that has already been done, replicate analyses avoid unintentionally duplicating analyses and identify cross-disciplinary implications and uses for the data. Some unexpected benefits of the Bibliography are also listed here: I think of this as meta work – So studying how data resources are used, Conducting citations analyses, Understanding the data life cycle And learning about current methodological issues that folks are grappling with.
  45. We have several different use cases for our citation search. - Users who are looking for facts/tables, rather than raw data - Users who want to see how a particular dataset has been utilized - Researchers and funding agencies who want to gauge the impact of a particular study by seeing how much it has been cited One caveat is that the citation search is a bit limited in that it only searches the citation, not the full text. Due to relatively recent court rulings on the Google copyright case that favor indexing copyrighted content and providing snippets to users on the search results page, we are investigating the possibility of expanding this service to include full text of the publication, which would of course strengthen the database search tool. One of the current strengths of this utility, however, is that we provide links to full text whenever possible. We retain DOIs when we can locate them, and we provide dynamic links to WorldCat and Google Scholar if we don’t know the specific location of the article/report, making it that much easier for users to get to the full text.
  46. References
  47. References (continued)
  48. References (continued)
  49. Resources