This presentation was provided by Libbie Stephenson, UCLA Social Science Data Archive, during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Stephenson - Data Curation for Quantitative Social Science Research
1. LIBBIE STEPHENSON, DATA ARCHIVIST (RETIRED)
UCLA SOCIAL SCIENCE DATA ARCHIVE
LIBBIE@G.UCLA.EDU
HTTPS://DATAVERSE.HARVARD.EDU/DATAVERSE/SSDA_UCLA
Data Curation for Quantitative
Social Science Research:
A Case Study
NISO Virtual Conference: Data
Curation – Cultivating Past Research
Data for Future Consumption
August 31, 2016
2. DISCLAIMER
I am retired from UCLA so my
comments reflect my own experience
and expertise. They do not necessarily
reflect the ideas, opinions or practices
of anyone at UCLA.
These materials are free for you to
use, but please cite accordingly.
NISO - AUGUST 31, 2016
2
3. OVERVIEW
About the Archive
About the data we manage
What we are trying to do
What we actually do
Some illustrations
NISO - AUGUST 31, 2016
3
4. ABOUT THE ARCHIVE
Operating since 1964 -- before email, PC’s, Internet,
laptops, smart phones; Manage survey/quantitative
data stored on media from punch cards to cloud
Staff have library science degrees; statistical and
technical expertise; quantitative social science
background
Serve all UCLA quantitative researchers: Provide
reference, cataloging/metadata, long term archiving;
support in data rescue, management, security.
NISO - AUGUST 31, 2016
4https://dataverse.harvard.edu
/dataverse/ssda_ucla
5. SURVEY/QUANTITATIVE
RESEARCH
Carried out in the U.S. since 1940’s -- post
WW2
1960’s -70’s -- ICPSR & academic archives
1970’s -- growth of data oriented professional
associations (IASSIST, APDU, IFDO, CESSDA)
Focused on society and social norms
Predict outcomes; test assumptions; study
change over time; run experiments
NISO - AUGUST 31, 2016
5
Note: in any
discipline we
also need to
understand
the work
flow of the
research and
the way
individuals
approach
their work.
6. CURATION GOALS
Researcher driven philosophy of open access,
data sharing, reuse
Collaborative, multi-unit or multi-institutional
Ensure data conservation and long term usability,
as well as discovery and access
Processes and work flows support disaster
planning
Use of best and trusted digital repository
policies, models, practices, and work flows
Reflect values of accountability and integrity
NISO - AUGUST 31, 2016
6
7. POLICIES SUPPORT PRACTICE
Foundational, essential to a strong data curation
infrastructure.
Encompasses what is acquired/collected, curation
levels and scope, ensures long term usability, drives
processes and work flows
Social Science Data Archive policy
TOOL : Policy-making for Research Data in
Repositories by Ann Green, Stuart Macdonald and
Robin Rice.
NISO - AUGUST 31, 2016
7
8. OUR STEPS IN CURATION
Initial contact
Data Quality Review and Appraisal
Ingest
Verification
Metadata
Physical storage
Access
Preservation
NISO - AUGUST 31, 2016
8
9. INITIAL CONTACT
Data Curation Profile
Data Management Plan
Guide to Social Science Data Preparation
and Archiving
NISO - AUGUST 31, 2016
9
10. APPRAISAL
Archival Collection Policy
Also depends on:
Resources to process
Long term resources
Fitness, usefulness
Data Deposit Form signatures and
completeness; commitment to share
data; privacy and confidentiality
NISO - AUGUST 31, 2016
10
11. DATA QUALITY REVIEW
Use of statistical packages, emulator, Adobe Pro, Excel,
Colectica, Text editor
Verify deposit package, check sums, freq’s,
compare data to documentation
Completeness of codebook, question text,
sampling, weighting, recodes, methods
Disclosure analysis, check for personal identifiers
and assess privacy/confidentiality of respondents
Documentation converted to PDF/A
11
NISO - AUGUST 31, 2016
13. CODEBOOK DOCUMENTS THE
COLUMNS
NISO - AUGUST 31, 2016
13
5002 01 01 302000 001 101 10004B121068965
Each item is
called a variable.
We refer to the
numeric content
of each item as a
value.
14. COMPARE FREQS TO CODEBOOK
NISO - AUGUST 31, 2016
14
VALUES
VALUE LABELS
VARIABLE
15. RUN MARGINALS/FREQUENCIES
NISO - AUGUST 31, 2016
15
Sex of Respondent
Frequency Percent Valid Percent Cumulative Percent
Valid MALE 856 45.1 45.1 45.1
FEMALE 1041 54.9 54.9 100.0
Total 1897 100.0 100.0
What is your race - ethnicity
Frequency Percent Valid Percent Cumulative Percent
Valid White 618 32.6 32.6 32.6
Hispanic 475 25.0 25.0 57.6
Black 474 25.0 25.0 82.6
Asian or Pacific Islander 282 14.9 14.9 97.5
Native American or Alaskan native 17 .9 .9 98.4
Identifies more than one of the above groups 20 1.1 1.1 99.4
DON'T KNOW 2 .1 .1 99.5
REFUSED 9 .5 .5 100.0
Total 1897 100.0 100.0
16. INGEST – PHYSICAL FORMATS
Virus check, run check sums, address
versioning, fixity, file naming conventions
Convert files to archival formats if required
Back copies to external media
Copy datasets to Dataverse; Safe Archive tool
Use of secure file transfer client
SQL/PHP scripts for local holdings file
Compression software (7-zip)
NISO - AUGUST 31, 2016
16
Address
disaster plan
and file
access
(public and
local);
Security
requirements;
LOCKSS
17. INGEST– BIBLIOGRAPHIC METADATA
Bibliographic metadata enables search and
discovery:
Establish bibliographic-level identity for unique
items
Bibliographic record to WorldCat/Voyager
Add record to holdings database (SQL)
Create Dataverse record; Assign persistent
identifier
NISO - AUGUST 31, 2016
17
Produce and review with investigator
18. WHAT ELSE DO WE NEED TO
KNOW ABOUT THE DATA?
Description of the study
Citation
Funding source
Methodology
Sampling
Publications
NISO - AUGUST 31, 2016
18
19. EXAMPLE - DATAVERSE
NISO - AUGUST 31, 2016
19
Links to tools to
manage collections
Navigate to and
search for studies
Studies can be downloaded or
analyzed online
20. VARIABLE LEVEL SEARCH
CAPABILITIES
Enables searching across many studies at
once.
Enables searching shared catalogs of multiple
archives
TOOLS: Colectica Repository and NESSTAR
Requires local or remote hosting of software.
Can share the metadata files for repurposing.
NISO - AUGUST 31, 2016
20
21. DATA DOCUMENTATION
INITIATIVE
Document, Discover, and Interoperate
“International standard for describing data
that result from observational methods in
the social, behavioral, economic, and health
sciences”
“Facilitates interpretation and understanding
-- both by humans and computers”
NISO - AUGUST 31, 2016
21
http://www.ddialliance.org/
22. INGEST-VARIABLE LEVEL METADATA
Descriptive metadata of detailed information about the
data enables understandability and reuse:
Create variable-level metadata, using Colectica or
NESSTAR to produce standardized metadata records
Create DDI record; full DDI codebook
Migrate DDI to Colectica Repository
NISO - AUGUST 31, 2016
22
Produce and review with investigator
NESSTAR
23. EXAMPLE - IMPORTING DATA
Use the
Data tab
to import
files from
SPSS or
STATA
formats.
NISO - AUGUST 31, 2016
23
25. EXAMPLE DDI FROM COLECTICA
NISO - AUGUST 31, 2016
25
DDI fields are in
red; used to
create
documentation;
can be
repurposed
26. PRESERVATION AND CURATION
Continuous monitoring of file formats; migrate to new formats
when:
New operating system; New version of statistical software
New mode of file transfer; Code change
Monitoring of database function; software updates or redesigns
Monitoring of servers, external media health; replace as needed
Data forensics; check sums; validation; authentication; version
control; format migration; refresh media; record preservation
metadata -- DDI
Review disaster plan and collection policy at regular intervals
Review new or revised regulations for intellectual property;
security; data producers/distributors; funding agencies
Review with original depositor, their data management plans,
changes in access or user permissions
26
Focus is on functional-level preservation and long term
usability through use of DDI and continuous review.
27. UNCOMFORTABLE TRUTHS
Data management in institutions requires
high level administrative participation;
new, sustained funding; and differently
trained staff
Data management planning is not a static
event but a continuous process to ensure
long term independently understandable
informed reuse of research
There is an urgent need for standards, tools,
and best practice models for many different
file formats and disciplines
NISO - AUGUST 31, 2016
27
28. NEXT STEPS FOR PRACTITIONERS
“Crucial metadata about data are not always
being captured or created and linked to data in
repositories. Storage and persistence of data
submissions isn't enough. We need data
archivists and librarians to commit to partnering
with researchers to curate data -- to review
incoming data for usability, confidentiality, and
completeness of descriptive information.”
NISO - AUGUST 31, 2016
28
Ann Green (2016) Email communication
Used with permission
29. ANY QUESTIONS?
THANK YOU!
Social Science Data Archive, UCLA
Box 951484
Los Angeles, CA 90095-1484
310-825-0716
NISO - AUGUST 31, 2016
29
30. LINKSSocial Science Data Archive dataverse.harvard.edu/dataverse/ssda_ucla
Data Seal of Approval www.datasealofapproval.org/en/
National Digital Stewardship Alliance
ndsa.org/activities/levels-of-digital-preservation/
Open Archival Information System
www.oclc.org/research/publications/library/2000/lavoie-oais.html
Social Science Data Archive Policy
data-archive.library.ucla.edu/SSDA_collectionAndArchivingPolicy.pdf?_ga=
1.3255478.786669706.1378228281
Data Curation Profile datacurationprofiles.org/
Data Management Planning at ICPSR
www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/index.html
ICPSR Guide to Data Preparation
www.icpsr.umich.edu/icpsrweb/content/deposit/guide/
Colectica www.colectica.com/
NESSTAR www.nesstar.com/index.html
DDI www.ddialliance.org/
Dataverse dataverse.org/
NISO - AUGUST 31, 2016