Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle

ADA, DDI and the Data
Lifecycle
Dr. Steve McEachern
Director, ADA
Tech Talk
April 2017

ADA in Brief
• The Social Science Data Archive (now ADA) was set up
in 1981, housed in the Research School of Social
Sciences at ANU, with a mission to collect and preserve
Australian social science data on behalf of the social
science research community
• The Archive holds over 5000 datasets from around
1500 studies, including national election studies; public
opinion polls; social attitudes surveys, censuses,
aggregate statistics, administrative data and many
other sources.
• Data holdings are sourced from academic, government
and private sectors.

The Data Documentation
Initiative standard
http://www.ddialliance.org

About DDI
• A structured metadata specification of and for the
community
• Two major development lines – XML Schemas
– DDI Codebook
– DDI Lifecycle
• Additional specifications:
– Controlled vocabularies
– RDF vocabularies for use with Linked Data
• Model based version is in development
– with serialisations in XML and RDF
– Includes support for provenance and process models
• Managed by the DDI Alliance
– http://www.ddialliance.org

DDI-Codebook
• XML based, first published in 2000
• Four sections:
1. Document description: characteristics of the DDI XML
document itself
2. Study description: characteristics of the Study (project) that
the DDI is describing (including Related Materials:
documents associated with the project, such as
questionnaires, codebooks, etc.)
3. File description: characteristics of the physical data files
4. Variable description: characteristics of the variables in the
data file

DDI Lifecycle Model
6
Metadata Reuse

Why can DDI Lifecycle
do more?
• It is machine-actionable – not just documentary
• It’s more complex with a tighter structure
• It manages metadata objects through a structured
identification and reference system that allows
sharing between organizations
• It has greater support for related standards
• Reuse of metadata within the lifecycle of a study and
between studies
7

Managing and Depositing Data:
ADA and DDI

Approach
• Core archive website:
– http://www.ada.edu.au
• Sub-archives focussed on specialised thematic or
methodological areas
- eg. http://www.ada.edu.au/indigenous/home
• “Add-on” systems for complex analysis or
visualisation tasks:
– Nesstar
– GIS: http://gis-test.ada.edu.au
– Longitudinal visualisation: Panemalia
– Historical census data: http://hccda.ada.edu.au

Archival processing
Manual system with some automation tools
1. Deposit:
– Review of ADAPT submission
– Storage via ADAPT to file store
2. Data processing:
– File format conversion (usually to SPSS for processing)
– Privacy/confidentiality review
– Data cleaning (in consultation with depositor)
3. Metadata processing:
– DDI-C metadata creation in Nesstar Publisher
4. Publishing:
– Archival storage and access format creation
– Data publication to Nesstar server
– Metadata publication to Nesstar and ADA CMS

The ADA study page
Study information is available through the tabs at the top of the
study:
• Study: information including the investigators, abstract,
sample, data collection methods, and access requirements.
• Variables: a list of variables available in a quantitative dataset
• Related Materials: additional documentation, links and other
related studies (eg. others in the series) that may interest you
The study page is also the access point for the ADA Nesstar
system, for:
• Analysis of quantitative data online,
• Download of data to your own computer.

Future plans: Dataverse
• http://dataverse.org/
• “Dataverse is an open source web application to share,
preserve, cite, explore, and analyze research data. It
facilitates making data available to others, and allows you
to replicate others' work more easily. Researchers, data
authors, publishers, data distributors, and affiliated
institutions all receive academic credit and web visibility.
• A Dataverse repository is the software installation, which
then hosts multiple dataverses. Each dataverse contains
datasets, and each dataset contains descriptive metadata
and data files (including documentation and code that
accompany the data). As an organizing method,
dataverses may also contain other dataverses.”

Features
• One installation, multiple logins
• Multiple hosting options: Bare metal, VMWare, AWS,
OpenStack, …
• Login options: Native, ORCID, Shibboleth, …
• API and GUI access
• Client libraries: R, Python, Java
• OAI-PMH harvesting
• Open and Restricted data access
• New implications for data archiving, curation,
management and dissemination

Questions?
Steven McEachern
steven.mceachern@anu.edu.au
ada@anu.edu.au

Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle

Ähnlich wie Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle