SlideShare ist ein Scribd-Unternehmen logo
1 von 19
ADA, DDI and the Data
Lifecycle
Dr. Steve McEachern
Director, ADA
Tech Talk
April 2017
ADA in Brief
• The Social Science Data Archive (now ADA) was set up
in 1981, housed in the Research School of Social
Sciences at ANU, with a mission to collect and preserve
Australian social science data on behalf of the social
science research community
• The Archive holds over 5000 datasets from around
1500 studies, including national election studies; public
opinion polls; social attitudes surveys, censuses,
aggregate statistics, administrative data and many
other sources.
• Data holdings are sourced from academic, government
and private sectors.
The Data Documentation
Initiative standard
http://www.ddialliance.org
About DDI
• A structured metadata specification of and for the
community
• Two major development lines – XML Schemas
– DDI Codebook
– DDI Lifecycle
• Additional specifications:
– Controlled vocabularies
– RDF vocabularies for use with Linked Data
• Model based version is in development
– with serialisations in XML and RDF
– Includes support for provenance and process models
• Managed by the DDI Alliance
– http://www.ddialliance.org
DDI-Codebook
• XML based, first published in 2000
• Four sections:
1. Document description: characteristics of the DDI XML
document itself
2. Study description: characteristics of the Study (project) that
the DDI is describing (including Related Materials:
documents associated with the project, such as
questionnaires, codebooks, etc.)
3. File description: characteristics of the physical data files
4. Variable description: characteristics of the variables in the
data file
DDI Lifecycle Model
6
Metadata Reuse
Why can DDI Lifecycle
do more?
• It is machine-actionable – not just documentary
• It’s more complex with a tighter structure
• It manages metadata objects through a structured
identification and reference system that allows
sharing between organizations
• It has greater support for related standards
• Reuse of metadata within the lifecycle of a study and
between studies
7
Managing and Depositing Data:
ADA and DDI
Approach
• Core archive website:
– http://www.ada.edu.au
• Sub-archives focussed on specialised thematic or
methodological areas
- eg. http://www.ada.edu.au/indigenous/home
• “Add-on” systems for complex analysis or
visualisation tasks:
– Nesstar
– GIS: http://gis-test.ada.edu.au
– Longitudinal visualisation: Panemalia
– Historical census data: http://hccda.ada.edu.au
OAIS architecture
Data deposit: ADAPT
Archival processing
Manual system with some automation tools
1. Deposit:
– Review of ADAPT submission
– Storage via ADAPT to file store
2. Data processing:
– File format conversion (usually to SPSS for processing)
– Privacy/confidentiality review
– Data cleaning (in consultation with depositor)
3. Metadata processing:
– DDI-C metadata creation in Nesstar Publisher
4. Publishing:
– Archival storage and access format creation
– Data publication to Nesstar server
– Metadata publication to Nesstar and ADA CMS
The ADA study page
Study information is available through the tabs at the top of the
study:
• Study: information including the investigators, abstract,
sample, data collection methods, and access requirements.
• Variables: a list of variables available in a quantitative dataset
• Related Materials: additional documentation, links and other
related studies (eg. others in the series) that may interest you
The study page is also the access point for the ADA Nesstar
system, for:
• Analysis of quantitative data online,
• Download of data to your own computer.
The ADA Study Page
Future plans: Dataverse
• http://dataverse.org/
• “Dataverse is an open source web application to share,
preserve, cite, explore, and analyze research data. It
facilitates making data available to others, and allows you
to replicate others' work more easily. Researchers, data
authors, publishers, data distributors, and affiliated
institutions all receive academic credit and web visibility.
• A Dataverse repository is the software installation, which
then hosts multiple dataverses. Each dataverse contains
datasets, and each dataset contains descriptive metadata
and data files (including documentation and code that
accompany the data). As an organizing method,
dataverses may also contain other dataverses.”
Harvard Dataverse
Features
• One installation, multiple logins
• Multiple hosting options: Bare metal, VMWare, AWS,
OpenStack, …
• Login options: Native, ORCID, Shibboleth, …
• API and GUI access
• Client libraries: R, Python, Java
• OAI-PMH harvesting
• Open and Restricted data access
• New implications for data archiving, curation,
management and dissemination
Questions?
Steven McEachern
steven.mceachern@anu.edu.au
ada@anu.edu.au

Weitere ähnliche Inhalte

Was ist angesagt?

Types of databases
Types of databasesTypes of databases
Types of databases
PAQUIAAIZEL
 

Was ist angesagt? (20)

Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Trailblazing in the Wilderness of Data Management
Trailblazing in the Wilderness of Data ManagementTrailblazing in the Wilderness of Data Management
Trailblazing in the Wilderness of Data Management
 
Online resources for data management planning
Online resources for data management planning Online resources for data management planning
Online resources for data management planning
 
Roles & Skills for RDM
Roles & Skills for RDMRoles & Skills for RDM
Roles & Skills for RDM
 
Deep Impact: Metadata and SUNCAT
Deep Impact: Metadata and SUNCATDeep Impact: Metadata and SUNCAT
Deep Impact: Metadata and SUNCAT
 
Types of databases
Types of databasesTypes of databases
Types of databases
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
Types of Databases
Types of DatabasesTypes of Databases
Types of Databases
 
JISC Managing Research Data: Liaison Librarian Training
JISC Managing Research Data: Liaison Librarian Training JISC Managing Research Data: Liaison Librarian Training
JISC Managing Research Data: Liaison Librarian Training
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for Journals
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platform
 
Institutional Repository (IR) and Open Access in Academic Libraries
Institutional Repository (IR) and Open Access in Academic LibrariesInstitutional Repository (IR) and Open Access in Academic Libraries
Institutional Repository (IR) and Open Access in Academic Libraries
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
 
Unidata Overview 3.6.15
Unidata Overview 3.6.15Unidata Overview 3.6.15
Unidata Overview 3.6.15
 
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible LibraryBeyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
 
Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UK
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
Large Scale Data Clean-ups & Challenges for the Library
Large Scale Data Clean-ups & Challenges for the Library Large Scale Data Clean-ups & Challenges for the Library
Large Scale Data Clean-ups & Challenges for the Library
 
Introduction to Crossref, Seoul - Ed Pentz
Introduction to Crossref, Seoul - Ed PentzIntroduction to Crossref, Seoul - Ed Pentz
Introduction to Crossref, Seoul - Ed Pentz
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 

Ähnlich wie Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle

Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...
EDINA, University of Edinburgh
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Brigitte Jörg
 

Ähnlich wie Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle (20)

Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Edinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for DataEdinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for Data
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 
Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic Librarians
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)
 
DSpace for Data Revisited
DSpace for Data RevisitedDSpace for Data Revisited
DSpace for Data Revisited
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data Discovery
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk Project
 
Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...
 
Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
Research data management: DMP & repository
Research data management: DMP & repositoryResearch data management: DMP & repository
Research data management: DMP & repository
 
"Data in Context" IG sessions @ RDA 3rd Plenary
"Data in Context" IG sessions @  RDA 3rd Plenary"Data in Context" IG sessions @  RDA 3rd Plenary
"Data in Context" IG sessions @ RDA 3rd Plenary
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle

  • 1. ADA, DDI and the Data Lifecycle Dr. Steve McEachern Director, ADA Tech Talk April 2017
  • 2. ADA in Brief • The Social Science Data Archive (now ADA) was set up in 1981, housed in the Research School of Social Sciences at ANU, with a mission to collect and preserve Australian social science data on behalf of the social science research community • The Archive holds over 5000 datasets from around 1500 studies, including national election studies; public opinion polls; social attitudes surveys, censuses, aggregate statistics, administrative data and many other sources. • Data holdings are sourced from academic, government and private sectors.
  • 3. The Data Documentation Initiative standard http://www.ddialliance.org
  • 4. About DDI • A structured metadata specification of and for the community • Two major development lines – XML Schemas – DDI Codebook – DDI Lifecycle • Additional specifications: – Controlled vocabularies – RDF vocabularies for use with Linked Data • Model based version is in development – with serialisations in XML and RDF – Includes support for provenance and process models • Managed by the DDI Alliance – http://www.ddialliance.org
  • 5. DDI-Codebook • XML based, first published in 2000 • Four sections: 1. Document description: characteristics of the DDI XML document itself 2. Study description: characteristics of the Study (project) that the DDI is describing (including Related Materials: documents associated with the project, such as questionnaires, codebooks, etc.) 3. File description: characteristics of the physical data files 4. Variable description: characteristics of the variables in the data file
  • 7. Why can DDI Lifecycle do more? • It is machine-actionable – not just documentary • It’s more complex with a tighter structure • It manages metadata objects through a structured identification and reference system that allows sharing between organizations • It has greater support for related standards • Reuse of metadata within the lifecycle of a study and between studies 7
  • 8. Managing and Depositing Data: ADA and DDI
  • 9. Approach • Core archive website: – http://www.ada.edu.au • Sub-archives focussed on specialised thematic or methodological areas - eg. http://www.ada.edu.au/indigenous/home • “Add-on” systems for complex analysis or visualisation tasks: – Nesstar – GIS: http://gis-test.ada.edu.au – Longitudinal visualisation: Panemalia – Historical census data: http://hccda.ada.edu.au
  • 12.
  • 13. Archival processing Manual system with some automation tools 1. Deposit: – Review of ADAPT submission – Storage via ADAPT to file store 2. Data processing: – File format conversion (usually to SPSS for processing) – Privacy/confidentiality review – Data cleaning (in consultation with depositor) 3. Metadata processing: – DDI-C metadata creation in Nesstar Publisher 4. Publishing: – Archival storage and access format creation – Data publication to Nesstar server – Metadata publication to Nesstar and ADA CMS
  • 14. The ADA study page Study information is available through the tabs at the top of the study: • Study: information including the investigators, abstract, sample, data collection methods, and access requirements. • Variables: a list of variables available in a quantitative dataset • Related Materials: additional documentation, links and other related studies (eg. others in the series) that may interest you The study page is also the access point for the ADA Nesstar system, for: • Analysis of quantitative data online, • Download of data to your own computer.
  • 16. Future plans: Dataverse • http://dataverse.org/ • “Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others' work more easily. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility. • A Dataverse repository is the software installation, which then hosts multiple dataverses. Each dataverse contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data). As an organizing method, dataverses may also contain other dataverses.”
  • 18. Features • One installation, multiple logins • Multiple hosting options: Bare metal, VMWare, AWS, OpenStack, … • Login options: Native, ORCID, Shibboleth, … • API and GUI access • Client libraries: R, Python, Java • OAI-PMH harvesting • Open and Restricted data access • New implications for data archiving, curation, management and dissemination