SlideShare ist ein Scribd-Unternehmen logo
1 von 24
CNI FALL MEETING: December 10-11, 2012, Washington, DC




       The Service Family for
Research Data at Oxford University
    Wolfram Horstmann & Neil Jefferies
          Contributors: Paul Jeffreys, Sally Rumsey, Neil
          Jefferies, David Shotton, Glenn Swafford,
          James Wilson, Wolfram Horstmann, and more
The Research Data Family




          http://www.flickr.com/photos/barbourians/6152005267/


  Simple – Helpful – Multi Agency – Reference-based
Funders’ policies & Institutions




                    http://www.flickr.com/photos/larry1732/4773431202/




  RCUK – EPSRC – Wellcome – EC / Horizon 2020 – University Of Oxford
Research Data vs. Open Access




                         http://www.flickr.com/photos/dyle/7531848910




Different Animals: Scientific exploitation – Privacy – Security – but related…
Research Data Management – Light
        We found a DataCite DOI for your publication!
          doi:10.1594/WDCC/CLM_C20_3_D3             Validate   Change




                                   http://ora.ox.ac.uk/




     You have a publication? Show me where the data are.
Research Data Management – Light
        We found a DataCite DOI for your publication!
          doi:10.1594/WDCC/CLM_C20_3_D3             Validate   Change




                                   http://ora.ox.ac.uk/




     You have a publication? Show me where the data are.
Research Data Management Services


                                       DataPlan

                DataFinder

            DataBank            Training,
                               Advice and
                                Support

            ORDS

              DataStage


                      http://www.admin.ox.ac.uk/rdm/

      5 Data Primitives: Inform, Plan, Work, Archive, Find
Research Data Systems




      http://www.flickr.com/photos/natalielucier




               Over to Neil!
RDM - Oxford History
•   2008 Computing Services internal scoping study into data management
    requirements
•   2008 Libraries set up DataBank adjunct to ORA
•   2009-10 EIDCSR (Embedding Institutional Data Curation Services in Research)
      •    OUCS, OULS, OeRC, Research Services, Computational Biology, Cardiac
           Mechano-Electric Feedback Group (JISC Funded)
      •    Policy, processes, requirements
      •    JISC/HEFCE (Universities Modernisation Fund) Projects
•   2010-12 Sudamih/ViDaaS – Prototype/productionise Database-as-a-
    ServicesComputing Services
      •    ORDS (Oxford Research Data Service)
•   2010-12 Admiral/DataFlow – Prototype/productionise DataStage/DataBank
    Libraries, Computing Services, OeRC, IBRG, UKOLN, Canonical, Lightweight
    data management/archiving
•   DaMaRO (Data Management Rollout at Oxford)
      Integration, Training, Policy (JISC Funded)
       DataFinder data catalogue
EIDCSR

•   Draft University Research Data
    Management Policy
•   RDM Portal
•   ‘Work Bench’ 3D Image visualisation
    software
•   Initial core RDM metadata schema (being
    revised)
•   Digital curation workflow module, with
    metadata and archiving client
      • DataFlow progenitor
ORDS – Expunging MSAccess
DataStage

•   “Sheer Curation”
      • Minimal metadata required
      • Enhancement supported
•   Lightweight, low-impact data
    management
•   Network drive & Web UI
      • Simple perrmissions:
         Personal/group/world
•   Designed for local or cloud
    deployment
      • Leverage existing infrastructure
      • Debian packages/OVF
•   SWORD2 deposit into DataBank (or
    anything else!)
DataBank

      •   Bodleian Data Repository (in dev
          since 2008) parallels ORA
      •   “Data” currently defined as
          “Research outputs that don't fit in
          ORA”
      •   File and metadata format agnostic
            • supports packages (zip & tar)
            • component subaddressing
      •   Built on “FEDORA-Lite” object
          model
      •   Assigns DataCite DOI's
      •   Manages embargos
            • Secure, dark archive is
                segregated
      •   Manual and SWORD2 deposit
      •   REST API
      •   Debian Packages or OVF
DataPlan

•   Based on DCC DMPOnline tool
•   Create, save, submit and use
    data management plans
      • To accompany research
         grant applications
      • 20Q's guide the
         management and
         publication of data
•   Develop a simple DataCite-
    and CERIF-compliant Data
    Management Ontology
•   DMP's archived in Oxford
    DMPBank instance of the
    DataBank software
•   Captures metadata in advance
    of data deposit
The DaMaRo Project
Diversity is the Key Challenge
•   Data management practice differs between disciplines
      • Some don't consider their material to be data
      • Training and education to bridge the gap
•   Data is not and will never be located in the same place
      • DataBank, Subject repositories, Grid, offline, non-digital
      • Cataloguing & discovery but also acquisition, accession and forensics may be needed
•   Metadata standards development and adoption varies widely
      • Bioinformatics boasts 200+ standards for describing experiments
      • Tools like Elastic Search are essential
      • Support domain specific applications built over archives
      • Standards development and promotion at the other end of the spectrum
•   Data retention and metadata requirements vary
      • Funders mandates vs unfunded research
      • Legal requirements (IPR vs FOI)
      • Citation requirements (DataCite)
•   Interoperability
      • Research Information Management (CERIF)
      • Research communities (Linked Open Data)
      • Libraries and Archives (OAI-XXX, SWORD2)
Training and Support
DataFinder
      •   Catalogue/registry of research data
            • Wherever and whatever it is!
            • OAI-PMH harvesting of external
               data stores
            • Manual record entry for non-
               electronic or non-harvestable data
      •   Search/browse interface
      •   DataReporter module
            • CERIF compatible
            • Analytics as well as content
               statitics
      •   Core Metadata schema based on
          DataCite
      •   Interfaces with many systems
            • “Hub” Of RDM activity
      •   Hierarchical architecture
            • Local catalogues, subjects specific
               or inter-institutional catalogues
               possible
It lives!
Metadata (again)
•   Citation
      • DataCite kernel: Creator, Title, Date, Publisher*, ID*
•   Discovery
      • The more the merrier. Domain specific metadata is great (if not very tractable)
•   Funder requirements
      • EPSRC: “Sufficient metadata should be recorded and made openly available to
          enable other researchers to understand the potential for further research and re-
          use of the data”
      • Meh!
•   Assessment of usefulness/value
•   Preservation
      • Some can be autogenerated
      • File format diversity can be a challenge
•   Reporting and Business Intelligence
      • Different standards like CERIF require crosswalks/mappings
•   Manual entry generally disliked
      • Import from existing systems (other repositories/research platforms)
      • Acquire from researcher interactions with other systems (DMP, Datastage, ORDS)
Minimum Core Data (WIP!)

Element                                           Auto Gen        DataCite Note
Record/digital object ID                          UUID            M
                                                                          If no URL: contact details
Location of dataset         URL/ DOI              DataBank auto
                                                                          To enable indication of non-digital
                            Default: digital (+                           data. Check box + options.
[Medium]                    non-digital).                                 On/offline
                                                                          If depositor draw from WebAuth.
Creator (if not depositor) Repeatable             WebAuth/OxDMP M         (see optional)
Creator affiliation (if not Repeatable (see                               If depositor draw from WebAuth;
depositor)                  optional)             WebAuth/OxDMP           CUD; Imply subject

Title                                                             M
                            Default University    Default
Publisher of data           of Oxford                             M
                                                  Default                 If an embargo period has been in
                                                                          effect, use the date when the
Publication year            Default current                       M       embargo period ends.

Access terms & conditions Default + options
                                                                          For curation; ALT Name (Person or
                                                                          role) + Data owner contact. + Qu
                            Default               WebAuth/OxDMP           'Do you own the rights for this
Data owner                  Department                                    data?Need policy
                                                                          To set embargo
Access date to data         Default current
                            Default: CC0?
Rights for metadata         ODC?
                                                                          Import where possible using
                                                                          available data. Encourage imupt.+
[Subject]                   FAST + options                                K/w option. See Optional
Context Dependent Mandatory Metadata (WIP!)

   Element                                          Auto Gen        DataCite   EPSRC
                                                    OxDMP
  Funding agency        Multiple                                               M
                                                    OxDMP
  Grant number          Multiple                                               M
                        Link to project web
  Project information   page/blog
  Last access request                               Automatically
  date                                              determined                 M
                                                    Automatically
  Source                If imported record          determined
                                                    Automatically
  Source URL            If imported record          determined
  Data generation
  process               Text or link to
                        paper/document                                         M
  Why the data was
  generated/Abstract/ Might be link to project
  Brief description   page                                                     M
                      Repeatable; eg date
                      (range) of data collection;
                      format described in
  Date                W3CDTF                                        O          M

  Reason for embargo Repeatable; List options                                  [M]
Where Next?
•   Oxford DAMASC (Databank Archiving and Manuscript Submission Combined)
      • Bodleian and OUP: Data deposit into institutional data archive alongside publisher
        paper submission workflow with cross citation
•   Author identification project
      • Identity management across Libraries, CRIS, Publishers etc.
      • Based on sameas service – there will never be a single standard!
      • Privacy concerns
•   ViDaaS, DataBank and DataStage generating interest at a number of institutions
      • Transition to a more managed Open Source project arrangement
      • Sustainability model needs to be defined
      • Interoperability with wider spectrum of systems
•   DataBank/DataFinder Roadmap
      • Large file handling – just pass download details at the point of submission
          •  File can be acquired asynchronously in the background
      • Group management for DataFinder/DataBank - delegation and group administration
          •  Balance simplicity with requirements – challenge of mapping Oxford's org
             structure
•   Methodological publications (e.g. MyExperiment)
      • Bridge data and papers
      • Cover case where recreation cheaper than storage

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009LeeFeigenbaum
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to MetadataEUDAT
 
Spring Data NHJUG April 2012
Spring Data NHJUG April 2012Spring Data NHJUG April 2012
Spring Data NHJUG April 2012trisberg
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) DataAli Khalili
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopFebiyan Rachman
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastEric Kavanagh
 
Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Rob Grim
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
Charper.lawdi.20120601
Charper.lawdi.20120601Charper.lawdi.20120601
Charper.lawdi.20120601charper
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital librariesSören Auer
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big DataMarin Dimitrov
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementRENDER project
 

Was ist angesagt? (20)

Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Spring Data NHJUG April 2012
Spring Data NHJUG April 2012Spring Data NHJUG April 2012
Spring Data NHJUG April 2012
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) Data
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks ass
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Charper.lawdi.20120601
Charper.lawdi.20120601Charper.lawdi.20120601
Charper.lawdi.20120601
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big Data
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 

Andere mochten auch

How has technology transformed access and dissemination (horstmann)
How has technology transformed access and dissemination (horstmann)How has technology transformed access and dissemination (horstmann)
How has technology transformed access and dissemination (horstmann)BDLSS
 
How has technology transformed access and dissemination (horstmann)
How has technology transformed access and dissemination (horstmann)How has technology transformed access and dissemination (horstmann)
How has technology transformed access and dissemination (horstmann)BDLSS
 

Andere mochten auch (6)

Lm 17
Lm 17Lm 17
Lm 17
 
Lathe classification
Lathe classificationLathe classification
Lathe classification
 
How has technology transformed access and dissemination (horstmann)
How has technology transformed access and dissemination (horstmann)How has technology transformed access and dissemination (horstmann)
How has technology transformed access and dissemination (horstmann)
 
Great tips of pedichiura
Great tips of pedichiuraGreat tips of pedichiura
Great tips of pedichiura
 
Lathe classification
Lathe classificationLathe classification
Lathe classification
 
How has technology transformed access and dissemination (horstmann)
How has technology transformed access and dissemination (horstmann)How has technology transformed access and dissemination (horstmann)
How has technology transformed access and dissemination (horstmann)
 

Ähnlich wie CNI Fall Meeting Coverage

Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
Research Data Management at the University of Salford
Research Data Management at the University of SalfordResearch Data Management at the University of Salford
Research Data Management at the University of SalfordDavid Clay
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic TechnologiesPeter Haase
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Denodo
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureDenodo
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data GenerationFilip Radulovic
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by SunnyDignitasDigital1
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperabilityparker01
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 

Ähnlich wie CNI Fall Meeting Coverage (20)

Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Research Data Management at the University of Salford
Research Data Management at the University of SalfordResearch Data Management at the University of Salford
Research Data Management at the University of Salford
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me Anything
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 

CNI Fall Meeting Coverage

  • 1. CNI FALL MEETING: December 10-11, 2012, Washington, DC The Service Family for Research Data at Oxford University Wolfram Horstmann & Neil Jefferies Contributors: Paul Jeffreys, Sally Rumsey, Neil Jefferies, David Shotton, Glenn Swafford, James Wilson, Wolfram Horstmann, and more
  • 2. The Research Data Family http://www.flickr.com/photos/barbourians/6152005267/ Simple – Helpful – Multi Agency – Reference-based
  • 3. Funders’ policies & Institutions http://www.flickr.com/photos/larry1732/4773431202/ RCUK – EPSRC – Wellcome – EC / Horizon 2020 – University Of Oxford
  • 4. Research Data vs. Open Access http://www.flickr.com/photos/dyle/7531848910 Different Animals: Scientific exploitation – Privacy – Security – but related…
  • 5. Research Data Management – Light We found a DataCite DOI for your publication! doi:10.1594/WDCC/CLM_C20_3_D3 Validate Change http://ora.ox.ac.uk/ You have a publication? Show me where the data are.
  • 6. Research Data Management – Light We found a DataCite DOI for your publication! doi:10.1594/WDCC/CLM_C20_3_D3 Validate Change http://ora.ox.ac.uk/ You have a publication? Show me where the data are.
  • 7. Research Data Management Services DataPlan DataFinder DataBank Training, Advice and Support ORDS DataStage http://www.admin.ox.ac.uk/rdm/ 5 Data Primitives: Inform, Plan, Work, Archive, Find
  • 8. Research Data Systems http://www.flickr.com/photos/natalielucier Over to Neil!
  • 9. RDM - Oxford History • 2008 Computing Services internal scoping study into data management requirements • 2008 Libraries set up DataBank adjunct to ORA • 2009-10 EIDCSR (Embedding Institutional Data Curation Services in Research) • OUCS, OULS, OeRC, Research Services, Computational Biology, Cardiac Mechano-Electric Feedback Group (JISC Funded) • Policy, processes, requirements • JISC/HEFCE (Universities Modernisation Fund) Projects • 2010-12 Sudamih/ViDaaS – Prototype/productionise Database-as-a- ServicesComputing Services • ORDS (Oxford Research Data Service) • 2010-12 Admiral/DataFlow – Prototype/productionise DataStage/DataBank Libraries, Computing Services, OeRC, IBRG, UKOLN, Canonical, Lightweight data management/archiving • DaMaRO (Data Management Rollout at Oxford) Integration, Training, Policy (JISC Funded) DataFinder data catalogue
  • 10. EIDCSR • Draft University Research Data Management Policy • RDM Portal • ‘Work Bench’ 3D Image visualisation software • Initial core RDM metadata schema (being revised) • Digital curation workflow module, with metadata and archiving client • DataFlow progenitor
  • 11. ORDS – Expunging MSAccess
  • 12. DataStage • “Sheer Curation” • Minimal metadata required • Enhancement supported • Lightweight, low-impact data management • Network drive & Web UI • Simple perrmissions: Personal/group/world • Designed for local or cloud deployment • Leverage existing infrastructure • Debian packages/OVF • SWORD2 deposit into DataBank (or anything else!)
  • 13. DataBank • Bodleian Data Repository (in dev since 2008) parallels ORA • “Data” currently defined as “Research outputs that don't fit in ORA” • File and metadata format agnostic • supports packages (zip & tar) • component subaddressing • Built on “FEDORA-Lite” object model • Assigns DataCite DOI's • Manages embargos • Secure, dark archive is segregated • Manual and SWORD2 deposit • REST API • Debian Packages or OVF
  • 14. DataPlan • Based on DCC DMPOnline tool • Create, save, submit and use data management plans • To accompany research grant applications • 20Q's guide the management and publication of data • Develop a simple DataCite- and CERIF-compliant Data Management Ontology • DMP's archived in Oxford DMPBank instance of the DataBank software • Captures metadata in advance of data deposit
  • 16. Diversity is the Key Challenge • Data management practice differs between disciplines • Some don't consider their material to be data • Training and education to bridge the gap • Data is not and will never be located in the same place • DataBank, Subject repositories, Grid, offline, non-digital • Cataloguing & discovery but also acquisition, accession and forensics may be needed • Metadata standards development and adoption varies widely • Bioinformatics boasts 200+ standards for describing experiments • Tools like Elastic Search are essential • Support domain specific applications built over archives • Standards development and promotion at the other end of the spectrum • Data retention and metadata requirements vary • Funders mandates vs unfunded research • Legal requirements (IPR vs FOI) • Citation requirements (DataCite) • Interoperability • Research Information Management (CERIF) • Research communities (Linked Open Data) • Libraries and Archives (OAI-XXX, SWORD2)
  • 18. DataFinder • Catalogue/registry of research data • Wherever and whatever it is! • OAI-PMH harvesting of external data stores • Manual record entry for non- electronic or non-harvestable data • Search/browse interface • DataReporter module • CERIF compatible • Analytics as well as content statitics • Core Metadata schema based on DataCite • Interfaces with many systems • “Hub” Of RDM activity • Hierarchical architecture • Local catalogues, subjects specific or inter-institutional catalogues possible
  • 20.
  • 21. Metadata (again) • Citation • DataCite kernel: Creator, Title, Date, Publisher*, ID* • Discovery • The more the merrier. Domain specific metadata is great (if not very tractable) • Funder requirements • EPSRC: “Sufficient metadata should be recorded and made openly available to enable other researchers to understand the potential for further research and re- use of the data” • Meh! • Assessment of usefulness/value • Preservation • Some can be autogenerated • File format diversity can be a challenge • Reporting and Business Intelligence • Different standards like CERIF require crosswalks/mappings • Manual entry generally disliked • Import from existing systems (other repositories/research platforms) • Acquire from researcher interactions with other systems (DMP, Datastage, ORDS)
  • 22. Minimum Core Data (WIP!) Element Auto Gen DataCite Note Record/digital object ID UUID M If no URL: contact details Location of dataset URL/ DOI DataBank auto To enable indication of non-digital Default: digital (+ data. Check box + options. [Medium] non-digital). On/offline If depositor draw from WebAuth. Creator (if not depositor) Repeatable WebAuth/OxDMP M (see optional) Creator affiliation (if not Repeatable (see If depositor draw from WebAuth; depositor) optional) WebAuth/OxDMP CUD; Imply subject Title M Default University Default Publisher of data of Oxford M Default If an embargo period has been in effect, use the date when the Publication year Default current M embargo period ends. Access terms & conditions Default + options For curation; ALT Name (Person or role) + Data owner contact. + Qu Default WebAuth/OxDMP 'Do you own the rights for this Data owner Department data?Need policy To set embargo Access date to data Default current Default: CC0? Rights for metadata ODC? Import where possible using available data. Encourage imupt.+ [Subject] FAST + options K/w option. See Optional
  • 23. Context Dependent Mandatory Metadata (WIP!) Element Auto Gen DataCite EPSRC OxDMP Funding agency Multiple M OxDMP Grant number Multiple M Link to project web Project information page/blog Last access request Automatically date determined M Automatically Source If imported record determined Automatically Source URL If imported record determined Data generation process Text or link to paper/document M Why the data was generated/Abstract/ Might be link to project Brief description page M Repeatable; eg date (range) of data collection; format described in Date W3CDTF O M Reason for embargo Repeatable; List options [M]
  • 24. Where Next? • Oxford DAMASC (Databank Archiving and Manuscript Submission Combined) • Bodleian and OUP: Data deposit into institutional data archive alongside publisher paper submission workflow with cross citation • Author identification project • Identity management across Libraries, CRIS, Publishers etc. • Based on sameas service – there will never be a single standard! • Privacy concerns • ViDaaS, DataBank and DataStage generating interest at a number of institutions • Transition to a more managed Open Source project arrangement • Sustainability model needs to be defined • Interoperability with wider spectrum of systems • DataBank/DataFinder Roadmap • Large file handling – just pass download details at the point of submission • File can be acquired asynchronously in the background • Group management for DataFinder/DataBank - delegation and group administration • Balance simplicity with requirements – challenge of mapping Oxford's org structure • Methodological publications (e.g. MyExperiment) • Bridge data and papers • Cover case where recreation cheaper than storage