SlideShare ist ein Scribd-Unternehmen logo
1 von 34
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SSHOC
Setting up Dataverse repository for
research data
Slava Tykhonov, Senior Information Scientist
DANS-KNAW, The Royal Netherlands Academy of
Arts and Sciences
LIBSENSE webinar
10 March 2021
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
About me: DANS-KNAW projects (2016-2021)
● CLARIAH+ (ongoing)
● EOSC Synergy (ongoing)
● SSHOC Dataverse (ongoing)
● CESSDA DataverseEU 2018
● Time Machine Europe Supervisor at DANS-KNAW
● PARTHENOS Horizon 2020
● CESSDA PID (Personal Identifiers) Horizon 2020
● CLARIAH
● RDA (Research Data Alliance) PITTS Horizon 2020
● CESSDA SaW H2020-EU.1.4.1.1 Horizon 2020
2
Source: LinkedIn
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
About DANS-KNAW
DANS is the Dutch national centre of expertise and repository for research data.
We help researchers make their data available for reuse. This allows researchers
to use the data for new research and makes published research verifiable and
reproducible. With more than 150,000 datasets and a staff of 60, DANS is one of
the leading repositories in Europe.
Three pillars of DANS 2021-2025 programme: ‘Focus on FAIR”
•Centre of expertise for FAIR research data
•Versatile data repository: DANS data stations
•Active collaborator
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
What is Dataverse?
● Open source project developed by IQSS of Harvard University and published
on github
● Great product with very long history (from 2006)
● Very dynamic and experienced development team working in the Agile
environment (community call scheduled once in two weeks)
● Clear vision and understanding of research communities requirements,
public roadmap
● Strong community behind of Dataverse is helping to improve the basic
functionality and develop it further
● Dataverse has been selected as a data repository infrastructure by countries
from all continents
● Well developed architecture with rich API endpoints to build application
layers around Dataverse
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Federated Dataverse data repositories worldwide
Source: Merce Crosas, Harvard Data Commons
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DANS Dataverse 3.x migration (2016)
Basic DataverseNL services:
• Federated login for Netherlands
institutions
• Persistent Identifier Services (DOI and
handle)
• Integration with archival systems
Applications:
• Modern and historical world maps
visualisations
• Data API and Geo API services for
projects with data
• Panel datasets constructor
• Time series plot
• Treemaps
• Pie and chart visualizations
• Descriptive statistics tools
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DataverseNL collaborative data network
Source: https://dataverse.nl
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DataverseNL partners
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Cooperative Model DataverseNL
DANS provides and manages the system & storage, organizes the meetings
(back-office)
•Research institutions run their own RDM-support for their researchers /end users
(front-office)
•Every partner institute is responsible for their own data
•Shared costs (service membership + storage)
•Advisory Board consisting of partner representatives decides on the general
policy
•Administrators committee to discuss technical and functional issues
•Cooperation agreement, Service Level Agreement, Processor agreement
(GDPR), General Terms of use (end users)
Source: Marion Wittenberg, DataverseNL
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DANS Data Stations - Future Data Services
Dataverse is API based and a key framework for Open Innovation!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Major challenges to provide services for researchers
● Maintenance concerns - who will be in charge after project is finished?
● Infrastructure problems - how to install and run tools for researchers?
● Various Interoperability issues - how to leverage data exchange between
different systems and services
Software updates and bug fixing, licences, technical staff training, legal aspects
and so on...
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Installation Manual
https://guides.dataverse.org/en/latest/installation/index.html
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Docker module (CESSDA Dataverse, 2018)
Source: https://github.com/IQSS/dataverse-docker
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Kubernetes
Project maintained by Oliver Bertuch (FZ Julich) and available in Global
Dataverse Community Consortium github (GDCC)
Google Cloud, Amazon AWS, Microsoft Azure platforms supported
Open Source, community pull requests are welcome
http://github.com/IQSS/dataverse-kubernetes
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SSHOC project task 5.2 Hosting and sharing data repositories
• Makes use of Dataverse software
• 4 ERICs: DARIAH, CLARIN, EHRIS and CESSDA
• Building mature infrastructure based on requirements of involved communities
• Developing external applications integrated with Dataverse (Dataverse Store)
• Investigating sustainable governance models
• Training Service Providers and institutes how to use Dataverse as a service
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Data Commons is essential for integrations
Source: Merce Crosas, “Harvard Data Commons”
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
FAIR Dataverse
Source:
Mercè Crosas,
“FAIR principles and
beyond:
implementation in
Dataverse”
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Our goals to increase Dataverse interoperability
Provide a custom FAIR metadata schema for European research communities:
● CESSDA metadata (Consortium of European Social Science Data Archives)
● Component MetaData Infrastructure (CMDI) metadata from CLARIN
linguistics community
Connect metadata to ontologies and CVs:
● link metadata fields to common ontologies (Dublin Core, DCAT)
● define semantic relationships between (new) metadata fields (SKOS)
● select available external controlled vocabularies for the specific fields
● provide multilingual access to controlled vocabularies
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SKOSMOS framework to discover ontologies
20
● SKOSMOS is developed in
Europe by the National Library
of Finland (NLF)
● active global user community
● search and browsing interface
for SKOS concept
● multilingual vocabularies
support
● used for different use cases
(publish vocabularies, build
discovery systems, vocabulary
visualization)
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
External CV support: metadata field could be linked to many ontologies
Language switch in Dataverse will change the language terms!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse App Store
Let’s build different services out of tools!
Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML,
Images, video render, audio, JSON, GeoJSON/Shapefiles/Map, XML
Interoperability: external controlled vocabularies (CESSDA CV Manager)
Data processing: NESSTAR DDI migration tool
Linked Data: RDF compliance (FAIR Data Point)
Federated login as a service (OAuth/Shibboleth in the same installation)
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Spreadsheet Previewer
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse and CLARIN tools integration
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Make Data Count metrics
Make Data Count is part of a broader Research Data Alliance (RDA) Data Usage Metrics
Working Group which helped to produce a specification called the COUNTER Code of
Practice for Research Data.
The following metrics can be downloaded directly from the DataCite hub for datasets hosted
by Dataverse installations:
● Total Views for a Dataset
● Unique Views for a Dataset
● Total Downloads for a Dataset
● Downloads for a Dataset
● Citations for a Dataset (via Crossref)
Dataverse Metrics API is a powerful source for BI tools used for the Data Landscape
monitoring.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Metrics from 30+ repositories
Source: Metrics
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Multilingual support
Dataverse SSHOC will run Weblate as a service for the user interface,
metadata schema and SOLR translation.
We’ve developed an experimental but adjustable pipeline for multilingual
support that allows to download and synchronize all translations available in
Dataverse Consortium github and provides easy access for translators to
keep all properties up-to-date.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse localization with Weblate
● service to connect files to Weblate in order to
translate them in a structured way
● several options for project visibility: accept
translations by the crowd, or only give access
to a select group of translators.
● Weblate indicates untranslated strings,
strings with failing checks, and strings that
need approval.
● when new strings are added with an upgrade
of Dataverse, Weblate can indicate which
strings are new and untranslated.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
GUI translation with Weblate as a service
Source: SSHOC Weblate
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Services in European Open Science Cloud (EOSC)
● EOSC requires the level 8 of maturity (at
least)
● we need the highest quality of software to
be accepted as a service
● clear and transparent evaluation of
services is essential
● the evidence of technical maturity is the
key to success
● the limited warranty will allow to stop out-
of-warranty services
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Applications maturity level
Every software package should follow the same CESSDA Maturity Model to
be accepted as a service.
Must have: k8s infrastructure with upstream Docker images, warranty
statement, documentation, unit tests, Selenium tests, jenkins pipeline
Running demonstration service will allow to create the connection to your
own Dataverse.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
CI/CD pipeline with SQAaaS (S)
1
2
3
git
push
Push GCP
container
registry
webhook
Create
docker
image
Kubernetes
Deployment
git clone
Jenkins pipeline (Jenkinsfile)
9
7
Run SQA
S 8
1. Developer pushes code to GitHub
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. (S) Runs SQA tests and does FAIRness check
5. (S) Issuing digital badge according to the results
6. (S) SQAaaS API triggers appropriate workflow
7. Creates docker image if success
8. Pushes new docker image to container registry
9. Updates the kubernetes deployment
32
Source: EOSC Synergy project
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Who is going to benefit from SSHOC Dataverse project?
• (SSH) institutes and researchers will be offered a Dataverse installation
on the cloud
• (SSH) institutes will be offered a Dataverse archive in a box solution for
their own purposes
• Many of the features to be developed in SSHOC will benefit also other
Dataverse installations / communities
All developments will be available for Dataverse community members!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Questions?
Slava Tykhonov (DANS-KNAW)
Senior Information Scientist
vyacheslav.tykhonov@dans.knaw.nl
Co-Chair:
Dataverse Working Group (WG) on Controlled Vocabularies and Ontologies
Dataverse WG on Registries
dataverse.org

Weitere ähnliche Inhalte

Was ist angesagt?

LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Creating Knowledge out of Interlinked Data
 

Was ist angesagt? (20)

Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 

Ähnlich wie Setting up Dataverse repository for research data

Data_Spaces_inputs_for_FIWARE_summit_Clara_Pezuela.pdf
Data_Spaces_inputs_for_FIWARE_summit_Clara_Pezuela.pdfData_Spaces_inputs_for_FIWARE_summit_Clara_Pezuela.pdf
Data_Spaces_inputs_for_FIWARE_summit_Clara_Pezuela.pdf
FIWARE
 
EU Data Spaces_Clara Pezuela.pptx
 EU Data Spaces_Clara Pezuela.pptx EU Data Spaces_Clara Pezuela.pptx
EU Data Spaces_Clara Pezuela.pptx
FIWARE
 

Ähnlich wie Setting up Dataverse repository for research data (20)

Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
SSHOC at EOSC-hub Week - SSH & eInfra Projects - Daan Broeder
SSHOC at EOSC-hub Week - SSH & eInfra Projects - Daan BroederSSHOC at EOSC-hub Week - SSH & eInfra Projects - Daan Broeder
SSHOC at EOSC-hub Week - SSH & eInfra Projects - Daan Broeder
 
SSH & eInfra projects
SSH & eInfra projectsSSH & eInfra projects
SSH & eInfra projects
 
2. EOSC-hub (Daan Broeder, CLARIN ERIC)
2. EOSC-hub (Daan Broeder, CLARIN ERIC)2. EOSC-hub (Daan Broeder, CLARIN ERIC)
2. EOSC-hub (Daan Broeder, CLARIN ERIC)
 
SSHOC General Presentation
SSHOC General PresentationSSHOC General Presentation
SSHOC General Presentation
 
Sshoc kick off meeting - Work package 3 Pitch - Daan Broeder - KNAW HuC/CLARI...
Sshoc kick off meeting - Work package 3 Pitch - Daan Broeder - KNAW HuC/CLARI...Sshoc kick off meeting - Work package 3 Pitch - Daan Broeder - KNAW HuC/CLARI...
Sshoc kick off meeting - Work package 3 Pitch - Daan Broeder - KNAW HuC/CLARI...
 
5. RDA community - Expectations from 14th plenary (Marieke Willems, Trust-IT)
5. RDA community - Expectations from 14th plenary (Marieke Willems, Trust-IT)5. RDA community - Expectations from 14th plenary (Marieke Willems, Trust-IT)
5. RDA community - Expectations from 14th plenary (Marieke Willems, Trust-IT)
 
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
 
PaNOSC and ExPaNDS commitment to Open Science
PaNOSC and ExPaNDS commitment to Open SciencePaNOSC and ExPaNDS commitment to Open Science
PaNOSC and ExPaNDS commitment to Open Science
 
Data_Spaces_inputs_for_FIWARE_summit_Clara_Pezuela.pdf
Data_Spaces_inputs_for_FIWARE_summit_Clara_Pezuela.pdfData_Spaces_inputs_for_FIWARE_summit_Clara_Pezuela.pdf
Data_Spaces_inputs_for_FIWARE_summit_Clara_Pezuela.pdf
 
Overview of the Sustainability Plans of the ICT-29b) Projects
Overview of the Sustainability Plans of the ICT-29b) ProjectsOverview of the Sustainability Plans of the ICT-29b) Projects
Overview of the Sustainability Plans of the ICT-29b) Projects
 
Webinar@AIMS: How to practically support Open Access: Guidelines for Data Pro...
Webinar@AIMS: How to practically support Open Access: Guidelines for Data Pro...Webinar@AIMS: How to practically support Open Access: Guidelines for Data Pro...
Webinar@AIMS: How to practically support Open Access: Guidelines for Data Pro...
 
How to practically support Open Access: Guidelines for Data Providers of the ...
How to practically support Open Access: Guidelines for Data Providers of the ...How to practically support Open Access: Guidelines for Data Providers of the ...
How to practically support Open Access: Guidelines for Data Providers of the ...
 
SFScon22 - Roberto Monsorno - ERMES - EnviRonmental pollution Micromobility s...
SFScon22 - Roberto Monsorno - ERMES - EnviRonmental pollution Micromobility s...SFScon22 - Roberto Monsorno - ERMES - EnviRonmental pollution Micromobility s...
SFScon22 - Roberto Monsorno - ERMES - EnviRonmental pollution Micromobility s...
 
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch CatalogueExposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
 
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
DataverseEU: Building Multilingual infrastructure  for the Social Sciences in...DataverseEU: Building Multilingual infrastructure  for the Social Sciences in...
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
 
DANS Data Trail Data Management Tools for Archaeologists
DANS Data Trail Data Management Tools for ArchaeologistsDANS Data Trail Data Management Tools for Archaeologists
DANS Data Trail Data Management Tools for Archaeologists
 
EU Data Spaces_Clara Pezuela.pptx
 EU Data Spaces_Clara Pezuela.pptx EU Data Spaces_Clara Pezuela.pptx
EU Data Spaces_Clara Pezuela.pptx
 

Mehr von vty

Mehr von vty (7)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
 

KĂźrzlich hochgeladen

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
SĂŠrgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
levieagacer
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 

KĂźrzlich hochgeladen (20)

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 

Setting up Dataverse repository for research data

  • 1. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SSHOC Setting up Dataverse repository for research data Slava Tykhonov, Senior Information Scientist DANS-KNAW, The Royal Netherlands Academy of Arts and Sciences LIBSENSE webinar 10 March 2021
  • 2. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 About me: DANS-KNAW projects (2016-2021) ● CLARIAH+ (ongoing) ● EOSC Synergy (ongoing) ● SSHOC Dataverse (ongoing) ● CESSDA DataverseEU 2018 ● Time Machine Europe Supervisor at DANS-KNAW ● PARTHENOS Horizon 2020 ● CESSDA PID (Personal Identifiers) Horizon 2020 ● CLARIAH ● RDA (Research Data Alliance) PITTS Horizon 2020 ● CESSDA SaW H2020-EU.1.4.1.1 Horizon 2020 2 Source: LinkedIn
  • 3. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 About DANS-KNAW DANS is the Dutch national centre of expertise and repository for research data. We help researchers make their data available for reuse. This allows researchers to use the data for new research and makes published research verifiable and reproducible. With more than 150,000 datasets and a staff of 60, DANS is one of the leading repositories in Europe. Three pillars of DANS 2021-2025 programme: ‘Focus on FAIR” •Centre of expertise for FAIR research data •Versatile data repository: DANS data stations •Active collaborator
  • 4. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 What is Dataverse? ● Open source project developed by IQSS of Harvard University and published on github ● Great product with very long history (from 2006) ● Very dynamic and experienced development team working in the Agile environment (community call scheduled once in two weeks) ● Clear vision and understanding of research communities requirements, public roadmap ● Strong community behind of Dataverse is helping to improve the basic functionality and develop it further ● Dataverse has been selected as a data repository infrastructure by countries from all continents ● Well developed architecture with rich API endpoints to build application layers around Dataverse
  • 5. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Federated Dataverse data repositories worldwide Source: Merce Crosas, Harvard Data Commons
  • 6. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DANS Dataverse 3.x migration (2016) Basic DataverseNL services: • Federated login for Netherlands institutions • Persistent Identifier Services (DOI and handle) • Integration with archival systems Applications: • Modern and historical world maps visualisations • Data API and Geo API services for projects with data • Panel datasets constructor • Time series plot • Treemaps • Pie and chart visualizations • Descriptive statistics tools
  • 7. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DataverseNL collaborative data network Source: https://dataverse.nl
  • 8. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DataverseNL partners
  • 9. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Cooperative Model DataverseNL DANS provides and manages the system & storage, organizes the meetings (back-office) •Research institutions run their own RDM-support for their researchers /end users (front-office) •Every partner institute is responsible for their own data •Shared costs (service membership + storage) •Advisory Board consisting of partner representatives decides on the general policy •Administrators committee to discuss technical and functional issues •Cooperation agreement, Service Level Agreement, Processor agreement (GDPR), General Terms of use (end users) Source: Marion Wittenberg, DataverseNL
  • 10. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DANS Data Stations - Future Data Services Dataverse is API based and a key framework for Open Innovation!
  • 11. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Major challenges to provide services for researchers ● Maintenance concerns - who will be in charge after project is finished? ● Infrastructure problems - how to install and run tools for researchers? ● Various Interoperability issues - how to leverage data exchange between different systems and services Software updates and bug fixing, licences, technical staff training, legal aspects and so on...
  • 12. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Installation Manual https://guides.dataverse.org/en/latest/installation/index.html
  • 13. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Docker module (CESSDA Dataverse, 2018) Source: https://github.com/IQSS/dataverse-docker
  • 14. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Kubernetes Project maintained by Oliver Bertuch (FZ Julich) and available in Global Dataverse Community Consortium github (GDCC) Google Cloud, Amazon AWS, Microsoft Azure platforms supported Open Source, community pull requests are welcome http://github.com/IQSS/dataverse-kubernetes
  • 15. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
  • 16. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SSHOC project task 5.2 Hosting and sharing data repositories • Makes use of Dataverse software • 4 ERICs: DARIAH, CLARIN, EHRIS and CESSDA • Building mature infrastructure based on requirements of involved communities • Developing external applications integrated with Dataverse (Dataverse Store) • Investigating sustainable governance models • Training Service Providers and institutes how to use Dataverse as a service
  • 17. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Data Commons is essential for integrations Source: Merce Crosas, “Harvard Data Commons”
  • 18. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 FAIR Dataverse Source: Mercè Crosas, “FAIR principles and beyond: implementation in Dataverse”
  • 19. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Our goals to increase Dataverse interoperability Provide a custom FAIR metadata schema for European research communities: ● CESSDA metadata (Consortium of European Social Science Data Archives) ● Component MetaData Infrastructure (CMDI) metadata from CLARIN linguistics community Connect metadata to ontologies and CVs: ● link metadata fields to common ontologies (Dublin Core, DCAT) ● define semantic relationships between (new) metadata fields (SKOS) ● select available external controlled vocabularies for the specific fields ● provide multilingual access to controlled vocabularies
  • 20. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SKOSMOS framework to discover ontologies 20 ● SKOSMOS is developed in Europe by the National Library of Finland (NLF) ● active global user community ● search and browsing interface for SKOS concept ● multilingual vocabularies support ● used for different use cases (publish vocabularies, build discovery systems, vocabulary visualization)
  • 21. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 External CV support: metadata field could be linked to many ontologies Language switch in Dataverse will change the language terms!
  • 22. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse App Store Let’s build different services out of tools! Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render, audio, JSON, GeoJSON/Shapefiles/Map, XML Interoperability: external controlled vocabularies (CESSDA CV Manager) Data processing: NESSTAR DDI migration tool Linked Data: RDF compliance (FAIR Data Point) Federated login as a service (OAuth/Shibboleth in the same installation)
  • 23. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Spreadsheet Previewer
  • 24. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse and CLARIN tools integration
  • 25. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Make Data Count metrics Make Data Count is part of a broader Research Data Alliance (RDA) Data Usage Metrics Working Group which helped to produce a specification called the COUNTER Code of Practice for Research Data. The following metrics can be downloaded directly from the DataCite hub for datasets hosted by Dataverse installations: ● Total Views for a Dataset ● Unique Views for a Dataset ● Total Downloads for a Dataset ● Downloads for a Dataset ● Citations for a Dataset (via Crossref) Dataverse Metrics API is a powerful source for BI tools used for the Data Landscape monitoring.
  • 26. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Metrics from 30+ repositories Source: Metrics
  • 27. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Multilingual support Dataverse SSHOC will run Weblate as a service for the user interface, metadata schema and SOLR translation. We’ve developed an experimental but adjustable pipeline for multilingual support that allows to download and synchronize all translations available in Dataverse Consortium github and provides easy access for translators to keep all properties up-to-date.
  • 28. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse localization with Weblate ● service to connect files to Weblate in order to translate them in a structured way ● several options for project visibility: accept translations by the crowd, or only give access to a select group of translators. ● Weblate indicates untranslated strings, strings with failing checks, and strings that need approval. ● when new strings are added with an upgrade of Dataverse, Weblate can indicate which strings are new and untranslated.
  • 29. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 GUI translation with Weblate as a service Source: SSHOC Weblate
  • 30. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Services in European Open Science Cloud (EOSC) ● EOSC requires the level 8 of maturity (at least) ● we need the highest quality of software to be accepted as a service ● clear and transparent evaluation of services is essential ● the evidence of technical maturity is the key to success ● the limited warranty will allow to stop out- of-warranty services
  • 31. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Applications maturity level Every software package should follow the same CESSDA Maturity Model to be accepted as a service. Must have: k8s infrastructure with upstream Docker images, warranty statement, documentation, unit tests, Selenium tests, jenkins pipeline Running demonstration service will allow to create the connection to your own Dataverse.
  • 32. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 CI/CD pipeline with SQAaaS (S) 1 2 3 git push Push GCP container registry webhook Create docker image Kubernetes Deployment git clone Jenkins pipeline (Jenkinsfile) 9 7 Run SQA S 8 1. Developer pushes code to GitHub 2. Jenkins receives notification - build trigger 3. Jenkins clones the workspace 4. (S) Runs SQA tests and does FAIRness check 5. (S) Issuing digital badge according to the results 6. (S) SQAaaS API triggers appropriate workflow 7. Creates docker image if success 8. Pushes new docker image to container registry 9. Updates the kubernetes deployment 32 Source: EOSC Synergy project
  • 33. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Who is going to benefit from SSHOC Dataverse project? • (SSH) institutes and researchers will be offered a Dataverse installation on the cloud • (SSH) institutes will be offered a Dataverse archive in a box solution for their own purposes • Many of the features to be developed in SSHOC will benefit also other Dataverse installations / communities All developments will be available for Dataverse community members!
  • 34. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Questions? Slava Tykhonov (DANS-KNAW) Senior Information Scientist vyacheslav.tykhonov@dans.knaw.nl Co-Chair: Dataverse Working Group (WG) on Controlled Vocabularies and Ontologies Dataverse WG on Registries dataverse.org