SlideShare ist ein Scribd-Unternehmen logo
1 von 22
www.eosc-synergy.euwww.eosc-synergy.eu
Technical Integration of Data
Repositories: status and
challenges
28/01/2020
Isabel Bernal (DIGITAL.CSIC)
Slava Tykhonov (DANS-KNAW)
Fernando Aguilar (CSIC)
www.eosc-synergy.eu
Outline
• Repositories
• Previous Integration initiatives
• Technical barriers and challenges
• Potential integration
www.eosc-synergy.eu
Repositories: DIGITAL.CSIC enabling Open
Science
● CSIC Green Open Access/Open Data Mandate
since April 2019
● Repository’s Data Policy since 2014: nearly
12,000 datasets, training to CSIC researchers
and librarians, support to comply with funders
and journals data sharing policies
● Involvement in several EOSC / FAIR Data
initiatives
● DOI assignation to datasets, software,
notebooks, preprints and other outputs via
DataCite
● Member of DataCite Metadata Working Group
● Current aggregation with OpenAIRE, CORE,
SHARE, DataCite Search, BASE…
● Registered in Re3Data and Repository Finder
www.eosc-synergy.eu
DSpace-CRIS state of art
• Publications and datasets (organized in communities and collections) supported in standard
DSpace
• DSpace-CRIS extends this functionality and involves the other entities that are part of the
research landscape:
• Researchers
• Projects
• Organization Units (Groups, Departments)
• Second Level Dynamic Objects
● DSpace 5.x has useful software integrations like CKAN and ORCID and data integrations
(Archivematica) and can be deployed from Docker images
www.eosc-synergy.eu
SSHOC Dataverse
Makes use of Dataverse software developed by Harvard IQSS
4 ERICs: DARIAH, CLARIN, EHRIS and CESSDA
Building mature infrastructure based on requirements of involved
EOSC communities (Docker and Kubernetes)
Investigating sustainable governance models
Training Service Providers and institutes how to use Dataverse as a
service
www.eosc-synergy.eu
Different levels of repositories integration
Metadata integration
● aggregation by OAI-PMH and ResourceSync
Software integration
● data repository provides functionality delivered by another services
(DSPACE-CRIS with CKAN and ORCID support)
Data integration
● controlled vocabularies support (COAR, OpenAire, LOC, FundRef)
Data archiving
● Datasets with files can be moved from one repository to another by
SWORD protocol (for example, from DSpace-CRIS to the long term
archive like Archivematica)
www.eosc-synergy.eu
Previous Integration initiatives
• OAI-PMH, Resource sync (THIS ONE IS EMERGING, TRYING
TO REPLACE OAI)
• Aggregators
• OpenAIRE
• ...
www.eosc-synergy.eu
Previous Integration initiatives - Metadata
OAI-PMH
Open Archives Initiative Protocol for Metadata Harvesting
6 Actions. Any metadata schema (pre-defined). Simple but
enough.
Used by aggregators. Information about the resource, but not the
resource itself. (there is a further development which is OAI-
ORE)
www.eosc-synergy.eu
Previous Integration initiatives - Metadata
ResourceSync
ResourceSync supports synchronization of both Resources and
Metadata about Resources with the relationships clearly
indicated.
Tracks the status of a resource (updated, deleted).
Access to the resource.
www.eosc-synergy.eu
Aggregators - Integration of repos
OpenAIRE explorer
CORE
Google Dataset Search
DataCite Search
SHARE
www.eosc-synergy.eu
OpenAIRE
OpenAIRE has grown through a series of project phases funded
by the European Commission: from the DRIVER projects to link
Europe’s repository infrastructure, to the first OpenAIRE project
aimed to assist the EC in implementing its initial pilot for Open
Access (OA) to publications.
Services:
• Explorer (aggregator)
• Monitor (compliance with EC Open Access/Open Data
Mandate)
• Scholix - Scholexplorer (links papers-datasets)
• AMNESIA
• DMPs
www.eosc-synergy.eu
WP3.3 FAIR data integration
● Thematic services usually keeping data inside of its own database, not storing in some data
repository with persistence identifiers. Data derivatives often located in the temporary
storage, aren’t sustainable and can be cleaned (Findable)
● Metadata schema varies in the different scientific communities, should be supported by all
data repositories, however citation block should be common (Accessible)
● controlled vocabularies (CV)/ontologies are different, should be standardized and preferably
provided by common CV services maintained by EOSC (Interoperable)
● most of metadata contain descriptions in the native language of researcher, should be
translated to English and linked to common CV (Interoperable)
● most of Thematic services don’t have provenance information exposed as PROV-O or other
standard required to be stored in datasets together with metadata and data files (Reusable)
● different types of licenses for data access should be supported by all EOSC data
repositories, sensitive data should be handled differently (Reusable)
www.eosc-synergy.eu
WP3 Software integration challenges
● The maintenance of the distributed applications with external services
is very difficult and expensive
● requires the highest level of service maturity
● increasing the code coverage does not necessarily lead to more
functionality coverage
● writing integration tests even more important than adding more unit
tests
● it’s almost not possible to run distributed services without help from
community
We need Docker to increase the maturity of the infrastructure
www.eosc-synergy.eu
Docker advantages
● Faster development and deployments
● Isolation of running containers allows to scale up apps
● Portability saves time to run the same image on the local
computer or in the cloud
● Snapshotting allows to archive Docker images state
● Resource limitation can be adjusted
● Increasing reproducibility
SQA Service maturity requires Docker on Kubernetes!
www.eosc-synergy.eu
Still, there are some technical barriers...
● The maturity of software and services is different, common
baseline in the development state (WP3)
● Maintenance of the ecosystem consisting of variety of tools
and services is complicated and consumes a lot of human
resources (upgrade of servers, bug fixing, security updates)
● fast technological development requires continuous training
and knowledge transfer
www.eosc-synergy.eu
Service for Quality Assurance
Selenium IDE allows
to create and replay
all UI tests in your
browser
Shared tests can be
reused by
community to
increase
reproducibility
SQA for the service maturity = unit tests + integration tests
www.eosc-synergy.eu
WP3.3 FAIR baseline on the technical integration
Integration of data repositories in EOSC means:
● Findable: data have to be findable in a standard way by the users and
other core and thematic services
● Accessible: data accessible via standard interfaces supported in EOSC
● Interoperable: data combinable with other data in EOSC repositories
● Reusable: data exploitable by the EOSC services enabling data
analysis to be performed using resources and services from the EOSC
infrastructure providers, keeping datasets versions and provenance.
(from EOSC-Synergy proposal)
www.eosc-synergy.eu
WP3.3 FAIR adoption
Evaluation of FAIRness of data:
● check if PID exists (F)
● metadata accessible via standard
protocols (A)
● shared vocabularies used (I)
● machine readable data contain
provenance information (R)
Outcome: “EOSC-ready” badge
should be released by SQA service
for all other services suitable to be
delivered under EOSC
www.eosc-synergy.eu
WP3.1 Maturity of thematic services
Thematic services should:
● follow common SQA baseline for Software and Services
● increase the maturity by adding unit and integration tests,
preferably by community
● improving Cloud infrastructure both horizontally and
vertically requires a good testing strategy
www.eosc-synergy.eu
Potential Integration
Integration of repositories metadata - data
Enrich repositories services
Focused on:
• FAIRness
• Metadata and data formats flexibility
• Connection to software
• Connection to computing
www.eosc-synergy.eu
Examples - FAIRness evaluation
WP3 - WP4
Automatic or semi-automatic FAIRness evaluation
Based on recommendations (e.g. FAIRsFAIR)
Some solutions available: #FAIRevaluator
www.eosc-synergy.eu
Example - Jupyter integration
• Search and retrieve data/software from repositories.
• Ensure reproducibility.
• Workflows
• Publication of results.
• Etc.

Weitere ähnliche Inhalte

Was ist angesagt?

Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataversevty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataversevty
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesVyacheslav Tykhonov
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2vty
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloudvty
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloudvty
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 

Was ist angesagt? (20)

Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 

Ähnlich wie Technical integration of data repositories status and challenges

Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
EOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub project
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
 
NextGEOSS Webinar - Cloud APIs
NextGEOSS Webinar - Cloud APIsNextGEOSS Webinar - Cloud APIs
NextGEOSS Webinar - Cloud APIsterradue
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service AreaEUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service AreaEUDAT
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Blue BRIDGE
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
Mobility and federation of Cloud computing
Mobility and federation of Cloud computingMobility and federation of Cloud computing
Mobility and federation of Cloud computingDavid Wallom
 
{code} and Containers - Open Source Infrastructure within Dell Technologies
{code} and Containers - Open Source Infrastructure within Dell Technologies{code} and Containers - Open Source Infrastructure within Dell Technologies
{code} and Containers - Open Source Infrastructure within Dell TechnologiesThe {code} Team
 
The EOSC Compute Platform with the EGI-ACE project
The EOSC Compute Platform with the EGI-ACE project The EOSC Compute Platform with the EGI-ACE project
The EOSC Compute Platform with the EGI-ACE project EGI Federation
 
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...David Wallom
 
Introduction to LoCloud
Introduction to LoCloud Introduction to LoCloud
Introduction to LoCloud locloud
 
OCCIware & Linked Data prototype OW2Con@POSS
OCCIware & Linked Data prototype OW2Con@POSSOCCIware & Linked Data prototype OW2Con@POSS
OCCIware & Linked Data prototype OW2Con@POSSMarc Dutoo
 

Ähnlich wie Technical integration of data repositories status and challenges (20)

Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
EOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub service portfolio
EOSC-hub service portfolio
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
NextGEOSS Webinar - Cloud APIs
NextGEOSS Webinar - Cloud APIsNextGEOSS Webinar - Cloud APIs
NextGEOSS Webinar - Cloud APIs
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service AreaEUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
{code} and containers
{code} and containers{code} and containers
{code} and containers
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
Mobility and federation of Cloud computing
Mobility and federation of Cloud computingMobility and federation of Cloud computing
Mobility and federation of Cloud computing
 
{code} and Containers - Open Source Infrastructure within Dell Technologies
{code} and Containers - Open Source Infrastructure within Dell Technologies{code} and Containers - Open Source Infrastructure within Dell Technologies
{code} and Containers - Open Source Infrastructure within Dell Technologies
 
The EOSC Compute Platform with the EGI-ACE project
The EOSC Compute Platform with the EGI-ACE project The EOSC Compute Platform with the EGI-ACE project
The EOSC Compute Platform with the EGI-ACE project
 
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
 
Introduction to LoCloud
Introduction to LoCloud Introduction to LoCloud
Introduction to LoCloud
 
OCCIware & Linked Data prototype OW2Con@POSS
OCCIware & Linked Data prototype OW2Con@POSSOCCIware & Linked Data prototype OW2Con@POSS
OCCIware & Linked Data prototype OW2Con@POSS
 

Mehr von vty

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC projectvty
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repositoryvty
 

Mehr von vty (8)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
 

Kürzlich hochgeladen

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 

Kürzlich hochgeladen (20)

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

Technical integration of data repositories status and challenges

  • 1. www.eosc-synergy.euwww.eosc-synergy.eu Technical Integration of Data Repositories: status and challenges 28/01/2020 Isabel Bernal (DIGITAL.CSIC) Slava Tykhonov (DANS-KNAW) Fernando Aguilar (CSIC)
  • 2. www.eosc-synergy.eu Outline • Repositories • Previous Integration initiatives • Technical barriers and challenges • Potential integration
  • 3. www.eosc-synergy.eu Repositories: DIGITAL.CSIC enabling Open Science ● CSIC Green Open Access/Open Data Mandate since April 2019 ● Repository’s Data Policy since 2014: nearly 12,000 datasets, training to CSIC researchers and librarians, support to comply with funders and journals data sharing policies ● Involvement in several EOSC / FAIR Data initiatives ● DOI assignation to datasets, software, notebooks, preprints and other outputs via DataCite ● Member of DataCite Metadata Working Group ● Current aggregation with OpenAIRE, CORE, SHARE, DataCite Search, BASE… ● Registered in Re3Data and Repository Finder
  • 4. www.eosc-synergy.eu DSpace-CRIS state of art • Publications and datasets (organized in communities and collections) supported in standard DSpace • DSpace-CRIS extends this functionality and involves the other entities that are part of the research landscape: • Researchers • Projects • Organization Units (Groups, Departments) • Second Level Dynamic Objects ● DSpace 5.x has useful software integrations like CKAN and ORCID and data integrations (Archivematica) and can be deployed from Docker images
  • 5. www.eosc-synergy.eu SSHOC Dataverse Makes use of Dataverse software developed by Harvard IQSS 4 ERICs: DARIAH, CLARIN, EHRIS and CESSDA Building mature infrastructure based on requirements of involved EOSC communities (Docker and Kubernetes) Investigating sustainable governance models Training Service Providers and institutes how to use Dataverse as a service
  • 6. www.eosc-synergy.eu Different levels of repositories integration Metadata integration ● aggregation by OAI-PMH and ResourceSync Software integration ● data repository provides functionality delivered by another services (DSPACE-CRIS with CKAN and ORCID support) Data integration ● controlled vocabularies support (COAR, OpenAire, LOC, FundRef) Data archiving ● Datasets with files can be moved from one repository to another by SWORD protocol (for example, from DSpace-CRIS to the long term archive like Archivematica)
  • 7. www.eosc-synergy.eu Previous Integration initiatives • OAI-PMH, Resource sync (THIS ONE IS EMERGING, TRYING TO REPLACE OAI) • Aggregators • OpenAIRE • ...
  • 8. www.eosc-synergy.eu Previous Integration initiatives - Metadata OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting 6 Actions. Any metadata schema (pre-defined). Simple but enough. Used by aggregators. Information about the resource, but not the resource itself. (there is a further development which is OAI- ORE)
  • 9. www.eosc-synergy.eu Previous Integration initiatives - Metadata ResourceSync ResourceSync supports synchronization of both Resources and Metadata about Resources with the relationships clearly indicated. Tracks the status of a resource (updated, deleted). Access to the resource.
  • 10. www.eosc-synergy.eu Aggregators - Integration of repos OpenAIRE explorer CORE Google Dataset Search DataCite Search SHARE
  • 11. www.eosc-synergy.eu OpenAIRE OpenAIRE has grown through a series of project phases funded by the European Commission: from the DRIVER projects to link Europe’s repository infrastructure, to the first OpenAIRE project aimed to assist the EC in implementing its initial pilot for Open Access (OA) to publications. Services: • Explorer (aggregator) • Monitor (compliance with EC Open Access/Open Data Mandate) • Scholix - Scholexplorer (links papers-datasets) • AMNESIA • DMPs
  • 12. www.eosc-synergy.eu WP3.3 FAIR data integration ● Thematic services usually keeping data inside of its own database, not storing in some data repository with persistence identifiers. Data derivatives often located in the temporary storage, aren’t sustainable and can be cleaned (Findable) ● Metadata schema varies in the different scientific communities, should be supported by all data repositories, however citation block should be common (Accessible) ● controlled vocabularies (CV)/ontologies are different, should be standardized and preferably provided by common CV services maintained by EOSC (Interoperable) ● most of metadata contain descriptions in the native language of researcher, should be translated to English and linked to common CV (Interoperable) ● most of Thematic services don’t have provenance information exposed as PROV-O or other standard required to be stored in datasets together with metadata and data files (Reusable) ● different types of licenses for data access should be supported by all EOSC data repositories, sensitive data should be handled differently (Reusable)
  • 13. www.eosc-synergy.eu WP3 Software integration challenges ● The maintenance of the distributed applications with external services is very difficult and expensive ● requires the highest level of service maturity ● increasing the code coverage does not necessarily lead to more functionality coverage ● writing integration tests even more important than adding more unit tests ● it’s almost not possible to run distributed services without help from community We need Docker to increase the maturity of the infrastructure
  • 14. www.eosc-synergy.eu Docker advantages ● Faster development and deployments ● Isolation of running containers allows to scale up apps ● Portability saves time to run the same image on the local computer or in the cloud ● Snapshotting allows to archive Docker images state ● Resource limitation can be adjusted ● Increasing reproducibility SQA Service maturity requires Docker on Kubernetes!
  • 15. www.eosc-synergy.eu Still, there are some technical barriers... ● The maturity of software and services is different, common baseline in the development state (WP3) ● Maintenance of the ecosystem consisting of variety of tools and services is complicated and consumes a lot of human resources (upgrade of servers, bug fixing, security updates) ● fast technological development requires continuous training and knowledge transfer
  • 16. www.eosc-synergy.eu Service for Quality Assurance Selenium IDE allows to create and replay all UI tests in your browser Shared tests can be reused by community to increase reproducibility SQA for the service maturity = unit tests + integration tests
  • 17. www.eosc-synergy.eu WP3.3 FAIR baseline on the technical integration Integration of data repositories in EOSC means: ● Findable: data have to be findable in a standard way by the users and other core and thematic services ● Accessible: data accessible via standard interfaces supported in EOSC ● Interoperable: data combinable with other data in EOSC repositories ● Reusable: data exploitable by the EOSC services enabling data analysis to be performed using resources and services from the EOSC infrastructure providers, keeping datasets versions and provenance. (from EOSC-Synergy proposal)
  • 18. www.eosc-synergy.eu WP3.3 FAIR adoption Evaluation of FAIRness of data: ● check if PID exists (F) ● metadata accessible via standard protocols (A) ● shared vocabularies used (I) ● machine readable data contain provenance information (R) Outcome: “EOSC-ready” badge should be released by SQA service for all other services suitable to be delivered under EOSC
  • 19. www.eosc-synergy.eu WP3.1 Maturity of thematic services Thematic services should: ● follow common SQA baseline for Software and Services ● increase the maturity by adding unit and integration tests, preferably by community ● improving Cloud infrastructure both horizontally and vertically requires a good testing strategy
  • 20. www.eosc-synergy.eu Potential Integration Integration of repositories metadata - data Enrich repositories services Focused on: • FAIRness • Metadata and data formats flexibility • Connection to software • Connection to computing
  • 21. www.eosc-synergy.eu Examples - FAIRness evaluation WP3 - WP4 Automatic or semi-automatic FAIRness evaluation Based on recommendations (e.g. FAIRsFAIR) Some solutions available: #FAIRevaluator
  • 22. www.eosc-synergy.eu Example - Jupyter integration • Search and retrieve data/software from repositories. • Ensure reproducibility. • Workflows • Publication of results. • Etc.