SlideShare a Scribd company logo
1 of 22
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and
DANS Research Data Repositories
Slava Tykhonov; Jerry de Vries; Eko Indarto; Andrea Scharnhorst; Femmy Admiraal; Mike
Priddy (DANS-KNAW)
Presentation at ISKO Knowledge Organisation Research Observatory
24 Nov 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES
AND DOMAIN NEEDS
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Introduction
DANS - Royal Netherlands Academy of Arts and
Sciences - research data expertise center - long-term
preservation - archive (www.easy.dans.knaw;
Dataverse.nl; Narcis.nl)
CLARIAH.nl Large Scale Infrastructure Project for
Humanities (CLARIN+DARIAH)
DARIAH: Digital Research Infrastructure for the Arts and
Humanities
CLARIN: Common Language Resources and
Technology Infrastructure
CMDI: Component MetaData Infrastructure: a
framework to describe and reuse metadata blueprints
SKOSMOS: Open source web-based SKOS browser
and publishing tool
Challenge
How to make the datasets from the CLARIN
community in the long-term archive discoverable in
the CLARIN infrastructure?
How to search ‘in data’ ? = How to achieve richer
indexing of metadata?
The problem - on a generic level
Vision: Semantic interoperability on the infrastructure level
We envision a situation where thousands of Dataverse instances (due to EOSC) on the
web can be simultaneously search for data.
The old dream of Federated search/Universal catalogue can only be realised if:
(1) Cross -walks; mapping across different metadata schemes are implemented
(2) In metadata schemes we seek for ways to enrich indexes with values from controlled
vocabularies
Standard response = standardisation and harmonisation = repository software, certain
metadata standards, or certain controlled vocabularies
New response = explore agile solutions (Proof of Concept) which can be implemented by
different communities (even smaller ones), so we keep variety and still enable integration.
The problem - on a concrete level
CMDI ‘in the wild’
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Conceptual approach: Semantic interoperability on the
infrastructure level - building common solutions for everyone
Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6
“Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format -
following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of
metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field
storage architecture). This new API also allows for the update of terms metadata“.
External controlled vocabularies support is being developed by DANS in SSHOC project and
already integrated in Dataverse core in release 5.7.
Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/
Interfaces: http://github.com/gdcc/dataverse-external-vocab-support
Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
CMDI Pipeline
- Backbone of our pipeline: Extract-Transform-Load (CMDI) metadata into Dataverse
- One block relevant for semantic services: Mapping across metadata standards
- Another block: Look-up for values in controlled vocabulary registers - enrich indexing
SEMAF: A Proposal for a Flexible Semantic Mapping Framework
Proposal: https://zenodo.org/record/4651421#.YT9lyC8RpZI
POC: https://github.com/Dans-labs/semaf-poc
Coming close to the implementation
1. Use Data Catalog Vocabulary (DCAT) mappings for CMDI metadata fields
2. Simple Knowledge Organization System (SKOS) to model a thesauri-like
resources with simple skos:broader, skos:narrower and skos:related
properties
3. Load CMDI properties and attributes and build a Knowledge Graph out of all
elements
4. Enrich the Knowledge Graph with concept URIs from various controlled
vocabularies like Skosmos hosted or Wikidata
5. Use different format data-serialization formats suitable for the integration with
different systems. For example, json-ld suitable for Dataverse, turtle for Jena
Fuseki, RDF for LoD frameworks
Complexity of CMDI is unfolding
After the implementation
● Complexity in CMDI becomes more
visible
● Identify core concepts which can be
mapped to standard bibliographic
schemes as DCAT (red box)
● Possibility to match values of CMDI
concepts to other controlled
vocabularies (green box)
How does it look when implemented in Dataverse?
Every field can be linked to the appropriate controlled vocabularies in FAIR way!
Greater vision:
Dataverse metadata schemas ingested into a Knowledge Graph
Compound keyword field with SKOS
We use SKOS relationships to keep the
hierarchy and relationships between
metadata fields
Other Dataverse schemas: https://github.com/Dans-labs/semaf-client/tree/cmdi/schema
Once in a Knowledge graph: what can we do?
Pipeline managed to establish some relationships
to Wikidata concepts and automatically updated the
dataset with new conceptURIs!
The example of automatic enrichment with Wikidata
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Lessons learned (I)
Scientific communities and archives have different perspectives on standardisation, and semantic services.
In research formalisation (including KOS, ontologies, any ‘model’) is a heuristic device, agile to new research
questions, and so intrinsically ‘not interoperable’. In other words, there is a difference between research needs
and information needs.
Lessons learned (II)
- We provided a solution for our CMDI problem - by creating a CLARIN
compatible Dataverse solution, which via an API can be harvested by the
CLARIN search service; we also created another perspective on the CDMI
‘challenge’
- We used Dataverse is a platform due to an open active community;
- The examples we showed you some of are results of a ‘Vision Lab’ - proof of
concepts - funded in projects as SSHOC, CLARIAH, EOSC
- The results are envisioned be implemented locally.
- But, in principle the solutions are platform agnostic.
Lessons learned (III)
- In the future, repositories might become nodes in a large searchable
knowledge graph and semantic links might enable pathways for
contextual/semantic search.
- Part of this future will be automatically supported semantic enrichment at the
local instantiations (automatic indexing in a net instead of in an index)
- Problem: keep provenance, authority (trust) - governance between those
(micro-service) providers need to be organised. What can we learn from
history?
References - pointers
CMDI exploration tool DANS CMDI converter github
CMDI properties frequency VLO top profiles
CMDI core metadata proposal Core metadata components design for use cases
DANS CMDI metadata generator CMDI metadata model published as TSV files
Convertor can extract and show the hierarchy of all fields
CLARIAH compliant Dataverse Docker module Dataverse Docker with CMDI metadata schema
Core metadata components design guidelines Guidelines link
Semantic Gateway as plugin app Dataverse gateway Semantic Gateway API
Dataverse metadata schema ingested into
Graph
https://github.com/Dans-labs/semaf-
client/tree/cmdi/schema
References - pointers
Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7
Semantic Gateway: https://github.com/Dans-labs/semantic-gateway
SSHOC task 5.2 http://github.com/SSHOC
SEMAF client https://github.com/Dans-labs/semaf-client
CMDI data model and namespaces: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from
CLARIN to Linked Open Data
Flexible Metadata Schemes for Research Data Repositories. / de Vries, Jerry; Tykhonov, Vyacheslav;
Scharnhorst, Andrea; Admiraal, Femmy; Indarto, Eko; Priddy, Mike.
2021. Abstract from CLARIN Annual Conference 2021. https://www.clarin.eu/content/programme-clarin-annual-
conference-2021
Questions?
Slava Tykhonov <vyacheslav.tykhonov@dans.knaw.nl>
Andrea Scharnhorst <andrea.scharnhorst@dans.knaw.nl>

More Related Content

What's hot

CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesVyacheslav Tykhonov
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes vty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2vty
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataversevty
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistStratos Kontopoulos
 
Cs 1023 lec 13 web (week 4)
Cs 1023 lec 13 web (week 4)Cs 1023 lec 13 web (week 4)
Cs 1023 lec 13 web (week 4)stanbridge
 
Library Web Services for Discovery and Delivery of Scientific Information
Library Web Services for Discovery and Delivery of Scientific InformationLibrary Web Services for Discovery and Delivery of Scientific Information
Library Web Services for Discovery and Delivery of Scientific InformationRichard Akerman
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFOpenLink Software
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and InteroperabilityDCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and InteroperabilityMike Bergman
 

What's hot (20)

CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
 
Cs 1023 lec 13 web (week 4)
Cs 1023 lec 13 web (week 4)Cs 1023 lec 13 web (week 4)
Cs 1023 lec 13 web (week 4)
 
Library Web Services for Discovery and Delivery of Scientific Information
Library Web Services for Discovery and Delivery of Scientific InformationLibrary Web Services for Discovery and Delivery of Scientific Information
Library Web Services for Discovery and Delivery of Scientific Information
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and InteroperabilityDCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
 

Similar to Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DANS Research Data Repositories

Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repositoryvty
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Menzo Windhouwer
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyCloudify Community
 
Peter Tiernan - Preservation and Trust at DRI - OR2015
Peter Tiernan - Preservation and Trust at DRI - OR2015Peter Tiernan - Preservation and Trust at DRI - OR2015
Peter Tiernan - Preservation and Trust at DRI - OR2015dri_ireland
 
Semantic MediaWiki as Knowledge Graph Interface
Semantic MediaWiki as Knowledge Graph InterfaceSemantic MediaWiki as Knowledge Graph Interface
Semantic MediaWiki as Knowledge Graph InterfaceBernhard Krabina
 
Plannen Code Jam OpenSocial gadgets
Plannen Code Jam OpenSocial gadgets Plannen Code Jam OpenSocial gadgets
Plannen Code Jam OpenSocial gadgets kirstenveelo
 
Buildvoc Introduction to linked data digital construction week 2018
Buildvoc Introduction to linked data digital construction week 2018Buildvoc Introduction to linked data digital construction week 2018
Buildvoc Introduction to linked data digital construction week 2018Phil Stacey ICIOB
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Zakaria Zubi
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederOpenAIRE
 
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....IMPACT Centre of Competence
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdfPoornimaShetty27
 

Similar to Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DANS Research Data Repositories (20)

Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
 
20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made Easy
 
CDMI For Swift
CDMI For SwiftCDMI For Swift
CDMI For Swift
 
Peter Tiernan - Preservation and Trust at DRI - OR2015
Peter Tiernan - Preservation and Trust at DRI - OR2015Peter Tiernan - Preservation and Trust at DRI - OR2015
Peter Tiernan - Preservation and Trust at DRI - OR2015
 
Semantic MediaWiki as Knowledge Graph Interface
Semantic MediaWiki as Knowledge Graph InterfaceSemantic MediaWiki as Knowledge Graph Interface
Semantic MediaWiki as Knowledge Graph Interface
 
Plannen Code Jam OpenSocial gadgets
Plannen Code Jam OpenSocial gadgets Plannen Code Jam OpenSocial gadgets
Plannen Code Jam OpenSocial gadgets
 
Buildvoc Introduction to linked data digital construction week 2018
Buildvoc Introduction to linked data digital construction week 2018Buildvoc Introduction to linked data digital construction week 2018
Buildvoc Introduction to linked data digital construction week 2018
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
 

More from vty

Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloudvty
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloudvty
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC projectvty
 

More from vty (9)

Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
 

Recently uploaded

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Recently uploaded (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DANS Research Data Repositories

  • 1. Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DANS Research Data Repositories Slava Tykhonov; Jerry de Vries; Eko Indarto; Andrea Scharnhorst; Femmy Admiraal; Mike Priddy (DANS-KNAW) Presentation at ISKO Knowledge Organisation Research Observatory 24 Nov 2021 RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
  • 2. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 3. Introduction DANS - Royal Netherlands Academy of Arts and Sciences - research data expertise center - long-term preservation - archive (www.easy.dans.knaw; Dataverse.nl; Narcis.nl) CLARIAH.nl Large Scale Infrastructure Project for Humanities (CLARIN+DARIAH) DARIAH: Digital Research Infrastructure for the Arts and Humanities CLARIN: Common Language Resources and Technology Infrastructure CMDI: Component MetaData Infrastructure: a framework to describe and reuse metadata blueprints SKOSMOS: Open source web-based SKOS browser and publishing tool Challenge How to make the datasets from the CLARIN community in the long-term archive discoverable in the CLARIN infrastructure? How to search ‘in data’ ? = How to achieve richer indexing of metadata?
  • 4. The problem - on a generic level
  • 5. Vision: Semantic interoperability on the infrastructure level We envision a situation where thousands of Dataverse instances (due to EOSC) on the web can be simultaneously search for data. The old dream of Federated search/Universal catalogue can only be realised if: (1) Cross -walks; mapping across different metadata schemes are implemented (2) In metadata schemes we seek for ways to enrich indexes with values from controlled vocabularies Standard response = standardisation and harmonisation = repository software, certain metadata standards, or certain controlled vocabularies New response = explore agile solutions (Proof of Concept) which can be implemented by different communities (even smaller ones), so we keep variety and still enable integration.
  • 6. The problem - on a concrete level CMDI ‘in the wild’
  • 7. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 8. Conceptual approach: Semantic interoperability on the infrastructure level - building common solutions for everyone Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6 “Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata“. External controlled vocabularies support is being developed by DANS in SSHOC project and already integrated in Dataverse core in release 5.7. Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/ Interfaces: http://github.com/gdcc/dataverse-external-vocab-support Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
  • 9. CMDI Pipeline - Backbone of our pipeline: Extract-Transform-Load (CMDI) metadata into Dataverse - One block relevant for semantic services: Mapping across metadata standards - Another block: Look-up for values in controlled vocabulary registers - enrich indexing
  • 10. SEMAF: A Proposal for a Flexible Semantic Mapping Framework Proposal: https://zenodo.org/record/4651421#.YT9lyC8RpZI POC: https://github.com/Dans-labs/semaf-poc
  • 11. Coming close to the implementation 1. Use Data Catalog Vocabulary (DCAT) mappings for CMDI metadata fields 2. Simple Knowledge Organization System (SKOS) to model a thesauri-like resources with simple skos:broader, skos:narrower and skos:related properties 3. Load CMDI properties and attributes and build a Knowledge Graph out of all elements 4. Enrich the Knowledge Graph with concept URIs from various controlled vocabularies like Skosmos hosted or Wikidata 5. Use different format data-serialization formats suitable for the integration with different systems. For example, json-ld suitable for Dataverse, turtle for Jena Fuseki, RDF for LoD frameworks
  • 12. Complexity of CMDI is unfolding After the implementation ● Complexity in CMDI becomes more visible ● Identify core concepts which can be mapped to standard bibliographic schemes as DCAT (red box) ● Possibility to match values of CMDI concepts to other controlled vocabularies (green box)
  • 13. How does it look when implemented in Dataverse? Every field can be linked to the appropriate controlled vocabularies in FAIR way!
  • 14. Greater vision: Dataverse metadata schemas ingested into a Knowledge Graph Compound keyword field with SKOS We use SKOS relationships to keep the hierarchy and relationships between metadata fields Other Dataverse schemas: https://github.com/Dans-labs/semaf-client/tree/cmdi/schema
  • 15. Once in a Knowledge graph: what can we do? Pipeline managed to establish some relationships to Wikidata concepts and automatically updated the dataset with new conceptURIs! The example of automatic enrichment with Wikidata
  • 16. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 17. Lessons learned (I) Scientific communities and archives have different perspectives on standardisation, and semantic services. In research formalisation (including KOS, ontologies, any ‘model’) is a heuristic device, agile to new research questions, and so intrinsically ‘not interoperable’. In other words, there is a difference between research needs and information needs.
  • 18. Lessons learned (II) - We provided a solution for our CMDI problem - by creating a CLARIN compatible Dataverse solution, which via an API can be harvested by the CLARIN search service; we also created another perspective on the CDMI ‘challenge’ - We used Dataverse is a platform due to an open active community; - The examples we showed you some of are results of a ‘Vision Lab’ - proof of concepts - funded in projects as SSHOC, CLARIAH, EOSC - The results are envisioned be implemented locally. - But, in principle the solutions are platform agnostic.
  • 19. Lessons learned (III) - In the future, repositories might become nodes in a large searchable knowledge graph and semantic links might enable pathways for contextual/semantic search. - Part of this future will be automatically supported semantic enrichment at the local instantiations (automatic indexing in a net instead of in an index) - Problem: keep provenance, authority (trust) - governance between those (micro-service) providers need to be organised. What can we learn from history?
  • 20. References - pointers CMDI exploration tool DANS CMDI converter github CMDI properties frequency VLO top profiles CMDI core metadata proposal Core metadata components design for use cases DANS CMDI metadata generator CMDI metadata model published as TSV files Convertor can extract and show the hierarchy of all fields CLARIAH compliant Dataverse Docker module Dataverse Docker with CMDI metadata schema Core metadata components design guidelines Guidelines link Semantic Gateway as plugin app Dataverse gateway Semantic Gateway API Dataverse metadata schema ingested into Graph https://github.com/Dans-labs/semaf- client/tree/cmdi/schema
  • 21. References - pointers Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7 Semantic Gateway: https://github.com/Dans-labs/semantic-gateway SSHOC task 5.2 http://github.com/SSHOC SEMAF client https://github.com/Dans-labs/semaf-client CMDI data model and namespaces: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from CLARIN to Linked Open Data Flexible Metadata Schemes for Research Data Repositories. / de Vries, Jerry; Tykhonov, Vyacheslav; Scharnhorst, Andrea; Admiraal, Femmy; Indarto, Eko; Priddy, Mike. 2021. Abstract from CLARIN Annual Conference 2021. https://www.clarin.eu/content/programme-clarin-annual- conference-2021
  • 22. Questions? Slava Tykhonov <vyacheslav.tykhonov@dans.knaw.nl> Andrea Scharnhorst <andrea.scharnhorst@dans.knaw.nl>

Editor's Notes

  1. We have far more than 29 datasets with CMDI information in them, but those additional metadata are added as files not indexed, and so not searchable; we use Dublin Core like metadata, and the CMDI metadata are often in XML form
  2. If there is a lot to be said about SEMAF slide 9 can be a bullet on another slide - 10? Mike
  3. And the clue is also to give power to the communities to create their own specific scheme, they only need to find the connector to the higher level