SlideShare ist ein Scribd-Unternehmen logo
1 von 17
HDL
Towards a Harmonized Dataset
Model for Open Data Portals
Ahmad Assaf, Raphaël Troncy And Aline Senart
@ahmadaassaf
PROFILES 15 – 2nd International Workshop on Dataset PROFIling & fEderated Search for Linked Data 1st June 2015
HDL Towards a Harmonized Dataset Model for Open Data Portals
Open Data/Linked Open Data
 Open Data (OD) is the data that can be easily discovered, accessed, reused and
redistributed by anyone [Davies et al. 2014]
 Open Data should be placed in public domain under liberal terms of use and available
in electronic formats that are non-proprietary and machine readable.
 Linked Open Data (LOD) refers to the semantically rich, linked and machine readable
open data.
 Open Data has major benefits for citizens, businesses, societies and governments.
2
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata
Metadata is structured information that describes, explains, locates or otherwise makes it
easier to retrieve use or manage information resources
Data Discovery,
exploration and
reuse
Organization
&
identification
Archiving
&
preservation
3
HDL Towards a Harmonized Dataset Model for Open Data Portals
Data Portals/Data Management Systems
 Data Portals (Catalogs) are the entry points to discover published
datasets
 Data Portals are a curated collection of datasets metadata providing a
set discovery and integration services.
 Data Portals can be private like datahub.io, publicdata.eu or private like
enigma.io or quandle.com
 Portals are built on top of Data Management Systems (DMS) like
CKAN, DKAN and Socrata
4
HDL Towards a Harmonized Dataset Model for Open Data Portals
Why a Harmonized Model ?
 Exploring/discovering datasets for
(re)use
 Defining a “minimal” set of
information needed to build a
“profile”
 Building tools that will
automatically generate/validate
metadata models
5
 The Data Catalog Vocabulary (DCAT)✝ is a W3C recommendation to facilitate interoperability
between data catalogs on the web
 DCAT is an RDF vocabulary with three main classes: dcat:Catalog, dcat:Dataset and dcat:Distribution
 DCAT Profiles [extensions built upon DCAT]
 DCAT-AP✝✝ defines a minimal set of properties that should be included in a datasets
profile by specifying mandatory and optional properties
 The Asset Description Metadata Schema (ADMS)✝✝✝ is used to semantically describe
assets (code lists, taxonomies, vocabularies)
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - DCAT
6
✝ http://w3.org/TR/vocab-dcat/
✝✝ https://joinup.ec.europa.eu/asset/dcat_application_profile/description
✝✝✝ http://www.w3.org/TR/vocab-adms/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - VoID✝
 RDF vocabulary for interlinked datasets
 In addition to describing datasets, VoID
describes the links between datasets
 VoID defines three main classes:
void:Dataset, void:Linkset and void:subset
 A linkset in voiD is a subclass of a dataset,
used for storing triples to express the
interlinking relationship between datasets
7
✝ http://www.w3.org/TR/void/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models – CKAN✝/DKAN✝✝
 Data model describes a set of entities (dataset, resource, group, tag)
 Allow additional information to be added via “extra” arbitrary key/value fields
 The core metadata restricted as a JSON file
 Supports Linked Data and RDF by providing a complete and functional mapping of its
model to LD formats
 CKAN support descriptions of vocabularies
 DKAN is a Drupal based DMS
8
✝ http://ckan.org/
✝✝ http://demo.getdkan.com/
 Online collection of best practices
and case studies to help data
publishers
 POD data model is based on DCAT
 Similarly to DCAT-AP, POD defines
three types of metadata elements:
Required, Required-If and
Expanded(optional)
 Metadata extensions using elements
from the “Expanded” fields
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - Continued
 Commercial platform to streamline
data publishing, management,
analysis and reusing.
 The model is designed specifically to
represent tabular data
 The model covers a basic set of
metadata properties and has good
support for geospatial data
 A collection of schema used to
markup HTML pages with structured
data
 Covers many domains. We are
interested in the Dataset schema
although we also use various
properties from schemas like
organizations, authors, etc.
9
✝ http://socrata.com/
✝✝ http://schema.org/
✝✝✝ https://project-open-data.cio.gov/
✝ ✝✝ ✝✝✝
10
Ballmer
effect
anyone?
HDL Towards a Harmonized Dataset Model for Open Data Portals
https://xkcd.com/323/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata Classification – Information Groups
11
Organization
Clustering or curation
solely based on
associations with specific
administration parties
Resource
Actual raw data that can
be downloaded or
accessed directly e.g.
JSON, CSV, SPARQL
endpoint
Tag
Descriptive knowledge
about the dataset
contents and structure.
This can range from
simple textual tags to
semantically rich
controlled terms
Group
Organizational units that
share common
semantics. They can be
seen as a cluster or
curation based on shared
themes/categories
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata Classification – Information Types
12
General Information
title, description, id
Ownership Information
author, maintainer_email
Provenance Information
version, creation_date, update_date
Access Information
URL, license_title, license_id
Geospatial Information
bbox, layers
Temporal Information
coverage_from, coverage_to
Statistical Information
max_value, uniques, average
Quality Information
rating, availability, freshness
Dataset Metadata
HDL Towards a Harmonized Dataset Model for Open Data Portals
Harmonization Process
 Examine the model or vocabulary specification and documentation
 Examine existing datasets using these models
 Examine the source code for DMS
13
1 Map the information groups [resource, tag, group, organization]
2 Map the information types [general, ownership, provenance, etc.]
HDL Towards a Harmonized Dataset Model for Open Data Portals
Mapping Information Types
14
CKAN maintainer_email
DKAN maintainer_email
POD ContactPoint -> hasEmail
Schema.org CreativeWork:producer -> Person:email
VoID void:Dataset -> dct:creator -> foaf:Person:givenName
DCAT dcat:Dataset -> dct:creator -> foaf:Person:givenName
HDL Towards a Harmonized Dataset Model for Open Data Portals
Extra Information
15
 Examining the models, we noticed an abundance of information filled in “extras” fields
 Using Roomba we generated aggregation reports to inspect those extras on LOD Cloud✝ and
OpenAfrica✝✝
extras>value:extras>name1 Extra fields names and values
resources>resource_type:resources>name2 Types describing resources
 53% of the datasets in OpenAfrica have additional geospatial attached (spatial-reference-system, spatial
harvester, bbox-east-long, bbox-north-long, bbox-south-long, bbox-west-long)
 16% of the datasets have additional provenance and ownership information (frequency-of-update, dataset-
reference-date)
✝ http://datahub.io/group/lodcloud
✝✝ http://africaopendata.org/https://github.com/ahmadassaf/opendata-checker/tree/master/model
HDL Towards a Harmonized Dataset Model for Open Data Portals 16
https://xkcd.com/927/
17HDL Towards a Harmonized Dataset Model for Open Data Portals
Questions?
Ahmad Assaf
http://ahmadassaf.com/
@ahmadaassaf
http://github.com/ahmadassaf

Weitere ähnliche Inhalte

Was ist angesagt?

Advantages of metadata
Advantages of metadataAdvantages of metadata
Advantages of metadataAzeem Sultan
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvestingAndrewLIS688
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to MetadataJenn Riley
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsJenn Riley
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overviewrobin fay
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 

Was ist angesagt? (20)

Metadata harvesting Tools
Metadata harvesting ToolsMetadata harvesting Tools
Metadata harvesting Tools
 
Advantages of metadata
Advantages of metadataAdvantages of metadata
Advantages of metadata
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata Standards
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overview
 
Metadata
MetadataMetadata
Metadata
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
FAIR Data ecosystem
FAIR Data ecosystemFAIR Data ecosystem
FAIR Data ecosystem
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Meta data
Meta dataMeta data
Meta data
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Hadoop
HadoopHadoop
Hadoop
 

Andere mochten auch

HOTEL EXPO 2016
HOTEL EXPO 2016HOTEL EXPO 2016
HOTEL EXPO 2016HOTEL EXPO
 
LEY DE COMPAÑÍAS
LEY DE COMPAÑÍASLEY DE COMPAÑÍAS
LEY DE COMPAÑÍASjeankrs9
 
Joseph S Stump Resume
Joseph S Stump ResumeJoseph S Stump Resume
Joseph S Stump ResumeJoseph Stump
 
Heroku cloud platform
Heroku cloud platformHeroku cloud platform
Heroku cloud platformHasan Khatib
 
Nascenia: Road to Software Industry
Nascenia: Road to Software IndustryNascenia: Road to Software Industry
Nascenia: Road to Software IndustryNascenia IT
 
боги древних славян
боги древних славянбоги древних славян
боги древних славянШкола№3
 
Bachillerato de humanidades ok
Bachillerato de humanidades okBachillerato de humanidades ok
Bachillerato de humanidades okTECHNO_CISNEROS
 
Reunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín TurinaReunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín Turinajesusman
 
Decálogo antibulling
Decálogo antibullingDecálogo antibulling
Decálogo antibullingjesusman
 
Vagrant vs Docker
Vagrant vs DockerVagrant vs Docker
Vagrant vs Dockerjchase50
 
Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016Susan Snyder
 
Formatos de manejo de almacén
Formatos de manejo de almacénFormatos de manejo de almacén
Formatos de manejo de almacénValeriaEH888
 
1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-update1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-updateCristiano Canotti
 

Andere mochten auch (20)

HOTEL EXPO 2016
HOTEL EXPO 2016HOTEL EXPO 2016
HOTEL EXPO 2016
 
FPGAs libres
FPGAs libresFPGAs libres
FPGAs libres
 
LEY DE COMPAÑÍAS
LEY DE COMPAÑÍASLEY DE COMPAÑÍAS
LEY DE COMPAÑÍAS
 
2016/10/28: Reset ETSII UPM
2016/10/28: Reset ETSII UPM2016/10/28: Reset ETSII UPM
2016/10/28: Reset ETSII UPM
 
Timeplan
TimeplanTimeplan
Timeplan
 
Joseph S Stump Resume
Joseph S Stump ResumeJoseph S Stump Resume
Joseph S Stump Resume
 
Heroku cloud platform
Heroku cloud platformHeroku cloud platform
Heroku cloud platform
 
Resume 1.4
Resume 1.4Resume 1.4
Resume 1.4
 
Nascenia: Road to Software Industry
Nascenia: Road to Software IndustryNascenia: Road to Software Industry
Nascenia: Road to Software Industry
 
Dina_Condon_Resume_2016
Dina_Condon_Resume_2016Dina_Condon_Resume_2016
Dina_Condon_Resume_2016
 
боги древних славян
боги древних славянбоги древних славян
боги древних славян
 
Useful C++ Features You Should be Using
Useful C++ Features You Should be UsingUseful C++ Features You Should be Using
Useful C++ Features You Should be Using
 
Bachillerato de humanidades ok
Bachillerato de humanidades okBachillerato de humanidades ok
Bachillerato de humanidades ok
 
Inspección atún en lata
Inspección atún en lataInspección atún en lata
Inspección atún en lata
 
Reunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín TurinaReunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín Turina
 
Decálogo antibulling
Decálogo antibullingDecálogo antibulling
Decálogo antibulling
 
Vagrant vs Docker
Vagrant vs DockerVagrant vs Docker
Vagrant vs Docker
 
Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016
 
Formatos de manejo de almacén
Formatos de manejo de almacénFormatos de manejo de almacén
Formatos de manejo de almacén
 
1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-update1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-update
 

Ähnlich wie HDL Model for Harmonizing Open Data Metadata

Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
 
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...Ahmad Assaf
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutionsOpen Data Support
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederOpenAIRE
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Noterumito
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)mhb120
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !Christophe Guéret
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 

Ähnlich wie HDL Model for Harmonizing Open Data Metadata (20)

How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria PesceHow to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
 
Linked Data In Action
Linked Data In ActionLinked Data In Action
Linked Data In Action
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Note
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 

Kürzlich hochgeladen

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

HDL Model for Harmonizing Open Data Metadata

  • 1. HDL Towards a Harmonized Dataset Model for Open Data Portals Ahmad Assaf, Raphaël Troncy And Aline Senart @ahmadaassaf PROFILES 15 – 2nd International Workshop on Dataset PROFIling & fEderated Search for Linked Data 1st June 2015
  • 2. HDL Towards a Harmonized Dataset Model for Open Data Portals Open Data/Linked Open Data  Open Data (OD) is the data that can be easily discovered, accessed, reused and redistributed by anyone [Davies et al. 2014]  Open Data should be placed in public domain under liberal terms of use and available in electronic formats that are non-proprietary and machine readable.  Linked Open Data (LOD) refers to the semantically rich, linked and machine readable open data.  Open Data has major benefits for citizens, businesses, societies and governments. 2
  • 3. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Metadata is structured information that describes, explains, locates or otherwise makes it easier to retrieve use or manage information resources Data Discovery, exploration and reuse Organization & identification Archiving & preservation 3
  • 4. HDL Towards a Harmonized Dataset Model for Open Data Portals Data Portals/Data Management Systems  Data Portals (Catalogs) are the entry points to discover published datasets  Data Portals are a curated collection of datasets metadata providing a set discovery and integration services.  Data Portals can be private like datahub.io, publicdata.eu or private like enigma.io or quandle.com  Portals are built on top of Data Management Systems (DMS) like CKAN, DKAN and Socrata 4
  • 5. HDL Towards a Harmonized Dataset Model for Open Data Portals Why a Harmonized Model ?  Exploring/discovering datasets for (re)use  Defining a “minimal” set of information needed to build a “profile”  Building tools that will automatically generate/validate metadata models 5
  • 6.  The Data Catalog Vocabulary (DCAT)✝ is a W3C recommendation to facilitate interoperability between data catalogs on the web  DCAT is an RDF vocabulary with three main classes: dcat:Catalog, dcat:Dataset and dcat:Distribution  DCAT Profiles [extensions built upon DCAT]  DCAT-AP✝✝ defines a minimal set of properties that should be included in a datasets profile by specifying mandatory and optional properties  The Asset Description Metadata Schema (ADMS)✝✝✝ is used to semantically describe assets (code lists, taxonomies, vocabularies) HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - DCAT 6 ✝ http://w3.org/TR/vocab-dcat/ ✝✝ https://joinup.ec.europa.eu/asset/dcat_application_profile/description ✝✝✝ http://www.w3.org/TR/vocab-adms/
  • 7. HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - VoID✝  RDF vocabulary for interlinked datasets  In addition to describing datasets, VoID describes the links between datasets  VoID defines three main classes: void:Dataset, void:Linkset and void:subset  A linkset in voiD is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets 7 ✝ http://www.w3.org/TR/void/
  • 8. HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models – CKAN✝/DKAN✝✝  Data model describes a set of entities (dataset, resource, group, tag)  Allow additional information to be added via “extra” arbitrary key/value fields  The core metadata restricted as a JSON file  Supports Linked Data and RDF by providing a complete and functional mapping of its model to LD formats  CKAN support descriptions of vocabularies  DKAN is a Drupal based DMS 8 ✝ http://ckan.org/ ✝✝ http://demo.getdkan.com/
  • 9.  Online collection of best practices and case studies to help data publishers  POD data model is based on DCAT  Similarly to DCAT-AP, POD defines three types of metadata elements: Required, Required-If and Expanded(optional)  Metadata extensions using elements from the “Expanded” fields HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - Continued  Commercial platform to streamline data publishing, management, analysis and reusing.  The model is designed specifically to represent tabular data  The model covers a basic set of metadata properties and has good support for geospatial data  A collection of schema used to markup HTML pages with structured data  Covers many domains. We are interested in the Dataset schema although we also use various properties from schemas like organizations, authors, etc. 9 ✝ http://socrata.com/ ✝✝ http://schema.org/ ✝✝✝ https://project-open-data.cio.gov/ ✝ ✝✝ ✝✝✝
  • 10. 10 Ballmer effect anyone? HDL Towards a Harmonized Dataset Model for Open Data Portals https://xkcd.com/323/
  • 11. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Classification – Information Groups 11 Organization Clustering or curation solely based on associations with specific administration parties Resource Actual raw data that can be downloaded or accessed directly e.g. JSON, CSV, SPARQL endpoint Tag Descriptive knowledge about the dataset contents and structure. This can range from simple textual tags to semantically rich controlled terms Group Organizational units that share common semantics. They can be seen as a cluster or curation based on shared themes/categories
  • 12. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Classification – Information Types 12 General Information title, description, id Ownership Information author, maintainer_email Provenance Information version, creation_date, update_date Access Information URL, license_title, license_id Geospatial Information bbox, layers Temporal Information coverage_from, coverage_to Statistical Information max_value, uniques, average Quality Information rating, availability, freshness Dataset Metadata
  • 13. HDL Towards a Harmonized Dataset Model for Open Data Portals Harmonization Process  Examine the model or vocabulary specification and documentation  Examine existing datasets using these models  Examine the source code for DMS 13 1 Map the information groups [resource, tag, group, organization] 2 Map the information types [general, ownership, provenance, etc.]
  • 14. HDL Towards a Harmonized Dataset Model for Open Data Portals Mapping Information Types 14 CKAN maintainer_email DKAN maintainer_email POD ContactPoint -> hasEmail Schema.org CreativeWork:producer -> Person:email VoID void:Dataset -> dct:creator -> foaf:Person:givenName DCAT dcat:Dataset -> dct:creator -> foaf:Person:givenName
  • 15. HDL Towards a Harmonized Dataset Model for Open Data Portals Extra Information 15  Examining the models, we noticed an abundance of information filled in “extras” fields  Using Roomba we generated aggregation reports to inspect those extras on LOD Cloud✝ and OpenAfrica✝✝ extras>value:extras>name1 Extra fields names and values resources>resource_type:resources>name2 Types describing resources  53% of the datasets in OpenAfrica have additional geospatial attached (spatial-reference-system, spatial harvester, bbox-east-long, bbox-north-long, bbox-south-long, bbox-west-long)  16% of the datasets have additional provenance and ownership information (frequency-of-update, dataset- reference-date) ✝ http://datahub.io/group/lodcloud ✝✝ http://africaopendata.org/https://github.com/ahmadassaf/opendata-checker/tree/master/model
  • 16. HDL Towards a Harmonized Dataset Model for Open Data Portals 16 https://xkcd.com/927/
  • 17. 17HDL Towards a Harmonized Dataset Model for Open Data Portals Questions? Ahmad Assaf http://ahmadassaf.com/ @ahmadaassaf http://github.com/ahmadassaf

Hinweis der Redaktion

  1. An asset is something that can be opened and read using a familiar desktop software as opposed to the need to be processed like raw data.
  2. The interlinking is modelled by a linkset (void:Linkset). A linkset in voiD is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets. In each interlinking triple, the subject is a resource hosted in one dataset and the object is a resource hosted in another dataset. This modelling enables a flexible and powerful way to talk in great detail about the interlinking between two datasets, such as how many links there exist, which kind of links (e.g. owl:sameAs or foaf:knows) are present, or stating who claims these statements.