SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Validating
JSON, XML and CSV data
with SHACL-like constraints
Péter Király, GWDG (Göttingen)
pkiraly@gwdg.de
Deutsche Initiative für Netzwerkinformation e.V.
Kompetenzzentrum Interoperable Metadaten (KIM) Workshop
2022-05-02
https://github.com/pkiraly/metadata-qa-api
Shapes Constraint Language (SHACL)
a language for validating RDF graphs against a set of conditions (expressed as
RDF graphs)
ex:PersonShape
a sh:NodeShape ;
sh:targetClass ex:Person ; # checks persons
sh:property [
sh:path ex:ssn ; # checks social
security nr.
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^d{3}-d{2}-d{4}$" ;
] ;
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
Metadata Quality Assessment Framework (MQAF) API
★ an open source software for metadata quality assessment
★ quality dimensions: completeness, multilinguality, uniqueness, etc.
★ extensions: Europeana, MARC, Deutsche Digitale Bibliothek
★ Java API + command line interface (in progress)
★ reads XML, JSON, CSV, MARC
★ highly configurable
★ adaptable to different metadata schemas
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
RDF agnostic SHACL tests*
Cardinality minCount <number>, maxCount <number>
Value Range minExclusive <number>, minInclusive <number>, maxExclusive <number>, maxInclusive
<number>
String minLength <number>, maxLength <number>, hasValue <String>, in [String1, ...,
StringN], pattern <regular expression>, minWords <number>, maxWords <number>
Comparision of
properties
equals <field label>, disjoint <field label>, lessThan <field label>, lessThanOrEquals
<field label>
Logical operators and [<rule1>, ..., <ruleN>], or [<rule1>, ..., <ruleN>], not [<rule1>, ..., <ruleN>]
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
* a subset of SHACL
MQAF API’s SHACL tests
Cardinality minCount <number>, maxCount <number>
Value Range minExclusive <number>, minInclusive <number>, maxExclusive <number>, maxInclusive
<number>
String minLength <number>, maxLength <number>, hasValue <String>, in [String1, ...,
StringN], pattern <regular expression>, minWords <number>, maxWords <number>
Comparision of
properties
equals <field label>, disjoint <field label>, lessThan <field label>, lessThanOrEquals
<field label>
Logical operators and [<rule1>, ..., <ruleN>], or [<rule1>, ..., <ruleN>], not [<rule1>, ..., <ruleN>]
extras contentType [type1, ..., typeN], unique <boolean>, dependencies [id1, id2, ..., idN],
dimension [criteria...] (min/max + Width/Height/Shortside/Longside)
properties id, description, failureScore, successScore, hidden, skip
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
abstracting the address of data element
XML
JSON
CSV
MARC21
have addressable data
elements (branches)
XPath
JSONPath
column
names
MARCSpec
addressing
languages
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
schema
definition
abstracting data element retrieval
XML
JSON
CSV
MARC21
data element
selector
uniform data
structure
May I
get the
title?
Title’s address
is //head/title
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
schema definition
Schema schema = new BaseSchema()
.setFormat(Format.CSV)
.addField(
new JsonBranch("title", "title")
.setRule(
new Rule()
.withDisjoint("description")))
.addField(
new JsonBranch("url", "url")
.setExtractable(true)
.setRule(
new Rule()
.withMinCount(1)
.withMaxCount(1)
.withPattern("^https?://.*$")))
format: csv
fields:
- name: title
rules:
disjoint: description
- name: url
extractable: true
rules:
minCount: 1
maxCount: 1
pattern: ^https?://.*$
Java API YAML configuration file
{
“format”: “csv”,
“fields”: [
{
“name”: “title”,
“rules”: [
{“disjoint”: “description”}
]
},
{
“name”: “url”,
“extractable”: true,
“rules”: [
{
“minCount”: 1,
“maxCount”: 1,
“pattern”: “^https?://.*$”}]}
JSON configuration file
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
one and only one data element instance
- name: about
path: $.['about']
rules:
- minCount: 1
- maxCount: 1
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
numeric value constraints
- name: price
path: $.['price']
rules:
- and:
- minInclusive: 1.0
- maxInclusive: 2.0
- name: price
path: $.['price']
rules:
- and:
- minExclusive: 1.0
- maxExclusive: 2.0
1.0 <= price <= 2.0 1.0 < price < 2.0
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
string constraints / length
- name: about
path: $.['about']
rules:
- minLength: 1
- name: about
path: $.['about']
rules:
- and:
- minLength: 3
- maxLength: 5
lenght(about) >= 1 5 >= lenght(about) >= 3
- name: status
path: $.['status']
rules:
- hasValue: published
status == “published”
- name: type
path: $.['type']
rules:
- in: [dataverse, dataset, file]
type == “dataverse” or
type == “dataset” or
type == “file”
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$
thumbnail is an image or PDF file
- name: about
path: $.['about']
rules:
- minWords: 1
nr_words(about) >= 2
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
string constraints / fixed values
- name: about
path: $.['about']
rules:
- minLength: 1
- name: about
path: $.['about']
rules:
- and:
- minLength: 3
- maxLength: 5
lenght(about) >= 1 5 >= lenght(about) >= 3
- name: status
path: $.['status']
rules:
- hasValue: published
status == “published”
- name: type
path: $.['type']
rules:
- in: [dataverse, dataset, file]
type == “dataverse” or
type == “dataset” or
type == “file”
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$
thumbnail is an image or PDF file
- name: about
path: $.['about']
rules:
- minWords: 1
nr_words(about) >= 2
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
string constraints / pattern
- name: about
path: $.['about']
rules:
- minLength: 1
- name: about
path: $.['about']
rules:
- and:
- minLength: 3
- maxLength: 5
lenght(about) >= 1 5 >= lenght(about) >= 3
- name: status
path: $.['status']
rules:
- hasValue: published
status == “published”
- name: type
path: $.['type']
rules:
- in: [dataverse, dataset, file]
type == “dataverse” or
type == “dataset” or
type == “file”
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$
thumbnail is an image or PDF file
- name: about
path: $.['about']
rules:
- minWords: 1
nr_words(about) >= 2
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
string constraints / number or words
- name: about
path: $.['about']
rules:
- minLength: 1
- name: about
path: $.['about']
rules:
- and:
- minLength: 3
- maxLength: 5
lenght(about) >= 1 5 >= lenght(about) >= 3
- name: status
path: $.['status']
rules:
- hasValue: published
status == “published”
- name: type
path: $.['type']
rules:
- in: [dataverse, dataset, file]
type == “dataverse” or
type == “dataset” or
type == “file”
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$
thumbnail is an image or PDF file
- name: about
path: $.['about']
rules:
- minWords: 2
nr_words(about) >= 2
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
comparisions of data elements
fields:
- name: id
path: $.['id']
rules:
- equals: isbn
- name: isbn
path: $.['isbn']
fields:
- name: title
path: $.['title']
rules:
- disjoint: description
- name: description
path: $.['description']
- name: startingPage
path: startingPage
rules:
- lessThanOrEquals: endingPage
id == isbn title != description startingPage <= endingPage
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
comparisions of data elements
fields:
- name: id
path: $.['id']
rules:
- equals: isbn
- name: isbn
path: $.['isbn']
fields:
- name: title
path: $.['title']
rules:
- disjoint: description
- name: description
path: $.['description']
- name: startingPage
path: startingPage
rules:
- lessThanOrEquals: endingPage
id == isbn title != description startingPage <= endingPage
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
comparisions of data elements
fields:
- name: id
path: $.['id']
rules:
- equals: isbn
- name: isbn
path: $.['isbn']
fields:
- name: title
path: $.['title']
rules:
- disjoint: description
- name: description
path: $.['description']
- name: startingPage
path: startingPage
rules:
- lessThanOrEquals: endingPage
id == isbn title != description startingPage <= endingPage
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
logical operations
- name: id
path: oai:record/dc:identifier
rules:
- and:
- minCount: 1
- maxCount: 1
- minLength: 1
- name: thumbnail
path: oai:record/dc:identifier
rules:
- or:
- pattern: ^.*.(jpe?g|png|)$
- contentType:
- image/jpeg
- image/png
- name: title
path: $.['title']
rules:
- not:
- equals: description
and or not
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
logical operations
- name: id
path: oai:record/dc:identifier
rules:
- and:
- minCount: 1
- maxCount: 1
- minLength: 1
- name: thumbnail
path: oai:record/dc:identifier
rules:
- or:
- pattern: ^.*.(jpe?g|png|)$
- contentType:
- image/jpeg
- image/png
- name: title
path: $.['title']
rules:
- not:
- equals: description
and or not
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
logical operations
- name: id
path: oai:record/dc:identifier
rules:
- and:
- minCount: 1
- maxCount: 1
- minLength: 1
- name: thumbnail
path: oai:record/dc:identifier
rules:
- or:
- pattern: ^.*.(jpe?g|png|)$
- contentType:
- image/jpeg
- image/png
- name: title
path: $.['title']
rules:
- not:
- equals: description
and or not
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
extras
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- contentType: [image/jpeg, image/png, …]
content type
- name: id
path: oai:record/dc:identifier
rules:
- unique: true
- name: url
path: oai:record/dc:identifier[@type='URL']
rules:
- id: Q-4.4
description: Both a media file and a link to an
object are referenced in context.
dependencies: [Q-3.0, Q-4.0]
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- id: 3.1
dimension:
minWidth: 200
minHeight: 200
only if other test has been passed image dimensions
unique value
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
extras
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- contentType: [image/jpeg, image/png, …]
content type
- name: id
path: oai:record/dc:identifier
rules:
- unique: true
- name: url
path: oai:record/dc:identifier[@type='URL']
rules:
- id: Q-4.4
description: Both a media file and a link to an
object are referenced in context.
dependencies: [Q-3.0, Q-4.0]
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- id: 3.1
dimension:
minWidth: 200
minHeight: 200
only if other test has been passed image dimensions
unique value
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
extras
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- contentType: [image/jpeg, image/png, …]
content type
- name: id
path: oai:record/dc:identifier
rules:
- unique: true
- name: url
path: oai:record/dc:identifier[@type='URL']
rules:
- id: Q-4.4
description: Both a media file and a link to an
object are referenced in context.
dependencies: [Q-3.0, Q-4.0]
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- id: 3.1
dimension:
minWidth: 200
minHeight: 200
only if other test has been passed image dimensions
unique value
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
extras
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- contentType: [image/jpeg, image/png, …]
content type
- name: id
path: oai:record/dc:identifier
rules:
- unique: true
- name: url
path: oai:record/dc:identifier[@type='URL']
rules:
- id: Q-4.4
description: Both a media file and a link to an
object are referenced in context.
dependencies: [Q-3.0, Q-4.0]
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- id: 3.1
dimension:
minWidth: 200
minHeight: 200
only if other test has been passed image dimensions
unique value
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
other properties
id identifier, used in output, and in internal references
description explain what the rule checks
failureScore a numerical score assigned if the test fails
successScore a numerical score assigned if the test passes
hidden run the test, but hides from the output
skip do not run the test now (for debugging reason)
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
raw output
★ for each tests:
○ status: PASSED, FAILED, NA (if the data element is not available)
○ score: the output of successScore (if passed), failureScore (if failed) or 0
★ total score
The output could be CSV, JSON or Java objects (configurable)
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
visualization for metadata managers / single record
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
aggregation
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
status and scores
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
workflow 1. ingest
2. measure records
3. aggregate
4. report
5. evaluate with experts
catalogue
improve records
quality assessment tool
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
research partners
early adopters and contributors
★ Miel Vander Sande (meemoo, Belgium)
★ Richard Palmer (Victoria and Albert Museum, Great Britain)
Deutsche Digitale Bibliothek
★ Francesca Schulze
★ Cosmina Berta
★ Stefanie Rühle
★ Claudia Effenberger
★ Letitia-Venetia Mölck
special thanks
★ Juliane Stiller
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api

Weitere ähnliche Inhalte

Ähnlich wie Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)

Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用LearningTech
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
Document Conversion & Retrieve and Rank 一問一答
Document Conversion & Retrieve and Rank 一問一答Document Conversion & Retrieve and Rank 一問一答
Document Conversion & Retrieve and Rank 一問一答Hisashi Komine
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...confluent
 
Tk2323 lecture 9 api json
Tk2323 lecture 9   api jsonTk2323 lecture 9   api json
Tk2323 lecture 9 api jsonMengChun Lam
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command lineSharat Chikkerur
 
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksData Con LA
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
 
REST API に疲れたあなたへ贈る GraphQL 入門
REST API に疲れたあなたへ贈る GraphQL 入門REST API に疲れたあなたへ贈る GraphQL 入門
REST API に疲れたあなたへ贈る GraphQL 入門Keisuke Tsukagoshi
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R StudioRupak Roy
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPJeremy Kendall
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout source{d}
 

Ähnlich wie Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022) (20)

Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Document Conversion & Retrieve and Rank 一問一答
Document Conversion & Retrieve and Rank 一問一答Document Conversion & Retrieve and Rank 一問一答
Document Conversion & Retrieve and Rank 一問一答
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 
Tk2323 lecture 9 api json
Tk2323 lecture 9   api jsonTk2323 lecture 9   api json
Tk2323 lecture 9 api json
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
 
Azure ARM Templates 101
Azure ARM Templates 101Azure ARM Templates 101
Azure ARM Templates 101
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
REST API に疲れたあなたへ贈る GraphQL 入門
REST API に疲れたあなたへ贈る GraphQL 入門REST API に疲れたあなたへ贈る GraphQL 入門
REST API に疲れたあなたへ贈る GraphQL 入門
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
SHACL by example
SHACL by exampleSHACL by example
SHACL by example
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
Ams adapters
Ams adaptersAms adapters
Ams adapters
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 

Mehr von Péter Király

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Péter Király
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Péter Király
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Péter Király
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Péter Király
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)Péter Király
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Péter Király
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Péter Király
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Péter Király
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Péter Király
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Péter Király
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)Péter Király
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)Péter Király
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Péter Király
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Péter Király
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Péter Király
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Péter Király
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Péter Király
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Péter Király
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Péter Király
 

Mehr von Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
 

Kürzlich hochgeladen

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 

Kürzlich hochgeladen (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 

Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)

  • 1. Validating JSON, XML and CSV data with SHACL-like constraints Péter Király, GWDG (Göttingen) pkiraly@gwdg.de Deutsche Initiative für Netzwerkinformation e.V. Kompetenzzentrum Interoperable Metadaten (KIM) Workshop 2022-05-02 https://github.com/pkiraly/metadata-qa-api
  • 2. Shapes Constraint Language (SHACL) a language for validating RDF graphs against a set of conditions (expressed as RDF graphs) ex:PersonShape a sh:NodeShape ; sh:targetClass ex:Person ; # checks persons sh:property [ sh:path ex:ssn ; # checks social security nr. sh:maxCount 1 ; sh:datatype xsd:string ; sh:pattern "^d{3}-d{2}-d{4}$" ; ] ; Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 3. Metadata Quality Assessment Framework (MQAF) API ★ an open source software for metadata quality assessment ★ quality dimensions: completeness, multilinguality, uniqueness, etc. ★ extensions: Europeana, MARC, Deutsche Digitale Bibliothek ★ Java API + command line interface (in progress) ★ reads XML, JSON, CSV, MARC ★ highly configurable ★ adaptable to different metadata schemas Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 4. RDF agnostic SHACL tests* Cardinality minCount <number>, maxCount <number> Value Range minExclusive <number>, minInclusive <number>, maxExclusive <number>, maxInclusive <number> String minLength <number>, maxLength <number>, hasValue <String>, in [String1, ..., StringN], pattern <regular expression>, minWords <number>, maxWords <number> Comparision of properties equals <field label>, disjoint <field label>, lessThan <field label>, lessThanOrEquals <field label> Logical operators and [<rule1>, ..., <ruleN>], or [<rule1>, ..., <ruleN>], not [<rule1>, ..., <ruleN>] Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api * a subset of SHACL
  • 5. MQAF API’s SHACL tests Cardinality minCount <number>, maxCount <number> Value Range minExclusive <number>, minInclusive <number>, maxExclusive <number>, maxInclusive <number> String minLength <number>, maxLength <number>, hasValue <String>, in [String1, ..., StringN], pattern <regular expression>, minWords <number>, maxWords <number> Comparision of properties equals <field label>, disjoint <field label>, lessThan <field label>, lessThanOrEquals <field label> Logical operators and [<rule1>, ..., <ruleN>], or [<rule1>, ..., <ruleN>], not [<rule1>, ..., <ruleN>] extras contentType [type1, ..., typeN], unique <boolean>, dependencies [id1, id2, ..., idN], dimension [criteria...] (min/max + Width/Height/Shortside/Longside) properties id, description, failureScore, successScore, hidden, skip Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 6. abstracting the address of data element XML JSON CSV MARC21 have addressable data elements (branches) XPath JSONPath column names MARCSpec addressing languages Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 7. schema definition abstracting data element retrieval XML JSON CSV MARC21 data element selector uniform data structure May I get the title? Title’s address is //head/title Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 8. schema definition Schema schema = new BaseSchema() .setFormat(Format.CSV) .addField( new JsonBranch("title", "title") .setRule( new Rule() .withDisjoint("description"))) .addField( new JsonBranch("url", "url") .setExtractable(true) .setRule( new Rule() .withMinCount(1) .withMaxCount(1) .withPattern("^https?://.*$"))) format: csv fields: - name: title rules: disjoint: description - name: url extractable: true rules: minCount: 1 maxCount: 1 pattern: ^https?://.*$ Java API YAML configuration file { “format”: “csv”, “fields”: [ { “name”: “title”, “rules”: [ {“disjoint”: “description”} ] }, { “name”: “url”, “extractable”: true, “rules”: [ { “minCount”: 1, “maxCount”: 1, “pattern”: “^https?://.*$”}]} JSON configuration file Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 9. one and only one data element instance - name: about path: $.['about'] rules: - minCount: 1 - maxCount: 1 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 10. numeric value constraints - name: price path: $.['price'] rules: - and: - minInclusive: 1.0 - maxInclusive: 2.0 - name: price path: $.['price'] rules: - and: - minExclusive: 1.0 - maxExclusive: 2.0 1.0 <= price <= 2.0 1.0 < price < 2.0 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 11. string constraints / length - name: about path: $.['about'] rules: - minLength: 1 - name: about path: $.['about'] rules: - and: - minLength: 3 - maxLength: 5 lenght(about) >= 1 5 >= lenght(about) >= 3 - name: status path: $.['status'] rules: - hasValue: published status == “published” - name: type path: $.['type'] rules: - in: [dataverse, dataset, file] type == “dataverse” or type == “dataset” or type == “file” - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$ thumbnail is an image or PDF file - name: about path: $.['about'] rules: - minWords: 1 nr_words(about) >= 2 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 12. string constraints / fixed values - name: about path: $.['about'] rules: - minLength: 1 - name: about path: $.['about'] rules: - and: - minLength: 3 - maxLength: 5 lenght(about) >= 1 5 >= lenght(about) >= 3 - name: status path: $.['status'] rules: - hasValue: published status == “published” - name: type path: $.['type'] rules: - in: [dataverse, dataset, file] type == “dataverse” or type == “dataset” or type == “file” - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$ thumbnail is an image or PDF file - name: about path: $.['about'] rules: - minWords: 1 nr_words(about) >= 2 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 13. string constraints / pattern - name: about path: $.['about'] rules: - minLength: 1 - name: about path: $.['about'] rules: - and: - minLength: 3 - maxLength: 5 lenght(about) >= 1 5 >= lenght(about) >= 3 - name: status path: $.['status'] rules: - hasValue: published status == “published” - name: type path: $.['type'] rules: - in: [dataverse, dataset, file] type == “dataverse” or type == “dataset” or type == “file” - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$ thumbnail is an image or PDF file - name: about path: $.['about'] rules: - minWords: 1 nr_words(about) >= 2 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 14. string constraints / number or words - name: about path: $.['about'] rules: - minLength: 1 - name: about path: $.['about'] rules: - and: - minLength: 3 - maxLength: 5 lenght(about) >= 1 5 >= lenght(about) >= 3 - name: status path: $.['status'] rules: - hasValue: published status == “published” - name: type path: $.['type'] rules: - in: [dataverse, dataset, file] type == “dataverse” or type == “dataset” or type == “file” - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$ thumbnail is an image or PDF file - name: about path: $.['about'] rules: - minWords: 2 nr_words(about) >= 2 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 15. comparisions of data elements fields: - name: id path: $.['id'] rules: - equals: isbn - name: isbn path: $.['isbn'] fields: - name: title path: $.['title'] rules: - disjoint: description - name: description path: $.['description'] - name: startingPage path: startingPage rules: - lessThanOrEquals: endingPage id == isbn title != description startingPage <= endingPage Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 16. comparisions of data elements fields: - name: id path: $.['id'] rules: - equals: isbn - name: isbn path: $.['isbn'] fields: - name: title path: $.['title'] rules: - disjoint: description - name: description path: $.['description'] - name: startingPage path: startingPage rules: - lessThanOrEquals: endingPage id == isbn title != description startingPage <= endingPage Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 17. comparisions of data elements fields: - name: id path: $.['id'] rules: - equals: isbn - name: isbn path: $.['isbn'] fields: - name: title path: $.['title'] rules: - disjoint: description - name: description path: $.['description'] - name: startingPage path: startingPage rules: - lessThanOrEquals: endingPage id == isbn title != description startingPage <= endingPage Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 18. logical operations - name: id path: oai:record/dc:identifier rules: - and: - minCount: 1 - maxCount: 1 - minLength: 1 - name: thumbnail path: oai:record/dc:identifier rules: - or: - pattern: ^.*.(jpe?g|png|)$ - contentType: - image/jpeg - image/png - name: title path: $.['title'] rules: - not: - equals: description and or not Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 19. logical operations - name: id path: oai:record/dc:identifier rules: - and: - minCount: 1 - maxCount: 1 - minLength: 1 - name: thumbnail path: oai:record/dc:identifier rules: - or: - pattern: ^.*.(jpe?g|png|)$ - contentType: - image/jpeg - image/png - name: title path: $.['title'] rules: - not: - equals: description and or not Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 20. logical operations - name: id path: oai:record/dc:identifier rules: - and: - minCount: 1 - maxCount: 1 - minLength: 1 - name: thumbnail path: oai:record/dc:identifier rules: - or: - pattern: ^.*.(jpe?g|png|)$ - contentType: - image/jpeg - image/png - name: title path: $.['title'] rules: - not: - equals: description and or not Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 21. extras - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - contentType: [image/jpeg, image/png, …] content type - name: id path: oai:record/dc:identifier rules: - unique: true - name: url path: oai:record/dc:identifier[@type='URL'] rules: - id: Q-4.4 description: Both a media file and a link to an object are referenced in context. dependencies: [Q-3.0, Q-4.0] - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - id: 3.1 dimension: minWidth: 200 minHeight: 200 only if other test has been passed image dimensions unique value Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 22. extras - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - contentType: [image/jpeg, image/png, …] content type - name: id path: oai:record/dc:identifier rules: - unique: true - name: url path: oai:record/dc:identifier[@type='URL'] rules: - id: Q-4.4 description: Both a media file and a link to an object are referenced in context. dependencies: [Q-3.0, Q-4.0] - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - id: 3.1 dimension: minWidth: 200 minHeight: 200 only if other test has been passed image dimensions unique value Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 23. extras - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - contentType: [image/jpeg, image/png, …] content type - name: id path: oai:record/dc:identifier rules: - unique: true - name: url path: oai:record/dc:identifier[@type='URL'] rules: - id: Q-4.4 description: Both a media file and a link to an object are referenced in context. dependencies: [Q-3.0, Q-4.0] - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - id: 3.1 dimension: minWidth: 200 minHeight: 200 only if other test has been passed image dimensions unique value Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 24. extras - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - contentType: [image/jpeg, image/png, …] content type - name: id path: oai:record/dc:identifier rules: - unique: true - name: url path: oai:record/dc:identifier[@type='URL'] rules: - id: Q-4.4 description: Both a media file and a link to an object are referenced in context. dependencies: [Q-3.0, Q-4.0] - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - id: 3.1 dimension: minWidth: 200 minHeight: 200 only if other test has been passed image dimensions unique value Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 25. other properties id identifier, used in output, and in internal references description explain what the rule checks failureScore a numerical score assigned if the test fails successScore a numerical score assigned if the test passes hidden run the test, but hides from the output skip do not run the test now (for debugging reason) Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 26. raw output ★ for each tests: ○ status: PASSED, FAILED, NA (if the data element is not available) ○ score: the output of successScore (if passed), failureScore (if failed) or 0 ★ total score The output could be CSV, JSON or Java objects (configurable) Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 27. visualization for metadata managers / single record Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 30. workflow 1. ingest 2. measure records 3. aggregate 4. report 5. evaluate with experts catalogue improve records quality assessment tool Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 31. research partners early adopters and contributors ★ Miel Vander Sande (meemoo, Belgium) ★ Richard Palmer (Victoria and Albert Museum, Great Britain) Deutsche Digitale Bibliothek ★ Francesca Schulze ★ Cosmina Berta ★ Stefanie Rühle ★ Claudia Effenberger ★ Letitia-Venetia Mölck special thanks ★ Juliane Stiller Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api