SlideShare a Scribd company logo
1 of 50
Shaping the Big Ball of Data Mud
W3C's Shapes Constraint Language (SHACL)
Richard Cyganiak
Lotico Berlin Semantic Web Meetup, 17 November 2016
Semantic Web
RDF
SPARQL
OWL
RDFS
RDF
SPARQL
OWL
RDFS
Strengths Weaknesses
• Flexible can-say-anything data model
• Merging data is trivial
• Shared, explicit meaning thanks to URIs
• Mixing and matching of schemas;
partial understanding
• Painstakingly developed vocabularies
• “Neutral ground” for modelling
• SPARQL
• Overgeneralisation: works for
anything, but great at nothing
• “RDF tax”
• Logic foundations and web
foundations can be baggage
• Maps poorly to common
programming language data
structures
• Schemaless nature makes
optimisation difficult
• Not good at semi-structured
Application Areas
• Knowledge graphs
• Publishing
• Life sciences
• Fraud detection & identity management
• Data integration & analysis
The V’s of Big Data: Volume, Velocity, Variety
https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
RDF is supposedly self-describing.
RDF
Schema.org
Simple Knowledge Organization Scheme
(SKOS)
Dublin Core
Data Cube Vocabulary
R2RML
Linked Data Platform (LDP)
Why is RDFS not enough?
RDF
SPARQL
OWL
RDFS
Why is RDFS not enough?
• RDF “Schema” — and schemas are for validation, right?
• It’s a misnomer; should be “RDF Vocabulary Definition Language”
• Very limited expressivity
• Not the right semantics for validation
• ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …?
• Invalid data -> infer more invalid data
=> ex:Germany a ex:City
RDFS
Why is OWL not enough?
RDF
SPARQL
OWL
RDFS
Why is OWL not enough?
• De facto a constraint language: logical contradiction => invalid
• Very expressive
• But targeted at logic modelling, not validity constraints
• Not the right semantics for validation
• ex:Dublin ex:inCountry ex:Ireland, ex:USA => …?
• Open world assumption
• No unique name assumption
=> ex:Ireland owl:sameAs ex:USA
OWL
ICV: OWL closed-world semantics in Stardog
Why is SPARQL not enough?
RDF
SPARQL
OWL
RDFS
Why is SPARQL not enough? SPARQL
http://spinrdf.org/
Why is SPARQL not enough?
• SPARQL ASK seems ideal for constraint validation
• Very expressive
• Efficient implementations
• But writing even simple constraints can be tedious
SPARQL
Other proposals
ShEx — Shape Expressions
http://shex.io/
So, something new?
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
SHACL
Shapes Constraint Language
SHACL Overview
• A language for “checking RDF graphs against conditions”
• Produced by W3C Data Shapes Working Group
• Work in progress, some features at risk
• 4th Working Draft: August 2016
• Should be done by June 2017
• Like RDFS and OWL, SHACL constraints are themselves written in RDF
• SPARQL underneath (for evaluation semantics and extensibility)
ex:PersonShape
a sh:Shape ;
sh:targetClass ex:Person ;
sh:property [
sh:predicate ex:ssn ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^d{3}-d{2}-d{4}$" ;
] ;
sh:property [
sh:predicate ex:child ;
sh:class ex:Person ;
sh:nodeKind sh:IRI ;
] ;
sh:property [
sh:path [ sh:inversePath ex:child ] ;
sh:name "parent" ;
sh:maxCount 2 ;
] .
How a Shape works
Diagram: Dimitris Kontokostas
Targets: Initial selection of focus nodes
• Node target
• Class instance target
• Subjects-of target
• Objects-of target
• SPARQL-based selection (advanced)
Node constraints
Constraints about the focus node itself:
• Node kind (IRI, blank, literal)
• IRI stem (namespace)
• IRI regex
• SPARQL query constraint (advanced)
Property constraints
Constraints about a certain outgoing or incoming property of the focus
node(s):
• Cardinality
• Class
• Datatype
• Node kind (IRI, blank node, literal)
• String min/max length, string regex
• Numeric min/max
• Value must match another shape
• Value must not match another shape
Other features
• Combine constraints with logical OR/any (default: AND/all)
• Property-pair comparison (=, <, >)
• Severities (Violation, Warning, Info)
• Annotations (name, description, grouping, order)
• Define additional types of constraints based on SPARQL (advanced)
Violation reports can be produced in RDF
ex:ExampleConstraintViolation
a sh:ValidationResult ;
sh:severity sh:Violation ;
sh:focusNode ex:Bob ;
sh:path ex:age ;
sh:value "twenty two" ;
sh:message "ex:age must be literal of datatype xsd:integer." ;
sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
sh:sourceShape ex:PersonShape .
Relationship to Rules
• Rules: “If someone says this, then I say that.”
• SHACL can’t do this.
• Does not replace SWRL, Jena Rules, RIF, SPIN Rules
Uses and implementations
SHACL in TopBraid Composer:
Shapes + Constraints
SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
SHACL in TopBraid Composer: SPARQL-based constraints
SHACL in TopQuadrant’s web products (EVN, EDG)
SHACL Protégé Plugin
http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html
Repairing SKOS taxonomies with SHACL
Validation of SKOS with SHACL, and extension of SHACL with
specification of repair strategies.
Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
Validating the “bag of crisps”…
• Validation is often not about correct/incorrect or valid/invalid
• Constraints-first (e.g., SQL)
• Well-formed vs valid (e.g., XML Schema)
• Validation is often about completeness and correctness for a specific
purpose: “This is what I produce”; “This is what I understand”
• Assumption is that there may be other statements
• Different consumers may apply different constraints
• SHACL should work well in this flexible, multi-source, multi-consumer
world.
“Anyone can say anything about anything”
RDF
SPARQL
OWL
RDFS
Statements: What is being said?
What words do
we have?
What makes logical sense to say?
What did you say
about XYZ?
OWL SHACL
Is that word used correctly?
What do you need to know from me?
You can't say that here!
I’d never say that!
richard@topquadrant.com
Backup slides
SHACL: Shaping the Big Ball of Data Mud

More Related Content

What's hot

FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingBoris Villazón-Terrazas
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshellFabien Gandon
 
Ontology In A Nutshell (version 2)
Ontology In A Nutshell (version 2)Ontology In A Nutshell (version 2)
Ontology In A Nutshell (version 2)Fabien Gandon
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQLOpen Data Support
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQLOlaf Hartig
 
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping LanguageMapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping Languageandimou
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesJose Emilio Labra Gayo
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQLOlaf Hartig
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 

What's hot (20)

JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct Mapping
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshell
 
Ontology In A Nutshell (version 2)
Ontology In A Nutshell (version 2)Ontology In A Nutshell (version 2)
Ontology In A Nutshell (version 2)
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQL
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping LanguageMapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping Language
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
RDF data validation 2017 SHACL
RDF data validation 2017 SHACLRDF data validation 2017 SHACL
RDF data validation 2017 SHACL
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 

Similar to SHACL: Shaping the Big Ball of Data Mud

RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesKurt Cagle
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebShamod Lacoul
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...LDBC council
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic webMarakana Inc.
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsRinke Hoekstra
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CIvan Herman
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...Dr.-Ing. Thomas Hartmann
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaJeen Broekstra
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web workPaul Houle
 

Similar to SHACL: Shaping the Big Ball of Data Mud (20)

RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data Frames
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic Web
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 
KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic web
 
What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
Linked services
Linked servicesLinked services
Linked services
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
 
SPIN and Shapes
SPIN and ShapesSPIN and Shapes
SPIN and Shapes
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
SPIN in Five Slides
SPIN in Five SlidesSPIN in Five Slides
SPIN in Five Slides
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Converting GHO to RDF
Converting GHO to RDFConverting GHO to RDF
Converting GHO to RDF
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
 

More from Richard Cyganiak

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsRichard Cyganiak
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Richard Cyganiak
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationRichard Cyganiak
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyRichard Cyganiak
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfRichard Cyganiak
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksRichard Cyganiak
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government DataRichard Cyganiak
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesRichard Cyganiak
 

More from Richard Cyganiak (11)

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five Stars
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations Ontology
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government Data
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data catalogues
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

SHACL: Shaping the Big Ball of Data Mud

  • 1. Shaping the Big Ball of Data Mud W3C's Shapes Constraint Language (SHACL) Richard Cyganiak Lotico Berlin Semantic Web Meetup, 17 November 2016
  • 4. Strengths Weaknesses • Flexible can-say-anything data model • Merging data is trivial • Shared, explicit meaning thanks to URIs • Mixing and matching of schemas; partial understanding • Painstakingly developed vocabularies • “Neutral ground” for modelling • SPARQL • Overgeneralisation: works for anything, but great at nothing • “RDF tax” • Logic foundations and web foundations can be baggage • Maps poorly to common programming language data structures • Schemaless nature makes optimisation difficult • Not good at semi-structured
  • 5. Application Areas • Knowledge graphs • Publishing • Life sciences • Fraud detection & identity management • Data integration & analysis The V’s of Big Data: Volume, Velocity, Variety
  • 7.
  • 9. RDF is supposedly self-describing. RDF
  • 14. R2RML
  • 16. Why is RDFS not enough? RDF SPARQL OWL RDFS
  • 17. Why is RDFS not enough? • RDF “Schema” — and schemas are for validation, right? • It’s a misnomer; should be “RDF Vocabulary Definition Language” • Very limited expressivity • Not the right semantics for validation • ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …? • Invalid data -> infer more invalid data => ex:Germany a ex:City RDFS
  • 18. Why is OWL not enough? RDF SPARQL OWL RDFS
  • 19. Why is OWL not enough? • De facto a constraint language: logical contradiction => invalid • Very expressive • But targeted at logic modelling, not validity constraints • Not the right semantics for validation • ex:Dublin ex:inCountry ex:Ireland, ex:USA => …? • Open world assumption • No unique name assumption => ex:Ireland owl:sameAs ex:USA OWL
  • 20. ICV: OWL closed-world semantics in Stardog
  • 21. Why is SPARQL not enough? RDF SPARQL OWL RDFS
  • 22. Why is SPARQL not enough? SPARQL
  • 24. Why is SPARQL not enough? • SPARQL ASK seems ideal for constraint validation • Very expressive • Efficient implementations • But writing even simple constraints can be tedious SPARQL
  • 26. ShEx — Shape Expressions http://shex.io/
  • 29. SHACL Overview • A language for “checking RDF graphs against conditions” • Produced by W3C Data Shapes Working Group • Work in progress, some features at risk • 4th Working Draft: August 2016 • Should be done by June 2017 • Like RDFS and OWL, SHACL constraints are themselves written in RDF • SPARQL underneath (for evaluation semantics and extensibility)
  • 30. ex:PersonShape a sh:Shape ; sh:targetClass ex:Person ; sh:property [ sh:predicate ex:ssn ; sh:maxCount 1 ; sh:datatype xsd:string ; sh:pattern "^d{3}-d{2}-d{4}$" ; ] ; sh:property [ sh:predicate ex:child ; sh:class ex:Person ; sh:nodeKind sh:IRI ; ] ; sh:property [ sh:path [ sh:inversePath ex:child ] ; sh:name "parent" ; sh:maxCount 2 ; ] .
  • 31. How a Shape works Diagram: Dimitris Kontokostas
  • 32. Targets: Initial selection of focus nodes • Node target • Class instance target • Subjects-of target • Objects-of target • SPARQL-based selection (advanced)
  • 33. Node constraints Constraints about the focus node itself: • Node kind (IRI, blank, literal) • IRI stem (namespace) • IRI regex • SPARQL query constraint (advanced)
  • 34. Property constraints Constraints about a certain outgoing or incoming property of the focus node(s): • Cardinality • Class • Datatype • Node kind (IRI, blank node, literal) • String min/max length, string regex • Numeric min/max • Value must match another shape • Value must not match another shape
  • 35. Other features • Combine constraints with logical OR/any (default: AND/all) • Property-pair comparison (=, <, >) • Severities (Violation, Warning, Info) • Annotations (name, description, grouping, order) • Define additional types of constraints based on SPARQL (advanced)
  • 36. Violation reports can be produced in RDF ex:ExampleConstraintViolation a sh:ValidationResult ; sh:severity sh:Violation ; sh:focusNode ex:Bob ; sh:path ex:age ; sh:value "twenty two" ; sh:message "ex:age must be literal of datatype xsd:integer." ; sh:sourceConstraintComponent sh:DatatypeConstraintComponent ; sh:sourceShape ex:PersonShape .
  • 37. Relationship to Rules • Rules: “If someone says this, then I say that.” • SHACL can’t do this. • Does not replace SWRL, Jena Rules, RIF, SPIN Rules
  • 39. SHACL in TopBraid Composer: Shapes + Constraints SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
  • 40. SHACL in TopBraid Composer: SPARQL-based constraints
  • 41. SHACL in TopQuadrant’s web products (EVN, EDG)
  • 42.
  • 44. Repairing SKOS taxonomies with SHACL Validation of SKOS with SHACL, and extension of SHACL with specification of repair strategies. Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
  • 45.
  • 46. Validating the “bag of crisps”… • Validation is often not about correct/incorrect or valid/invalid • Constraints-first (e.g., SQL) • Well-formed vs valid (e.g., XML Schema) • Validation is often about completeness and correctness for a specific purpose: “This is what I produce”; “This is what I understand” • Assumption is that there may be other statements • Different consumers may apply different constraints • SHACL should work well in this flexible, multi-source, multi-consumer world.
  • 47. “Anyone can say anything about anything” RDF SPARQL OWL RDFS Statements: What is being said? What words do we have? What makes logical sense to say? What did you say about XYZ? OWL SHACL Is that word used correctly? What do you need to know from me? You can't say that here! I’d never say that!

Editor's Notes

  1. It’s amazing how many people have done incredible work. Massive effort shown in this pic. But there is some hype. Quite a few datasets are a sloppy conversion script, results thrown into a SPARQL store, with some haphazard links to DBpedia. Run a handful of SPARQL queries as sanity checks. But no in-depth quality control at all. Lots of data quality issues. Querying within a dataset can be hard enough, across datasets often impossible. If one dataset (e.g., DBpedia) changes, links break and often are never fixed.
  2. Talk will be about validation and SHACL, but I’d like to start by setting the scene Where is the Semantic Web on the hype cycle? Arguably, it went over the bump twice already: with focus on logic/AI around 2000, and focus on Linked Data around 2010. I helped to fan the flames of the second hype. The base standards: Today it's no longer that exciting. Overblown expectations have cooled off. It’s no longer expected to change the world. Getting stable and mature. Specific applications can be elsewhere on the cycle. See “Enterprise Taxonomy and Ontology Management”. That’s actually what TQ does.
  3. If you work with these technologies, life is pretty good these days, and still getting better. Maturing standards and tool support. And today we really understand what the technologies are good at, and what not.
  4. “Maps poorly to programming languages”: property names are not simple identifiers, every property can be multivalued; need navigability along incoming and outgoing arcs; ordering is difficult Semi-structured is important in big data
  5. We know where it works and where it doesn’t. It’s productive in a number of niches. RDF is good at dealing with Variety. (But not good enough: contextual validation, fuzzy/statistical matching for the semi-structured stuff) Variety tends to make logic approaches difficult—no single global truth—less OWL, more SPARQL
  6. Tim Berners-Lee deconstructing a bag of crisps Perfect metaphor for the strengths of the SW Different information co-exists on the packaging: the plain English “potato chips” the nutrition information on the back, standardized by the U.S. food and drug administration some allergy information that many people don’t pay any attention to, but those with allergies read very carefully. the UPC code that can be read by any retail checkout machine in the world some numbers on the bottom edge of the package that make no sense to him whatsoever. Mixing and matching of different vocabularies, standardised by different organisations, intended for different consumers. Partial understanding. Once you have agreed on an identifier for a thing *and a location for data about it*, different data producers and consumers can use it without stepping on each others' toes.
  7. The two main open source implementations of the technology stack, Jena and Sesame, are now at the Apache Foundation and at the Eclipse Foundation—big, established, mature, enterprisey organisations.
  8. So life is pretty good. Maturing technology stack, clearly understood strengths and weaknesses, productive niches, improving tools. But… We never solved validation. That’s kind of surprising. After all, each of these technologies has aspects that address these needs. Review one by one
  9. Every class and property has a URI. The URI references an ontology that defines the term. So each triple describes itself, right? One of the major strengths, right? No. Actually, most of the meaning is just not given in the ontology. Too much of the meaning is implicit, or just written down in text somewhere and cannot be automatically checked. Let me give examples.
  10. Arguably most important ontology in existence. Examples of things they want to validate in a tool for webmasters, came out of the workshop that kicked off the Data Shapes WG See https://www.w3.org/2001/sw/wiki/images/0/00/SimpleApplication-SpecificConstraintsforRDFModels.pdf https://www.w3.org/TR/shacl-ucr/#uc23-schema.org-constraints
  11. DC is widely used. It’s easy enough to agree on calling a title “dc:title” and an author “dc:creator”, but different orgs have widely differing views on what constitutes a complete metadata record. DC Application profiles as a response. DC developed its own way to represent those. Not a standard, not used apart from the DC community.
  12. I’ve been involved. We wrote constraint in prose, and added SPARQL queries to make it more formal/explicit. And yes people can copy-paste them. But still no way of just running all of them automatically against a published dataset! And no error reporting—just true/false.
  13. I’ve been involved. Mapping files are written in RDF; goal was to be very clear about what constitutes a valid mapping file. This is semi-formal. Surely this should be representable in some standard machine-readable way?
  14. Read/write Linked Data. Applications want to put constraints on the kind of data they can receive. Address book application wants to say that there should be an address in the RDF you PUT/POST. But completely punted on saying how to achieve it. “machine-readable ones facilitate better client interaction”—no shit!
  15. So, lots of initiatives that are serious about using SW in an interoperable and robust way end up just putting constraints in prose text, where it should really be in a machine-processable form. Same problem everywhere! But we have RDF Schema. SCHEMA!
  16. RDF Schema in analogy to XML Schema, but they really do very different things.
  17. So RDFS is just not powerful enough. But OWL surely gets us there?
  18. Clark&Parsia. Use OWL syntax, but switch to a semantics based on CW and UNA. This works pretty well! But can be a bit confusing—if you find some OWL, what semantics is intended? And OWL, while expressive, lacks some things that one would like to have in validation.
  19. We saw the Data Cube example where SPARQL was used to query the graph to see if it’s complete. Isn’t that enough to solve all validation issues?
  20. SPIN is a technology introduced by TQ. A bunch of things (rules written in SPARQL, templated SPARQL queries, defining custom SPARQL functions, etc.) We have used this for years and it actually works very well.
  21. Custom syntax. Somewhere between SPARQL, regular expressions, and grammar parsing. “regex for graphs.” Pretty cool. Concise. Needs new parsers.
  22. So, several good solutions around—but none has enough mindshare to take over. Meet at W3C, make a standard with the best aspects of each. (Or with the worst aspects of each—fingers crossed.)
  23. Some features and aspects are still highly controversial.
  24. When a violation occurs, the result is not just “false”. It’s a structure with info. Can process it in various ways. Just display it? Attach it to the right form field based on sh:path? Just count the violations per type in a large dataset? Different behaviour for different severitites?
  25. It’s still early days. Mostly individuals and organisations that are active in the working group.
  26. SHACL is getting really important to our products. We have made a major contribution. TQ’s Holger Knublauch is one of the editors of the spec. TBC is an SW IDE, workbench for SW professionals. At its heart is a schema/ontology editor. It supports editing of SHACL constraints through nice UI.
  27. EVN is a taxonomy and ontology management platform. EDG a data governance solution. SHACL allows our customers to add custom constraints over their own data models. Very powerful.
  28. Note the suggestions for fixing the problem. Goes beyond standard SHACL but an obvious addition and very cool.
  29. Not sure how well maintained and if it’s following the spec.
  30. Semantic Web Company. Nice application that shows using SHACL for bulk validation. Automated repair—somewhat similar to our suggestions extension.
  31. Dimitris Kontokostas (SHACL spec editor) and team at Uni Leipzig. Organising entire test suites, expressed originally in SPARQL but now with SHACL support, for data quality of large data sets. Used in context of DBpedia.
  32. So, how do the parts of the stack fit together? High-level view. Let’s run with the metaphor that anyone can say anything about anything. First we should note: Just because you can say anything about anything doesn’t mean you should! RDF is triples. But we also call them RDF statements. Each triple is a statement of some fact.
  33. It’s amazing how many people have done incredible work. Massive effort shown in this pic. But there is some hype. Quite a few datasets are a sloppy conversion script, results thrown into a SPARQL store, with some haphazard links to DBpedia. Run a handful of SPARQL queries as sanity checks. But no in-depth quality control at all. Lots of data quality issues. Querying within a dataset can be hard enough, across datasets often impossible. If one dataset (e.g., DBpedia) changes, links break and often are never fixed.