Validating RDF Data Quality
using Constraints
to Direct the Development of Constraint Languages
Thomas Hartmann
Benjamin Z...
XML Validation
<!ELEMENT library (book+, author*)>
<!ELEMENT book (isbn, title, author-ref+)>
<!ATTLIST book
id ID #REQUIRED
>
<!ELEMENT ...
RDF Validation
Workshop
Working Groups on
RDF Validation
W3C Data Shapes Working Group
DCMI RDF Application Profiles Task Group
http://purl.org/net/rdf-validation
81 Types of Constraints
on RDF Data
Constraint
Languages
SPARQL Query Language for RDF
SELECT ?concept
WHERE {
?concept a [ rdfs:subClassOf* skos:Concept ] .
FILTER NOT EXISTS {
?...
SPARQL Inferencing Notation (SPIN)
# FILTER NOT EXISTS { ?book author ?person }
[ a sp:Filter ;
sp:expression [
a sp:notEx...
Web Ontology Language (OWL)
:Publication rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :author ;
owl:allValuesFrom ...
Shape Expressions (ShEx)
:Publication {
( :isbn xsd:string, :title xsd:string )
|
( :issn xsd:string, :title xsd:string )}
Resource Shapes (ReSh)
:Computer-Science-Book
a oslc:ResourceShape ;
oslc:property [
oslc:propertyDefinition :subject ;
os...
[ a dsp:DescriptionTemplate ;
dsp:resourceClass :Science-Fiction-Book ;
dsp:statementTemplate [
dsp:property :subject ;
ds...
Shapes Constraint Language
(SHACL)
:BookShape
a sh:Shape ;
sh:scopeClass :Book ;
sh:property [
sh:predicate :author ;
sh:v...
http://purl.org/net/rdfval-demo
RDF Validation
Environment
Constraint Types
Classification
1. RDFS/OWL Based
2. Constraint Language Based
3. SPARQL Based
RDFS/OWL Based
:Publication rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :author ;
owl:allValuesFrom :Person ] .
Constraint Language Based
:Publication {
( :isbn xsd:string, :title xsd:string )
|
( :issn xsd:string, :title xsd:string )}
SPARQL Based
SELECT ?concept
WHERE {
?concept a [ rdfs:subClassOf* skos:Concept ] .
FILTER NOT EXISTS {
?concept ?p ?o .
F...
Constraints
Classification
1. Informational
2. Warning
3. Error
Evaluation Setup
• 115 constraints from vocabularies and
experts
• constraints classified and implemented
• on 3 vocabular...
Validated Data Sets
Vocabulary Data Sets Triples
QB 9,990 3,775,983,610
SKOS 4,178 477,737,281
DDI-RDF 1,526 9,673,055
Tot...
Finding 1
C [%] CV [%]
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
C (constraints), CV (constraint violations)
Finding 2
C [%] CV [%]
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
C (constraints), CV (constraint violations)
Finding 3
C [%] CV [%]
Info 42.3 31.3
Warning 18.7 62.7
Error 39.0 6.1
C (constraints), CV (constraint violations)
Limitations
> 3 Vocabularies
> 1 Domain
2016.02 - Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages (ICSC 2016)
Nächste SlideShare
Wird geladen in …5
×

2016.02 - Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages (ICSC 2016)

470 Aufrufe

Veröffentlicht am

For research institutes, data libraries, and data
archives, RDF data validation according to predefined constraints
is a much sought-after feature, particularly as this is taken
for granted in the XML world. Based on our work in the
DCMI RDF Application Profiles Task Group and in cooperation
with the W3C Data Shapes Working Group, we identified and
published by today 81 types of constraints that are required
by various stakeholders for data applications. In this paper,
in collaboration with several domain experts we formulate 115
constraints on three different vocabularies (DDI-RDF, QB, and
SKOS) and classify them according to (1) the severity of an
occurring violation and (2) the complexity of the constraint
expression in common constraint languages. We evaluate the
data quality of 15,694 data sets (4.26 billion triples) of research
data for the social, behavioral, and economic sciences obtained
from 33 SPARQL endpoints. Based on the results, we formulate
several findings to direct the further development of constraint
languages.

Veröffentlicht in: Technologie
0 Kommentare
1 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
470
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
10
Aktionen
Geteilt
0
Downloads
4
Kommentare
0
Gefällt mir
1
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

2016.02 - Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages (ICSC 2016)

  1. 1. Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages Thomas Hartmann Benjamin Zapilko, Joachim Wackerow, Kai Eckert International Conference on Semantic Systems (ICSC 2016)
  2. 2. XML Validation
  3. 3. <!ELEMENT library (book+, author*)> <!ELEMENT book (isbn, title, author-ref+)> <!ATTLIST book id ID #REQUIRED > <!ELEMENT author-ref EMPTY> <!ATTLIST author-ref id IDREF #REQUIRED > <!ELEMENT author (name)> <!ATTLIST author id ID #REQUIRED > <!ELEMENT isbn (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT name (#PCDATA)>
  4. 4. RDF Validation Workshop
  5. 5. Working Groups on RDF Validation W3C Data Shapes Working Group DCMI RDF Application Profiles Task Group
  6. 6. http://purl.org/net/rdf-validation 81 Types of Constraints on RDF Data
  7. 7. Constraint Languages
  8. 8. SPARQL Query Language for RDF SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }
  9. 9. SPARQL Inferencing Notation (SPIN) # FILTER NOT EXISTS { ?book author ?person } [ a sp:Filter ; sp:expression [ a sp:notExists ; sp:elements ( [ sp:subject [ sp:varName "book" ] ; sp:predicate author ; sp:object [ sp:varName "person" ]])]])
  10. 10. Web Ontology Language (OWL) :Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .
  11. 11. Shape Expressions (ShEx) :Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}
  12. 12. Resource Shapes (ReSh) :Computer-Science-Book a oslc:ResourceShape ; oslc:property [ oslc:propertyDefinition :subject ; oslc:allowedValues [ oslc:allowedValue "Computer Science" , "Informatics" , "Information Technology" ] ] .
  13. 13. [ a dsp:DescriptionTemplate ; dsp:resourceClass :Science-Fiction-Book ; dsp:statementTemplate [ dsp:property :subject ; dsp:nonLiteralConstraint [ dsp:valueClass skos:Concept ; dsp:valueURI :Science-Fiction, :Sci-Fi, :SF ; dsp:vocabularyEncodingScheme :Science-Fiction-Book-Subjects ; ] ] . Description Set Profiles (DSP)
  14. 14. Shapes Constraint Language (SHACL) :BookShape a sh:Shape ; sh:scopeClass :Book ; sh:property [ sh:predicate :author ; sh:valueShape :PersonShape ; sh:minCount 1 ; ] .
  15. 15. http://purl.org/net/rdfval-demo RDF Validation Environment
  16. 16. Constraint Types Classification 1. RDFS/OWL Based 2. Constraint Language Based 3. SPARQL Based
  17. 17. RDFS/OWL Based :Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .
  18. 18. Constraint Language Based :Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}
  19. 19. SPARQL Based SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }
  20. 20. Constraints Classification 1. Informational 2. Warning 3. Error
  21. 21. Evaluation Setup • 115 constraints from vocabularies and experts • constraints classified and implemented • on 3 vocabularies in the SBE sciences – well-established vocabularies (QB, SKOS) – vocabulary under development (DDI-RDF)
  22. 22. Validated Data Sets Vocabulary Data Sets Triples QB 9,990 3,775,983,610 SKOS 4,178 477,737,281 DDI-RDF 1,526 9,673,055 Total 15,694 4.26 billion 33 SPARQL Endpoints
  23. 23. Finding 1 C [%] CV [%] SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8 C (constraints), CV (constraint violations)
  24. 24. Finding 2 C [%] CV [%] SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8 C (constraints), CV (constraint violations)
  25. 25. Finding 3 C [%] CV [%] Info 42.3 31.3 Warning 18.7 62.7 Error 39.0 6.1 C (constraints), CV (constraint violations)
  26. 26. Limitations > 3 Vocabularies > 1 Domain

×