SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Legislative data portals and
linked data quality
Jose Emilio Labra Gayo
WESO
WEb Semantics Oviedo
University of Oviedo, Spain
About me…
In 2004, founded WESO (Web Semantics Oviedo) research group
Goal: Practical applications of semantic technologies
Several domains: e-Government, e-Health,...
2 books:
"Web semántica" (in Spanish), 2012
"Validating RDF data", 2017
…and software:
SHaclEX (Scala library, implements ShEx & SHACL)
RDFShape (RDF playground)
WikiShape (Wikidata playground)
HTML version: http://book.validatingrdf.com
Examples: https://github.com/labra/validatingRDFBookExamples
About this talk...
Legislative linked data portals
Chilean National library of Congress
Presented at ESWC'19*
Linked data quality
RDF description and validation
ShEx & SHACL overview
1st part 2nd part
It will be divided in 2 parts
Legislative linked data portals
Processing the History of the Law at Chile
Francisco Cifuentes Silva
Library of Congress, Chile
PhD Student
WESO research group
Jose Emilio Labra Gayo
WESO research group
University of Oviedo, Spain
More information:
Cifuentes-Silva F., Labra Gayo J.E. (2019) Legislative Document Content Extraction Based on Semantic Web Technologies. In:
Hitzler P. et al. (eds) The Semantic Web. ESWC 2019. Lecture Notes in Computer Science, vol 11503. Springer, Cham
Chilean Library of Congress
In Spanish: BCN (Biblioteca del Congreso Nacional de Chile)
Political
powers
ExecutiveJudiciaryLegislative
Independent body inside the Legislative power
Advices the parliament and gives services to citizens
http://www.bcn.cl
2 projects at library of congress (BCN)
History of the Law
Parliamentary work
History of the Law (LeyChile)
Collect all documents generated during a law legislative process
Phases:
An initiative sees life as a draft bill
Subject to debates
Validity time (it is published)
Modifications, additions,...
Derogation
Goal:
Capture the spirit of the law
Traceability
https://www.bcn.cl/historiadelaley
Parliamentary work
Collect all legislative activity by each Member of Parliament
Retrieve all interventions made
Parliamentary motion
Session journal
Commission report
Ordered and categorised
https://www.bcn.cl/laborparlamentaria/
Both projects adopted semantic technologies
Some initial reasons:
Semantic technologies considered one pillar of strategic plan (in 2014)
Innovative action to generate new products
Improve interoperability mechanisms
Sem. Web aligned well with open & public data
Which semantic technologies?
Text mining and content enrichment
Entity extraction
Topic identification
Automatic markup
Classification
Machine readable info
XML & URIs
RDF
Ontologies
Linked Open Data
Workflow pipelines
3 main steps
Automatic XML Marker
RDF & Linked data generation
Content delivery
Linked Open
Data
Query DB
Workflow overview
National library
Legislative documents
• Some docs in paper (requires OCR)
• Text documents
Automatic
XML
marker
SVN repository
Akoma-Ntoso
XML editor &
tools
Publishing
(RDF extraction
From Akoma-Ntoso)
Services
layer
Content
portals
Automatic XML marker
Source: Text Target: XML following Akoma-Ntoso
Automatic XML marker
Text
Entity Type
MediatorLegal Knowledge
Base
Entity Type URI Structural
marker
Internal XML
representation
Converter
XML
AKN
Text
Text
Named Entity
Recognizer 4 phases
1. Named Entity Recognizer
Detection of entities & types of entities
Web service implementing the Stanford NER with a CRF classifier
Evaluation in production: detects 97% entities
Type Some examples # of entities
Person Salvador Allende, Sebastián Piñera 5.139
Organization Ministerio de Salud, SERNATUR 2.848
Location Valparaíso, Santiago de Chile 1.251
Document Ley 20.000, Diario de sesión nº 12 732.497
Role Senador, Diputado, Alcalde 428
Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389
Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737
Dates 27 de febrero de 2010, el próximo año, ... 20.632
Text
Entity Type
Text
Named Entity
Recognizer
2. Mediator
Entity linking and disambiguation
Text similarity algorithms
Based on Apache Lucene
In-house development
- Use of context information to narrow
list of candidates
- Custom filters and association
heuristics
- Specialized web services
Entity Type
Mediator
Legal Knowledge
Base
Entity Type URI
Text
Text
3. Structural marker
Detect structures in the text
Titles, subtitles, paragraphs, sections,...
Special structure for debates: participation
Regular expressions + custom rules
Entity Type URI
Structural
marker
Internal XML
representation
Text
4. XML converter to Akom-Ntoso
Programmatic approach
Internal XML representation similar to DOM
Each node converted to text in AKN-XML
Internal XML
representation
Converter
XML
AKN
Human edition of AKN-Documents
Quality assurance by human analysts
They review the generated XML documents
2 editors:
Ad-hoc XML editor
Commercial editor: LegisPro (Xcential)
Linked data generation
The pilot project (2011) carefully defined a stable URI model
URIs have been maintained since them
URIs = IDs in the whole system
URIs are dereferentiable
Content negotiation
Custom linked data browser
Documentation (in Spanish)
http://datos.bcn.cl/es/documentacion
AKN2RDF
RDF extraction from Akoma-Ntoso XML
● Custom-made converter (XSL discarded for perceived complexity)
● Each XML tag implemented in one Class
● Extracted data saved into multiple databases (Relational and RDF)
Linked data generation
Source: AKN XML documents
Linked data browser (WESO-DESH)
Target: RDF data
http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
SPARQL endpoint
RDF triples are published as a public SPARQL endpoint
Number of norms by municipality
Content delivery
Web portals using Open Source Technologies
CMS (Typo3)
Python/Java
Varnish
Apache Lucene
REST Web service layers which connect to RDF triplestore and DB
Data exports to PDF, Doc and XML formats
URIs of parliamentary profiles = URIs in triplestore
History of the Law portal
https://www.bcn.cl/historiadelaley
Links to
Members of
Parliament
Each article
has a link
Different
versions
of a law
History of the Law portal
https://www.bcn.cl/historiadelaley
Compare
different
versions
Parliamentary Work
https://www.bcn.cl/laborparlamentaria
Show
participation of
each Member of
Parliament
Some experimental visualizations
Relationships between laws
Historical Parliament
Parliamentary genealogy (family relationships)
Regions mentioned in laws (legislative hackathon)
Links between laws
Historical parliament
http://datos.bcn.cl/visualizaciones/genealogia-parlamentaria/
Parliamentary genealogy
http://datos.bcn.cl/visualizaciones/genealogia-parlamentaria/consulta.jsp
Regions mentioned by law
Result of a legislative hackathon
http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html
In 2010 there was an
Earthquake in BioBio region
Statistics
24.368 documents (nov. 2018)
Number of RDF triples: 28 millions
According to Google analytics
Average browsing time: 2min 26s
Visits received 331,481 (nov. 2016-2017)  476,241 (nov. 2017-2018)
...and some findings...
Question: why are there some valleys?
Dictatorship time
Session attendance by year
RDF triples generated by year
Lessons learnt
RDF granularity & inference trade-off
RDF statements + inference (high running times...queries that didn't terminate)
A priori inferred triples added to triple store (high response times for large docs)
Small subset of RDF triples (structural parts of docs and metadata)
Performance problems in XML editor browsing long docs (>1000pages)
Low SPARQL endpoint usage by external apps
If we could start again, I would recommend ShEx
Personal note: These kind of projects led to my interest in ShEx
Conclusions & future projects
Well designed URIs can act as a perfect glue for interoperability
Automatic workflow pipelines help long-term survival of LD-based projects
SPARQL endpoint since 2011
Future projects on top of existing ones
National Budget as Linked data
Diana Project: Members of Parliament linked to social network analysis
New portal: User customization & recommender systems
2nd part
Linked data quality
and Shapes
RDF, the good parts...
RDF as an integration language
RDF as a lingua franca for semantic web and linked data
RDF flexibility & integration
Data can be adapted to multiple environments
Open and reusable data by default
RDF for knowledge representation
RDF data stores & SPARQL
RDF, the other parts
Consuming & producing RDF
Multiple syntaxes: Turtle, RDF/XML, JSON-LD, ...
Embedding RDF in HTML
Describing and validating RDF content
Producer Consumer
Why describe & validate RDF?
For producers
Developers can understand the contents they are going to produce
Ensure they produce the expected structure
Advertise and document the structure
Generate interfaces
For consumers
Understand the contents
Verify the structure before processing it
Query generation & optimization
Producer Consumer
Shapes
Similar technologies
Technology Schema
Relational Databases DDL
XML DTD, XML Schema, RelaxNG, Schematron
Json Json Schema
RDF ?
Fill that gap
ShEx and SHACL
2013 RDF Validation Workshop
Conclusions of the workshop:
There is a need of a higher level, concise language for RDF Validation
ShEx initially proposed (v 1.0)
2014 W3c Data Shapes WG chartered
2017 SHACL accepted as W3C recommendation
2017 ShEx 2.0 released as Community group draft
2019 ShEx adopted by Wikidata
Short intro to ShEx
ShEx (Shape Expressions Language)
Concise and human-readable language for RDF validation & description
Syntax similar to SPARQL, Turtle
Semantics inspired by regular expressions & RelaxNG
2 syntaxes: Compact and RDF/JSON-LD
Official info: http://shex.io
Semantics: http://shex.io/shex-semantics/, primer: http://shex.io/shex-primer
Simple example
Nodes conforming to <User> shape must:
• Be IRIs
• Have exactly one schema:name with a value of type xsd:string
• Have zero or more schema:knows whose values conform to <User>
prefix schema: <http://schema.org/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
<User> IRI {
schema:name xsd:string ;
schema:knows @<User> *
}
Prefix declarations
as in
Turtle/SPARQL
RDF Validation using ShEx
:alice schema:name "Alice" ;
schema:knows :alice .
:bob schema:knows :alice ;
schema:name "Robert".
:carol schema:name "Carol", "Carole" .
:dave schema:name 234 .
:emily foaf:name "Emily" .
:frank schema:name "Frank" ;
schema:email <mailto:frank@example.org> ;
schema:knows :alice, :bob .
:grace schema:name "Grace" ;
schema:knows :alice, _:1 .
_:1 schema:name "Unknown" .
Try it (RDFShape): https://goo.gl/97bYdv
Try it (ShExDemo):https://goo.gl/Y8hBsW
Schema
Data
<User> IRI {
schema:name xsd:string ;
schema:knows @<User> *
}
:alice@<User>,
:bob @<User>,
:carol@<User>,
:dave @<User>,
:emily@<User>,
:frank@<User>,
:grace@<User>
Shape map







Validation process
:alice@:User, :bob@:User, :carol@:User
ShEx
Validator
Result shape map
:User {
schema:name xsd:string ;
schema:knows @:User *
}
ShEx Schema
:alice schema:name "Alice" ;
schema:knows :alice .
:bob schema:knows :alice ;
schema:name "Robert".
:carol schema:name "Carol", "Carole" .
RDF data
Shape map :alice@:User,
:bob@:User,
:carol@!:User
Input: RDF data, ShEx schema, Shape map
Output: Result shape map
Example with more ShEx features
:AdultPerson EXTRA rdf:type {
rdf:type [ schema:Person ] ;
:name xsd:string ;
:age MinInclusive 18 ;
:gender [:Male :Female] OR xsd:string ;
:address @:Address ? ;
:worksFor @:Company + ;
}
:Address CLOSED {
:addressLine xsd:string {1,3} ;
:postalCode /[0-9]{5}/ ;
:state @:State ;
:city xsd:string
}
:Company {
:name xsd:string ;
:state @:State ;
:employee @:AdultPerson * ;
}
:State /[A-Z]{2}/
:alice rdf:type :Student, schema:Person ;
:name "Alice" ;
:age 20 ;
:gender :Male ;
:address [
:addressLine "Bancroft Way" ;
:city "Berkeley" ;
:postalCode "55123" ;
:state "CA"
] ;
:worksFor [
:name "Company" ;
:state "CA" ;
:employee :alice
] . Try it: https://tinyurl.com/yd5hp9z4
SHACL
SHACL (Shapes Constraint Language)
W3C recommendation:
https://www.w3.org/TR/shacl/ (July 2017)
RDF vocabulary
2 parts: SHACL-Core, SHACL-SPARQL
Simple example
Try it. RDFShape https://goo.gl/ukY5vq
prefix : <http://example.org/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix schema: <http://schema.org/>
:UserShape a sh:NodeShape ;
sh:targetNode :alice, :bob, :carol ;
sh:nodeKind sh:IRI ;
sh:property [
sh:path schema:name ;
sh:minCount 1; sh:maxCount 1;
sh:datatype xsd:string ;
] ;
sh:property [
sh:path schema:email ;
sh:minCount 1; sh:maxCount 1;
sh:nodeKind sh:IRI ;
] .
:alice schema:name "Alice Cooper" ;
schema:email <mailto:alice@mail.org> .
:bob schema:firstName "Bob" ;
schema:email <mailto:bob@mail.org> .
:carol schema:name "Carol" ;
schema:email "carol@mail.org" .
Shapes graph
Data graph


Longer example
:AdultPerson EXTRA a {
a [ schema:Person ] ;
:name xsd:string ;
:age MinInclusive 18 ;
:gender [:Male :Female] OR xsd:string ;
:address @:Address ? ;
:worksFor @:Company + ;
}
:Address CLOSED {
:addressLine xsd:string {1,3} ;
:postalCode /[0-9]{5}/ ;
:state @:State ;
:city xsd:string
}
:Company {
:name xsd:string ;
:state @:State ;
:employee @:AdultPerson * ;
}
:State /[A-Z]{2}/
:AdultPerson a sh:NodeShape ;
sh:property [
sh:path rdf:type ;
sh:qualifiedValueShape [
sh:hasValue schema:Person
];
sh:qualifiedMinCount 1 ;
sh:qualifiedMaxCount 1 ;
] ;
sh:targetNode :alice ;
sh:property [ sh:path :name ;
sh:minCount 1; sh:maxCount 1;
sh:datatype xsd:string ;
] ;
sh:property [ sh:path :gender ;
sh:minCount 1; sh:maxCount 1;
sh:in (:Male :Female);
] ;
sh:property [ sh:path :age ;
sh:maxCount 1;
sh:minInclusive 18
] ;
sh:property [ sh:path :address ;
sh:node :Address ;
sh:minCount 1 ; sh:maxCount 1
] ;
sh:property [ sh:path :worksFor ;
sh:node :Company ;
sh:minCount 1 ; sh:maxCount 1
].
:Address a sh:NodeShape ;
sh:closed true ;
sh:property [ sh:path :addressLine;
sh:datatype xsd:string ;
sh:minCount 1 ; sh:maxCount 3
] ;
sh:property [ sh:path :postalCode ;
sh:pattern "[0-9]{5}" ;
sh:minCount 1 ; sh:maxCount 3
] ;
sh:property [ sh:path :city ;
sh:datatype xsd:string ;
sh:minCount 1 ; sh:maxCount 1
] ;
sh:property [ sh:path :state ;
sh:node :State ;
] .
:Company a sh:NodeShape ;
sh:property [ sh:path :name ;
sh:datatype xsd:string
] ;
sh:property [
sh:path :state ;
sh:node :State
] ;
sh:property [ sh:path :employee ;
sh:node :AdultPerson ;
] ;.
:State a sh:NodeShape ;
sh:pattern "[A-Z]{2}" .
In ShEx
In SHACL
Its recursive!!! (not well defined SHACL)
Implementation dependent feature
Try it: https://tinyurl.com/ycl3mkzr
Some challenges and perspectives
Theoretical foundations of ShEx/SHACL
Generating shapes from data
Validation Usability
RDF Stream validation
Schema ecosystems
Wikidata
Solid
Theoretical foundations of ShEx/SHACL
Conversion between ShEx and SHACL
SHaclEX library converts subsets of both
Challenges
Recursion and negation
Performance and algorithmic complexity
Detect useful subsets of the languages
Convert to SPARQL
Schema/data mapping Shapes Shapes
ShEx SHACL
Generating Shapes from Data
Useful use case in practice
Knowledge Graph summarization
Some prototypes:
ShExer, RDFShape, ShapeArchitect
Try it with RDFShape:
https://tinyurl.com/y8pjcbyf
Shape Expression
generated for
wd:Q51613194
Shapes
RDF data
infer
Validation usability
Learning from users
Early adopters: WebIndex, HL7 FHIR, Eclipse Lyo, GenWiki,…
Improve error information/visualization/navigation/repairing
Authoring/visualization tools
Propose annotation sets
UI generation
Error reporting/suggestion (SHOULD/MUST/…)
Shapes
RDF Stream validation
Validation of RDF streams
Challenges:
Incremental validation
Named graphs
Addition/removal of triples
Sensor
Stream
RDF
data
Shapes
Validator
Stream
Validation
results
Schema ecosystems: Wikidata
In May, 2019, Wikidata announced ShEx adoption
New namespace for schemas
Example: https://www.wikidata.org/wiki/EntitySchema:E2
It opens lots of opportunities/challenges
Schema evolution and comparison
Schema ecosystems: Solid project
SOLID (SOcial Linked Data): Promoted by Tim Berners-Lee
Goal: Re-decentralize the Web
Separate data from apps
Give users more control about their data
Internally using linked data & RDF
Shapes needed for interoperability
Data
pod
Shape
A
Shape
C
Shape
B
App
1
App
2
App
3"...I just can’t stop thinking about shapes.", Ruben Verborgh
https://ruben.verborgh.org/blog/2019/06/17/shaping-linked-data-apps/
Conclusions
Explicit schemas (shapes) can improve linked data quality
2 languages proposed: ShEx/SHACL
Towards an ecosystem of shapes for data portals
New challenges and opportunities
More info
About ShEx
ShEx by Example (slides):
https://figshare.com/articles/ShExByExample_pptx/6291464
ShEx chapter from Validating RDF data book:
http://book.validatingrdf.com/bookHtml010.html
About SHACL
SHACL by example (slides):
https://figshare.com/articles/SHACL_by_example/6449645
SHACL chapter at Validating RDF data book
http://book.validatingrdf.com/bookHtml011.html
Comparing ShEx and SHACL
http://book.validatingrdf.com/bookHtml013.html
End of presentation
Acknowledgements:
David Vilches, Eridan Otto, Christian Sifaqui

Weitere ähnliche Inhalte

Was ist angesagt?

NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
National Information Standards Organization (NISO)
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
National Information Standards Organization (NISO)
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
eswcsummerschool
 

Was ist angesagt? (20)

Library Linked Data in Latvia - #LIBER2014 poster
Library Linked Data in Latvia - #LIBER2014 posterLibrary Linked Data in Latvia - #LIBER2014 poster
Library Linked Data in Latvia - #LIBER2014 poster
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
 
Lawless-3-jun15
Lawless-3-jun15Lawless-3-jun15
Lawless-3-jun15
 
Case study: Towards a linked digital collection of Latvian Cultural Heritage
Case study: Towards a linked digital collection of Latvian Cultural HeritageCase study: Towards a linked digital collection of Latvian Cultural Heritage
Case study: Towards a linked digital collection of Latvian Cultural Heritage
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
 
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti... NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 
Semantic web
Semantic web Semantic web
Semantic web
 
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
 
Wiggins-7-jun15
Wiggins-7-jun15Wiggins-7-jun15
Wiggins-7-jun15
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceBIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
 
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
 
Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 

Ähnlich wie Legislative data portals and linked data quality

SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
Rinke Hoekstra
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
giurca
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
Figoblog
 
Digital Library Applications Of Social Networking Jeju Intl Conference
Digital Library Applications Of Social Networking Jeju Intl ConferenceDigital Library Applications Of Social Networking Jeju Intl Conference
Digital Library Applications Of Social Networking Jeju Intl Conference
guestbba8ac
 

Ähnlich wie Legislative data portals and linked data quality (20)

Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Structured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackStructured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product Stack
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
The Semantic Web and Libraries in the United States: Experimentation and Achi...
The Semantic Web and Libraries in the United States: Experimentation and Achi...The Semantic Web and Libraries in the United States: Experimentation and Achi...
The Semantic Web and Libraries in the United States: Experimentation and Achi...
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Web Topics
Web TopicsWeb Topics
Web Topics
 
Metadata is back!
Metadata is back!Metadata is back!
Metadata is back!
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
 
Talis Platform: A Linked Data Engine
Talis Platform: A Linked Data EngineTalis Platform: A Linked Data Engine
Talis Platform: A Linked Data Engine
 
Web Services and the JISC IE
Web Services and the JISC IEWeb Services and the JISC IE
Web Services and the JISC IE
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
 
Digital Library Applications Of Social Networking Jeju Intl Conference
Digital Library Applications Of Social Networking Jeju Intl ConferenceDigital Library Applications Of Social Networking Jeju Intl Conference
Digital Library Applications Of Social Networking Jeju Intl Conference
 
Digital Library Applications Of Social Networking
Digital Library Applications Of Social Networking  Digital Library Applications Of Social Networking
Digital Library Applications Of Social Networking
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
 

Mehr von Jose Emilio Labra Gayo

Mehr von Jose Emilio Labra Gayo (20)

Publicaciones de investigación
Publicaciones de investigaciónPublicaciones de investigación
Publicaciones de investigación
 
Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctorado
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Wikidata
WikidataWikidata
Wikidata
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introducción a la Web Semántica
Introducción a la Web SemánticaIntroducción a la Web Semántica
Introducción a la Web Semántica
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
2017 Tendencias en informática
2017 Tendencias en informática2017 Tendencias en informática
2017 Tendencias en informática
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
19 javascript servidor
19 javascript servidor19 javascript servidor
19 javascript servidor
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazados
 
16 Alternativas XML
16 Alternativas XML16 Alternativas XML
16 Alternativas XML
 
XSLT
XSLTXSLT
XSLT
 
XPath
XPathXPath
XPath
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el Servidor
 
RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
 

Kürzlich hochgeladen

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Kürzlich hochgeladen (20)

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 

Legislative data portals and linked data quality

  • 1. Legislative data portals and linked data quality Jose Emilio Labra Gayo WESO WEb Semantics Oviedo University of Oviedo, Spain
  • 2. About me… In 2004, founded WESO (Web Semantics Oviedo) research group Goal: Practical applications of semantic technologies Several domains: e-Government, e-Health,... 2 books: "Web semántica" (in Spanish), 2012 "Validating RDF data", 2017 …and software: SHaclEX (Scala library, implements ShEx & SHACL) RDFShape (RDF playground) WikiShape (Wikidata playground) HTML version: http://book.validatingrdf.com Examples: https://github.com/labra/validatingRDFBookExamples
  • 3. About this talk... Legislative linked data portals Chilean National library of Congress Presented at ESWC'19* Linked data quality RDF description and validation ShEx & SHACL overview 1st part 2nd part It will be divided in 2 parts
  • 4. Legislative linked data portals Processing the History of the Law at Chile Francisco Cifuentes Silva Library of Congress, Chile PhD Student WESO research group Jose Emilio Labra Gayo WESO research group University of Oviedo, Spain More information: Cifuentes-Silva F., Labra Gayo J.E. (2019) Legislative Document Content Extraction Based on Semantic Web Technologies. In: Hitzler P. et al. (eds) The Semantic Web. ESWC 2019. Lecture Notes in Computer Science, vol 11503. Springer, Cham
  • 5. Chilean Library of Congress In Spanish: BCN (Biblioteca del Congreso Nacional de Chile) Political powers ExecutiveJudiciaryLegislative Independent body inside the Legislative power Advices the parliament and gives services to citizens http://www.bcn.cl
  • 6. 2 projects at library of congress (BCN) History of the Law Parliamentary work
  • 7. History of the Law (LeyChile) Collect all documents generated during a law legislative process Phases: An initiative sees life as a draft bill Subject to debates Validity time (it is published) Modifications, additions,... Derogation Goal: Capture the spirit of the law Traceability https://www.bcn.cl/historiadelaley
  • 8. Parliamentary work Collect all legislative activity by each Member of Parliament Retrieve all interventions made Parliamentary motion Session journal Commission report Ordered and categorised https://www.bcn.cl/laborparlamentaria/
  • 9. Both projects adopted semantic technologies Some initial reasons: Semantic technologies considered one pillar of strategic plan (in 2014) Innovative action to generate new products Improve interoperability mechanisms Sem. Web aligned well with open & public data
  • 10. Which semantic technologies? Text mining and content enrichment Entity extraction Topic identification Automatic markup Classification Machine readable info XML & URIs RDF Ontologies Linked Open Data
  • 11. Workflow pipelines 3 main steps Automatic XML Marker RDF & Linked data generation Content delivery
  • 12. Linked Open Data Query DB Workflow overview National library Legislative documents • Some docs in paper (requires OCR) • Text documents Automatic XML marker SVN repository Akoma-Ntoso XML editor & tools Publishing (RDF extraction From Akoma-Ntoso) Services layer Content portals
  • 13. Automatic XML marker Source: Text Target: XML following Akoma-Ntoso
  • 14. Automatic XML marker Text Entity Type MediatorLegal Knowledge Base Entity Type URI Structural marker Internal XML representation Converter XML AKN Text Text Named Entity Recognizer 4 phases
  • 15. 1. Named Entity Recognizer Detection of entities & types of entities Web service implementing the Stanford NER with a CRF classifier Evaluation in production: detects 97% entities Type Some examples # of entities Person Salvador Allende, Sebastián Piñera 5.139 Organization Ministerio de Salud, SERNATUR 2.848 Location Valparaíso, Santiago de Chile 1.251 Document Ley 20.000, Diario de sesión nº 12 732.497 Role Senador, Diputado, Alcalde 428 Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389 Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737 Dates 27 de febrero de 2010, el próximo año, ... 20.632 Text Entity Type Text Named Entity Recognizer
  • 16. 2. Mediator Entity linking and disambiguation Text similarity algorithms Based on Apache Lucene In-house development - Use of context information to narrow list of candidates - Custom filters and association heuristics - Specialized web services Entity Type Mediator Legal Knowledge Base Entity Type URI Text Text
  • 17. 3. Structural marker Detect structures in the text Titles, subtitles, paragraphs, sections,... Special structure for debates: participation Regular expressions + custom rules Entity Type URI Structural marker Internal XML representation Text
  • 18. 4. XML converter to Akom-Ntoso Programmatic approach Internal XML representation similar to DOM Each node converted to text in AKN-XML Internal XML representation Converter XML AKN
  • 19. Human edition of AKN-Documents Quality assurance by human analysts They review the generated XML documents 2 editors: Ad-hoc XML editor Commercial editor: LegisPro (Xcential)
  • 20. Linked data generation The pilot project (2011) carefully defined a stable URI model URIs have been maintained since them URIs = IDs in the whole system URIs are dereferentiable Content negotiation Custom linked data browser Documentation (in Spanish) http://datos.bcn.cl/es/documentacion
  • 21. AKN2RDF RDF extraction from Akoma-Ntoso XML ● Custom-made converter (XSL discarded for perceived complexity) ● Each XML tag implemented in one Class ● Extracted data saved into multiple databases (Relational and RDF)
  • 22. Linked data generation Source: AKN XML documents Linked data browser (WESO-DESH) Target: RDF data http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
  • 23. SPARQL endpoint RDF triples are published as a public SPARQL endpoint Number of norms by municipality
  • 24. Content delivery Web portals using Open Source Technologies CMS (Typo3) Python/Java Varnish Apache Lucene REST Web service layers which connect to RDF triplestore and DB Data exports to PDF, Doc and XML formats URIs of parliamentary profiles = URIs in triplestore
  • 25. History of the Law portal https://www.bcn.cl/historiadelaley Links to Members of Parliament Each article has a link Different versions of a law
  • 26. History of the Law portal https://www.bcn.cl/historiadelaley Compare different versions
  • 28. Some experimental visualizations Relationships between laws Historical Parliament Parliamentary genealogy (family relationships) Regions mentioned in laws (legislative hackathon)
  • 32. Regions mentioned by law Result of a legislative hackathon http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html In 2010 there was an Earthquake in BioBio region
  • 33. Statistics 24.368 documents (nov. 2018) Number of RDF triples: 28 millions According to Google analytics Average browsing time: 2min 26s Visits received 331,481 (nov. 2016-2017)  476,241 (nov. 2017-2018)
  • 34. ...and some findings... Question: why are there some valleys? Dictatorship time Session attendance by year RDF triples generated by year
  • 35. Lessons learnt RDF granularity & inference trade-off RDF statements + inference (high running times...queries that didn't terminate) A priori inferred triples added to triple store (high response times for large docs) Small subset of RDF triples (structural parts of docs and metadata) Performance problems in XML editor browsing long docs (>1000pages) Low SPARQL endpoint usage by external apps If we could start again, I would recommend ShEx Personal note: These kind of projects led to my interest in ShEx
  • 36. Conclusions & future projects Well designed URIs can act as a perfect glue for interoperability Automatic workflow pipelines help long-term survival of LD-based projects SPARQL endpoint since 2011 Future projects on top of existing ones National Budget as Linked data Diana Project: Members of Parliament linked to social network analysis New portal: User customization & recommender systems
  • 37. 2nd part Linked data quality and Shapes
  • 38. RDF, the good parts... RDF as an integration language RDF as a lingua franca for semantic web and linked data RDF flexibility & integration Data can be adapted to multiple environments Open and reusable data by default RDF for knowledge representation RDF data stores & SPARQL
  • 39. RDF, the other parts Consuming & producing RDF Multiple syntaxes: Turtle, RDF/XML, JSON-LD, ... Embedding RDF in HTML Describing and validating RDF content Producer Consumer
  • 40. Why describe & validate RDF? For producers Developers can understand the contents they are going to produce Ensure they produce the expected structure Advertise and document the structure Generate interfaces For consumers Understand the contents Verify the structure before processing it Query generation & optimization Producer Consumer Shapes
  • 41. Similar technologies Technology Schema Relational Databases DDL XML DTD, XML Schema, RelaxNG, Schematron Json Json Schema RDF ? Fill that gap
  • 42. ShEx and SHACL 2013 RDF Validation Workshop Conclusions of the workshop: There is a need of a higher level, concise language for RDF Validation ShEx initially proposed (v 1.0) 2014 W3c Data Shapes WG chartered 2017 SHACL accepted as W3C recommendation 2017 ShEx 2.0 released as Community group draft 2019 ShEx adopted by Wikidata
  • 43. Short intro to ShEx ShEx (Shape Expressions Language) Concise and human-readable language for RDF validation & description Syntax similar to SPARQL, Turtle Semantics inspired by regular expressions & RelaxNG 2 syntaxes: Compact and RDF/JSON-LD Official info: http://shex.io Semantics: http://shex.io/shex-semantics/, primer: http://shex.io/shex-primer
  • 44. Simple example Nodes conforming to <User> shape must: • Be IRIs • Have exactly one schema:name with a value of type xsd:string • Have zero or more schema:knows whose values conform to <User> prefix schema: <http://schema.org/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> <User> IRI { schema:name xsd:string ; schema:knows @<User> * } Prefix declarations as in Turtle/SPARQL
  • 45. RDF Validation using ShEx :alice schema:name "Alice" ; schema:knows :alice . :bob schema:knows :alice ; schema:name "Robert". :carol schema:name "Carol", "Carole" . :dave schema:name 234 . :emily foaf:name "Emily" . :frank schema:name "Frank" ; schema:email <mailto:frank@example.org> ; schema:knows :alice, :bob . :grace schema:name "Grace" ; schema:knows :alice, _:1 . _:1 schema:name "Unknown" . Try it (RDFShape): https://goo.gl/97bYdv Try it (ShExDemo):https://goo.gl/Y8hBsW Schema Data <User> IRI { schema:name xsd:string ; schema:knows @<User> * } :alice@<User>, :bob @<User>, :carol@<User>, :dave @<User>, :emily@<User>, :frank@<User>, :grace@<User> Shape map       
  • 46. Validation process :alice@:User, :bob@:User, :carol@:User ShEx Validator Result shape map :User { schema:name xsd:string ; schema:knows @:User * } ShEx Schema :alice schema:name "Alice" ; schema:knows :alice . :bob schema:knows :alice ; schema:name "Robert". :carol schema:name "Carol", "Carole" . RDF data Shape map :alice@:User, :bob@:User, :carol@!:User Input: RDF data, ShEx schema, Shape map Output: Result shape map
  • 47. Example with more ShEx features :AdultPerson EXTRA rdf:type { rdf:type [ schema:Person ] ; :name xsd:string ; :age MinInclusive 18 ; :gender [:Male :Female] OR xsd:string ; :address @:Address ? ; :worksFor @:Company + ; } :Address CLOSED { :addressLine xsd:string {1,3} ; :postalCode /[0-9]{5}/ ; :state @:State ; :city xsd:string } :Company { :name xsd:string ; :state @:State ; :employee @:AdultPerson * ; } :State /[A-Z]{2}/ :alice rdf:type :Student, schema:Person ; :name "Alice" ; :age 20 ; :gender :Male ; :address [ :addressLine "Bancroft Way" ; :city "Berkeley" ; :postalCode "55123" ; :state "CA" ] ; :worksFor [ :name "Company" ; :state "CA" ; :employee :alice ] . Try it: https://tinyurl.com/yd5hp9z4
  • 48. SHACL SHACL (Shapes Constraint Language) W3C recommendation: https://www.w3.org/TR/shacl/ (July 2017) RDF vocabulary 2 parts: SHACL-Core, SHACL-SPARQL
  • 49. Simple example Try it. RDFShape https://goo.gl/ukY5vq prefix : <http://example.org/> prefix sh: <http://www.w3.org/ns/shacl#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix schema: <http://schema.org/> :UserShape a sh:NodeShape ; sh:targetNode :alice, :bob, :carol ; sh:nodeKind sh:IRI ; sh:property [ sh:path schema:name ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ; ] ; sh:property [ sh:path schema:email ; sh:minCount 1; sh:maxCount 1; sh:nodeKind sh:IRI ; ] . :alice schema:name "Alice Cooper" ; schema:email <mailto:alice@mail.org> . :bob schema:firstName "Bob" ; schema:email <mailto:bob@mail.org> . :carol schema:name "Carol" ; schema:email "carol@mail.org" . Shapes graph Data graph  
  • 50. Longer example :AdultPerson EXTRA a { a [ schema:Person ] ; :name xsd:string ; :age MinInclusive 18 ; :gender [:Male :Female] OR xsd:string ; :address @:Address ? ; :worksFor @:Company + ; } :Address CLOSED { :addressLine xsd:string {1,3} ; :postalCode /[0-9]{5}/ ; :state @:State ; :city xsd:string } :Company { :name xsd:string ; :state @:State ; :employee @:AdultPerson * ; } :State /[A-Z]{2}/ :AdultPerson a sh:NodeShape ; sh:property [ sh:path rdf:type ; sh:qualifiedValueShape [ sh:hasValue schema:Person ]; sh:qualifiedMinCount 1 ; sh:qualifiedMaxCount 1 ; ] ; sh:targetNode :alice ; sh:property [ sh:path :name ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ; ] ; sh:property [ sh:path :gender ; sh:minCount 1; sh:maxCount 1; sh:in (:Male :Female); ] ; sh:property [ sh:path :age ; sh:maxCount 1; sh:minInclusive 18 ] ; sh:property [ sh:path :address ; sh:node :Address ; sh:minCount 1 ; sh:maxCount 1 ] ; sh:property [ sh:path :worksFor ; sh:node :Company ; sh:minCount 1 ; sh:maxCount 1 ]. :Address a sh:NodeShape ; sh:closed true ; sh:property [ sh:path :addressLine; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 3 ] ; sh:property [ sh:path :postalCode ; sh:pattern "[0-9]{5}" ; sh:minCount 1 ; sh:maxCount 3 ] ; sh:property [ sh:path :city ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ] ; sh:property [ sh:path :state ; sh:node :State ; ] . :Company a sh:NodeShape ; sh:property [ sh:path :name ; sh:datatype xsd:string ] ; sh:property [ sh:path :state ; sh:node :State ] ; sh:property [ sh:path :employee ; sh:node :AdultPerson ; ] ;. :State a sh:NodeShape ; sh:pattern "[A-Z]{2}" . In ShEx In SHACL Its recursive!!! (not well defined SHACL) Implementation dependent feature Try it: https://tinyurl.com/ycl3mkzr
  • 51. Some challenges and perspectives Theoretical foundations of ShEx/SHACL Generating shapes from data Validation Usability RDF Stream validation Schema ecosystems Wikidata Solid
  • 52. Theoretical foundations of ShEx/SHACL Conversion between ShEx and SHACL SHaclEX library converts subsets of both Challenges Recursion and negation Performance and algorithmic complexity Detect useful subsets of the languages Convert to SPARQL Schema/data mapping Shapes Shapes ShEx SHACL
  • 53. Generating Shapes from Data Useful use case in practice Knowledge Graph summarization Some prototypes: ShExer, RDFShape, ShapeArchitect Try it with RDFShape: https://tinyurl.com/y8pjcbyf Shape Expression generated for wd:Q51613194 Shapes RDF data infer
  • 54. Validation usability Learning from users Early adopters: WebIndex, HL7 FHIR, Eclipse Lyo, GenWiki,… Improve error information/visualization/navigation/repairing Authoring/visualization tools Propose annotation sets UI generation Error reporting/suggestion (SHOULD/MUST/…) Shapes
  • 55. RDF Stream validation Validation of RDF streams Challenges: Incremental validation Named graphs Addition/removal of triples Sensor Stream RDF data Shapes Validator Stream Validation results
  • 56. Schema ecosystems: Wikidata In May, 2019, Wikidata announced ShEx adoption New namespace for schemas Example: https://www.wikidata.org/wiki/EntitySchema:E2 It opens lots of opportunities/challenges Schema evolution and comparison
  • 57. Schema ecosystems: Solid project SOLID (SOcial Linked Data): Promoted by Tim Berners-Lee Goal: Re-decentralize the Web Separate data from apps Give users more control about their data Internally using linked data & RDF Shapes needed for interoperability Data pod Shape A Shape C Shape B App 1 App 2 App 3"...I just can’t stop thinking about shapes.", Ruben Verborgh https://ruben.verborgh.org/blog/2019/06/17/shaping-linked-data-apps/
  • 58. Conclusions Explicit schemas (shapes) can improve linked data quality 2 languages proposed: ShEx/SHACL Towards an ecosystem of shapes for data portals New challenges and opportunities
  • 59. More info About ShEx ShEx by Example (slides): https://figshare.com/articles/ShExByExample_pptx/6291464 ShEx chapter from Validating RDF data book: http://book.validatingrdf.com/bookHtml010.html About SHACL SHACL by example (slides): https://figshare.com/articles/SHACL_by_example/6449645 SHACL chapter at Validating RDF data book http://book.validatingrdf.com/bookHtml011.html Comparing ShEx and SHACL http://book.validatingrdf.com/bookHtml013.html
  • 60. End of presentation Acknowledgements: David Vilches, Eridan Otto, Christian Sifaqui

Hinweis der Redaktion

  1. ShEx: 20 lines of code SHACL: 60 lines of code