SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Inference and Serialization
of Latent Graph Schemata
Using Shex
Speaker: Daniel Fernández-Álvarez
Category: Idea
Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González*
danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com
*Department of Computer Science
WESO Research Group
University of Oviedo
Oviedo, Spain
Motivational
example
Motivation: Torimbia Beach
Motivation: Torimbia Beach
• Country: Spain
• Region: Asturias
• Council/city: Llanes
• Lat/long: 43.44, -4.85
• Length: 500 m
• Width: 100 m
• Naturist: True
Motivation: Torimbia Beach
*Batu Ferringhi, Horseshoe Bay, Manly Beach, Marina Beach, Playa Arcadia, Red Beach
Region Lat/long Width
X
X
X
X
X
6 different random but relevant beaches in DBPedia*
The same happens with country, council/city, length and naturist
Motivation
I would like to…
check the concept of beach, not the instances
make a single query/click to discover usual schemata
be correct, coherent and exhaustive
Idea
Proposal
• Analysis of the neighborhood of nodes that fit in a certain condition
to induce usual schemata:
• Typical condition: rdf:type
• Serialization of inferred schemata with ShEx (Shape Expressions).
• Association to a type (class)
• Management of trustworthiness
• Handy for:
• Documentation
• Verification of quality
• Discovering “hidden” entities
How?
Workflow
ShEx
<Person> {
}
Source graph:
Dbpedia,
Wikidata…
Inference Serialization
Abstract
schemata
representation
Textual schemata
representation
with ShEx
Schemata Inference: current approaches
• Ontology integration to find shared core elements [Zhao,13]
• Association rule mining (Apriori)
• Rule-based classification (Decision Tables)
• Logical axioms at ontology level [Völker,11]
• Association rule mining (Apriori)
• Axioms represented with OWL 2 EL
• Graph schemata al class level[Christodoulou,15]
• Clusters of similar individuals (ideally, cluster=class).
• Results in an ad-hoc syntax.
Schemata Inference: our current status
Some promising ideas:
Instance clustering
Association rule mining
Some issues linked to the target graph:
Noise management
Adaptation to data model
Graph size & complexity
Completeness and coherence
Schemata Serialization I
Need: Standard syntax to express constraints in RDF graphs at class
level:
• XML: RelaxNG, DTD, Xml Schema
• Relational databases: DDL
• Json: Json Schema
RDF candidates:
ShEx
Grammar-oriented
Recursion
Human-friendly syntax
SHACL
Constraint-oriented
No recursion (by now)
RDF syntax (by now)
19%
59%
83%
83%
87%
69%
32%
Schemata Serialization II
Pure ShEx
<Beach> {
dbp:width xsd:integer,
dbp:length xsd:integer,
geo:lat xsd:long,
geo:long xsd:long,
dbo:isPartOf @<Place>*
}
Anotated ShEx
<Beach> {
dbp:width xsd:integer,
dbp:length xsd:integer,
geo:lat xsd:long,
geo:long xsd:long,
geo:geometry @<Point>,
dbo:isPartOf @<Place>*,
dbo:country @<Country>
}
Use cases?
Context: Types of graphs
Specific purpose
Automatically built
Managed by a single agent
General purpose
Manually built
Managed by community
Reality
Context: Collaborative graphs
Key points:
• Schemata are not planned, they just emerge
• Schemata change in time
Posibilities:
• Schemata inference on users’ demand
• What is associated to a type, instead of how a type should be
• Freedom: ShEx as guide, not dogma
To summarize…
Conclusions and Future Work
What we have done:
Idea
Inference of Latent Graph Schemata
Serialization through ShEx syntax
What we want to do:
Prototype
Selection of techniques
Selection of target source/s
Tests
Usefulness in different domains
Feasibility: reached trustworthiness
User’s acceptance
References
• Zhao, L., & Ichise, R. (2013, May). Instance-based ontological
knowledge acquisition. In Extended Semantic Web Conference (pp.
155-169). Springer Berlin Heidelberg.
• [2] Völker, J., & Niepert, M. (2011, May). Statistical schema induction.
In Extended Semantic Web Conference (pp. 124-138). Springer Berlin
Heidelberg.
• [3] Christodoulou, K., Paton, N. W., & Fernandes, A. A. (2015).
Structure inference for linked data sources using clustering.
In Transactions on Large-Scale Data-and Knowledge-Centered
Systems XIX (pp. 1-25). Springer Berlin Heidelberg.
Inference and Serialization
of Latent Graph Schemata
Using Shex
Speaker: Daniel Fernández-Álvarez
Category: Idea
Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González*
danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com
*Department of Computer Science
WESO Research Group
University of Oviedo
Oviedo, Spain
Extra information for Torimbia example I
Latlong* Naturist
Batu Ferringhi
dbp:latd, dbp:longd, georss:point,
geo:geometry, geo:lat, geo:long X
Horseshoe Bay geo:geometry, geo:lat, geo:long X
Manly Beach
georss:point, geo:geometry, geo:lat,
geo:long X
Marina Beach
georss:point, geo:geometry, geo:lat,
geo:long X
Playa Arcadia
georss:point, geo:geometry, geo:lat,
geo:long X
Red Beach
dbp:latDeg, dbp:longDeg, georss:point,
geo:geometry, geo:lat, geo:long X
*Some lat/long properties has been omitted. Some of them work togheter in order to
get a precise coordinate (total degrees + orientation N/S E/W)
Extra information for Torimbia example II
Lenght Width Council Region Country
Batu
Ferringhi X X shared entity dbo:isPartOf dbo:country
Horseshoe
Bay X X description description
rdf:type
(BeachesOfBer
muda)
Manly Beach X X description
dct:subject
dbc:Beaches_of_N
ew_South_Wales description
Marina
Beach dbp:height description dct:subject dct:subject
Playa ArcadiaX X dct:subject X dct:subject
Red Beach X dbp:width dbp:city is dbp:south of description
Wikimedia Strategy: Templates and Mappings
• Mappings
• Designed to automatically import data from Wikipedia’s infoboxes and tables
into DBpedia.
• Wikipedia Templates define expected properties for certain types. Mappings
define which property should be used to create a triple when finding an
occurrence of an expected property.
PROS
• Preserves Wikipedia’s quality.
• Handy as guide for content
represented in Wikipedia.
• It may enrich both Wikipedia and
DBpedia
• Templates can evolve guided by
community
CONS
• Depends on Wikipedia’s quality.
• It can only manage content
represented in Wikipedia.
• Non transposable to standalone RDF
graph projects.
• It assumes that the community is
following the templates. It may not
reflect the real graph.
ShEx vs SHACL
ShEx
<UserShape> {
dbp:label xsd:string,
ex:role ( ex:User ) ?
}
SHACL
:UserShape
a sh:Shape ;
sh:property [
sh:predicate rdfs:label ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
sh:property [
sh:predicate ex:role ;
sh:hasValue ex:User ;
sh:filterShape [
sh:property [
sh:predicate ex:role ;
sh:minCount 1 ;
]
] ;
sh:maxCount 1 ; ] .

Weitere ähnliche Inhalte

Andere mochten auch (12)

4.unidad didactica primer grado
4.unidad didactica primer grado4.unidad didactica primer grado
4.unidad didactica primer grado
 
Actividad 1 humanidades
Actividad 1 humanidadesActividad 1 humanidades
Actividad 1 humanidades
 
Tecnologiaaa
TecnologiaaaTecnologiaaa
Tecnologiaaa
 
Bing Ads
Bing AdsBing Ads
Bing Ads
 
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4
 
Teoria de geometria euclideana
Teoria de geometria euclideanaTeoria de geometria euclideana
Teoria de geometria euclideana
 
130103 fbgis 2008_2012
130103 fbgis 2008_2012130103 fbgis 2008_2012
130103 fbgis 2008_2012
 
Fundamento de empaque y conservación
Fundamento de empaque y conservaciónFundamento de empaque y conservación
Fundamento de empaque y conservación
 
Redação oficial e pronomes de tratamento
Redação oficial e pronomes de tratamentoRedação oficial e pronomes de tratamento
Redação oficial e pronomes de tratamento
 
Conventional loom and modern loom
Conventional loom and modern loomConventional loom and modern loom
Conventional loom and modern loom
 
Bm examination
Bm examinationBm examination
Bm examination
 
Marketing
MarketingMarketing
Marketing
 

Mehr von Daniel Fernández Álvarez (6)

Mini tutorial rdflib
Mini tutorial rdflibMini tutorial rdflib
Mini tutorial rdflib
 
Wikidata: qué es y cómo subirse al carro
Wikidata: qué es y cómo subirse al carroWikidata: qué es y cómo subirse al carro
Wikidata: qué es y cómo subirse al carro
 
Presentation shexer
Presentation shexerPresentation shexer
Presentation shexer
 
Wikidata intro
Wikidata introWikidata intro
Wikidata intro
 
Presentation ClassRank WikidataCon 2017
Presentation ClassRank WikidataCon 2017Presentation ClassRank WikidataCon 2017
Presentation ClassRank WikidataCon 2017
 
Presentation to KILT
Presentation to KILTPresentation to KILT
Presentation to KILT
 

Kürzlich hochgeladen

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 

Slides SEMAPRO 2016 University of Oviedo

  • 1. Inference and Serialization of Latent Graph Schemata Using Shex Speaker: Daniel Fernández-Álvarez Category: Idea Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com *Department of Computer Science WESO Research Group University of Oviedo Oviedo, Spain
  • 4. Motivation: Torimbia Beach • Country: Spain • Region: Asturias • Council/city: Llanes • Lat/long: 43.44, -4.85 • Length: 500 m • Width: 100 m • Naturist: True
  • 5. Motivation: Torimbia Beach *Batu Ferringhi, Horseshoe Bay, Manly Beach, Marina Beach, Playa Arcadia, Red Beach Region Lat/long Width X X X X X 6 different random but relevant beaches in DBPedia* The same happens with country, council/city, length and naturist
  • 6. Motivation I would like to… check the concept of beach, not the instances make a single query/click to discover usual schemata be correct, coherent and exhaustive
  • 8. Proposal • Analysis of the neighborhood of nodes that fit in a certain condition to induce usual schemata: • Typical condition: rdf:type • Serialization of inferred schemata with ShEx (Shape Expressions). • Association to a type (class) • Management of trustworthiness • Handy for: • Documentation • Verification of quality • Discovering “hidden” entities
  • 10. Workflow ShEx <Person> { } Source graph: Dbpedia, Wikidata… Inference Serialization Abstract schemata representation Textual schemata representation with ShEx
  • 11. Schemata Inference: current approaches • Ontology integration to find shared core elements [Zhao,13] • Association rule mining (Apriori) • Rule-based classification (Decision Tables) • Logical axioms at ontology level [Völker,11] • Association rule mining (Apriori) • Axioms represented with OWL 2 EL • Graph schemata al class level[Christodoulou,15] • Clusters of similar individuals (ideally, cluster=class). • Results in an ad-hoc syntax.
  • 12. Schemata Inference: our current status Some promising ideas: Instance clustering Association rule mining Some issues linked to the target graph: Noise management Adaptation to data model Graph size & complexity Completeness and coherence
  • 13. Schemata Serialization I Need: Standard syntax to express constraints in RDF graphs at class level: • XML: RelaxNG, DTD, Xml Schema • Relational databases: DDL • Json: Json Schema RDF candidates: ShEx Grammar-oriented Recursion Human-friendly syntax SHACL Constraint-oriented No recursion (by now) RDF syntax (by now)
  • 14. 19% 59% 83% 83% 87% 69% 32% Schemata Serialization II Pure ShEx <Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, dbo:isPartOf @<Place>* } Anotated ShEx <Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, geo:geometry @<Point>, dbo:isPartOf @<Place>*, dbo:country @<Country> }
  • 16. Context: Types of graphs Specific purpose Automatically built Managed by a single agent General purpose Manually built Managed by community Reality
  • 17. Context: Collaborative graphs Key points: • Schemata are not planned, they just emerge • Schemata change in time Posibilities: • Schemata inference on users’ demand • What is associated to a type, instead of how a type should be • Freedom: ShEx as guide, not dogma
  • 19. Conclusions and Future Work What we have done: Idea Inference of Latent Graph Schemata Serialization through ShEx syntax What we want to do: Prototype Selection of techniques Selection of target source/s Tests Usefulness in different domains Feasibility: reached trustworthiness User’s acceptance
  • 20. References • Zhao, L., & Ichise, R. (2013, May). Instance-based ontological knowledge acquisition. In Extended Semantic Web Conference (pp. 155-169). Springer Berlin Heidelberg. • [2] Völker, J., & Niepert, M. (2011, May). Statistical schema induction. In Extended Semantic Web Conference (pp. 124-138). Springer Berlin Heidelberg. • [3] Christodoulou, K., Paton, N. W., & Fernandes, A. A. (2015). Structure inference for linked data sources using clustering. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XIX (pp. 1-25). Springer Berlin Heidelberg.
  • 21. Inference and Serialization of Latent Graph Schemata Using Shex Speaker: Daniel Fernández-Álvarez Category: Idea Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com *Department of Computer Science WESO Research Group University of Oviedo Oviedo, Spain
  • 22. Extra information for Torimbia example I Latlong* Naturist Batu Ferringhi dbp:latd, dbp:longd, georss:point, geo:geometry, geo:lat, geo:long X Horseshoe Bay geo:geometry, geo:lat, geo:long X Manly Beach georss:point, geo:geometry, geo:lat, geo:long X Marina Beach georss:point, geo:geometry, geo:lat, geo:long X Playa Arcadia georss:point, geo:geometry, geo:lat, geo:long X Red Beach dbp:latDeg, dbp:longDeg, georss:point, geo:geometry, geo:lat, geo:long X *Some lat/long properties has been omitted. Some of them work togheter in order to get a precise coordinate (total degrees + orientation N/S E/W)
  • 23. Extra information for Torimbia example II Lenght Width Council Region Country Batu Ferringhi X X shared entity dbo:isPartOf dbo:country Horseshoe Bay X X description description rdf:type (BeachesOfBer muda) Manly Beach X X description dct:subject dbc:Beaches_of_N ew_South_Wales description Marina Beach dbp:height description dct:subject dct:subject Playa ArcadiaX X dct:subject X dct:subject Red Beach X dbp:width dbp:city is dbp:south of description
  • 24. Wikimedia Strategy: Templates and Mappings • Mappings • Designed to automatically import data from Wikipedia’s infoboxes and tables into DBpedia. • Wikipedia Templates define expected properties for certain types. Mappings define which property should be used to create a triple when finding an occurrence of an expected property. PROS • Preserves Wikipedia’s quality. • Handy as guide for content represented in Wikipedia. • It may enrich both Wikipedia and DBpedia • Templates can evolve guided by community CONS • Depends on Wikipedia’s quality. • It can only manage content represented in Wikipedia. • Non transposable to standalone RDF graph projects. • It assumes that the community is following the templates. It may not reflect the real graph.
  • 25. ShEx vs SHACL ShEx <UserShape> { dbp:label xsd:string, ex:role ( ex:User ) ? } SHACL :UserShape a sh:Shape ; sh:property [ sh:predicate rdfs:label ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ; ] ; sh:property [ sh:predicate ex:role ; sh:hasValue ex:User ; sh:filterShape [ sh:property [ sh:predicate ex:role ; sh:minCount 1 ; ] ] ; sh:maxCount 1 ; ] .