Information sources such as spreadsheets and databases contain a vast amount of structured data. Understanding the semantics of this information is essential to automate searching and integrating it. Semantic models capture the intended meaning of data sources by mapping them to the concepts and relationships defined by a domain ontology. Most of the effort to automatically build semantic models is focused on labeling the data fields with ontology classes and/or properties, e.g., annotating the first column of a table with dbpedia:Person and the second one with dbpedia:Film. However, a precise semantic model needs to explicitly represent the relationships too, e.g., stating that dbpedia:director is the relation between the first and second column. In this paper, we present a novel approach that leverages the small graph patterns occurring in the Linked Open Data (LOD) to automatically infer the semantic relations within a given data source assuming that the source attributes are already annotated with semantic labels. We evaluated our approach on a dataset of museum sources using the linked data published by Smithsonian American Art Museum as background knowledge. Mining only patterns of length one and two, our method achieves an average precision of 78% and recall of 70% in inferring the relationships included in the semantic models associated with data sources.
Schema on read is obsolete. Welcome metaprogramming..pdf
Leveraging Linked Data to Infer Semantic Relations within Structured Sources
1. Leveraging Linked Data to Infer
Semantic Relations within
Structured Sources
Mohsen Taheriyan
Craig A. Knoblock
Pedro Szekely
Jose Luis Ambite
Yinyi Chen
3. Semantic Model
Map the source to the classes & properties in an ontology
3
title date name
1 The Island 2009 Walton Ford
2 Excavation at Night 1908 George Wesley Bellows
3 Rose Garden 1901 Maria Oakey Dewing
SourceDomainOntology
CIDOC-CRM
85 classes
297 properties
4. Semantic Types
E35_Title E82_Actor_Appellation
rdfs:label rdfs:label
4
E52_Time-Span
title date name
1 The Island 2009 Walton Ford
2 Excavation at Night 1908 George Wesley Bellows
3 Rose Garden 1901 Maria Oakey Dewing
P82_at_some_time_within
5. Relationships
E35_Title E82_Actor_Appellation
rdfs:label rdfs:label
5
E52_Time-Span
title date name
1 The Island 2009 Walton Ford
2 Excavation at Night 1908 George Wesley Bellows
3 Rose Garden 1901 Maria Oakey Dewing
P82_at_some_time_within
E22_Man-Made_Object
E12_Production E21_Person
P102_has_title
P108_was_produced_by
P4_has_time-span
P14_carried_out_by
P131_is_identified_by
6. Idea
• There is a huge amount of linked data
available in many domains (RDF format)
6
• Use LOD as the
background
knowledge
• Exploit the
relationships
between instances
7. Approach
7
Extract graph patterns from the linked data
• Target source (S)
• Domain Ontologies (O)
• Semantic labels of S
• Linked Data (in the same domain)
Construct a graph from LOD patterns and the ontology
Generate and rank semantic models
1
2
3
Input
A ranked set of semantic
models for S
Output
8. Approach
8
Extract graph patterns from the linked data
• Target source (S)
• Domain Ontologies (O)
• Semantic labels of S
• Linked Data (in the same domain)
Construct a graph from LOD patterns and the ontology
Generate and rank semantic models
1
2
3
Input
A ranked set of semantic
models for S
Output
11. Pattern Templates
• Many possible templates for patterns
– Example: patterns for classes C1, C2, C3
• Consider only tree patterns
• Limit the length of the patterns
11
12. Extracting Patterns
• Use SPARQL to query RDF data
• Example: patterns with length 1
12
SELECT DISTINCT ?c1 ?p ?c2 (COUNT(*) as ?count)
WHERE {
?x ?p ?y.
?x rdf:type ?c1.
?y rdf:type ?c2.
FILTER (?x != ?y).}
GROUP BY ?c1 ?p ?c2
ORDER BY DESC(?count);
13. Approach
13
Extract graph patterns from the linked data
• Target source (S)
• Domain Ontologies (O)
• Semantic labels of S
• Linked Data (in the same domain)
Construct a graph from LOD patterns and the ontology
Generate and rank semantic models
1
2
3
Input
A ranked set of semantic
models for S
Output
14. Merge the Patterns into a Graph
14
E12_ProductionE53_Title
P108i_was_produced_by
E52_Time-Span
E82_Actor_Appellation
E22_Man-Made_Object
E21_Person
P102_has_title
P14_carried_out_by
P131_is_identified_by
E67_Birth
P98i_was_born
P4_has_time-span
P4_has_time-span
Links are weighted: less weight for more frequent links
Links have tags: the identifier of the patterns containing the link
16. Approach
16
Extract graph patterns from the linked data
• Target source (S)
• Domain Ontologies (O)
• Semantic labels of S
• Linked Data (in the same domain)
Construct a graph from LOD patterns and the ontology
Generate and rank semantic models
1
2
3
Input
A ranked set of semantic
models for S
Output
19. Generate and Rank Semantic Models
• Compute Steiner tree for the mapping
– A minimal tree connecting nodes of mapping
– A customization of BANKS algorithm [Bhalotia
et al., 2002]
• Our algorithm considers both coherence
and popularity
• Each tree is a candidate model
• Rank the models based on coherence and
cost
19
25. Evaluation
• Correct semantic types given
• Linked data: 3,398,350 triples published by Smithsonian
American Art Museum
• Extracted patterns of length 1 and 2
• Compute precision and recall between learned links and correct
links
25
Evaluation Dataset
# sources 29
# classes in the ontologies 147
# properties in the ontologies 409
# nodes in the gold standard models 812
# links in the gold standard models 785
29. Results
29
background knowledge precision recall time (s)
domain ontology 0.07 0.05 0.17
domain ontology + patterns of length 1 0.65 0.55 0.75
domain ontology + patterns of length 1 and 2 0.78 0.70 0.46
30. Related Work
• Mapping databases and spreadsheets to ontologies
– Mapping languages and tools (D2R, R2RML)
– String similarity between column names and ontology terms
• Understand semantics of Web tables
– Use column headers and cell values to find the labels and
relations from a database of labels and relations populated
from the Web
• Exploit Linked Open Data (LOD)
– Link the values to the entities in LOD to find the types of the
values and their relationships
• Learn semantic models of structured data sources from
previously modeled sources
– Learn from the popular and coherent patterns in known
semantic models
30
31. Discussion & Future Work
• Automatically Infer semantic relations
from LOD
• Help to publish consistent RDF data
• Extract longer patterns from LOD
31
Hinweis der Redaktion
Key ingredient to automate: Source discovery, Data integration, Publish knowledge graphs
P131 is identified by is more popular than P1 is identified by to connect instances of E82 Actor Appellation and instances of E21 Person: The property P1 is identified by describes the naming or identification of any real world item by a name or any other identifier, and P131 is identified by is a specialization of P1 is identified by that identifies a name used specifically to identify an instance of E39 Actor (superclass of E21 Person).
Keyword Searching and Browsing in Databases using BANKS
#data source 29
#classes in the domain ontologies 147
#properties in the domain ontologies 409
#nodes in the gold-standard models 812
#data nodes in the gold-standard models 418
#class nodes in the gold-standard models 394
#links in the gold-standard models 785
#internal links in the gold-standard models 367
68 distinct patterns with length one (two nodes and one link)
634 distinct patterns with length two (three nodes and two links)
precision: How many of the learned relationships are correct?
recall: How many of the correct relationships are learned?
Indiana Museum of Art
Dallas Museum of Art
Mapping databases and spreadsheets to ontologies
Mapping languages: D2R [Bizer, 2003], D2RQ [Bizer and Seaborne, 2004], R2RML [Das et al., 2012]
Tools: RDOTE [Vavliakis et al., 2010], RDF123 [Han et al., 2008], XLWrap [Langegger and Woß, 2009]
String similarity between column names and ontology terms [Polfliet and Ichise, 2010]
Understand semantics of Web tables
Use column headers and cell values to find the labels and relations from a database of labels and relations populated from the Web [Wang et al., 2012] [Limaye et al., 2010] [Venetis et al., 2011]
Exploit Linked Open Data (LOD)
Link the values to the entities in LOD to find the types of the values and their relationships [Muoz et al., 2013] [Mulwad et al., 2013]
Semantic annotation of Web services
Languages: SAWSDL [Farrell and Lausen, 2007]
Tools: SWEET [Maleshkova et al., 2009]
Annotate input and output parameters [Heß et al., 2003] [Lerman et al., 2006] [Saquicela e al., 2011]
Learn semantic definitions of online information sources [Carman, Knoblock, 2007]
Learns LAV rules from known sources
Only learns descriptions that are conjunctive combinations of known descriptions
Learn semantic models of structured data sources [Taheriyan et al, 2013, 2014, 2015]
Learns complete semantic models from previously modeled sources
Publish linked data
Transform the data to a common vocabulary
Linking entities across different datasets