4. Semantic Description
Describing the source in terms of the concepts and relationships
defined by the domain ontology
Domain Ontology
bornIn
nearby
birthdate
isPartOf
livesIn
Place
name
Person
organizer
name
ceo
worksFor
Event
Organization
phone
postalCode
location
City
state
State
object property
data property
subClassOf
title
startDate
name
Source
3
Column 1
Bill Gates
Mark Zuckerberg
Larry Page
Column 2
Oct 1955
May 1984
Mar 1973
Column 3
Microsoft
Facebook
Google
Column 4
Seattle
White Plains
East Lansing
Column 5
WA
NY
MI
10. Refining The Model
Refined Model
• Previous work does not learn the changes done by the
user in relationships
• User has to go through the refinement process each time
9
12. Key Idea
• Sources in the same domain often have
similar data
• Exploit knowledge of existing source
models
• Leverage relationships in known source
models to hypothesize relationships for
new sources
11
14. Example
Domain Ontology
Known Source Models
worksFor
Organization
City
Person
name
state
State
state
name
birthdate
name| birthdate|
name
name
city|
state|
S1 = personalInfo
workplace
State
name
isPartOf
City
ceo
City
name
state |
city
S2 = getCities
Person
name
State
name
name
company| ceo| city|
New Source
S4 = postalCodeLookup(zipcode, city, state)
13
location
Organization
bornIn
name
state
S3 = businessInfo
15. Build a Graph from Known Models
• Create a component in G for each known source model
– Only add if the model is not subgraph of an existing
component
• Annotate links with list of supporting models
S1 = personalInfo
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
Person
name
{s1}
{s1}
{s1}
birthdate
Person.name
{s1} name
City.name
Person.birthdate
Component 1
14
State
{s1}
name
State.name
16. Build a Graph from Known Models
• Create a component in G for each known source model
– Only add if the model is not subgraph of an existing
component
• Annotate links with list of supporting models
S1 = personalInfo
S2 = getCities
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
Person.name
City.name
Person.birthdate
Component 1
15
{s1,s2}
name
State.name
17. Build a Graph from Known Models
• Create a component in G for each known source model
– Only add if the model is not subgraph of an existing
component
• Annotate links with list of supporting models
S1 = personalInfo
S2 = getCities
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
Person.name
City.name
Person.birthdate
Component 1
16
{s1,s2}
name
State.name
S3 = businessInfo
Organization
location
{s3}
name
{s3}
City
ceo
{s3}
isPartOf
State
{s3}
Person
{s3} name
{s3}
name
Org.name
name
{s3}
State.name
City.name
Person.name
Component 2
18. Build a Graph from Known Models
• Connect graph components using all paths inferred from the
ontology
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
worksFor
ceo
name
State.name
organizer
Organization
Person.birthdate
location
name
{s3}
Person
Event
isPartOf
location
{s3} name
isPartOf
{s3}
State
{s3}
name
Org.name
isPartOf
location
Place
name
{s3}
Person.name
isPartOf
isPartOf
17
City
ceo
{s3}
organizer
location
{s3}
State.name
City.name
19. Build a Graph from Known Models
•
•
Assign low weight = ε to links within a component (black links)
Weight other links according to their (green links)
M
= known source models
Wmax = number of links in M (>= |EG|) = 18
c1(e) = number of links in M whose <label,source, target>
match e
c2(e) = number of links in M whose <label> match e
we
= Min(Wmax - c1 , Wmax - c2/Wmax)
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
name
17 worksFor
State.name
17
ceo
Organization
Person.birthdate
organizer
18
17.94
location
name
organizer 18
isPartOf
location
{s3} name
isPartOf
{s3}
State
{s3}
name
Org.name
isPartOf
location
17.94
Place
name
{s3}
Person.name
17.94
18
{s3}
Person
Event
17.94
City
ceo
{s3}
17.94
location
{s3}
17.94
isPartOf
isPartOf
State.name
City.name
20. Learn Semantic Types (Previous Work)
• A CRF-based model to assign a Semantic Type to each
column from its data
• Semantic Type
– Ontology Class
– Data Property + Domain
Domain Ontology
Place.postalCode
S4 = postalCodeLookup
19
City.name
(zipcode ,
city ,
State.name
state)
21. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
City
{s3}
Person
Event
state)
isPartOf
location
isPartOf
location
17.94
name
{s3}
Person.name
17.94
17.94
{s3} name
isPartOf
{s3}
State
{s3}
name
Org.name
Place
20
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
isPartOf
State.name
City.name
22. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
Mapping 1
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
21
Place.postalCode
City
{s3}
Person
Event
isPartOf
location
postalCode
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
state)
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
location
Place
name
{s3}
Person.name
17.94
{s3}
State
{s3}
name
Org.name
17.94
17.94
{s3} name
isPartOf
isPartOf
isPartOf
State.name
City.name
23. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
Mapping 1
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
22
Place.postalCode
City
{s3}
Person
Event
isPartOf
location
postalCode
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
state)
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
location
Place
name
{s3}
Person.name
17.94
{s3}
State
{s3}
name
Org.name
17.94
17.94
{s3} name
isPartOf
isPartOf
isPartOf
State.name
City.name
24. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
Mapping 2
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
23
Place.postalCode
City
{s3}
Person
Event
isPartOf
location
postalCode
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
state)
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
location
Place
name
{s3}
Person.name
17.94
{s3}
State
{s3}
name
Org.name
17.94
17.94
{s3} name
isPartOf
isPartOf
isPartOf
State.name
City.name
25. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
Mapping 2
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
24
Place.postalCode
City
{s3}
Person
Event
isPartOf
location
postalCode
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
state)
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
location
Place
name
{s3}
Person.name
17.94
{s3}
State
{s3}
name
Org.name
17.94
17.94
{s3} name
isPartOf
isPartOf
isPartOf
State.name
City.name
26. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
Mapping 3
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
25
Place.postalCode
state)
isPartOf
location
isPartOf
location
Place
State
{s3}
name
{s3}
Person.name
17.94
{s3}
{s3} name
{s3}
name
Org.name
17.94
17.94
isPartOf
City
Person
Event
postalCode
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
isPartOf
State.name
City.name
27. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
Mapping 3
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
Event
isPartOf
postalCode
26
Place.postalCode
state)
Place
State
{s3}
name
{s3}
Person.name
17.94
{s3}
name
{s3}
{s3}
name
Org.name
17.94
17.94
isPartOf
City
Person
isPartOf
location
location
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
isPartOf
State.name
City.name
28. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
Mapping 4
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
27
Place.postalCode
state)
isPartOf
location
isPartOf
location
17.94
Place
17.94
{s3}
State
{s3}
{s3} name
{s3}
name
Org.name
name
{s3}
Person.name
17.94
isPartOf
City
Person
Event
postalCode
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
isPartOf
State.name
City.name
29. Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings
• Compute Steiner tree (minimal tree) for each mapping
Mapping 4
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}
{s1,s2} name
{s1}
birthdate
City.name
Person.name
{s1,s2}
Place.postalCode
S4 = postalCodeLookup
name
ceo
Organization
17.94
name
28
Place.postalCode
state)
isPartOf
location
isPartOf
17.94
Place
name
{s3}
Person.name
17.94
17.94
{s3}
State
{s3}
Org.name
location
isPartOf
City
Person
Event
postalCode
location
{s3}
ceo
{s3}
17.94
organizer 18
17.94
city,
State.name
17
location
(zipcode,
State.name
17 worksFor
Person.birthdate
organizer
18
City.name
isPartOf
isPartOf
name
{s3}
{s3}
name
State.name
City.name
30. Rank Source Models
• Rank the candidates based on:
– Cost: sum of the weights
– Coherence: prefer the models with higher number of supporting models
Rank 1: Candidate 1
isPartOf
City
state
State
Rank 2: Candidate 4
isPartOf
City
{s1,s2}
Place
{s1,s2} name
{s1,s2} name
postalCode
State.name
Place.postalCode
isPartOf
{s3} name
postalCode
postalCode
{s1,s2}
name
State
{s3} name
State.name
Place.postalCode
City.name
29
State
{s3} name
State.name
Place.postalCode
City.name
City
Place
{s3}
Place
City.name
isPartOf
isPartOf
Rank 3: Candidate 2
isPartOf
isPartOf
State
City
Place
name {s1,s2} name
{s3}
postalCode
State.name
Place.postalCode
City.name
Rank 3: Candidate 3
31. Evaluation
• Dataset 1
– 17 data sources containing overlapping data
– Semantic descriptions created manually using DBPedia, FOAF,
GeoNames, and WGS84 ontologies
• Dataset 2
– 6 museum sources
– Semantic descriptions created by domain experts using EDM,
SKOS, and FOAF ontologies
• Learned a source model assuming other models as input
• Computed the Graph Edit Distance (GED) between the
learned model and the correct one
– Operations: node insertion, node deletion, edge insertion, edge
deletion, edge relabeling
• Compared the results with our previous work in Karma
30
34. Related Work
• Writing semantic descriptions by hand
– R2RML, SWRL
– Tedious and time-consuming task
– Requires expertise in SW technologies
• Semantic annotation of Web services and Web tables
– Very limited in learning the relationships
• Learning Semantic Definitions of Online Information
Sources [Carman, Knoblock, 2007]
– Learns LAV rules from known sources
– Can only learn descriptions that overlap known sources
33
35. Discussion
• Automatically build rich semantic descriptions of
data sources
• Exploit the background knowledge from (i) the
domain ontology, and (ii) the known source
models
• Semantic descriptions are the key ingredients to
automate many tasks, e.g.,
– Source Discovery
– Data Integration
– Service Composition
34
36. Future Work
• Investigate how to create a more compact graph
– Consolidate the overlapping segments of the known semantic models
• Relax the problem by removing the constraint that the correct
semantic type of each attribute is known
– CRF part returns a set of candidate semantic types along with their
confidence values
• Use the data available in Linked Open Data (LOD) cloud to learn
more accurate models
• Put the user in the loop
– Integrate the new approach into Karma
– The user refines one of the suggested models
– The new model will be added to the graph as a new pattern
35
Editor's Notes
M = set of known source modelsk=Total number of links in M (>= |EG|)c1(e)=number of links in M whose <label,source, target> match ec2(e)=number of links in M whose <label,source> match ec3(e)=number of links in M whose <label,target> match ec4(e)=number of links in M whose <label> match ewe=Min( k - ε/ [k-c1], k - ε/ [k * (k-c2)],k - ε/ [k * (k-c3)], k - ε/ [k * k * (k-c4)])
If number of mappings from semantic types to the nodes in the graph is too large, only keep top k promising mappings: score the mappings based on their coherence (the nodes in the same pattern)
If there is large number of mappings, we sort them based on their coherency: how many number of each mapping belong to the same pattern?