SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Schema Matching and Integration
IIS 651 (S 2022)
1
Outline
๏ƒ˜ Schema and Schema Matching
๏ƒ˜ Schema Heterogeneity & Data Interoperability
๏ƒ˜ Large Scale Scenarios concerning Schema Matching and
Integration
๏ƒ˜ Related Work
๏ƒ˜ Our approach to handle Large Scale Scenario
๏ƒ˜ PORSCHE (Performance Oriented Schema Mediation)
๏ƒ˜ Future Research Directions
2
Schema
origin in Greek, meaning "shapeโ€œ or "plan"
From computer science perspective โ€“
โ€ข description of the relationship of data/ information in some
structured way or
โ€ข a set of rules defining the relationship
or
โ€ข a model to represent the data
For example
โ€ข Relational Schema
โ€ข XML Schema
โ€ข Class Diagram โ€ฆ.
3
Relational Database Schema
4
book_id
book
title
author_id
author
name pub_id
publisher
name
book_id
detail
author_id pub_id
books
XML Schema
5
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="time">
<xs:complexType/>
</xs:element>
<xs:element name="day">
<xs:complexType/>
</xs:element>
<xs:element name="courseCode">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="time"/>
<xs:element ref="day"/>
<xs:element ref="Instructor"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="arizonaCourses">
<xs:complexType>
<xs:sequence>
<xs:element ref="courseCode"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Instructor">
<xs:complexType/>
</xs:element>
</xs:schema>
Web Interface Form Schema
From city or airport* To city or airport*
I f y o u a r e u n s u r e o f t h e s p e l l i n g o f a c i t y o r a i r p o r t , e n t e r t h e
f i r s t 3 o r m o r e l e t t e r s f o l l o w e d b y a n a s t e r i s k ( * ) .
Departure date Departure time
Jul 2008 23 Any Time
Wednesday
Return Date Return time
Jul 2008 24 Any Time
Thursday
Traveler types
Adults
(12-64 yrs)
1
Children
(2-11 yrs)
0
Seniors
(65+ yrs)
0
Infants (0-
23 months)
0
Cabin type
Coach
Direct or Non-Stop flights only
More search options
6
Schema Matching
7
โ€ข Takes two schemas/ontologies as input and produces a
mapping between elements of the two schemas that
correspond semantically to each other [Halevy05]
1-1 match
complex match
26,60 Harry Potter J. K. Rowling
11,50 Marie Des Juliette Benzoni
Intrigues
16,50 Nous Les Bernard Werber
Dieux
24 Pompei Robert Harris
price book-title author-name
Books
Source A
listed-price title a-fname a-lname
Books
Source B
Applications of Schema Matching
โ€ข Data Interoperability
โ€ข Data Integration
โ€ข Data Warehousing
โ€ข Catalogue Integration
โ€ข Web Services Discovery and
Composition
โ€ข Query over the Web
โ€ข ...
โ€ข Data Exchange
โ€ข E-commerce
โ€ข Agents Communication
โ€ข ...
8
Static
Dynamic
Contributing
Schema Set Not
Evolving >>
Matching and
Mapping is one
time process
Contributing
Schema Set
Evolving >>
Matching and
Mapping also
evolve
PROBLEM?
9
Schema Heterogeneity &
Data Interoperability
โ€ข A key roadblock for information integration!
โ€ข Different data sources speak their own schema
10
Consumer
Data Source
Data Source
Data Source
Hotels, Youth Centers
Lodges, Restaurants
Beaches, Volcanoes
Hotel, Restaurant,
AdventureSports,
HistoricalSites
SOLUTION!
Schema Mediation
11
Schema Integration and Mediation
โ€ข All concerned data sources schemas are merged together into one
schema, without any concept redundancy. i.e. similar concepts are
represented by one concept
โ€ข All the input data sources schemas are mapped to this integrated
schema, also called the mediated schema
12
Consumer
Data Source
Data Source
Data Source
Hotels, Youth Centers
Lodges, Restaurants
Beaches, Volcanoes
Hotel, Restaurant,
AdventureSports,
HistoricalSites
Mediation
Mediation
Schema Mapping is key to any data sharing architecture
13
[Tomasic et al. IEEE TKDE 1998].
Mediated Schema
Source n
Source 1 Source 2
mappings
...
wrapper wrapper wrapper
User Query
sub-query
sub-query
sub-query
Schema
Matching, Mapping, Integration & Mediation
14
S1
B C
S2
B1 C2
C1
Matching
S1
B C
S2
B1 C2
C1
Mapping
Merging/ Integration
Si
B C1
C
Mediation
Si
B C1
C
S1
B C
S2
B1 C2
C1
Finding similarities
between schemas
Final correspondences
between elements
of two schemas
Based upon schema
mappings, merging
schemas into one schema
Mappings from source
schemas to the integrated
schema for data interoperability
Different Research Domains - Mediation
15
Mediation
Distributed
Databases
Data
Warehousin
g
Data Mining
โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ
Informatio
n Retrieval
Knowledge
Extraction
LARGE SCALE
PROBLEM!
16
Large Scale Scenario
โ€ข Creating a mediated schema from two large schemas (with thousands
of nodes).
โ€ข For example Open Applications Group Integration Specification (OAGIS)1
XML schema instances with number of elements in thousands
โ€ข Creating a mediated schema from a large set of schemas (with
hundreds of schemas and thousands of nodes)
โ€ข For example creating a mediated web interface input form (schema) from
the hundreds of web interface forms (schemas) related to travel domain2
17
1. http://www.openapplications.org/
2. http://metaquerier.cs.uiuc.edu
Large scale schema matching and integration requires
automated approach
Related Work
18
Pre-Match
eTuner
[Lee&Doan 07]
Amid-Match
SCIA
[Wang et al 07]
Post-Match
COMA++
[Do et al 07,
Manakanatas06]
Tuning approach
Large Scale Schema Matching and
Integration Approaches
Incremental Holistic
Fragmentation Clustering Mining
Data-mining
Element
Level
Schema
Level
Tree-mining
COMA++
[Do&Rahm07]
BellFlower
[Smiljanic06]
DCM [He et al 04]
xClust
[Lee et al 02]
PORSCHE
[Saleem et al 08]
An approach to handle
Large Scale Scenario
๏ƒ˜ Handle Schemas as Trees
๏ƒ˜ Apply the Clustering Method
๏ƒ˜ Use Tree Mining
๏ƒ˜ Devise Hybrid Approach
19
Result
Automated Approach having
Good Time Performance with
Approximate Match Quality
From city or airport* To city or airport*
I f y o u a r e u n s u r e o f t h e s p e l l i n g o f a c i t y o r a i r p o r t , e n t e r t h e
f i r s t 3 o r m o r e l e t t e r s f o l l o w e d b y a n a s t e r i s k ( * ) .
Departure date Departure time
Jul 2008 23 Any Time
Wednesday
Return Date Return time
Jul 2008 24 Any Time
Thursday
Traveler types
Adults
(12-64 yrs)
1
Children
(2-11 yrs)
0
Seniors
(65+ yrs)
0
Infants (0-
23 months)
0
Cabin type
Coach
Direct or Non-Stop flights only
More search options
20
Schemas as trees โ€“ Web Interface Forms
absTravel
From
D_City
To
A_City
Departure
Date
D_Month
D_Day
D_Time
Return
Date
R_Month
R_Day
R_Time
CabinType
TravelerTypes
Adults
Children
Seniors
Infants
absTravel
D_City
D_Day
Return
D_Month
Departure
A_City
D_Time
CabinType
Adults
Children
Seniors
Infants
D_Day
D_Month
D_Time
TravlerTypes
From
To
Date
Date
[He et al. KDD 2004]
Schemas as trees โ€“ Relational Database
21
books
book_id
author_id
author
detail
name
publisher
title
pub_id name
book_id
book
title
author_id
author
name pub_id
publisher
name
book_id
detail
author_id pub_id
books
[Lee et al. CIKM 2006]
Schemas as trees โ€“ XML Schema
22
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="time">
<xs:complexType/>
</xs:element>
<xs:element name="day">
<xs:complexType/>
</xs:element>
<xs:element name="courseCode">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="time"/>
<xs:element ref="day"/>
<xs:element ref="Instructor"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="arizonaCourses">
<xs:complexType>
<xs:sequence>
<xs:element ref="courseCode"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Instructor">
<xs:complexType/>
</xs:element>
</xs:schema>
arizonaCourses
courseCode
day
time place instructor
A speculatively rooted tree for rRNA genes
23
Schema Tree Benefit
โ€ข Tree structure for a data model inherently supports the contextual
meanings of the descendent nodes.
24
A
B
C
S1
D
A1
B1
C11
C1
S2
D
D
X
A
B C
D
S1
A1
B C11
C1
D D
S2
Element Level Clustering
โ€ข Clustering helps in target search space optimization
โ€ข Schema elements clustering based on label similarity
25
A
B
C
A1
B1
C4
C1
A
B
C2
A1
B1
C3
C5
D
D
S1 S2 S3 Si
Node Labels Similarity
C โ‰ˆ C1 โ‰ˆ C2 โ‰ˆ C3 โ‰ˆ C4 โ‰ˆ C5
t1 t2 t3 t4 โ€ฆโ€ฆ tn
s1
s2
s3
s4
โ€ฆ
sm
a1
a2
a3
a4 โ€ฆ
aq
Typical matching scenario
Tree Mining Aspect
โ€ข Tree mining finds frequent sub-trees in a given set of trees;
โ€ข similar to schema matching, which finds similar concepts among a set of
schemas
โ€ข Use of data structures supporting tree mining algorithms for schema
matching is possible
โ€ข Helps in handling Large Scale Scenario
โ€ข Supports the context of nodes
26
computers
Desktop notebook
Software
Desktop notepad
Tree mining example
โ€ข Element Level Matching (sub-tree size 1)
โ€ข Structure Level Matching (sub-tree size > 1)
27
b
a p
n
t
n
b
a f
n
t
p i
n
b
d
a
f
t p r
a
n h b
t
a
n
b
t
b
p t โ€ฆโ€ฆ
Hybrid Approach
28
Matching
Mapping
Integratio
n
Mediation
Schema Trees
Clustering
Tree Mining
Database Research Advances Reports
โ€ข https://dsf.berkeley.edu/claremont/claremontreport08.pdf
โ€ข https://beckman.cs.wisc.edu/beckman-report2013.pdf
โ€ข https://link.springer.com/article/10.1007/s10796-017-9819-2
โ€ข https://sigmodrecord.org/publications/sigmodRecord/1912/pdfs/07_
Reports_Abadi.pdf Last one 2018 โ€ฆ
โ€ข https://www.sciencedirect.com/science/article/pii/S0306437908000
15X
โ€ข https://vldb.org/2021/?papers-research
29

Weitere รคhnliche Inhalte

ร„hnlich wie Lecture 05-SchemaMatching.ppt

Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaConnected Data World
ย 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4jSina Khorami
ย 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMOptum
ย 
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...Artem Chebotko
ย 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N BasavarajappaSanthosh Basavarajappa
ย 
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...Dr. Aparna Varde
ย 
Semantic web
Semantic webSemantic web
Semantic webRonit Mathur
ย 
MIT302 Lesson 2_Advanced Database Systems.pptx
MIT302 Lesson 2_Advanced Database Systems.pptxMIT302 Lesson 2_Advanced Database Systems.pptx
MIT302 Lesson 2_Advanced Database Systems.pptxelsagalgao
ย 
L2s 090701234157 Phpapp02
L2s 090701234157 Phpapp02L2s 090701234157 Phpapp02
L2s 090701234157 Phpapp02google
ย 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the CloudUniversity of Washington
ย 
Migrating from SQL to MongoDB
Migrating from SQL to MongoDBMigrating from SQL to MongoDB
Migrating from SQL to MongoDBMongoDB
ย 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
ย 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanPeter Berger
ย 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Artificial Intelligence Institute at UofSC
ย 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
ย 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
ย 
RDF and OWL : the powerful duo | Tara Raafat
RDF and OWL : the powerful duo | Tara RaafatRDF and OWL : the powerful duo | Tara Raafat
RDF and OWL : the powerful duo | Tara RaafatConnected Data World
ย 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
ย 

ร„hnlich wie Lecture 05-SchemaMatching.ppt (20)

Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
ย 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4j
ย 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEM
ย 
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
ย 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
ย 
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
ย 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
ย 
Semantic web
Semantic webSemantic web
Semantic web
ย 
MIT302 Lesson 2_Advanced Database Systems.pptx
MIT302 Lesson 2_Advanced Database Systems.pptxMIT302 Lesson 2_Advanced Database Systems.pptx
MIT302 Lesson 2_Advanced Database Systems.pptx
ย 
L2s 090701234157 Phpapp02
L2s 090701234157 Phpapp02L2s 090701234157 Phpapp02
L2s 090701234157 Phpapp02
ย 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
ย 
Top 5-nosql
Top 5-nosqlTop 5-nosql
Top 5-nosql
ย 
Migrating from SQL to MongoDB
Migrating from SQL to MongoDBMigrating from SQL to MongoDB
Migrating from SQL to MongoDB
ย 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
ย 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
ย 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
ย 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
ย 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
ย 
RDF and OWL : the powerful duo | Tara Raafat
RDF and OWL : the powerful duo | Tara RaafatRDF and OWL : the powerful duo | Tara Raafat
RDF and OWL : the powerful duo | Tara Raafat
ย 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
ย 

Mehr von Asadkhan47384

cactus-.pptx
cactus-.pptxcactus-.pptx
cactus-.pptxAsadkhan47384
ย 
Usability in Practice.pptx
Usability in Practice.pptxUsability in Practice.pptx
Usability in Practice.pptxAsadkhan47384
ย 
HCI_Lec-15.pptx
HCI_Lec-15.pptxHCI_Lec-15.pptx
HCI_Lec-15.pptxAsadkhan47384
ย 
Lecture 08B - Logical-DWH-Model-Pending.pptx
Lecture 08B - Logical-DWH-Model-Pending.pptxLecture 08B - Logical-DWH-Model-Pending.pptx
Lecture 08B - Logical-DWH-Model-Pending.pptxAsadkhan47384
ย 
Lecture 06- Reading-SQLDataManipulation.ppt
Lecture 06- Reading-SQLDataManipulation.pptLecture 06- Reading-SQLDataManipulation.ppt
Lecture 06- Reading-SQLDataManipulation.pptAsadkhan47384
ย 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptAsadkhan47384
ย 
HCI_Lec-12.pptx
HCI_Lec-12.pptxHCI_Lec-12.pptx
HCI_Lec-12.pptxAsadkhan47384
ย 
Lecture 06 -IIS-OLAP.pptx
Lecture 06 -IIS-OLAP.pptxLecture 06 -IIS-OLAP.pptx
Lecture 06 -IIS-OLAP.pptxAsadkhan47384
ย 
Lecture 01-1-IIS.pptx
Lecture 01-1-IIS.pptxLecture 01-1-IIS.pptx
Lecture 01-1-IIS.pptxAsadkhan47384
ย 
Lecture 02-2-IIS.pptx
Lecture 02-2-IIS.pptxLecture 02-2-IIS.pptx
Lecture 02-2-IIS.pptxAsadkhan47384
ย 
HCI_ Lec-5.pptx
HCI_ Lec-5.pptxHCI_ Lec-5.pptx
HCI_ Lec-5.pptxAsadkhan47384
ย 
Lecture 06- Reading-SQLDataManipulation.ppt
Lecture 06- Reading-SQLDataManipulation.pptLecture 06- Reading-SQLDataManipulation.ppt
Lecture 06- Reading-SQLDataManipulation.pptAsadkhan47384
ย 

Mehr von Asadkhan47384 (14)

cactus-.pptx
cactus-.pptxcactus-.pptx
cactus-.pptx
ย 
Usability in Practice.pptx
Usability in Practice.pptxUsability in Practice.pptx
Usability in Practice.pptx
ย 
HCI_Lec-15.pptx
HCI_Lec-15.pptxHCI_Lec-15.pptx
HCI_Lec-15.pptx
ย 
Lecture 08B - Logical-DWH-Model-Pending.pptx
Lecture 08B - Logical-DWH-Model-Pending.pptxLecture 08B - Logical-DWH-Model-Pending.pptx
Lecture 08B - Logical-DWH-Model-Pending.pptx
ย 
Lecture 06- Reading-SQLDataManipulation.ppt
Lecture 06- Reading-SQLDataManipulation.pptLecture 06- Reading-SQLDataManipulation.ppt
Lecture 06- Reading-SQLDataManipulation.ppt
ย 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.ppt
ย 
HCI_Lec-12.pptx
HCI_Lec-12.pptxHCI_Lec-12.pptx
HCI_Lec-12.pptx
ย 
Lecture 06 -IIS-OLAP.pptx
Lecture 06 -IIS-OLAP.pptxLecture 06 -IIS-OLAP.pptx
Lecture 06 -IIS-OLAP.pptx
ย 
Lecture 01-1-IIS.pptx
Lecture 01-1-IIS.pptxLecture 01-1-IIS.pptx
Lecture 01-1-IIS.pptx
ย 
Lecture 02-2-IIS.pptx
Lecture 02-2-IIS.pptxLecture 02-2-IIS.pptx
Lecture 02-2-IIS.pptx
ย 
HCI_ Lec-5.pptx
HCI_ Lec-5.pptxHCI_ Lec-5.pptx
HCI_ Lec-5.pptx
ย 
HCI.pptx
HCI.pptxHCI.pptx
HCI.pptx
ย 
Lecture 06- Reading-SQLDataManipulation.ppt
Lecture 06- Reading-SQLDataManipulation.pptLecture 06- Reading-SQLDataManipulation.ppt
Lecture 06- Reading-SQLDataManipulation.ppt
ย 
HCI.pptx
HCI.pptxHCI.pptx
HCI.pptx
ย 

Kรผrzlich hochgeladen

VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...
VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...
VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...SUHANI PANDEY
ย 
Proposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
Proposed Amendments to Chapter 15, Article X: Wetland Conservation AreasProposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
Proposed Amendments to Chapter 15, Article X: Wetland Conservation Areas๐Ÿ’ฅVictoria K. Colangelo
ย 
Presentation: Farmer-led climate adaptation - Project launch and overview by ...
Presentation: Farmer-led climate adaptation - Project launch and overview by ...Presentation: Farmer-led climate adaptation - Project launch and overview by ...
Presentation: Farmer-led climate adaptation - Project launch and overview by ...AICCRA
ย 
Environmental Science - Nuclear Hazards and Us.pptx
Environmental Science - Nuclear Hazards and Us.pptxEnvironmental Science - Nuclear Hazards and Us.pptx
Environmental Science - Nuclear Hazards and Us.pptxhossanmdjobayer103
ย 
CSR_Tested activities in the classroom -EN
CSR_Tested activities in the classroom -ENCSR_Tested activities in the classroom -EN
CSR_Tested activities in the classroom -ENGeorgeDiamandis11
ย 
Call Girls Moshi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Moshi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Moshi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Moshi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
(NEHA) Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts 24x7
(NEHA) Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts 24x7(NEHA) Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts 24x7
(NEHA) Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts 24x7Call Girls in Nagpur High Profile Call Girls
ย 
Call Girls Budhwar Peth Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Budhwar Peth Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Budhwar Peth Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Budhwar Peth Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
Plastic Bank Beyond EPR - Sustainability Programs
Plastic Bank Beyond EPR - Sustainability ProgramsPlastic Bank Beyond EPR - Sustainability Programs
Plastic Bank Beyond EPR - Sustainability Programsitadmin50
ย 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
ย 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 8005736733 Starting From 5K to...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 8005736733 Starting From 5K to...VIP Model Call Girls Hadapsar ( Pune ) Call ON 8005736733 Starting From 5K to...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 8005736733 Starting From 5K to...SUHANI PANDEY
ย 
VVIP Pune Call Girls Vishal Nagar WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Vishal Nagar WhatSapp Number 8005736733 With Elite Staff...VVIP Pune Call Girls Vishal Nagar WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Vishal Nagar WhatSapp Number 8005736733 With Elite Staff...SUHANI PANDEY
ย 
VVIP Pune Call Girls Moshi WhatSapp Number 8005736733 With Elite Staff And Re...
VVIP Pune Call Girls Moshi WhatSapp Number 8005736733 With Elite Staff And Re...VVIP Pune Call Girls Moshi WhatSapp Number 8005736733 With Elite Staff And Re...
VVIP Pune Call Girls Moshi WhatSapp Number 8005736733 With Elite Staff And Re...SUHANI PANDEY
ย 
RA 7942:vThe Philippine Mining Act of 1995
RA 7942:vThe Philippine Mining Act of 1995RA 7942:vThe Philippine Mining Act of 1995
RA 7942:vThe Philippine Mining Act of 1995garthraymundo123
ย 
Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...MOHANI PANDEY
ย 
VIP Model Call Girls Viman Nagar ( Pune ) Call ON 8005736733 Starting From 5K...
VIP Model Call Girls Viman Nagar ( Pune ) Call ON 8005736733 Starting From 5K...VIP Model Call Girls Viman Nagar ( Pune ) Call ON 8005736733 Starting From 5K...
VIP Model Call Girls Viman Nagar ( Pune ) Call ON 8005736733 Starting From 5K...SUHANI PANDEY
ย 
Call Girls Service Pune โ‚น7.5k Pick Up & Drop With Cash Payment 8005736733 Cal...
Call Girls Service Pune โ‚น7.5k Pick Up & Drop With Cash Payment 8005736733 Cal...Call Girls Service Pune โ‚น7.5k Pick Up & Drop With Cash Payment 8005736733 Cal...
Call Girls Service Pune โ‚น7.5k Pick Up & Drop With Cash Payment 8005736733 Cal...SUHANI PANDEY
ย 

Kรผrzlich hochgeladen (20)

VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...
VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...
VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...
ย 
Climate Change
Climate ChangeClimate Change
Climate Change
ย 
Proposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
Proposed Amendments to Chapter 15, Article X: Wetland Conservation AreasProposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
Proposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
ย 
Water Pollution
Water Pollution Water Pollution
Water Pollution
ย 
Presentation: Farmer-led climate adaptation - Project launch and overview by ...
Presentation: Farmer-led climate adaptation - Project launch and overview by ...Presentation: Farmer-led climate adaptation - Project launch and overview by ...
Presentation: Farmer-led climate adaptation - Project launch and overview by ...
ย 
Environmental Science - Nuclear Hazards and Us.pptx
Environmental Science - Nuclear Hazards and Us.pptxEnvironmental Science - Nuclear Hazards and Us.pptx
Environmental Science - Nuclear Hazards and Us.pptx
ย 
CSR_Tested activities in the classroom -EN
CSR_Tested activities in the classroom -ENCSR_Tested activities in the classroom -EN
CSR_Tested activities in the classroom -EN
ย 
Call Girls Moshi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Moshi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Moshi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Moshi Call Me 7737669865 Budget Friendly No Advance Booking
ย 
(NEHA) Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts 24x7
(NEHA) Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts 24x7(NEHA) Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts 24x7
(NEHA) Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts 24x7
ย 
Call Girls Budhwar Peth Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Budhwar Peth Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Budhwar Peth Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Budhwar Peth Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Plastic Bank Beyond EPR - Sustainability Programs
Plastic Bank Beyond EPR - Sustainability ProgramsPlastic Bank Beyond EPR - Sustainability Programs
Plastic Bank Beyond EPR - Sustainability Programs
ย 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
ย 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 8005736733 Starting From 5K to...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 8005736733 Starting From 5K to...VIP Model Call Girls Hadapsar ( Pune ) Call ON 8005736733 Starting From 5K to...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 8005736733 Starting From 5K to...
ย 
(INDIRA) Call Girl Katra Call Now 8617697112 Katra Escorts 24x7
(INDIRA) Call Girl Katra Call Now 8617697112 Katra Escorts 24x7(INDIRA) Call Girl Katra Call Now 8617697112 Katra Escorts 24x7
(INDIRA) Call Girl Katra Call Now 8617697112 Katra Escorts 24x7
ย 
VVIP Pune Call Girls Vishal Nagar WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Vishal Nagar WhatSapp Number 8005736733 With Elite Staff...VVIP Pune Call Girls Vishal Nagar WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Vishal Nagar WhatSapp Number 8005736733 With Elite Staff...
ย 
VVIP Pune Call Girls Moshi WhatSapp Number 8005736733 With Elite Staff And Re...
VVIP Pune Call Girls Moshi WhatSapp Number 8005736733 With Elite Staff And Re...VVIP Pune Call Girls Moshi WhatSapp Number 8005736733 With Elite Staff And Re...
VVIP Pune Call Girls Moshi WhatSapp Number 8005736733 With Elite Staff And Re...
ย 
RA 7942:vThe Philippine Mining Act of 1995
RA 7942:vThe Philippine Mining Act of 1995RA 7942:vThe Philippine Mining Act of 1995
RA 7942:vThe Philippine Mining Act of 1995
ย 
Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
ย 
VIP Model Call Girls Viman Nagar ( Pune ) Call ON 8005736733 Starting From 5K...
VIP Model Call Girls Viman Nagar ( Pune ) Call ON 8005736733 Starting From 5K...VIP Model Call Girls Viman Nagar ( Pune ) Call ON 8005736733 Starting From 5K...
VIP Model Call Girls Viman Nagar ( Pune ) Call ON 8005736733 Starting From 5K...
ย 
Call Girls Service Pune โ‚น7.5k Pick Up & Drop With Cash Payment 8005736733 Cal...
Call Girls Service Pune โ‚น7.5k Pick Up & Drop With Cash Payment 8005736733 Cal...Call Girls Service Pune โ‚น7.5k Pick Up & Drop With Cash Payment 8005736733 Cal...
Call Girls Service Pune โ‚น7.5k Pick Up & Drop With Cash Payment 8005736733 Cal...
ย 

Lecture 05-SchemaMatching.ppt

  • 1. Schema Matching and Integration IIS 651 (S 2022) 1
  • 2. Outline ๏ƒ˜ Schema and Schema Matching ๏ƒ˜ Schema Heterogeneity & Data Interoperability ๏ƒ˜ Large Scale Scenarios concerning Schema Matching and Integration ๏ƒ˜ Related Work ๏ƒ˜ Our approach to handle Large Scale Scenario ๏ƒ˜ PORSCHE (Performance Oriented Schema Mediation) ๏ƒ˜ Future Research Directions 2
  • 3. Schema origin in Greek, meaning "shapeโ€œ or "plan" From computer science perspective โ€“ โ€ข description of the relationship of data/ information in some structured way or โ€ข a set of rules defining the relationship or โ€ข a model to represent the data For example โ€ข Relational Schema โ€ข XML Schema โ€ข Class Diagram โ€ฆ. 3
  • 4. Relational Database Schema 4 book_id book title author_id author name pub_id publisher name book_id detail author_id pub_id books
  • 5. XML Schema 5 <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="time"> <xs:complexType/> </xs:element> <xs:element name="day"> <xs:complexType/> </xs:element> <xs:element name="courseCode"> <xs:complexType mixed="true"> <xs:sequence> <xs:element ref="time"/> <xs:element ref="day"/> <xs:element ref="Instructor"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="arizonaCourses"> <xs:complexType> <xs:sequence> <xs:element ref="courseCode"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Instructor"> <xs:complexType/> </xs:element> </xs:schema>
  • 6. Web Interface Form Schema From city or airport* To city or airport* I f y o u a r e u n s u r e o f t h e s p e l l i n g o f a c i t y o r a i r p o r t , e n t e r t h e f i r s t 3 o r m o r e l e t t e r s f o l l o w e d b y a n a s t e r i s k ( * ) . Departure date Departure time Jul 2008 23 Any Time Wednesday Return Date Return time Jul 2008 24 Any Time Thursday Traveler types Adults (12-64 yrs) 1 Children (2-11 yrs) 0 Seniors (65+ yrs) 0 Infants (0- 23 months) 0 Cabin type Coach Direct or Non-Stop flights only More search options 6
  • 7. Schema Matching 7 โ€ข Takes two schemas/ontologies as input and produces a mapping between elements of the two schemas that correspond semantically to each other [Halevy05] 1-1 match complex match 26,60 Harry Potter J. K. Rowling 11,50 Marie Des Juliette Benzoni Intrigues 16,50 Nous Les Bernard Werber Dieux 24 Pompei Robert Harris price book-title author-name Books Source A listed-price title a-fname a-lname Books Source B
  • 8. Applications of Schema Matching โ€ข Data Interoperability โ€ข Data Integration โ€ข Data Warehousing โ€ข Catalogue Integration โ€ข Web Services Discovery and Composition โ€ข Query over the Web โ€ข ... โ€ข Data Exchange โ€ข E-commerce โ€ข Agents Communication โ€ข ... 8 Static Dynamic Contributing Schema Set Not Evolving >> Matching and Mapping is one time process Contributing Schema Set Evolving >> Matching and Mapping also evolve
  • 10. Schema Heterogeneity & Data Interoperability โ€ข A key roadblock for information integration! โ€ข Different data sources speak their own schema 10 Consumer Data Source Data Source Data Source Hotels, Youth Centers Lodges, Restaurants Beaches, Volcanoes Hotel, Restaurant, AdventureSports, HistoricalSites
  • 12. Schema Integration and Mediation โ€ข All concerned data sources schemas are merged together into one schema, without any concept redundancy. i.e. similar concepts are represented by one concept โ€ข All the input data sources schemas are mapped to this integrated schema, also called the mediated schema 12 Consumer Data Source Data Source Data Source Hotels, Youth Centers Lodges, Restaurants Beaches, Volcanoes Hotel, Restaurant, AdventureSports, HistoricalSites Mediation
  • 13. Mediation Schema Mapping is key to any data sharing architecture 13 [Tomasic et al. IEEE TKDE 1998]. Mediated Schema Source n Source 1 Source 2 mappings ... wrapper wrapper wrapper User Query sub-query sub-query sub-query
  • 14. Schema Matching, Mapping, Integration & Mediation 14 S1 B C S2 B1 C2 C1 Matching S1 B C S2 B1 C2 C1 Mapping Merging/ Integration Si B C1 C Mediation Si B C1 C S1 B C S2 B1 C2 C1 Finding similarities between schemas Final correspondences between elements of two schemas Based upon schema mappings, merging schemas into one schema Mappings from source schemas to the integrated schema for data interoperability
  • 15. Different Research Domains - Mediation 15 Mediation Distributed Databases Data Warehousin g Data Mining โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ Informatio n Retrieval Knowledge Extraction
  • 17. Large Scale Scenario โ€ข Creating a mediated schema from two large schemas (with thousands of nodes). โ€ข For example Open Applications Group Integration Specification (OAGIS)1 XML schema instances with number of elements in thousands โ€ข Creating a mediated schema from a large set of schemas (with hundreds of schemas and thousands of nodes) โ€ข For example creating a mediated web interface input form (schema) from the hundreds of web interface forms (schemas) related to travel domain2 17 1. http://www.openapplications.org/ 2. http://metaquerier.cs.uiuc.edu Large scale schema matching and integration requires automated approach
  • 18. Related Work 18 Pre-Match eTuner [Lee&Doan 07] Amid-Match SCIA [Wang et al 07] Post-Match COMA++ [Do et al 07, Manakanatas06] Tuning approach Large Scale Schema Matching and Integration Approaches Incremental Holistic Fragmentation Clustering Mining Data-mining Element Level Schema Level Tree-mining COMA++ [Do&Rahm07] BellFlower [Smiljanic06] DCM [He et al 04] xClust [Lee et al 02] PORSCHE [Saleem et al 08]
  • 19. An approach to handle Large Scale Scenario ๏ƒ˜ Handle Schemas as Trees ๏ƒ˜ Apply the Clustering Method ๏ƒ˜ Use Tree Mining ๏ƒ˜ Devise Hybrid Approach 19 Result Automated Approach having Good Time Performance with Approximate Match Quality
  • 20. From city or airport* To city or airport* I f y o u a r e u n s u r e o f t h e s p e l l i n g o f a c i t y o r a i r p o r t , e n t e r t h e f i r s t 3 o r m o r e l e t t e r s f o l l o w e d b y a n a s t e r i s k ( * ) . Departure date Departure time Jul 2008 23 Any Time Wednesday Return Date Return time Jul 2008 24 Any Time Thursday Traveler types Adults (12-64 yrs) 1 Children (2-11 yrs) 0 Seniors (65+ yrs) 0 Infants (0- 23 months) 0 Cabin type Coach Direct or Non-Stop flights only More search options 20 Schemas as trees โ€“ Web Interface Forms absTravel From D_City To A_City Departure Date D_Month D_Day D_Time Return Date R_Month R_Day R_Time CabinType TravelerTypes Adults Children Seniors Infants absTravel D_City D_Day Return D_Month Departure A_City D_Time CabinType Adults Children Seniors Infants D_Day D_Month D_Time TravlerTypes From To Date Date [He et al. KDD 2004]
  • 21. Schemas as trees โ€“ Relational Database 21 books book_id author_id author detail name publisher title pub_id name book_id book title author_id author name pub_id publisher name book_id detail author_id pub_id books [Lee et al. CIKM 2006]
  • 22. Schemas as trees โ€“ XML Schema 22 <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="time"> <xs:complexType/> </xs:element> <xs:element name="day"> <xs:complexType/> </xs:element> <xs:element name="courseCode"> <xs:complexType mixed="true"> <xs:sequence> <xs:element ref="time"/> <xs:element ref="day"/> <xs:element ref="Instructor"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="arizonaCourses"> <xs:complexType> <xs:sequence> <xs:element ref="courseCode"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Instructor"> <xs:complexType/> </xs:element> </xs:schema> arizonaCourses courseCode day time place instructor
  • 23. A speculatively rooted tree for rRNA genes 23
  • 24. Schema Tree Benefit โ€ข Tree structure for a data model inherently supports the contextual meanings of the descendent nodes. 24 A B C S1 D A1 B1 C11 C1 S2 D D X A B C D S1 A1 B C11 C1 D D S2
  • 25. Element Level Clustering โ€ข Clustering helps in target search space optimization โ€ข Schema elements clustering based on label similarity 25 A B C A1 B1 C4 C1 A B C2 A1 B1 C3 C5 D D S1 S2 S3 Si Node Labels Similarity C โ‰ˆ C1 โ‰ˆ C2 โ‰ˆ C3 โ‰ˆ C4 โ‰ˆ C5 t1 t2 t3 t4 โ€ฆโ€ฆ tn s1 s2 s3 s4 โ€ฆ sm a1 a2 a3 a4 โ€ฆ aq Typical matching scenario
  • 26. Tree Mining Aspect โ€ข Tree mining finds frequent sub-trees in a given set of trees; โ€ข similar to schema matching, which finds similar concepts among a set of schemas โ€ข Use of data structures supporting tree mining algorithms for schema matching is possible โ€ข Helps in handling Large Scale Scenario โ€ข Supports the context of nodes 26 computers Desktop notebook Software Desktop notepad
  • 27. Tree mining example โ€ข Element Level Matching (sub-tree size 1) โ€ข Structure Level Matching (sub-tree size > 1) 27 b a p n t n b a f n t p i n b d a f t p r a n h b t a n b t b p t โ€ฆโ€ฆ
  • 29. Database Research Advances Reports โ€ข https://dsf.berkeley.edu/claremont/claremontreport08.pdf โ€ข https://beckman.cs.wisc.edu/beckman-report2013.pdf โ€ข https://link.springer.com/article/10.1007/s10796-017-9819-2 โ€ข https://sigmodrecord.org/publications/sigmodRecord/1912/pdfs/07_ Reports_Abadi.pdf Last one 2018 โ€ฆ โ€ข https://www.sciencedirect.com/science/article/pii/S0306437908000 15X โ€ข https://vldb.org/2021/?papers-research 29