SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/1
Outline
ā€¢ Introduction
ā€¢ Background
ā€¢ Distributed Database Design
ā€¢ Database Integration
āž” Schema Matching
āž” Schema Mapping
ā€¢ Semantic Data Control
ā€¢ Distributed Query Processing
ā€¢ Multimedia Query Processing
ā€¢ Distributed Transaction Management
ā€¢ Data Replication
ā€¢ Parallel Database Systems
ā€¢ Distributed Object DBMS
ā€¢ Peer-to-Peer Data Management
ā€¢ Web Data Management
ā€¢ Current Issues
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/2
Problem Definition
ā€¢ Given existing databases with their Local Conceptual Schemas
(LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS)
āž” GCS is also called mediated schema
ā€¢ Bottom-up design process
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/3
Integration Alternatives
ā€¢ Physical integration
āž” Source databases integrated and the integrated database is materialized
āž” Data warehouses
ā€¢ Logical integration
āž” Global conceptual schema is virtual and not materialized
āž” Enterprise Information Integration (EII)
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/4
Data Warehouse Approach
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/5
Bottom-up Design
ā€¢ GCS (also called mediated schema) is defined first
āž” Map LCSs to this schema
āž” As in data warehouses
ā€¢ GCS is defined as an integration of parts of LCSs
āž” Generate GCS and map LCSs to this GCS
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/6
GCS/LCS Relationship
ā€¢ Local-as-view
āž” The GCS definition is assumed to exist, and each LCS is treated as a view
definition over it
ā€¢ Global-as-view
āž” The GCS is defined as a set of views over the LCSs
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/7
Database Integration Process
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/8
Recall Access Architecture
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/9
Database Integration Issues
ā€¢ Schema translation
āž” Component database schemas translated to a common intermediate canonical
representation
ā€¢ Schema generation
āž” Intermediate schemas are used to create a global conceptual schema
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/10
Schema Translation
ā€¢ What is the canonical data model?
āž” Relational
āž” Entity-relationship
āœ¦ DIKE
āž” Object-oriented
āœ¦ ARTEMIS
āž” Graph-oriented
āœ¦ DIPE, TranScm, COMA, Cupid
āœ¦ Preferable with emergence of XML
āœ¦ No common graph formalism
ā€¢ Mapping algorithms
āž” These are well-known
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/11
Schema Generation
ā€¢ Schema matching
āž” Finding the correspondences between multiple schemas
ā€¢ Schema integration
āž” Creation of the GCS (or mediated schema) using the correspondences
ā€¢ Schema mapping
āž” How to map data from local databases to the GCS
ā€¢ Important: sometimes the GCS is defined first and schema matching and
schema mapping is done against this target GCS
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/12
Running Example
EMP(ENO, ENAME, TITLE)
PROJ(PNO, PNAME, BUDGET, LOC, CNAME)
ASG(ENO, PNO, RESP, DUR)
PAY(TITLE, SAL)
Relational
E-R Model
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/13
Schema Matching
ā€¢ Schema heterogeneity
āž” Structural heterogeneity
āœ¦ Type conflicts
āœ¦ Dependency conflicts
āœ¦ Key conflicts
āœ¦ Behavioral conflicts
āž” Semantic heterogeneity
āœ¦ More important and harder to deal with
āœ¦ Synonyms, homonyms, hypernyms
āœ¦ Different ontology
āœ¦ Imprecise wording
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/14
Schema Matching (contā€™d)
ā€¢ Other complications
āž” Insufficient schema and instance information
āž” Unavailability of schema documentation
āž” Subjectivity of matching
ā€¢ Issues that affect schema matching
āž” Schema versus instance matching
āž” Element versus structure level matching
āž” Matching cardinality
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/15
Schema Matching Approaches
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/16
Linguistic Schema Matching
ā€¢ Use element names and other textual information (textual
descriptions, annotations)
ā€¢ May use external sources (e.g., Thesauri)
ā€¢ 怈SC1.element-1 ā‰ˆ SC2.element-2, p,s怉
āž” Element-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p
holds with a similarity value of s
ā€¢ Schema level
āž” Deal with names of schema elements
āž” Handle cases such as synonyms, homonyms, hypernyms, data type
similarities
ā€¢ Instance level
āž” Focus on information retrieval techniques (e.g., word frequencies, key terms)
āž” ā€œDeduceā€ similarities from these
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/17
Linguistic Matchers
ā€¢ Use a set of linguistic (terminological) rules
ā€¢ Basic rules can be hand-crafted or may be discovered from outside sources
(e.g., WordNet)
ā€¢ Predicate p and similarity value s
āž” hand-crafted ā‡’ specified,
āž” discovered ā‡’ may be computed or specified by an expert after discovery
ā€¢ Examples
āž” 怈uppercase names ā‰ˆ lower case names, true, 1.0怉
āž” 怈uppercase names ā‰ˆ capitalized names, true, 1.0怉
āž” 怈capitalized names ā‰ˆ lower case names, true, 1.0怉
āž” 怈DB1.ASG ā‰ˆ DB2.WORKS_IN, true, 0.8怉
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/18
Automatic Discovery of Name
Similarities
ā€¢ Affixes
āž” Common prefixes and suffixes between two element name strings
ā€¢ N-grams
āž” Comparing how many substrings of length n are common between the two
name strings
ā€¢ Edit distance
āž” Number of character modifications (additions, deletions, insertions) that
needs to be performed to convert one string into the other
ā€¢ Soundex code
āž” Phonetic similarity between names based on their soundex codes
ā€¢ Also look at data types
āž” Data type similarity may suggest stronger relationship than the computed
similarity using these methods or to differentiate between multiple strings
with same value
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/19
N-gram Example
ā€¢ 3-grams of string ā€œResponsibilityā€ are the following:
ļ¬Res ļ¬ sib
ļ¬ibi ļ¬ esp
ļ¬bip ļ¬ spo
ļ¬ili ļ¬ pon
ļ¬lit ļ¬ ons
ļ¬ity ļ¬ nsi
ā€¢ 3-grams of string ā€œRespā€ are
āž” Res
āž” esp
ā€¢ 3-gram similarity: 2/12 = 0.17
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/20
Edit Distance Example
ā€¢ Again consider ā€œResponsibilityā€ and ā€œRespā€
ā€¢ To convert ā€œResponsibilityā€ to ā€œRespā€
āž” Delete characters ā€œoā€, ā€œnā€, ā€œsā€, ā€œiā€, ā€œbā€, ā€œiā€, ā€œlā€, ā€œiā€, ā€œtā€, ā€œyā€
ā€¢ To convert ā€œRespā€ to ā€œResponsibilityā€
āž” Add characters ā€œoā€, ā€œnā€, ā€œsā€, ā€œiā€, ā€œbā€, ā€œiā€, ā€œlā€, ā€œiā€, ā€œtā€, ā€œyā€
ā€¢ The number of edit operations required is 10
ā€¢ Similarity is 1 āˆ’ (10/14) = 0.29
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/21
Constraint-based Matchers
ā€¢ Data always have constraints ā€“ use them
āž” Data type information
āž” Value ranges
āž” ā€¦
ā€¢ Examples
āž” RESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity
= 0.19 (low)
āž” If they come from the same domain, this may increase their similarity value
āž” ENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-R
āž” ENO and WORKER.NUMBER may have type INTEGER while
PROJECT.NUMBER may have STRING
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/22
Constraint-based Structural
Matching
ā€¢ If two schema elements are structurally similar, then there is a higher
likelihood that they represent the same concept
ā€¢ Structural similarity:
āž” Same properties (attributes)
āž” ā€œNeighborhoodā€ similarity
āœ¦ Using graph representation
āœ¦ The set of nodes that can be reached within a particular path length from a node
are the neighbors of that node
āœ¦ If two concepts (nodes) have similar set of neighbors, they are likely to represent
the same concept
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/23
Learning-based Schema
Matching
ā€¢ Use machine learning techniques to determine schema matches
ā€¢ Classification problem: classify concepts from various schemas into classes
according to their similarity. Those that fall into the same class represent
similar concepts
ā€¢ Similarity is defined according to features of data instances
ā€¢ Classification is ā€œlearnedā€ from a training set
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/24
Learning-based Schema
Matching
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/25
Combined Schema Matching
Approaches
ā€¢ Use multiple matchers
āž” Each matcher focuses on one area (name, etc)
ā€¢ Meta-matcher integrates these into one prediction
ā€¢ Integration may be simple (take average of similarity values) or more
complex (see Faginā€™s work)
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/26
Schema Integration
ā€¢ Use the correspondences to create a GCS
ā€¢ Mainly a manual process, although rules can help
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/27
Binary Integration Methods
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/28
N-ary Integration Methods
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/29
Schema Mapping
ā€¢ Mapping data from each local database (source) to GCS (target) while
preserving semantic consistency as defined in both source and target.
ā€¢ Data warehouses ā‡’ actual translation
ā€¢ Data integration systems ā‡’ discover mappings that can be used in the
query processing phase
ā€¢ Mapping creation
ā€¢ Mapping maintenance
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/30
Mapping Creation
Given
āž” A source LCS
āž” A target GCS
āž” A set of value correspondences discovered
during schema matching phase
Produce a set of queries that, when executed, will create GCS data instances
from the source data.
We are looking, for each Tk, a query Qk that is defined on a (possibly proper)
subset of the relations in S such that, when executed, will generate data for
Ti from the source relations
Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/31
Mapping Creation Algorithm
General idea:
ā€¢ Consider each Tk in turn. Divide Vk into subsets such that
each specifies one possible way that values of Tk can be computed.
ā€¢ Each can be mapped to a query that, when executed, would generate
some of Tkā€™s data.
ā€¢ Union of these queries gives

Weitere Ƥhnliche Inhalte

Was ist angesagt?

Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Gyanmanjari Institute Of Technology
Ā 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization Hafiz faiz
Ā 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
Ā 
Distributed file system
Distributed file systemDistributed file system
Distributed file systemAnamika Singh
Ā 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
Ā 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactionsNilu Desai
Ā 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Meghaj Mallick
Ā 
Data preparation
Data preparationData preparation
Data preparationTony Nguyen
Ā 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issuesEsar Qasmi
Ā 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query ProcessingMythili Kannan
Ā 
Object Oriented Database Management System
Object Oriented Database Management SystemObject Oriented Database Management System
Object Oriented Database Management SystemAjay Jha
Ā 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)Ravinder Kamboj
Ā 
Object oriented databases
Object oriented databasesObject oriented databases
Object oriented databasesSajith Ekanayaka
Ā 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMSkoolkampus
Ā 

Was ist angesagt? (20)

Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Ā 
Lec 7 query processing
Lec 7 query processingLec 7 query processing
Lec 7 query processing
Ā 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
Ā 
Lecture 1 ddbms
Lecture 1 ddbmsLecture 1 ddbms
Lecture 1 ddbms
Ā 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ā 
Distributed database
Distributed databaseDistributed database
Distributed database
Ā 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
Ā 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
Ā 
Ddbms1
Ddbms1Ddbms1
Ddbms1
Ā 
Unit 3
Unit   3Unit   3
Unit 3
Ā 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
Ā 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
Ā 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
Ā 
Data preparation
Data preparationData preparation
Data preparation
Ā 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issues
Ā 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
Ā 
Object Oriented Database Management System
Object Oriented Database Management SystemObject Oriented Database Management System
Object Oriented Database Management System
Ā 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
Ā 
Object oriented databases
Object oriented databasesObject oriented databases
Object oriented databases
Ā 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
Ā 

Andere mochten auch

Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema IntegrationMustafa Jarrar
Ā 
Data integration
Data integrationData integration
Data integrationUmar Alharaky
Ā 
Data Integration (ETL)
Data Integration (ETL)Data Integration (ETL)
Data Integration (ETL)easysoft
Ā 
DBMS Canonical cover
DBMS Canonical coverDBMS Canonical cover
DBMS Canonical coverSaurabh Tandel
Ā 
Data integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcuttaData integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcuttaBhawani N Prasad
Ā 
Database ,7 query localization
Database ,7 query localizationDatabase ,7 query localization
Database ,7 query localizationAli Usman
Ā 
Database, 3 Distribution Design
Database, 3 Distribution DesignDatabase, 3 Distribution Design
Database, 3 Distribution DesignAli Usman
Ā 
Database ,11 Concurrency Control
Database ,11 Concurrency ControlDatabase ,11 Concurrency Control
Database ,11 Concurrency ControlAli Usman
Ā 
Database , 15 Object DBMS
Database , 15 Object DBMSDatabase , 15 Object DBMS
Database , 15 Object DBMSAli Usman
Ā 
Database ,18 Current Issues
Database ,18 Current IssuesDatabase ,18 Current Issues
Database ,18 Current IssuesAli Usman
Ā 
Database ,2 Background
 Database ,2 Background Database ,2 Background
Database ,2 BackgroundAli Usman
Ā 
Database , 6 Query Introduction
Database , 6 Query Introduction Database , 6 Query Introduction
Database , 6 Query Introduction Ali Usman
Ā 
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationPal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationMustafa Jarrar
Ā 
test
testtest
testeduard_c
Ā 
Modul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitianModul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitianFokgusta
Ā 
Media ajarelektronik
Media ajarelektronikMedia ajarelektronik
Media ajarelektronikFokgusta
Ā 
Processor Specifications
Processor SpecificationsProcessor Specifications
Processor SpecificationsAli Usman
Ā 
SysML as a Common Integration Platform for Co-Simulations ā€“ Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations ā€“ Example of a Cybe...SysML as a Common Integration Platform for Co-Simulations ā€“ Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations ā€“ Example of a Cybe...Andrey Sadovykh
Ā 
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataPal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataMustafa Jarrar
Ā 

Andere mochten auch (20)

Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema Integration
Ā 
Data integration
Data integrationData integration
Data integration
Ā 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
Ā 
Data Integration (ETL)
Data Integration (ETL)Data Integration (ETL)
Data Integration (ETL)
Ā 
DBMS Canonical cover
DBMS Canonical coverDBMS Canonical cover
DBMS Canonical cover
Ā 
Data integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcuttaData integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcutta
Ā 
Database ,7 query localization
Database ,7 query localizationDatabase ,7 query localization
Database ,7 query localization
Ā 
Database, 3 Distribution Design
Database, 3 Distribution DesignDatabase, 3 Distribution Design
Database, 3 Distribution Design
Ā 
Database ,11 Concurrency Control
Database ,11 Concurrency ControlDatabase ,11 Concurrency Control
Database ,11 Concurrency Control
Ā 
Database , 15 Object DBMS
Database , 15 Object DBMSDatabase , 15 Object DBMS
Database , 15 Object DBMS
Ā 
Database ,18 Current Issues
Database ,18 Current IssuesDatabase ,18 Current Issues
Database ,18 Current Issues
Ā 
Database ,2 Background
 Database ,2 Background Database ,2 Background
Database ,2 Background
Ā 
Database , 6 Query Introduction
Database , 6 Query Introduction Database , 6 Query Introduction
Database , 6 Query Introduction
Ā 
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationPal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integration
Ā 
test
testtest
test
Ā 
Modul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitianModul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitian
Ā 
Media ajarelektronik
Media ajarelektronikMedia ajarelektronik
Media ajarelektronik
Ā 
Processor Specifications
Processor SpecificationsProcessor Specifications
Processor Specifications
Ā 
SysML as a Common Integration Platform for Co-Simulations ā€“ Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations ā€“ Example of a Cybe...SysML as a Common Integration Platform for Co-Simulations ā€“ Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations ā€“ Example of a Cybe...
Ā 
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataPal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddata
Ā 

Ƅhnlich wie Database , 4 Data Integration

Database ,16 P2P
Database ,16 P2P Database ,16 P2P
Database ,16 P2P Ali Usman
Ā 
Database , 17 Web
Database , 17 WebDatabase , 17 Web
Database , 17 WebAli Usman
Ā 
1 introduction
1 introduction1 introduction
1 introductionAmrit Kaur
Ā 
6-Query_Intro (5).pdf
6-Query_Intro (5).pdf6-Query_Intro (5).pdf
6-Query_Intro (5).pdfJaveriaShoaib4
Ā 
Nosql
NosqlNosql
NosqlROXTAD71
Ā 
[Mas 500] Data Basics
[Mas 500] Data Basics[Mas 500] Data Basics
[Mas 500] Data Basicsrahulbot
Ā 
1 introduction DDBS
1 introduction DDBS1 introduction DDBS
1 introduction DDBSnaimanighat
Ā 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 IntroductionAli Usman
Ā 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Synaptica, LLC
Ā 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdfTOUSEEQHAIDER14
Ā 
OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashingJ Singh
Ā 
1 introduction ddbms
1 introduction ddbms1 introduction ddbms
1 introduction ddbmsamna izzat
Ā 
Info systems databases
Info systems databasesInfo systems databases
Info systems databasesMR Z
Ā 

Ƅhnlich wie Database , 4 Data Integration (20)

Database ,16 P2P
Database ,16 P2P Database ,16 P2P
Database ,16 P2P
Ā 
Database , 17 Web
Database , 17 WebDatabase , 17 Web
Database , 17 Web
Ā 
1 introduction
1 introduction1 introduction
1 introduction
Ā 
6-Query_Intro (5).pdf
6-Query_Intro (5).pdf6-Query_Intro (5).pdf
6-Query_Intro (5).pdf
Ā 
Nosql
NosqlNosql
Nosql
Ā 
Nosql
NosqlNosql
Nosql
Ā 
[Mas 500] Data Basics
[Mas 500] Data Basics[Mas 500] Data Basics
[Mas 500] Data Basics
Ā 
1 introduction DDBS
1 introduction DDBS1 introduction DDBS
1 introduction DDBS
Ā 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
Ā 
DDBS PPT (1).pptx
DDBS PPT (1).pptxDDBS PPT (1).pptx
DDBS PPT (1).pptx
Ā 
Dunsire roadmap meeting proposal
Dunsire roadmap meeting proposalDunsire roadmap meeting proposal
Dunsire roadmap meeting proposal
Ā 
Top 5-nosql
Top 5-nosqlTop 5-nosql
Top 5-nosql
Ā 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
Ā 
DBMS outline.pptx
DBMS outline.pptxDBMS outline.pptx
DBMS outline.pptx
Ā 
NoSql
NoSqlNoSql
NoSql
Ā 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdf
Ā 
OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
Ā 
1 introduction ddbms
1 introduction ddbms1 introduction ddbms
1 introduction ddbms
Ā 
Nosql
NosqlNosql
Nosql
Ā 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
Ā 

Mehr von Ali Usman

Cisco Packet Tracer Overview
Cisco Packet Tracer OverviewCisco Packet Tracer Overview
Cisco Packet Tracer OverviewAli Usman
Ā 
Islamic Arts and Architecture
Islamic Arts and  ArchitectureIslamic Arts and  Architecture
Islamic Arts and ArchitectureAli Usman
Ā 
Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMSAli Usman
Ā 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 ReplicationAli Usman
Ā 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 ReliabilityAli Usman
Ā 
Database ,10 Transactions
Database ,10 TransactionsDatabase ,10 Transactions
Database ,10 TransactionsAli Usman
Ā 
Database , 5 Semantic
Database , 5 SemanticDatabase , 5 Semantic
Database , 5 SemanticAli Usman
Ā 
Processor Specifications
Processor SpecificationsProcessor Specifications
Processor SpecificationsAli Usman
Ā 
Fifty Year Of Microprocessor
Fifty Year Of MicroprocessorFifty Year Of Microprocessor
Fifty Year Of MicroprocessorAli Usman
Ā 
Discrete Structures lecture 2
 Discrete Structures lecture 2 Discrete Structures lecture 2
Discrete Structures lecture 2Ali Usman
Ā 
Discrete Structures. Lecture 1
 Discrete Structures. Lecture 1  Discrete Structures. Lecture 1
Discrete Structures. Lecture 1 Ali Usman
Ā 
Muslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-AstronomyMuslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-AstronomyAli Usman
Ā 
Muslim Contributions in Geography
Muslim Contributions in GeographyMuslim Contributions in Geography
Muslim Contributions in GeographyAli Usman
Ā 
Muslim Contributions in Astronomy
Muslim Contributions in AstronomyMuslim Contributions in Astronomy
Muslim Contributions in AstronomyAli Usman
Ā 
Ptcl modem (user manual)
Ptcl modem (user manual)Ptcl modem (user manual)
Ptcl modem (user manual)Ali Usman
Ā 
Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali Ali Usman
Ā 
Muslim Contributions in Mathematics
Muslim Contributions in MathematicsMuslim Contributions in Mathematics
Muslim Contributions in MathematicsAli Usman
Ā 
Osi protocols
Osi protocolsOsi protocols
Osi protocolsAli Usman
Ā 

Mehr von Ali Usman (18)

Cisco Packet Tracer Overview
Cisco Packet Tracer OverviewCisco Packet Tracer Overview
Cisco Packet Tracer Overview
Ā 
Islamic Arts and Architecture
Islamic Arts and  ArchitectureIslamic Arts and  Architecture
Islamic Arts and Architecture
Ā 
Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMS
Ā 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 Replication
Ā 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
Ā 
Database ,10 Transactions
Database ,10 TransactionsDatabase ,10 Transactions
Database ,10 Transactions
Ā 
Database , 5 Semantic
Database , 5 SemanticDatabase , 5 Semantic
Database , 5 Semantic
Ā 
Processor Specifications
Processor SpecificationsProcessor Specifications
Processor Specifications
Ā 
Fifty Year Of Microprocessor
Fifty Year Of MicroprocessorFifty Year Of Microprocessor
Fifty Year Of Microprocessor
Ā 
Discrete Structures lecture 2
 Discrete Structures lecture 2 Discrete Structures lecture 2
Discrete Structures lecture 2
Ā 
Discrete Structures. Lecture 1
 Discrete Structures. Lecture 1  Discrete Structures. Lecture 1
Discrete Structures. Lecture 1
Ā 
Muslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-AstronomyMuslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-Astronomy
Ā 
Muslim Contributions in Geography
Muslim Contributions in GeographyMuslim Contributions in Geography
Muslim Contributions in Geography
Ā 
Muslim Contributions in Astronomy
Muslim Contributions in AstronomyMuslim Contributions in Astronomy
Muslim Contributions in Astronomy
Ā 
Ptcl modem (user manual)
Ptcl modem (user manual)Ptcl modem (user manual)
Ptcl modem (user manual)
Ā 
Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali
Ā 
Muslim Contributions in Mathematics
Muslim Contributions in MathematicsMuslim Contributions in Mathematics
Muslim Contributions in Mathematics
Ā 
Osi protocols
Osi protocolsOsi protocols
Osi protocols
Ā 

KĆ¼rzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
Ā 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
Ā 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
Ā 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
Ā 
WhatsApp 9892124323 āœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 āœ“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 āœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 āœ“Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Ā 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 

KĆ¼rzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Ā 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Ā 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Ā 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
WhatsApp 9892124323 āœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 āœ“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 āœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 āœ“Call Girls In Kalyan ( Mumbai ) secure service
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Ā 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 

Database , 4 Data Integration

  • 1. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/1 Outline ā€¢ Introduction ā€¢ Background ā€¢ Distributed Database Design ā€¢ Database Integration āž” Schema Matching āž” Schema Mapping ā€¢ Semantic Data Control ā€¢ Distributed Query Processing ā€¢ Multimedia Query Processing ā€¢ Distributed Transaction Management ā€¢ Data Replication ā€¢ Parallel Database Systems ā€¢ Distributed Object DBMS ā€¢ Peer-to-Peer Data Management ā€¢ Web Data Management ā€¢ Current Issues
  • 2. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/2 Problem Definition ā€¢ Given existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS) āž” GCS is also called mediated schema ā€¢ Bottom-up design process
  • 3. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/3 Integration Alternatives ā€¢ Physical integration āž” Source databases integrated and the integrated database is materialized āž” Data warehouses ā€¢ Logical integration āž” Global conceptual schema is virtual and not materialized āž” Enterprise Information Integration (EII)
  • 4. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/4 Data Warehouse Approach
  • 5. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/5 Bottom-up Design ā€¢ GCS (also called mediated schema) is defined first āž” Map LCSs to this schema āž” As in data warehouses ā€¢ GCS is defined as an integration of parts of LCSs āž” Generate GCS and map LCSs to this GCS
  • 6. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/6 GCS/LCS Relationship ā€¢ Local-as-view āž” The GCS definition is assumed to exist, and each LCS is treated as a view definition over it ā€¢ Global-as-view āž” The GCS is defined as a set of views over the LCSs
  • 7. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/7 Database Integration Process
  • 8. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/8 Recall Access Architecture
  • 9. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/9 Database Integration Issues ā€¢ Schema translation āž” Component database schemas translated to a common intermediate canonical representation ā€¢ Schema generation āž” Intermediate schemas are used to create a global conceptual schema
  • 10. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/10 Schema Translation ā€¢ What is the canonical data model? āž” Relational āž” Entity-relationship āœ¦ DIKE āž” Object-oriented āœ¦ ARTEMIS āž” Graph-oriented āœ¦ DIPE, TranScm, COMA, Cupid āœ¦ Preferable with emergence of XML āœ¦ No common graph formalism ā€¢ Mapping algorithms āž” These are well-known
  • 11. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/11 Schema Generation ā€¢ Schema matching āž” Finding the correspondences between multiple schemas ā€¢ Schema integration āž” Creation of the GCS (or mediated schema) using the correspondences ā€¢ Schema mapping āž” How to map data from local databases to the GCS ā€¢ Important: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCS
  • 12. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/12 Running Example EMP(ENO, ENAME, TITLE) PROJ(PNO, PNAME, BUDGET, LOC, CNAME) ASG(ENO, PNO, RESP, DUR) PAY(TITLE, SAL) Relational E-R Model
  • 13. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/13 Schema Matching ā€¢ Schema heterogeneity āž” Structural heterogeneity āœ¦ Type conflicts āœ¦ Dependency conflicts āœ¦ Key conflicts āœ¦ Behavioral conflicts āž” Semantic heterogeneity āœ¦ More important and harder to deal with āœ¦ Synonyms, homonyms, hypernyms āœ¦ Different ontology āœ¦ Imprecise wording
  • 14. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/14 Schema Matching (contā€™d) ā€¢ Other complications āž” Insufficient schema and instance information āž” Unavailability of schema documentation āž” Subjectivity of matching ā€¢ Issues that affect schema matching āž” Schema versus instance matching āž” Element versus structure level matching āž” Matching cardinality
  • 15. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/15 Schema Matching Approaches
  • 16. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/16 Linguistic Schema Matching ā€¢ Use element names and other textual information (textual descriptions, annotations) ā€¢ May use external sources (e.g., Thesauri) ā€¢ 怈SC1.element-1 ā‰ˆ SC2.element-2, p,s怉 āž” Element-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p holds with a similarity value of s ā€¢ Schema level āž” Deal with names of schema elements āž” Handle cases such as synonyms, homonyms, hypernyms, data type similarities ā€¢ Instance level āž” Focus on information retrieval techniques (e.g., word frequencies, key terms) āž” ā€œDeduceā€ similarities from these
  • 17. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/17 Linguistic Matchers ā€¢ Use a set of linguistic (terminological) rules ā€¢ Basic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet) ā€¢ Predicate p and similarity value s āž” hand-crafted ā‡’ specified, āž” discovered ā‡’ may be computed or specified by an expert after discovery ā€¢ Examples āž” 怈uppercase names ā‰ˆ lower case names, true, 1.0怉 āž” 怈uppercase names ā‰ˆ capitalized names, true, 1.0怉 āž” 怈capitalized names ā‰ˆ lower case names, true, 1.0怉 āž” 怈DB1.ASG ā‰ˆ DB2.WORKS_IN, true, 0.8怉
  • 18. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/18 Automatic Discovery of Name Similarities ā€¢ Affixes āž” Common prefixes and suffixes between two element name strings ā€¢ N-grams āž” Comparing how many substrings of length n are common between the two name strings ā€¢ Edit distance āž” Number of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the other ā€¢ Soundex code āž” Phonetic similarity between names based on their soundex codes ā€¢ Also look at data types āž” Data type similarity may suggest stronger relationship than the computed similarity using these methods or to differentiate between multiple strings with same value
  • 19. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/19 N-gram Example ā€¢ 3-grams of string ā€œResponsibilityā€ are the following: ļ¬Res ļ¬ sib ļ¬ibi ļ¬ esp ļ¬bip ļ¬ spo ļ¬ili ļ¬ pon ļ¬lit ļ¬ ons ļ¬ity ļ¬ nsi ā€¢ 3-grams of string ā€œRespā€ are āž” Res āž” esp ā€¢ 3-gram similarity: 2/12 = 0.17
  • 20. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/20 Edit Distance Example ā€¢ Again consider ā€œResponsibilityā€ and ā€œRespā€ ā€¢ To convert ā€œResponsibilityā€ to ā€œRespā€ āž” Delete characters ā€œoā€, ā€œnā€, ā€œsā€, ā€œiā€, ā€œbā€, ā€œiā€, ā€œlā€, ā€œiā€, ā€œtā€, ā€œyā€ ā€¢ To convert ā€œRespā€ to ā€œResponsibilityā€ āž” Add characters ā€œoā€, ā€œnā€, ā€œsā€, ā€œiā€, ā€œbā€, ā€œiā€, ā€œlā€, ā€œiā€, ā€œtā€, ā€œyā€ ā€¢ The number of edit operations required is 10 ā€¢ Similarity is 1 āˆ’ (10/14) = 0.29
  • 21. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/21 Constraint-based Matchers ā€¢ Data always have constraints ā€“ use them āž” Data type information āž” Value ranges āž” ā€¦ ā€¢ Examples āž” RESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low) āž” If they come from the same domain, this may increase their similarity value āž” ENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-R āž” ENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRING
  • 22. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/22 Constraint-based Structural Matching ā€¢ If two schema elements are structurally similar, then there is a higher likelihood that they represent the same concept ā€¢ Structural similarity: āž” Same properties (attributes) āž” ā€œNeighborhoodā€ similarity āœ¦ Using graph representation āœ¦ The set of nodes that can be reached within a particular path length from a node are the neighbors of that node āœ¦ If two concepts (nodes) have similar set of neighbors, they are likely to represent the same concept
  • 23. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/23 Learning-based Schema Matching ā€¢ Use machine learning techniques to determine schema matches ā€¢ Classification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar concepts ā€¢ Similarity is defined according to features of data instances ā€¢ Classification is ā€œlearnedā€ from a training set
  • 24. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/24 Learning-based Schema Matching
  • 25. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/25 Combined Schema Matching Approaches ā€¢ Use multiple matchers āž” Each matcher focuses on one area (name, etc) ā€¢ Meta-matcher integrates these into one prediction ā€¢ Integration may be simple (take average of similarity values) or more complex (see Faginā€™s work)
  • 26. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/26 Schema Integration ā€¢ Use the correspondences to create a GCS ā€¢ Mainly a manual process, although rules can help
  • 27. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/27 Binary Integration Methods
  • 28. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/28 N-ary Integration Methods
  • 29. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/29 Schema Mapping ā€¢ Mapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target. ā€¢ Data warehouses ā‡’ actual translation ā€¢ Data integration systems ā‡’ discover mappings that can be used in the query processing phase ā€¢ Mapping creation ā€¢ Mapping maintenance
  • 30. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/30 Mapping Creation Given āž” A source LCS āž” A target GCS āž” A set of value correspondences discovered during schema matching phase Produce a set of queries that, when executed, will create GCS data instances from the source data. We are looking, for each Tk, a query Qk that is defined on a (possibly proper) subset of the relations in S such that, when executed, will generate data for Ti from the source relations
  • 31. Distributed DBMS Ā© M. T. Ɩzsu & P. Valduriez Ch.4/31 Mapping Creation Algorithm General idea: ā€¢ Consider each Tk in turn. Divide Vk into subsets such that each specifies one possible way that values of Tk can be computed. ā€¢ Each can be mapped to a query that, when executed, would generate some of Tkā€™s data. ā€¢ Union of these queries gives