SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Mustafa Jarrar
Lecture Notes, Web Data Management (MCOM7348)
University of Birzeit, Palestine
1st Semester, 2013

Data Schema Integration

Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Jarrar © 2013

1
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2013/11/web-data-management.html

Jarrar © 2013

2
Data Schema Integration: A simple example
In ORM:
bornIn/

locatedIn/

/WorksIn

Employee

City

Region

Integrated
schema

locatedIn/

Organization

/WorksIn

Employee

Organization

Municipality
locatedIn/

bornIn/
Worker

City
Schema 2

Region
Organization

locatedIn
/

Schema 3

Schema 1
Jarrar © 2013

3
Data Schema Integration: A simple example
Source: Carlo Batini

In ER:
Employee

born

City

in

works
Organization

Region

Integrated
schema

in

Employee
works
Organization

Empoloyee

born

City

in

Schema 2

Schema 1

Region

Municipality

Organization

in

Schema 3
Jarrar © 2013

4
Challenges of Data Schema Integration
Source: Carlo Batini

Schema Integration has two major challenges:
1.  Identification of all portions of schemas that pertain to the
same concept, in such a way to unify such different
representations in the global schema.
2.  Identification, analysis and resolution of the different
types of conflicts (heterogeneities) in different schemas.

Jarrar © 2013

5
Framework for Schema Integration
Local
Schemas

Schemas
Transformation

Transformation
Rules

Schemas
Matching

Matching
Rules

Schemas
Integration

Integration
Rules

Integrated Schema
and mappings

Source: Advances in Object-Oriented Data Modeling, M. P. Papazoglou, S. Spaccapietra, Z. Tari (Eds.), The MIT Press, 2000

Jarrar © 2013

6
Framework for Schema Integration
0. Define the integration strategy
If the number of local schemas to be integrated is large, the order of
schema integration becomes important. Several strategies can be
adopted.
Input: n source schemas
Output: n source schemas + integration strategies
Method used: heuristics
S1

S2

S3

S1

S2

S3 S4

One shot strategy
- Efficient integration
process

S3 S4

S2
IS1

IS1
IS

S1

IS2

IS2
IS

…

IS
Pair at a time strategy
- Priority to most relevant
and stable schemas.

- The integration process is
- Many correspondences
more efficient
between concepts have to
be considered together.
Jarrar © 2013

Balanced Strategy
-e.g.: Production, Marketing,
Sales.
-To be preferred when the
cohesion among schemas is high.
7
Framework for Schema Integration
Source: Stefano Spaccapietra

1. Schema transformation (or Pre-integration)
Input: n source schemas
Output: n source schemas homogeneized
Methods used: Model and Design Homogeneization

Reduce model heterogeneities as much as possible to make
the sources more suitable for integration.
Goal: use a single, common data model and format.
Transformation

Source DBs

Integration

Homogeneized DBs
Jarrar © 2013

DW
8
Schema Transformation
Schema Transformation involves:
•  Data model homogeneization
–  Where all data sources are described using the same data model.

•  Design homogeneization
–  Enforce standard design rules to reduce the number of structural
conflicts (e.g., Normalization: one fact in one place)

•  Reverse Engineering
–  Reverse engineer the schema from existing data (such as COBOL
files, spreadsheets, legacy relational databases, legacy objectoriented databases).

Jarrar © 2013

9
Example of Design homogeneization (Normalization)

ONE TABLE:
R1 (#Student, Name, LastName, #Course, CourseName,
Grade, Date)
Dependencies:
–  #Student à Name, LastName
–  #Course à CourseName
–  #Student #Course à Grade, Date)

Normalized Into 3 Tables: One Fact In One Place:
R11 (#Student, Name, LastName)
R12 (#Course, CourseName)
R13 (#Student, #Course, Grade, Date)

Jarrar © 2013

10
Example of Reverse Engineering
Source: Stefano Spaccapietra

Jarrar © 2013

11
Schema Matching
2. Schema matching (Correspondences investigation)
Input: n source schemas
Output: n source schemas + correspondences
Method used: techniques to discover correspondences

Correspondences relate (schema) elements which describe
the same phenomena of the real world.
–  This step aims at finding and describing all semantic links between
elements of the input schemas and the corresponding data.
–  By doing so, one matches between the schemas to be integrated.
–  This step fixes the conflicts found in the schema.

Jarrar © 2013

12
Semantics of Correspondences
Source: Stefano Spaccapietra

Correspondences relate (schema) elements which describe
the same phenomena of the real world.

Jarrar © 2013

13
Asserting Correspondences
Source: Stefano Spaccapietra

Finding matching correspondences is done through the use of a
rich language for expressing correspondences (matchings).

Example:

S1.Person ≡ S2.Person,
With Corresponding Identifiers: Pin,
With Corresponding Property: name
Jarrar © 2013

14
Automated Matching
•  Fully automated matching is impossible, as a computer process can
hardly make ultimate decisions about the semantics of data.
•  But even partial assistance in discovering of correspondences (to be
confirmed or guided by humans) is beneficial, due to the complexity of
the task.
•  All proposed methods rely on some similarity measures that try to
evaluate the semantic distance between two descriptions.
Some state of the art matching systems
Cupid (Microsoft Research, USA)
FOAM/QOM (University of Karlsruhe, Germany)
OLA (INRIA Rhône-Alpes, France / University of Montreal, Canada)
S-Match (University of Trento, Italy)
… many others
Jarrar © 2013

15
Examples of Correspondences
Source: Stefano Spaccapietra

Jarrar © 2013

16
Examples of Correspondences
Previous example
Employee
/WorksIn

Municipality

locatedIn/
Organization

Organization

Schema 1

Worker

Schema 3

bornIn/

City

locatedIn/

Region

Schema 2
Jarrar © 2013

17
Examples of Correspondences
Source: Stefano Spaccapietra

Jarrar © 2013

18
Schema Integration & Mapping Generation
Source: Carlo Batini

3. Schemas integration and mapping generation
Input: n source schemas + correspondences
Output: integrated schema + mapping rules btw the integrated
schema and input source schemas
Method used: New classification of conflicts + Conflict resolution
transformations
GOAL: Creating an Integrated Schema ( IS ) and the mappings to the
local databases.

Jarrar © 2013

19
GAV and LAV Integration
Research has identified two methods to set up mappings between the
integrated schema and the input schemas:
(1)  GAV (Global As View): proposes to define the integrated schema
as a view over input schemas.
GAV is usually considered simpler and more efficient for processing
queries on the integrated database, but is weaker in supporting
evolution of the global system through addition of new sources.
(2)  LAV (Local As View): proposes to define the local schemas as
views over the integrated schema.
LAV generates issues of incomplete information, which adds
complexity in handling global queries, but it better supports dynamic
addition and removal of source.
Jarrar © 2013

20
Integration Process
After we identified the correspondences (in the previous
step), we now solve the conflicts:
One can distinguish between four types of conflicts:
–  Structural conflicts
–  Classification conflicts
–  Descriptive conflicts
–  Fragmentation conflicts

Examples of conflicts among related object types
–  different classifications (sets of instances)
–  different sets of properties
–  different structures
–  different coding schemes
–  …
Jarrar © 2013

21
Integration Rules
Rules defining the strategy to solve conflicts
Example rules:
–  If an class corresponds to an attribute, keep the class
–  If the population of a class is included in the population of another
class, build an is-a hierarchy

Integration rules depend on how you want the integrated
schema to look like

Jarrar © 2013

22
Structural Conflicts
Source: Stefano Spaccapietra

Different schema element types, e.g.: class, attribute, relationship

Library example:
–  S1 : Book is a class
–  S2 : books is an attribute of Author

S1

Conflict resolution :
Choose the less constraining structure
S2
–  Integrated Schema: Book is a class

Jarrar © 2013

23
Classification Conflicts
•  Corresponding elements describe different sets of real world objects
–  S1.Faculty CONTAINS S2.PhD-advisor

•  Conflict Resolution:
–  Generalization / Specialization hierarchy

S1

Faculty

Faculty

S2

Phd-advisor

Phd-advisor

–  Merging

Faculty
Jarrar © 2013

24
Descriptive Conflicts
Corresponding types have different properties, or corresponding
properties are described in different ways
Object / Entity / Relationship type:
–  Naming conflicts :
•  synonyms Node , Extremity
•  homonyms Highway (EU) , Highway (USA)

–  Composition conflicts : different attributes and methods
•  Employee ( E# , name , address )
•  Employee ( E# , position , salary , department )

Jarrar © 2013

25
Integration Methods: Manual
Source: Stefano Spaccapietra

First method: manual integration
“ do it yourself ”
a language
mapping
rules
integrated
schema

schemas

DBA

Easy to implement , Flexible
BUT
time consuming for the DBA
Jarrar © 2013

26
Integration Methods: Semi-Automatic
Second method : semi-automatic integration
“ tell me about the problem, I will try to fix it “

correspondences

mapping rules
TOOL
integrated
schema

schemas

DBA
Opens to visual CASE tools, integration servers
BUT knowledge acquisition can be painful
Jarrar © 2013

27
References and Acknowledgement
•  Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
•  Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
•  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.

Thanks to Anton Deik for helping me preparing this lecture

Jarrar © 2013

28

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schema
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Gestão Ágil de Dados com Enterprise Data Fabric
Gestão Ágil de Dados com Enterprise Data FabricGestão Ágil de Dados com Enterprise Data Fabric
Gestão Ágil de Dados com Enterprise Data Fabric
 
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...
 
Togaf notes
Togaf notesTogaf notes
Togaf notes
 
Approaching Data Quality
Approaching Data QualityApproaching Data Quality
Approaching Data Quality
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
 
Data governance
Data governanceData governance
Data governance
 
Enterprise architecture
Enterprise architectureEnterprise architecture
Enterprise architecture
 
1.1 Data Modelling - Part I (Understand Data Model).pdf
1.1 Data Modelling - Part I (Understand Data Model).pdf1.1 Data Modelling - Part I (Understand Data Model).pdf
1.1 Data Modelling - Part I (Understand Data Model).pdf
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 
Communicating Effectively with Data Visualization
Communicating Effectively with Data VisualizationCommunicating Effectively with Data Visualization
Communicating Effectively with Data Visualization
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodology
 
DMBOK and Data Governance
DMBOK and Data GovernanceDMBOK and Data Governance
DMBOK and Data Governance
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4j
 
Data Visualization with Tableau - by Knowledgebee Trainings
Data Visualization with Tableau - by Knowledgebee TrainingsData Visualization with Tableau - by Knowledgebee Trainings
Data Visualization with Tableau - by Knowledgebee Trainings
 
Diabetes Data Science
Diabetes Data ScienceDiabetes Data Science
Diabetes Data Science
 

Andere mochten auch

Database , 4 Data Integration
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data Integration
Ali Usman
 
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationPal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integration
Mustafa Jarrar
 
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataPal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddata
Mustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
Mustafa Jarrar
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data Mashups
Mustafa Jarrar
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDF
Mustafa Jarrar
 

Andere mochten auch (20)

Database , 4 Data Integration
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data Integration
 
Data integration
Data integrationData integration
Data integration
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Data Integration (ETL)
Data Integration (ETL)Data Integration (ETL)
Data Integration (ETL)
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationPal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integration
 
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataPal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddata
 
Jarrar: Zinnar
Jarrar: ZinnarJarrar: Zinnar
Jarrar: Zinnar
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query Language
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data Mashups
 
Jarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsJarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and Constraints
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDF
 
Jarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationJarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data Integration
 
Jarrar: Linked Data
Jarrar: Linked DataJarrar: Linked Data
Jarrar: Linked Data
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course Outline
 
Jarrar: Introduction to Data Integration
Jarrar: Introduction to Data IntegrationJarrar: Introduction to Data Integration
Jarrar: Introduction to Data Integration
 
Jarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and SolutionsJarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and Solutions
 
Jarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaJarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF Schema
 

Ähnlich wie Jarrar: Data Schema Integration

Mapping objects to_relational_databases
Mapping objects to_relational_databasesMapping objects to_relational_databases
Mapping objects to_relational_databases
Ivan Paredes
 
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Dr. Amarjeet Singh
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Object relationship mapping and hibernate
Object relationship mapping and hibernateObject relationship mapping and hibernate
Object relationship mapping and hibernate
Joe Jacob
 

Ähnlich wie Jarrar: Data Schema Integration (20)

Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Jarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
Data Modeling.docx
Data Modeling.docxData Modeling.docx
Data Modeling.docx
 
Data models
Data modelsData models
Data models
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Mapping objects to_relational_databases
Mapping objects to_relational_databasesMapping objects to_relational_databases
Mapping objects to_relational_databases
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
 
Unified modeling language
Unified modeling languageUnified modeling language
Unified modeling language
 
Comparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseComparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented Database
 
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
 
OOM Unit I - III.pdf
OOM Unit I - III.pdfOOM Unit I - III.pdf
OOM Unit I - III.pdf
 
ITB - UNIT 3.pdf
ITB - UNIT 3.pdfITB - UNIT 3.pdf
ITB - UNIT 3.pdf
 
DBMS-Unit-1.pptx
DBMS-Unit-1.pptxDBMS-Unit-1.pptx
DBMS-Unit-1.pptx
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERSMAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
 
Object oriented analysis and design unit- iv
Object oriented analysis and design unit- ivObject oriented analysis and design unit- iv
Object oriented analysis and design unit- iv
 
Object relationship mapping and hibernate
Object relationship mapping and hibernateObject relationship mapping and hibernate
Object relationship mapping and hibernate
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
chapter 2-DATABASE SYSTEM CONCEPTS AND architecture [Autosaved].pdf
chapter 2-DATABASE SYSTEM CONCEPTS AND architecture [Autosaved].pdfchapter 2-DATABASE SYSTEM CONCEPTS AND architecture [Autosaved].pdf
chapter 2-DATABASE SYSTEM CONCEPTS AND architecture [Autosaved].pdf
 
Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdf
 
DBMS-2.pptx
DBMS-2.pptxDBMS-2.pptx
DBMS-2.pptx
 

Kürzlich hochgeladen

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Kürzlich hochgeladen (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

Jarrar: Data Schema Integration

  • 1. Mustafa Jarrar Lecture Notes, Web Data Management (MCOM7348) University of Birzeit, Palestine 1st Semester, 2013 Data Schema Integration Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2013 1
  • 2. Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html Jarrar © 2013 2
  • 3. Data Schema Integration: A simple example In ORM: bornIn/ locatedIn/ /WorksIn Employee City Region Integrated schema locatedIn/ Organization /WorksIn Employee Organization Municipality locatedIn/ bornIn/ Worker City Schema 2 Region Organization locatedIn / Schema 3 Schema 1 Jarrar © 2013 3
  • 4. Data Schema Integration: A simple example Source: Carlo Batini In ER: Employee born City in works Organization Region Integrated schema in Employee works Organization Empoloyee born City in Schema 2 Schema 1 Region Municipality Organization in Schema 3 Jarrar © 2013 4
  • 5. Challenges of Data Schema Integration Source: Carlo Batini Schema Integration has two major challenges: 1.  Identification of all portions of schemas that pertain to the same concept, in such a way to unify such different representations in the global schema. 2.  Identification, analysis and resolution of the different types of conflicts (heterogeneities) in different schemas. Jarrar © 2013 5
  • 6. Framework for Schema Integration Local Schemas Schemas Transformation Transformation Rules Schemas Matching Matching Rules Schemas Integration Integration Rules Integrated Schema and mappings Source: Advances in Object-Oriented Data Modeling, M. P. Papazoglou, S. Spaccapietra, Z. Tari (Eds.), The MIT Press, 2000 Jarrar © 2013 6
  • 7. Framework for Schema Integration 0. Define the integration strategy If the number of local schemas to be integrated is large, the order of schema integration becomes important. Several strategies can be adopted. Input: n source schemas Output: n source schemas + integration strategies Method used: heuristics S1 S2 S3 S1 S2 S3 S4 One shot strategy - Efficient integration process S3 S4 S2 IS1 IS1 IS S1 IS2 IS2 IS … IS Pair at a time strategy - Priority to most relevant and stable schemas. - The integration process is - Many correspondences more efficient between concepts have to be considered together. Jarrar © 2013 Balanced Strategy -e.g.: Production, Marketing, Sales. -To be preferred when the cohesion among schemas is high. 7
  • 8. Framework for Schema Integration Source: Stefano Spaccapietra 1. Schema transformation (or Pre-integration) Input: n source schemas Output: n source schemas homogeneized Methods used: Model and Design Homogeneization Reduce model heterogeneities as much as possible to make the sources more suitable for integration. Goal: use a single, common data model and format. Transformation Source DBs Integration Homogeneized DBs Jarrar © 2013 DW 8
  • 9. Schema Transformation Schema Transformation involves: •  Data model homogeneization –  Where all data sources are described using the same data model. •  Design homogeneization –  Enforce standard design rules to reduce the number of structural conflicts (e.g., Normalization: one fact in one place) •  Reverse Engineering –  Reverse engineer the schema from existing data (such as COBOL files, spreadsheets, legacy relational databases, legacy objectoriented databases). Jarrar © 2013 9
  • 10. Example of Design homogeneization (Normalization) ONE TABLE: R1 (#Student, Name, LastName, #Course, CourseName, Grade, Date) Dependencies: –  #Student à Name, LastName –  #Course à CourseName –  #Student #Course à Grade, Date) Normalized Into 3 Tables: One Fact In One Place: R11 (#Student, Name, LastName) R12 (#Course, CourseName) R13 (#Student, #Course, Grade, Date) Jarrar © 2013 10
  • 11. Example of Reverse Engineering Source: Stefano Spaccapietra Jarrar © 2013 11
  • 12. Schema Matching 2. Schema matching (Correspondences investigation) Input: n source schemas Output: n source schemas + correspondences Method used: techniques to discover correspondences Correspondences relate (schema) elements which describe the same phenomena of the real world. –  This step aims at finding and describing all semantic links between elements of the input schemas and the corresponding data. –  By doing so, one matches between the schemas to be integrated. –  This step fixes the conflicts found in the schema. Jarrar © 2013 12
  • 13. Semantics of Correspondences Source: Stefano Spaccapietra Correspondences relate (schema) elements which describe the same phenomena of the real world. Jarrar © 2013 13
  • 14. Asserting Correspondences Source: Stefano Spaccapietra Finding matching correspondences is done through the use of a rich language for expressing correspondences (matchings). Example: S1.Person ≡ S2.Person, With Corresponding Identifiers: Pin, With Corresponding Property: name Jarrar © 2013 14
  • 15. Automated Matching •  Fully automated matching is impossible, as a computer process can hardly make ultimate decisions about the semantics of data. •  But even partial assistance in discovering of correspondences (to be confirmed or guided by humans) is beneficial, due to the complexity of the task. •  All proposed methods rely on some similarity measures that try to evaluate the semantic distance between two descriptions. Some state of the art matching systems Cupid (Microsoft Research, USA) FOAM/QOM (University of Karlsruhe, Germany) OLA (INRIA Rhône-Alpes, France / University of Montreal, Canada) S-Match (University of Trento, Italy) … many others Jarrar © 2013 15
  • 16. Examples of Correspondences Source: Stefano Spaccapietra Jarrar © 2013 16
  • 17. Examples of Correspondences Previous example Employee /WorksIn Municipality locatedIn/ Organization Organization Schema 1 Worker Schema 3 bornIn/ City locatedIn/ Region Schema 2 Jarrar © 2013 17
  • 18. Examples of Correspondences Source: Stefano Spaccapietra Jarrar © 2013 18
  • 19. Schema Integration & Mapping Generation Source: Carlo Batini 3. Schemas integration and mapping generation Input: n source schemas + correspondences Output: integrated schema + mapping rules btw the integrated schema and input source schemas Method used: New classification of conflicts + Conflict resolution transformations GOAL: Creating an Integrated Schema ( IS ) and the mappings to the local databases. Jarrar © 2013 19
  • 20. GAV and LAV Integration Research has identified two methods to set up mappings between the integrated schema and the input schemas: (1)  GAV (Global As View): proposes to define the integrated schema as a view over input schemas. GAV is usually considered simpler and more efficient for processing queries on the integrated database, but is weaker in supporting evolution of the global system through addition of new sources. (2)  LAV (Local As View): proposes to define the local schemas as views over the integrated schema. LAV generates issues of incomplete information, which adds complexity in handling global queries, but it better supports dynamic addition and removal of source. Jarrar © 2013 20
  • 21. Integration Process After we identified the correspondences (in the previous step), we now solve the conflicts: One can distinguish between four types of conflicts: –  Structural conflicts –  Classification conflicts –  Descriptive conflicts –  Fragmentation conflicts Examples of conflicts among related object types –  different classifications (sets of instances) –  different sets of properties –  different structures –  different coding schemes –  … Jarrar © 2013 21
  • 22. Integration Rules Rules defining the strategy to solve conflicts Example rules: –  If an class corresponds to an attribute, keep the class –  If the population of a class is included in the population of another class, build an is-a hierarchy Integration rules depend on how you want the integrated schema to look like Jarrar © 2013 22
  • 23. Structural Conflicts Source: Stefano Spaccapietra Different schema element types, e.g.: class, attribute, relationship Library example: –  S1 : Book is a class –  S2 : books is an attribute of Author S1 Conflict resolution : Choose the less constraining structure S2 –  Integrated Schema: Book is a class Jarrar © 2013 23
  • 24. Classification Conflicts •  Corresponding elements describe different sets of real world objects –  S1.Faculty CONTAINS S2.PhD-advisor •  Conflict Resolution: –  Generalization / Specialization hierarchy S1 Faculty Faculty S2 Phd-advisor Phd-advisor –  Merging Faculty Jarrar © 2013 24
  • 25. Descriptive Conflicts Corresponding types have different properties, or corresponding properties are described in different ways Object / Entity / Relationship type: –  Naming conflicts : •  synonyms Node , Extremity •  homonyms Highway (EU) , Highway (USA) –  Composition conflicts : different attributes and methods •  Employee ( E# , name , address ) •  Employee ( E# , position , salary , department ) Jarrar © 2013 25
  • 26. Integration Methods: Manual Source: Stefano Spaccapietra First method: manual integration “ do it yourself ” a language mapping rules integrated schema schemas DBA Easy to implement , Flexible BUT time consuming for the DBA Jarrar © 2013 26
  • 27. Integration Methods: Semi-Automatic Second method : semi-automatic integration “ tell me about the problem, I will try to fix it “ correspondences mapping rules TOOL integrated schema schemas DBA Opens to visual CASE tools, integration servers BUT knowledge acquisition can be painful Jarrar © 2013 27
  • 28. References and Acknowledgement •  Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. •  Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. •  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture Jarrar © 2013 28