SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
Clio: Schema Mapping Creation and
              Data Exchange

                            Presented by
                            Leila Jalali




Information Systems Group                  Candidacy Exam, Jan. 2010
the Clio project
                                     •Wants data from S
                                     •Understands T
                                     •May not understand S            Q


    Source
                            Schema Mapping                     Target
                                                               schema T
    schema S


“conforms to”                                                “conforms to”

                               Data Exchange
        data
                              to transform data


 Clio addresses two main problems:
   How to generate schema mappings and how to use them for data exchange?
                                                                exchange



Information Systems Group                                      Leila Jalali, Candidacy Exam
Outline
 The Motivating Example
2. Schema Mapping Generation
        Mapping generation algorithm
2. Data Exchange
            Query generation algorithm
 Conclusions




 Information Systems Group                Leila Jalali, Candidacy Exam
A Motivating Example
Schema S:
       Companies: Set of Rcd                Schema T:
           Name                 v1                Organizations: Set of Rcd
           Address                                  Code
           Year                                     Year
 f1
                                                    Fundings: Set of Rcd
       Grants : Set of Rcd       v2                          FId
           Gid                                               FinId
           Recipient
                                                                               f4
           Amount                                 Finances: Set of Rcd
                                v3
           Supervisor                               FinId
  f2       Manager                                  Budget
       f3                                           Phone
        Contacts : Set of Rcd
                                v4    Correspondences
           Cid
                                      (given by a "schema matcher“ or
           Email
                                      a“user”)
           Phone

  Information Systems Group                                   Leila Jalali, Candidacy Exam
Correspondences
   Companies                                           Using tuple generating dependency(tgd):
       Name              v1        Organizations
       Address                        Code                   ∀n,d,y Companies(n,d,y) →
                                                       v1:
                                                                ∃y',F Organizations(n,y',F))
       Year                           Year
f1 Grants
                                      Fundings
       Gid                v2                   FId
       Recipient                               FinId
       Amount
                                                               foreach c in companies
f2     Supervisor        v3        Finances      f4
  f3                                                           exists o in organizations,
       Manager                        FinId
   Contacts                           Budget                   with    o.code = c.name
       Cid                            Phone
       Email
       Phone
                        v4




       Information Systems Group                                                Leila Jalali, Candidacy Exam
More complex mappings
   Companies                                           ∀n,d,y,g,a,s,m Companies(n,d,y),
       Name              v1        Organizations               Grants(g,n,a,s,m) →
       Address                        Code              ∃y',F,f, p
       Year                           Year
f1 Grants                                                      Organizations(n,y',F)),
                                      Fundings
                          v2                              F(g,f),
       Gid                                     FId
       Recipient                               FinId           Finances(f,a,p)
       Amount
                                                       foreach c in companies, g in grants
f2     Supervisor        v3        Finances      f4
  f3                                                            where c.name=g.recipient
       Manager                        FinId            exists o in organizations,
   Contacts                           Budget                    f in o.fundings,
       Cid                            Phone                     i in finances
       Email                                                    where f.finId = i.finId
                        v4
       Phone                                           with     o.code = c.name
                                                          and f.fId = g.gId
                                                          and i.budget = g.amount


       Information Systems Group                                           Leila Jalali, Candidacy Exam
More complex mappings
   Companies                                                ∀n,d,y,g,a,s,m Companies(n,d,y),
       Name              v1        Organizations                    Grants(g,n,a,s,m) →
       Address                        Code                   ∃y',F,f, p
       Year                           Year
f1 Grants                                                           Organizations(n,y',F)),
                                      Fundings
                          v2                                   F(g,f),
       Gid                                    FId
       Recipient                              FinId                 Finances(f,a,p)
       Amount
                                                            foreach c in companies, g in grants
f2     Supervisor        v3        Finances      f4
  f3                                                                 where c.name=g.recipient
       Manager                        FinId                 exists o in organizations,
   Contacts                           Budget                         f in o.fundings,
       Cid                            Phone                          i in finances
       Email                                                         where f.finId = i.finId
                        v4
       Phone                                query on the    with     o.code = c.name
                                            source:QS          and f.fId = g.gId
                                                               and i.budget = g.amount
                                                                                        query on the
                                             Correspondences          QS  QT           target: QT
       Information Systems Group                                                 Leila Jalali, Candidacy Exam
Outline
  The Motivating Example
 2. Schema Mapping Generation
           Mapping generation algorithm
 2. Data Exchange
              Query generation algorithm
  Conclusions




Information Systems Group                   Leila Jalali, Candidacy Exam
Mapping Generation
Source Schema                 Generate all possible associations within the Source
                                         Structural Associations
Target Schema                 Generate all possible associations within the Target




  Information Systems Group                                              Leila Jalali, Candidacy Exam
Mapping Generation
Source Schema                 Generate all possible associations within the Source
                                           Structural Associations
Target Schema                 Generate all possible associations within the Target

       Companies:
           Name                                      Organizations:
      f1 Address        from p in companies             Code
           Year                                         Year             from o in organizations
       Grants:          from g in grants                Fundings:
           Gid                                                   FId
                                                                  f4
     f2    Recipient                                             FinId
     f3                                              Finances:
           Amount
           Supervisor                                   FinId
           Manager                                      Budget
       Contacts:                                        Phone
           Cid
           Email
  Information Systems Group                                                   Leila Jalali, Candidacy Exam
Mapping Generation
Source Schema                 Generate all possible associations within the Source
                                         Structural Associations
Target Schema                 Generate all possible associations within the Target
           Logical Associations

                 Build larger associaitons in Source (AS) and Target (AT)




  Information Systems Group                                              Leila Jalali, Candidacy Exam
Mapping Generation
Source Schema                    Generate all possible associations within the Source
                                               Structural Associations
Target Schema                    Generate all possible associations within the Target
              Logical Associations

                    Build larger associaitons in Source (AS) and Target (AT)
 Companies:
     Name          starting with a structural association and "chasing" constraints
f1 Address
                    AS :
     Year
 Grants:
     Gid
f2   Recipient
f3   Amount
     Supervisor
     Manager
 Contacts:
     Information Systems Group                                                        Leila Jalali, Candidacy Exam
Mapping Generation
Source Schema                 Generate all possible associations within the Source
                                              Structural Associations
Target Schema                 Generate all possible associations within the Target
           Logical Associations

                 Build larger associaitons in Source (AS) and Target (AT)

     Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> to generate a
                        Clio Mapping: foreach AS exists AT with W
              W is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences)




  Information Systems Group                                                            Leila Jalali, Candidacy Exam
Clio mapping, example
                                                       Generate a Clio Mapping: foreach AS exists AT with W
   Companies
                                                            W is the conjunction of equalities h (eS )=h’(eT )
       Name              v1        Organizations
       Address                        Code                 AS : from g in grants, c in companies,
       Year                           Year                          s in contacts, m in contacts
f1 Grants                                                       where g.recipient = c.name
                                      Fundings
       Gid                v2                   FId
                                                                   and g.supervisor = s.cid
       Recipient                                                   and g.manager = m.cid
                                               FinId
       Amount                                              AT: from o in organizations,
f2     Supervisor        v3        Finances      f4               f in o.fundings, i in finances
  f3   Manager                        FinId                     where f.finId = i.finId
   Contacts                           Budget
       Cid                            Phone                     v1, v2, v3 are covered
       Email
       Phone
                        v4foreach g in grants, c in companies, s in contacts, m in contacts
                              where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid
                          exists o in organizations, f in o.fundings, i in finances
                              where f.finId = i.finId
                          with c.name = o.code and g.gId = f. fId and g.amount = i.budget
       Information Systems Group                                                               Leila Jalali, Candidacy Exam
Dominance
 A2 dominates A1 (A1 ≤ A2 ) if
    the from and where clauses of A1 are subsets of those of A2 (after
      suitable renaming)

        A2 : from g in grants, c in companies, s in contacts, m in contacts
               where g.recipient = c.name and
                        g.supervisor = s.cid and
                        g.manager = m.cid

        A1 : from g in grants, c in companies
               where g.recipient = c.name




 Information Systems Group                                           Leila Jalali, Candidacy Exam
Coverage of a coresspondence
 A correspondence    v : foreach PS exists PT with eS=eT
  is covered by a pair of associations <AS , AT> if PS ≤ AS and PT ≤ AT
  with some renaming h, h’

                 AS : from c in companies      v: foreach c in companies
Example:         AT : fom o in organizations   exists o in organizations
                                               with c.name = o.code




 Information Systems Group                                         Leila Jalali, Candidacy Exam
Mapping Generation
Source Schema                 Generate all possible associations within the Source
                                              Structural Associations
Target Schema                 Generate all possible associations within the Target
           Logical Associations

                 Build larger associaitons in Source (AS) and Target (AT)

    Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> and generate a
                       Clio Mapping: foreach AS exists AT with W
              W is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences)




  Information Systems Group                                                            Leila Jalali, Candidacy Exam
Mapping Generation
Source Schema                    Generate all possible associations within the Source
                                              Structural Associations
Target Schema                    Generate all possible associations within the Target
           Logical Associations

                 Build larger associaitons in Source (AS) and Target (AT)

    Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> and generate a
                       Clio Mapping: foreach AS exists AT with W
              W is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences)


                              Add the Clio Mapping to the Set of Mappings

                                            the Set of Mappings



  Information Systems Group                                                            Leila Jalali, Candidacy Exam
Logical associations are meaningful
                              combinations of correspondences


                            Finds maximal sets of correspondences
                               that can be interpreted together



                                     Discard the “larger” mapping


                                     Generate a Clio mapping




Information Systems Group                       Leila Jalali, Candidacy Exam
Outline
  The Motivating Example
 1. Schema Mapping Generation
           Mapping generation algorithm
 2. Data Exchange
              Query generation algorithm
  Conclusions




Information Systems Group                   Leila Jalali, Candidacy Exam
Query generation for data exchange
                             Mapping
                            generation
      Source                             Target
      schema                             schema

                              Query
                            generation




Information Systems Group                Leila Jalali, Candidacy Exam
Overview of Query Generation

         Input: A Clio Mapping

                                                                                             x 0.name
1. Query Graph is constructed which represents                         y 0 (organizations)
the key portions of the query in the graph                                                              x 0.name

                                                 x1. amount, x1.gid,
                                                  x 0.name,
                                                                                                   y 0.year
2. Annotate the graph to generate Skolem terms          y 1(fundings)
                                                                                       x 0.name

                                                                                      y 0 .code
                                                   x1.gid
                                                                          x 0.name, x1.gid
3. Traverse the graph and produce the query         y 0.fid               y 0.finId


                                                      x1. gid
  Output: the data exchange Query
    (in SQL, XQuery, or XSLT)

   Information Systems Group                                                  Leila Jalali, Candidacy Exam
1. Constructing the Query Graph



Adding a node for each variable in the exists clause

                             y0 (organizations)        y2(finances)



     y1(fundings)




 Information Systems Group                                     Leila Jalali, Candidacy Exam
1. Constructing the Query Graph (cont.)
                                                                               Organizations:
                                                                                   Code
                                                                                   Year
                                                                                   Fundings:
                                                                                             FId
                                                                                                   f4
Adding nodes for all the atomic type elements reachable from these                           FinId
nodes via record projection                                                    Finances
                                                                                     FinId
                                 y0 (organizations)                  y2(finances) Budget
                                                                                     Phone

  y1(fundings)                                                                         y2.phone
                                y0.code        y0.year   y2.finId
                                                                     y2.budget

    y1.fid           y1.finId




  Information Systems Group                                               Leila Jalali, Candidacy Exam
1. Constructing the Query Graph (cont.)
                                                                              Organizations:
                                                                                  Code
                                                                                  Year
                                                                                  Fundings:
                                                                                            FId
Add structural edges to reflect the relationships between nodes                             FinId
                                                                              Finances
                                                                                    FinId
                                 y0 (organizations)                 y2(finances) Budget
                                                                                    Phone

  y1(fundings)                                                                        y2.phone
                                y0.code        y0.year   y2.finId
                                                                    y2.budget

    y1.fid           y1.finId




  Information Systems Group                                              Leila Jalali, Candidacy Exam
1. Constructing the Query Graph (cont.)



Add the source nodes for all source expressions in the with clause


                                 y0 (organizations)                   y2(finances)



  y1(fundings)                                                                          y2.phone
                                y0.code        y0.year   y2.finId
                                                                      y2.budget

    y1.fid           y1.finId      x0.name
                                                                                        x2.phone
                                                                    x1.amount
      x1. gid
  Information Systems Group                                                Leila Jalali, Candidacy Exam
1. Constructing the Query Graph (cont.)



Attach the source nodes to the target nodes to which they are “equal”


                                y0 (organizations)                   y2(finances)



  y1(fundings)                                                                         y2.phone
                               y0.code        y0.year   y2.finId
                                                                     y2.budget

   y1.fid           y1.finId      x0.name
                                                                                       x2.phone
                                                                   x1.amount
     x1. gid
 Information Systems Group                                                Leila Jalali, Candidacy Exam
1. Constructing the Query Graph (cont.)



Use the equalities in the where clause to add edges between target nodes


                                 y0 (organizations)                   y2(finances)



  y1(fundings)                                                                          y2.phone
                                y0.code        y0.year   y2.finId
                                                                      y2.budget

    y1.fid           y1.finId      x0.name
                                                                                        x2.phone
                                                                    x1.amount
      x1. gid
  Information Systems Group                                                Leila Jalali, Candidacy Exam
2. Annotating the Graph

Each node is annotated with a set of source expressions
Upward propagation: Every expression that a node acquires is propagated
to its parent node, unless the (acquiring) node is a variable.



                                     y0 (organizations)                   y2(finances)
                                                                                             x 2.phone
                                x 0.name
                                                                           x 1.amount       y2.phone
  y1(fundings)                     y0.code         y0.year   y2.finId
                                                                          y2.budget
x1.gid

    y1.fid           y1.finId          x0.name
                                                                                            x2.phone
                                                                        x1.amount
         x1. gid
  Information Systems Group                                                    Leila Jalali, Candidacy Exam
2. Annotating the Graph (cont.)
Downward propagation: Every expression that a node acquires is
propagated to its children




                                           x 0.name
                                                                                x 1.amount, x 2.phone

                                     y0 (organizations)                      y2(finances)
                                                                                                x 2.phone
  x1.gid
                                x 0.name
                                                                              x 1.amount       y2.phone
  y1(fundings)                     y0.code            y0.year   y2.finId
                                                                             y2.budget
x1.gid                 x 0.name

    y1.fid           y1.finId          x0.name
                                                                                               x2.phone
                                                                           x1.amount
         x1. gid
  Information Systems Group                                                       Leila Jalali, Candidacy Exam
2. Annotating the Graph (cont.)
Eq. propagation: Every expression that a node acquires is propagated to
the nodes related to it through equality edges.




                                              x 0.name
                                                                                               x 1.amount, x 2.phone

                                        y0 (organizations)                               y2(finances)
                                                                                                               x 2.phone
  x1.gid,x 0.name                                        x 0.name    x 1.amount, x 2.phone
                                   x 0.name
                                                                                             x 1.amount       y2.phone
  y1(fundings)                        y0.code              y0.year         y2.finId
                                                                                             y2.budget
                    x1.gid,x 0.name
x1.gid

    y1.fid              y1.finId          x0.name
                                                                                                              x2.phone
                                                                                      x1.amount
         x1. gid
  Information Systems Group                                                                      Leila Jalali, Candidacy Exam
2. Annotating the Graph (cont.)

Apply the rules until no more rules can be applied



                                                                                                x1.gid,x 0.name
                                              x 0.name
                                                                                               x 1.amount, x 2.phone

                                        y0 (organizations)                               y2(finances)
x 1.amount, x 2.phone                                                 x1.gid,x 0.name                          x 2.phone
   x1.gid,x 0.name                                       x 0.name    x 1.amount, x 2.phone
                                   x 0.name
                                                                                             x 1.amount       y2.phone
  y1(fundings)                        y0.code              y0.year         y2.finId
                x 1.amount, x 2.phone                                                        y2.budget
                 x1.gid,x 0.name
x1.gid

    y1.fid              y1.finId          x0.name
                                                                                                              x2.phone
                                                                                      x1.amount
         x1. gid
  Information Systems Group                                                                      Leila Jalali, Candidacy Exam
3. Generation of Transformation Queries

Generate the query fragment:




The for each clause is converted to a query fragment:




 Information Systems Group                              Leila Jalali, Candidacy Exam
3. Generation of Transformation Queries

 Perform a depth-first traversal on the Graph
                                                                                                    x1.gid,x 0.name
                                              x 0.name
                                                                                               x 1.amount, x 2.phone

                                        y0 (organizations)                               y2(finances)
x 1.amount, x 2.phone
                                                                      x1.gid,x 0.name                          x 2.phone
  x1.gid,x 0.name                                        x 0.name    x 1.amount, x 2.phone
                                   x 0.name
                                                                                             x 1.amount      y2.phone
  y1(fundings)                        y0.code              y0.year         y2.finId
                x 1.amount, x 2.phone                                                        y2.budget
                 x1.gid,x 0.name
x1.gid

     y1.fid             y1.finId          x0.name
                                                                                                             x2.phone
                                                                                      x1.amount
         x1. gid




   Information Systems Group                                                                      Leila Jalali, Candidacy Exam
3. Generation of Transformation Queries
                          x 0.name                                                                         x1.gid,x 0.name
                                        y0 (organizations)                                            x 1.amount, x 2.phone
                                                                                     y2(finances)
x 1.amount, x 2.phone
                                                                  x1.gid,x 0.name                            x 2.phone
  x1.gid,x 0.name                                    x 0.name    x 1.amount, x 2.phone
                                   x 0.name
                                                                                         x 1.amount        y2.phone
  y1(fundings)                        y0.code          y0.year         y2.finId
                x 1.amount, x 2.phone                                                    y2.budget
                 x1.gid,x 0.name
x1.gid

    y1.fid              y1.finId          x0.name
                                                                                                           x2.phone
                                                                                  x1.amount
         x1. gid




  Information Systems Group                                                                  Leila Jalali, Candidacy Exam
Finally we have the Query:




Information Systems Group    Leila Jalali, Candidacy Exam
Clio: Conclusion
 Providing tools that help in automating and managing the
  problem of Data Conversion
 The key contributions of Clio:
     Schema mapping generation
       Mapping as a query discovery problem
       Capable of mapping between relational and nested schemas
     Query generation for data exchange
         SQL, XQuery, XSLT, generating Skolems,...




 Information Systems Group                                         Leila Jalali, Candidacy Exam
Thanks




Information Systems Group            Candidacy Exam, Jan. 2010
Back ups
 Clio Requirements
 Complex mappings: using association
 Definitions:
    Mapping language
    Paths
    Schema&Types
    Dominance
 Query Generation Challenges,the problem of Recursion in XML schema
 Nested Referential Integrity (NRI) constraints
 The Chase



Information Systems Group                                Leila Jalali, Candidacy Exam
the Clio project- overview of the requirements

                                                                               Q

                                 Schema Mapping                         Target
    Source
                                                                        schema T
    schema S


“conforms to”                                                         “conforms to”
                            no assumptions about the schemas

        data                   A general mapping language
                       Mapping at different levels of granularities
                       Incremental mapping algorithms
        Capable of mapping between relations schemas and nested schemas




Information Systems Group                                               Leila Jalali, Candidacy Exam
Formalize correspondences
   Companies                                           Using tuple generating dependency(tgd):
       Name              v1        Organizations
       Address                        Code                     ∀n,d,y Companies(n,d,y) →
                                                       v1:
                                                                  ∃y',F Organizations(n,y',F))
       Year                           Year
f1 Grants
                                      Fundings
       Gid                v2                   FId
       Recipient                               FinId     v3:
                                                               ∀g, r, a, s, m Grants(g,r,a,s,m) →
       Amount
                                                                  ∃f,p Finances(f,a,p)
f2     Supervisor        v3        Finances      f4
  f3   Manager
                                                               ∀c, e, p Contacts(c,e,p) →
                                      FinId
   Contacts                           Budget             v4:
       Cid                            Phone                       ∃f,b Finances(f,b,p)
       Email
       Phone
                        v4
                                     ∀n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) →
                               v2:
                                        ∃ y',F,f Organizations(n,y’,F), F(g,f )

       Information Systems Group                                                 Leila Jalali, Candidacy Exam
Correspondences alone are not enough
   How individual data values should be connected in the target?
   Companies
       Name              v1        Organizations
       Address                        Code
       Year                           Year
f1 Grants
                                      Fundings
       Gid              v2                    FId
       Recipient                              FinId
       Amount
                                                f4         Companies                     Organizations
f2     Supervisor        v3        Finances             Name   Address   Year   Code   Year              Fundings
  f3   Manager                        FinId               MS     SA      1976
                                                                                                   FId              FinId
   Contacts                           Budget            AT&T     TX      1980
                                                      f3 IBM     NY      1955   MS
       Cid                            Phone
       Email                                                   Grants           AT&T
       Phone
                        v4                               GId             Amt
                                                                Rec.t           IBM
                                                         301     MS       30
                                                                                                   301
                                                         302     MS       40
                                                         303    IBM       30                       302

       Information Systems Group                                                              Leila Jalali, Candidacy Exam
More complex mappings are needed
   Companies
        Name              v1       Organizations
        Address                       Code                The "association" between companies and grants in
        Year                          Year                the source is suggested by f1 (a foreign key)
f1 Grants
                                      Fundings
        Gid                v2                         ∀n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) →
                                              FId
        Recipient                             FinId      ∃ y',F,f Organizations(n,y’,F), F(g,f )
        Amount
f2      Supervisor        v3       Finances      f4
  f3    Manager                       FinId
   Contacts                           Budget              Companies
                                                                                             Organizations
                                                        Name Address    Year
        Cid                           Phone
                                                          MS   SA       1976   Code   Year                Fundings
        Email                                           AT&T   TX       1980
                         v4                                                                        FId               FinId
        Phone                                         f3 IBM   NY       1955
                                                                               MS                   301
                                                               Grants
                                                                                                    302
                                                         GId    Rec.t   Amt
                                                         301     MS      30    AT&T
                                                         302     MS      40    IBM                  303
                                                         303    IBM      30

       Information Systems Group                                                               Leila Jalali, Candidacy Exam
Yet more complex...
   Companies
       Name              v1        Organizations                ∀g, r, a, s, m Grants(g,r,a,s,m) →
                                                          v3:
       Address                        Code                          ∃f,p Finances(f,a,p)
       Year                           Year
f1 Grants
                                      Fundings
       Gid                v2                  FId     ∀n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) →
       Recipient                              FinId
                                                       ∃y',F,f, p Organizations(n,y',F), F(g,f), Finances(f,a,p)
       Amount
f2     Supervisor        v3        Finances      f4
  f3   Manager                        FinId
   Contacts                           Budget             • Three tuples are generated for each pair of related
       Cid                            Phone                companies and grants
       Email                                             • The mapping specifies that there exist an f, appearing in
       Phone
                        v4                                 two places, without saying what its value must be




       Information Systems Group                                                               Leila Jalali, Candidacy Exam
Yet more complex...                           Companies
                                                  Name            v1       Organizations
v4   ∀c, e, p Contacts(c,e,p) →                   Address                      Code
                                                  Year
        ∃f,b Finances(f,b,p)               f1 Grants
                                                                               Year
                                                                               Fundings
                                                  Gid              v2                  FId
• How do we obtain the phone to be                Recipient                            FinId
  put in finances?                                Amount
    • Is it the supervisor's one or the    f2     Supervisor               Finances       f4
                                                                  v3
       manager's?                            f3   Manager                      FinId
• FKs suggest either (or even both)           Contacts                         Budget
• Human intervention is needed to choose                                       Phone
                                                  Cid
                                                  Email
                                                  Phone
                                                                 v4




Information Systems Group                                      Leila Jalali, Candidacy Exam
The Mapping Language- Syntax
         foreach x1 in g1, . . . , xn in gn         xi in gi (generator)
             where B1                               •xi variable
                                                    •gi set (either the root or a set
         exists y1 in g'1, . . . , ym in g'm        nested within it)
             where B2
                                                    B1 conjunction of equalities over
         with e1 = e'1 and . . . and ek = e'k
                                                    the xi variables
The example:
                                                    e1 = e'1 … equalities between a
         foreach c in companies, g in grants
                                                    source expression and a target
            where c.name=g.recipient                expression
         exists      o in organizations,
                                 f in o.fundings,
                                 i in finances
            where f.finId = i.finId
         with        o.code = c.name
            and      f.fId = g.gId
            and      i.budget = g.amount

   Information Systems Group                                         Leila Jalali, Candidacy Exam
Primary and Relative paths
 Primary path (given a schema root R, that is a first level
   element in the schema):
     x1 in g1, x2 in g2, …, xn in gn
         where g1 is an expression on R (just R?), gi (for i ≥ 2) g1 is an expression
           on xi-1
     Examples
       c in companies
       o in organizations, f in o.fundings

 Relative path with respect to a variable x
     x1 in g1, x2 in g2, …, xn in gn
         where g1 is an expression on x, gi (for i ≥ 2) g1 is an expression on xi-1
     Example
       f in o.fundings
 Information Systems Group                                                   Leila Jalali, Candidacy Exam
Schema and types
 A schema: a sequence of labels(roots) each with associated
  type, defined by this grammar:

                                                Complex types
     Atomic types           A set type
                                                All and choice model-groups
                            Repeated elements

  Instances: associates each schema root a value

      A value for atomic types
      setID
     An unordered tuple of pairs
     A pair

Information Systems Group                                   Leila Jalali, Candidacy Exam
Correspondences




Information Systems Group   Leila Jalali, Candidacy Exam
the data exchange problem




Information Systems Group   Leila Jalali, Candidacy Exam
Query generation challenges

1. Creation of New Values in the Target
Optional: Null
                                                              name
                                                              salary
                                                              spouse
                                                            dateofbirth




Not nullable: one-to-one Skolem function             But if it is emp ID




 Information Systems Group                 Leila Jalali, Candidacy Exam
Query generation challenges
1. Creation of New Values in the Target

Refrential constraints




  Information Systems Group               Leila Jalali, Candidacy Exam
Query generation challenges

2. Grouping Nested elements




  Information Systems Group     Leila Jalali, Candidacy Exam
Query generation challenges
3. Value Creation interacts with Grouping




 Information Systems Group                  Leila Jalali, Candidacy Exam
Recursion in XML schema




Information Systems Group   Leila Jalali, Candidacy Exam
the Chase
 Given as association, repeatedly applying a chase rule to the "current"
  association (initialed as the input one)
   If there is a NRI constraint
         foreach X exists Y where B
     such that the "current" association contains X and does not contain a Y that
     satisfies B
     then add Y to the generators and B to the where clause
 Example. If we start with
     from g in grants
  then we have to add various components and obtain
     from g in grants, c in companies,
                     s in contacts, m in contacts
          where g.recipient = c.name and
                     g.supervisor = s.cid and
                     g.manager = m.cid
   Information Systems Group                                                Leila Jalali, Candidacy Exam
Clio: Analysis and Conclusion
 Termination and Complexity of the Chase:
     the Chase with general dependecies may not be terminate
        Cyclic dependencies
     NRIs: A weakly acyclic set
     the number of Chase steps is polynomial
 Conculsion




 Information Systems Group                                Leila Jalali, Candidacy Exam
Clio mapping
 A Clio mapping:               for each AS exists AT with E
    AS , AT : logical associations (on source and target, resp.)
    E a conjunction of equalities:
        for each correspondence v in C covered by <AS , AT> ,
          E includes the equality h(eS )=h(eT ) which is the result of the coverage,
          for one of the coverages




Information Systems Group                                                 Leila Jalali, Candidacy Exam
Structural Association
 Structural association:
   − from P           (with P primary path)

                                              Starts from the Root of the schema



                                                     Companies
                                                       Name                Organizations
                                                       Address                 Code
                                                       Year                    Year
                                                     Grants                    Fundings
                                                       Gid                               FId
                                                       Recipient                         FinId
                                                       Amount
                                                       Supervisor          Finances
                                                       Manager                 FinId
                                                     Contacts                  Budget
  Information Systems Group                            Cid          Leila Jalali, Phone
                                                                                   Candidacy Exam
Nested Referential Integrity (NRI) constraints
 The basis for discovery of associations: capture relation foreign key and
  referential constraints as well as XML keyref constraint:
    foreach P1 exists P2 where B
                                          o in organizations, f in o.fundings
     P1 is a primary path                                            f in o.fundings
                                                                          Organizations:
     P2 is a primary path or a relative path with respect to a
                                                                              Code
       variable in P1                                                         Year
     B is a conjunction of equalities                                        Fundings:
                                                                                        FId
       between an expression on a variable of P1
                                                                                        FinId
                                                                                                f4
      and an expression on a variable of P2                               Finances
       foreach o in organizations, f in o.fundings                            FinId
       exists i in finances                                                   Budget
            where f.finId = i.finId                                           Phone

   Information Systems Group                                          Leila Jalali, Candidacy Exam
Logical Association
 Logical association: semantic relationships between schema
  elements
   Obtained by starting with a structural association
     and "chasing" NRI constraints




   Information Systems Group                             Leila Jalali, Candidacy Exam
Logical Association- the Chase
                                                      start with a structural association
     Companies
          Name            v1      Organizations
          Address                     Code
f1        Year                        Year
     Grants                           Fundings
                           v2
          Gid                                 FId
          Recipient                           FinId                           f2
          Amount                  Finances
f2        Supervisor      v3                   f4
                                      FinId
     f3   Manager                     Budget
     Contacts                         Phone
          Cid
                                                                              f3
          Email          v4
          Phone




          Information Systems Group                                 Leila Jalali, Candidacy Exam
Logical Association Relationships

 A2 dominates A1 (A1 ≤ A2 ) if
    the from and where clauses of A1 are subsets of those of A2 (after
      suitable renaming)

        A2 : from g in grants, c in companies, s in contacts, m in contacts
               where g.recipient = c.name and
                        g.supervisor = s.cid and
                        g.manager = m.cid

        A1 : from g in grants, c in companies
               where g.recipient = c.name




 Information Systems Group                                           Leila Jalali, Candidacy Exam
Mapping Generation Algorithm
    Inputs: S , T , Correspondences                                         AS : from c in companies
                                                                            AT : fom o in organizations
           Logical associations are meaningful combinations of correspondences
                       Generate all Logical Associations : AS , AT

                 Which correspondences can be interpreted together?
   For each suitable pair <AS , AT>: find the correspondences covered by the pair
                   with some renaming <h,h‘>, Check for dominance

                     Generate Clio Mapping: foreach AS exists AT with W
                                     W is the equality h(eS )=h(eT )
                             Add the Clio Mapping to the Set of Mappings

                                                                       M: for each c in companies
Output: the set of Schema Mappings                                     exists o in organizations
                                                                       with c.name = o.code



 Information Systems Group                                                        Leila Jalali, Candidacy Exam

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 

Kürzlich hochgeladen (20)

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 

Data Integration

  • 1. Clio: Schema Mapping Creation and Data Exchange Presented by Leila Jalali Information Systems Group Candidacy Exam, Jan. 2010
  • 2. the Clio project •Wants data from S •Understands T •May not understand S Q Source Schema Mapping Target schema T schema S “conforms to” “conforms to” Data Exchange data to transform data Clio addresses two main problems: How to generate schema mappings and how to use them for data exchange? exchange Information Systems Group Leila Jalali, Candidacy Exam
  • 3. Outline  The Motivating Example 2. Schema Mapping Generation  Mapping generation algorithm 2. Data Exchange  Query generation algorithm  Conclusions Information Systems Group Leila Jalali, Candidacy Exam
  • 4. A Motivating Example Schema S: Companies: Set of Rcd Schema T: Name v1 Organizations: Set of Rcd Address Code Year Year f1 Fundings: Set of Rcd Grants : Set of Rcd v2 FId Gid FinId Recipient f4 Amount Finances: Set of Rcd v3 Supervisor FinId f2 Manager Budget f3 Phone Contacts : Set of Rcd v4 Correspondences Cid (given by a "schema matcher“ or Email a“user”) Phone Information Systems Group Leila Jalali, Candidacy Exam
  • 5. Correspondences Companies Using tuple generating dependency(tgd): Name v1 Organizations Address Code ∀n,d,y Companies(n,d,y) → v1: ∃y',F Organizations(n,y',F)) Year Year f1 Grants Fundings Gid v2 FId Recipient FinId Amount foreach c in companies f2 Supervisor v3 Finances f4 f3 exists o in organizations, Manager FinId Contacts Budget with o.code = c.name Cid Phone Email Phone v4 Information Systems Group Leila Jalali, Candidacy Exam
  • 6. More complex mappings Companies ∀n,d,y,g,a,s,m Companies(n,d,y), Name v1 Organizations Grants(g,n,a,s,m) → Address Code ∃y',F,f, p Year Year f1 Grants Organizations(n,y',F)), Fundings v2 F(g,f), Gid FId Recipient FinId Finances(f,a,p) Amount foreach c in companies, g in grants f2 Supervisor v3 Finances f4 f3 where c.name=g.recipient Manager FinId exists o in organizations, Contacts Budget f in o.fundings, Cid Phone i in finances Email where f.finId = i.finId v4 Phone with o.code = c.name and f.fId = g.gId and i.budget = g.amount Information Systems Group Leila Jalali, Candidacy Exam
  • 7. More complex mappings Companies ∀n,d,y,g,a,s,m Companies(n,d,y), Name v1 Organizations Grants(g,n,a,s,m) → Address Code ∃y',F,f, p Year Year f1 Grants Organizations(n,y',F)), Fundings v2 F(g,f), Gid FId Recipient FinId Finances(f,a,p) Amount foreach c in companies, g in grants f2 Supervisor v3 Finances f4 f3 where c.name=g.recipient Manager FinId exists o in organizations, Contacts Budget f in o.fundings, Cid Phone i in finances Email where f.finId = i.finId v4 Phone query on the with o.code = c.name source:QS and f.fId = g.gId and i.budget = g.amount query on the Correspondences QS  QT target: QT Information Systems Group Leila Jalali, Candidacy Exam
  • 8. Outline  The Motivating Example 2. Schema Mapping Generation  Mapping generation algorithm 2. Data Exchange  Query generation algorithm  Conclusions Information Systems Group Leila Jalali, Candidacy Exam
  • 9. Mapping Generation Source Schema Generate all possible associations within the Source Structural Associations Target Schema Generate all possible associations within the Target Information Systems Group Leila Jalali, Candidacy Exam
  • 10. Mapping Generation Source Schema Generate all possible associations within the Source Structural Associations Target Schema Generate all possible associations within the Target Companies: Name Organizations: f1 Address from p in companies Code Year Year from o in organizations Grants: from g in grants Fundings: Gid FId f4 f2 Recipient FinId f3 Finances: Amount Supervisor FinId Manager Budget Contacts: Phone Cid Email Information Systems Group Leila Jalali, Candidacy Exam
  • 11. Mapping Generation Source Schema Generate all possible associations within the Source Structural Associations Target Schema Generate all possible associations within the Target Logical Associations Build larger associaitons in Source (AS) and Target (AT) Information Systems Group Leila Jalali, Candidacy Exam
  • 12. Mapping Generation Source Schema Generate all possible associations within the Source Structural Associations Target Schema Generate all possible associations within the Target Logical Associations Build larger associaitons in Source (AS) and Target (AT) Companies: Name starting with a structural association and "chasing" constraints f1 Address AS : Year Grants: Gid f2 Recipient f3 Amount Supervisor Manager Contacts: Information Systems Group Leila Jalali, Candidacy Exam
  • 13. Mapping Generation Source Schema Generate all possible associations within the Source Structural Associations Target Schema Generate all possible associations within the Target Logical Associations Build larger associaitons in Source (AS) and Target (AT) Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> to generate a Clio Mapping: foreach AS exists AT with W W is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences) Information Systems Group Leila Jalali, Candidacy Exam
  • 14. Clio mapping, example Generate a Clio Mapping: foreach AS exists AT with W Companies W is the conjunction of equalities h (eS )=h’(eT ) Name v1 Organizations Address Code AS : from g in grants, c in companies, Year Year s in contacts, m in contacts f1 Grants where g.recipient = c.name Fundings Gid v2 FId and g.supervisor = s.cid Recipient and g.manager = m.cid FinId Amount AT: from o in organizations, f2 Supervisor v3 Finances f4 f in o.fundings, i in finances f3 Manager FinId where f.finId = i.finId Contacts Budget Cid Phone v1, v2, v3 are covered Email Phone v4foreach g in grants, c in companies, s in contacts, m in contacts where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid exists o in organizations, f in o.fundings, i in finances where f.finId = i.finId with c.name = o.code and g.gId = f. fId and g.amount = i.budget Information Systems Group Leila Jalali, Candidacy Exam
  • 15. Dominance  A2 dominates A1 (A1 ≤ A2 ) if  the from and where clauses of A1 are subsets of those of A2 (after suitable renaming) A2 : from g in grants, c in companies, s in contacts, m in contacts where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid A1 : from g in grants, c in companies where g.recipient = c.name Information Systems Group Leila Jalali, Candidacy Exam
  • 16. Coverage of a coresspondence  A correspondence v : foreach PS exists PT with eS=eT is covered by a pair of associations <AS , AT> if PS ≤ AS and PT ≤ AT with some renaming h, h’ AS : from c in companies v: foreach c in companies Example: AT : fom o in organizations exists o in organizations with c.name = o.code Information Systems Group Leila Jalali, Candidacy Exam
  • 17. Mapping Generation Source Schema Generate all possible associations within the Source Structural Associations Target Schema Generate all possible associations within the Target Logical Associations Build larger associaitons in Source (AS) and Target (AT) Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> and generate a Clio Mapping: foreach AS exists AT with W W is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences) Information Systems Group Leila Jalali, Candidacy Exam
  • 18. Mapping Generation Source Schema Generate all possible associations within the Source Structural Associations Target Schema Generate all possible associations within the Target Logical Associations Build larger associaitons in Source (AS) and Target (AT) Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> and generate a Clio Mapping: foreach AS exists AT with W W is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences) Add the Clio Mapping to the Set of Mappings the Set of Mappings Information Systems Group Leila Jalali, Candidacy Exam
  • 19. Logical associations are meaningful combinations of correspondences Finds maximal sets of correspondences that can be interpreted together Discard the “larger” mapping Generate a Clio mapping Information Systems Group Leila Jalali, Candidacy Exam
  • 20. Outline  The Motivating Example 1. Schema Mapping Generation  Mapping generation algorithm 2. Data Exchange  Query generation algorithm  Conclusions Information Systems Group Leila Jalali, Candidacy Exam
  • 21. Query generation for data exchange Mapping generation Source Target schema schema Query generation Information Systems Group Leila Jalali, Candidacy Exam
  • 22. Overview of Query Generation Input: A Clio Mapping x 0.name 1. Query Graph is constructed which represents y 0 (organizations) the key portions of the query in the graph x 0.name x1. amount, x1.gid, x 0.name, y 0.year 2. Annotate the graph to generate Skolem terms y 1(fundings) x 0.name y 0 .code x1.gid x 0.name, x1.gid 3. Traverse the graph and produce the query y 0.fid y 0.finId x1. gid Output: the data exchange Query (in SQL, XQuery, or XSLT) Information Systems Group Leila Jalali, Candidacy Exam
  • 23. 1. Constructing the Query Graph Adding a node for each variable in the exists clause y0 (organizations) y2(finances) y1(fundings) Information Systems Group Leila Jalali, Candidacy Exam
  • 24. 1. Constructing the Query Graph (cont.) Organizations: Code Year Fundings: FId f4 Adding nodes for all the atomic type elements reachable from these FinId nodes via record projection Finances FinId y0 (organizations) y2(finances) Budget Phone y1(fundings) y2.phone y0.code y0.year y2.finId y2.budget y1.fid y1.finId Information Systems Group Leila Jalali, Candidacy Exam
  • 25. 1. Constructing the Query Graph (cont.) Organizations: Code Year Fundings: FId Add structural edges to reflect the relationships between nodes FinId Finances FinId y0 (organizations) y2(finances) Budget Phone y1(fundings) y2.phone y0.code y0.year y2.finId y2.budget y1.fid y1.finId Information Systems Group Leila Jalali, Candidacy Exam
  • 26. 1. Constructing the Query Graph (cont.) Add the source nodes for all source expressions in the with clause y0 (organizations) y2(finances) y1(fundings) y2.phone y0.code y0.year y2.finId y2.budget y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 27. 1. Constructing the Query Graph (cont.) Attach the source nodes to the target nodes to which they are “equal” y0 (organizations) y2(finances) y1(fundings) y2.phone y0.code y0.year y2.finId y2.budget y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 28. 1. Constructing the Query Graph (cont.) Use the equalities in the where clause to add edges between target nodes y0 (organizations) y2(finances) y1(fundings) y2.phone y0.code y0.year y2.finId y2.budget y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 29. 2. Annotating the Graph Each node is annotated with a set of source expressions Upward propagation: Every expression that a node acquires is propagated to its parent node, unless the (acquiring) node is a variable. y0 (organizations) y2(finances) x 2.phone x 0.name x 1.amount y2.phone y1(fundings) y0.code y0.year y2.finId y2.budget x1.gid y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 30. 2. Annotating the Graph (cont.) Downward propagation: Every expression that a node acquires is propagated to its children x 0.name x 1.amount, x 2.phone y0 (organizations) y2(finances) x 2.phone x1.gid x 0.name x 1.amount y2.phone y1(fundings) y0.code y0.year y2.finId y2.budget x1.gid x 0.name y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 31. 2. Annotating the Graph (cont.) Eq. propagation: Every expression that a node acquires is propagated to the nodes related to it through equality edges. x 0.name x 1.amount, x 2.phone y0 (organizations) y2(finances) x 2.phone x1.gid,x 0.name x 0.name x 1.amount, x 2.phone x 0.name x 1.amount y2.phone y1(fundings) y0.code y0.year y2.finId y2.budget x1.gid,x 0.name x1.gid y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 32. 2. Annotating the Graph (cont.) Apply the rules until no more rules can be applied x1.gid,x 0.name x 0.name x 1.amount, x 2.phone y0 (organizations) y2(finances) x 1.amount, x 2.phone x1.gid,x 0.name x 2.phone x1.gid,x 0.name x 0.name x 1.amount, x 2.phone x 0.name x 1.amount y2.phone y1(fundings) y0.code y0.year y2.finId x 1.amount, x 2.phone y2.budget x1.gid,x 0.name x1.gid y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 33. 3. Generation of Transformation Queries Generate the query fragment: The for each clause is converted to a query fragment: Information Systems Group Leila Jalali, Candidacy Exam
  • 34. 3. Generation of Transformation Queries Perform a depth-first traversal on the Graph x1.gid,x 0.name x 0.name x 1.amount, x 2.phone y0 (organizations) y2(finances) x 1.amount, x 2.phone x1.gid,x 0.name x 2.phone x1.gid,x 0.name x 0.name x 1.amount, x 2.phone x 0.name x 1.amount y2.phone y1(fundings) y0.code y0.year y2.finId x 1.amount, x 2.phone y2.budget x1.gid,x 0.name x1.gid y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 35. 3. Generation of Transformation Queries x 0.name x1.gid,x 0.name y0 (organizations) x 1.amount, x 2.phone y2(finances) x 1.amount, x 2.phone x1.gid,x 0.name x 2.phone x1.gid,x 0.name x 0.name x 1.amount, x 2.phone x 0.name x 1.amount y2.phone y1(fundings) y0.code y0.year y2.finId x 1.amount, x 2.phone y2.budget x1.gid,x 0.name x1.gid y1.fid y1.finId x0.name x2.phone x1.amount x1. gid Information Systems Group Leila Jalali, Candidacy Exam
  • 36. Finally we have the Query: Information Systems Group Leila Jalali, Candidacy Exam
  • 37. Clio: Conclusion  Providing tools that help in automating and managing the problem of Data Conversion  The key contributions of Clio:  Schema mapping generation  Mapping as a query discovery problem  Capable of mapping between relational and nested schemas  Query generation for data exchange  SQL, XQuery, XSLT, generating Skolems,... Information Systems Group Leila Jalali, Candidacy Exam
  • 38. Thanks Information Systems Group Candidacy Exam, Jan. 2010
  • 39. Back ups  Clio Requirements  Complex mappings: using association  Definitions:  Mapping language  Paths  Schema&Types  Dominance  Query Generation Challenges,the problem of Recursion in XML schema  Nested Referential Integrity (NRI) constraints  The Chase Information Systems Group Leila Jalali, Candidacy Exam
  • 40. the Clio project- overview of the requirements Q Schema Mapping Target Source schema T schema S “conforms to” “conforms to” no assumptions about the schemas data A general mapping language Mapping at different levels of granularities Incremental mapping algorithms Capable of mapping between relations schemas and nested schemas Information Systems Group Leila Jalali, Candidacy Exam
  • 41. Formalize correspondences Companies Using tuple generating dependency(tgd): Name v1 Organizations Address Code ∀n,d,y Companies(n,d,y) → v1: ∃y',F Organizations(n,y',F)) Year Year f1 Grants Fundings Gid v2 FId Recipient FinId v3: ∀g, r, a, s, m Grants(g,r,a,s,m) → Amount ∃f,p Finances(f,a,p) f2 Supervisor v3 Finances f4 f3 Manager ∀c, e, p Contacts(c,e,p) → FinId Contacts Budget v4: Cid Phone ∃f,b Finances(f,b,p) Email Phone v4 ∀n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) → v2: ∃ y',F,f Organizations(n,y’,F), F(g,f ) Information Systems Group Leila Jalali, Candidacy Exam
  • 42. Correspondences alone are not enough How individual data values should be connected in the target? Companies Name v1 Organizations Address Code Year Year f1 Grants Fundings Gid v2 FId Recipient FinId Amount f4 Companies Organizations f2 Supervisor v3 Finances Name Address Year Code Year Fundings f3 Manager FinId MS SA 1976 FId FinId Contacts Budget AT&T TX 1980 f3 IBM NY 1955 MS Cid Phone Email Grants AT&T Phone v4 GId Amt Rec.t IBM 301 MS 30 301 302 MS 40 303 IBM 30 302 Information Systems Group Leila Jalali, Candidacy Exam
  • 43. More complex mappings are needed Companies Name v1 Organizations Address Code The "association" between companies and grants in Year Year the source is suggested by f1 (a foreign key) f1 Grants Fundings Gid v2 ∀n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) → FId Recipient FinId ∃ y',F,f Organizations(n,y’,F), F(g,f ) Amount f2 Supervisor v3 Finances f4 f3 Manager FinId Contacts Budget Companies Organizations Name Address Year Cid Phone MS SA 1976 Code Year Fundings Email AT&T TX 1980 v4 FId FinId Phone f3 IBM NY 1955 MS 301 Grants 302 GId Rec.t Amt 301 MS 30 AT&T 302 MS 40 IBM 303 303 IBM 30 Information Systems Group Leila Jalali, Candidacy Exam
  • 44. Yet more complex... Companies Name v1 Organizations ∀g, r, a, s, m Grants(g,r,a,s,m) → v3: Address Code ∃f,p Finances(f,a,p) Year Year f1 Grants Fundings Gid v2 FId ∀n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) → Recipient FinId ∃y',F,f, p Organizations(n,y',F), F(g,f), Finances(f,a,p) Amount f2 Supervisor v3 Finances f4 f3 Manager FinId Contacts Budget • Three tuples are generated for each pair of related Cid Phone companies and grants Email • The mapping specifies that there exist an f, appearing in Phone v4 two places, without saying what its value must be Information Systems Group Leila Jalali, Candidacy Exam
  • 45. Yet more complex... Companies Name v1 Organizations v4 ∀c, e, p Contacts(c,e,p) → Address Code Year ∃f,b Finances(f,b,p) f1 Grants Year Fundings Gid v2 FId • How do we obtain the phone to be Recipient FinId put in finances? Amount • Is it the supervisor's one or the f2 Supervisor Finances f4 v3 manager's? f3 Manager FinId • FKs suggest either (or even both) Contacts Budget • Human intervention is needed to choose Phone Cid Email Phone v4 Information Systems Group Leila Jalali, Candidacy Exam
  • 46. The Mapping Language- Syntax foreach x1 in g1, . . . , xn in gn xi in gi (generator) where B1 •xi variable •gi set (either the root or a set exists y1 in g'1, . . . , ym in g'm nested within it) where B2 B1 conjunction of equalities over with e1 = e'1 and . . . and ek = e'k the xi variables The example: e1 = e'1 … equalities between a foreach c in companies, g in grants source expression and a target where c.name=g.recipient expression exists o in organizations, f in o.fundings, i in finances where f.finId = i.finId with o.code = c.name and f.fId = g.gId and i.budget = g.amount Information Systems Group Leila Jalali, Candidacy Exam
  • 47. Primary and Relative paths  Primary path (given a schema root R, that is a first level element in the schema):  x1 in g1, x2 in g2, …, xn in gn  where g1 is an expression on R (just R?), gi (for i ≥ 2) g1 is an expression on xi-1  Examples  c in companies  o in organizations, f in o.fundings  Relative path with respect to a variable x  x1 in g1, x2 in g2, …, xn in gn  where g1 is an expression on x, gi (for i ≥ 2) g1 is an expression on xi-1  Example  f in o.fundings Information Systems Group Leila Jalali, Candidacy Exam
  • 48. Schema and types  A schema: a sequence of labels(roots) each with associated type, defined by this grammar: Complex types Atomic types A set type All and choice model-groups Repeated elements  Instances: associates each schema root a value A value for atomic types setID An unordered tuple of pairs A pair Information Systems Group Leila Jalali, Candidacy Exam
  • 49. Correspondences Information Systems Group Leila Jalali, Candidacy Exam
  • 50. the data exchange problem Information Systems Group Leila Jalali, Candidacy Exam
  • 51. Query generation challenges 1. Creation of New Values in the Target Optional: Null name salary spouse dateofbirth Not nullable: one-to-one Skolem function But if it is emp ID Information Systems Group Leila Jalali, Candidacy Exam
  • 52. Query generation challenges 1. Creation of New Values in the Target Refrential constraints Information Systems Group Leila Jalali, Candidacy Exam
  • 53. Query generation challenges 2. Grouping Nested elements Information Systems Group Leila Jalali, Candidacy Exam
  • 54. Query generation challenges 3. Value Creation interacts with Grouping Information Systems Group Leila Jalali, Candidacy Exam
  • 55. Recursion in XML schema Information Systems Group Leila Jalali, Candidacy Exam
  • 56. the Chase  Given as association, repeatedly applying a chase rule to the "current" association (initialed as the input one)  If there is a NRI constraint foreach X exists Y where B such that the "current" association contains X and does not contain a Y that satisfies B then add Y to the generators and B to the where clause  Example. If we start with from g in grants then we have to add various components and obtain from g in grants, c in companies, s in contacts, m in contacts where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid Information Systems Group Leila Jalali, Candidacy Exam
  • 57. Clio: Analysis and Conclusion  Termination and Complexity of the Chase:  the Chase with general dependecies may not be terminate  Cyclic dependencies  NRIs: A weakly acyclic set  the number of Chase steps is polynomial  Conculsion Information Systems Group Leila Jalali, Candidacy Exam
  • 58. Clio mapping  A Clio mapping: for each AS exists AT with E  AS , AT : logical associations (on source and target, resp.)  E a conjunction of equalities:  for each correspondence v in C covered by <AS , AT> , E includes the equality h(eS )=h(eT ) which is the result of the coverage, for one of the coverages Information Systems Group Leila Jalali, Candidacy Exam
  • 59. Structural Association  Structural association: − from P (with P primary path) Starts from the Root of the schema Companies Name Organizations Address Code Year Year Grants Fundings Gid FId Recipient FinId Amount Supervisor Finances Manager FinId Contacts Budget Information Systems Group Cid Leila Jalali, Phone Candidacy Exam
  • 60. Nested Referential Integrity (NRI) constraints  The basis for discovery of associations: capture relation foreign key and referential constraints as well as XML keyref constraint: foreach P1 exists P2 where B o in organizations, f in o.fundings  P1 is a primary path f in o.fundings Organizations:  P2 is a primary path or a relative path with respect to a Code variable in P1 Year  B is a conjunction of equalities Fundings: FId between an expression on a variable of P1 FinId f4 and an expression on a variable of P2 Finances foreach o in organizations, f in o.fundings FinId exists i in finances Budget where f.finId = i.finId Phone Information Systems Group Leila Jalali, Candidacy Exam
  • 61. Logical Association  Logical association: semantic relationships between schema elements  Obtained by starting with a structural association and "chasing" NRI constraints Information Systems Group Leila Jalali, Candidacy Exam
  • 62. Logical Association- the Chase start with a structural association Companies Name v1 Organizations Address Code f1 Year Year Grants Fundings v2 Gid FId Recipient FinId f2 Amount Finances f2 Supervisor v3 f4 FinId f3 Manager Budget Contacts Phone Cid f3 Email v4 Phone Information Systems Group Leila Jalali, Candidacy Exam
  • 63. Logical Association Relationships  A2 dominates A1 (A1 ≤ A2 ) if  the from and where clauses of A1 are subsets of those of A2 (after suitable renaming) A2 : from g in grants, c in companies, s in contacts, m in contacts where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid A1 : from g in grants, c in companies where g.recipient = c.name Information Systems Group Leila Jalali, Candidacy Exam
  • 64. Mapping Generation Algorithm Inputs: S , T , Correspondences AS : from c in companies AT : fom o in organizations Logical associations are meaningful combinations of correspondences Generate all Logical Associations : AS , AT Which correspondences can be interpreted together? For each suitable pair <AS , AT>: find the correspondences covered by the pair with some renaming <h,h‘>, Check for dominance Generate Clio Mapping: foreach AS exists AT with W W is the equality h(eS )=h(eT ) Add the Clio Mapping to the Set of Mappings M: for each c in companies Output: the set of Schema Mappings exists o in organizations with c.name = o.code Information Systems Group Leila Jalali, Candidacy Exam

Hinweis der Redaktion

  1. Providing tools that help in automating and managing the problem of Data Conversion use of Schema Mappings (specification to describe the relationship between data in two different schemas) To transform data between two different representations Schema Mappings to generate: A view to reformulates queries: Data Integration A code to transform data : Data Exchange
  2. Contributions of the paper
  3. Information about companies and grants…. Nested relational representation  one can present both relational and xml schemas Schema S is a relational schema: with 3 tables : companies, grants and contacts The grant has grantidentifier, recipient which is the name of the company that receives, and the amount The green lines: referential constraints: foreign key or dependency The target is the XML schema: the funding that an organization receives is nested with the organization record Dashed arrows : Correspondences : the relationships between the schemas, may given by the schema matcher, or we can ask the user to draw these lines V1: the company name in the first schema referred to the organization code in the second schema Why there is no lines between year: 2 diff. concepts. The year. The time the company founded vs the time it had its first initial public offer Their approach does not care about how these correspondence are created, but consider about matchings are incompelete and sometimes incorrect For simplicity these 4 correcpondences are correct
  4. Correspondence can be formally expressed using tuple generating dependency(tgd) Using shared variables: for each company there must be an organization whose code is the same as companies.name All the shared variables are underlined
  5. For each x i in g i (generator) x i variable g i set (either the root or a set nested within it) where B 1 conjunction of equalities over the x i variables with e 1 = e&apos; 1 … equalities between a source expression and a target expression The mapping as a source to target constraint: &amp;quot;the result of Q T (over the target, projected as in the with-clause) must contain the result of Q S (over the source, projected as in the with-clause)&amp;quot;
  6. For each x i in g i (generator) x i variable g i set (either the root or a set nested within it) where B 1 conjunction of equalities over the x i variables with e 1 = e&apos; 1 … equalities between a source expression and a target expression The mapping as a source to target constraint: &amp;quot;the result of Q T (over the target, projected as in the with-clause) must contain the result of Q S (over the source, projected as in the with-clause)&amp;quot;
  7. Contributions of the paper
  8. Logical association: An association obtained by &amp;quot;chasing&amp;quot; constraints (starting with a structural or a user association) Logical associations are meaningful combinations of correspondences A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
  9. Logical association: An association obtained by &amp;quot;chasing&amp;quot; constraints (starting with a structural or a user association) Logical associations are meaningful combinations of correspondences A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
  10. Logical association: An association obtained by &amp;quot;chasing&amp;quot; constraints (starting with a structural or a user association) Logical associations are meaningful combinations of correspondences A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
  11. Logical association: An association obtained by &amp;quot;chasing&amp;quot; constraints (starting with a structural or a user association) Logical associations are meaningful combinations of correspondences A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
  12. Logical association: An association obtained by &amp;quot;chasing&amp;quot; constraints (starting with a structural or a user association) Logical associations are meaningful combinations of correspondences A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
  13.  n ,d,y Companies( n ,d,y) →  y&apos;,F Organizations( n ,y&apos;,F))  n ,d,y, g , a,s,m Companies( n ,d,y), Grants( g , n ,a,s,m) →  y&apos;,F ,f Organizations( n ,y’ ,F), F( g ,f )  g, r, a , s, m Grants( g,r, a ,s,m) →  f,p Finances(f, a ,p)  c, e, p Contacts( c,e, p ) →  f,b Finances(f,b, p )
  14. Logical association: An association obtained by &amp;quot;chasing&amp;quot; constraints (starting with a structural or a user association) Logical associations are meaningful combinations of correspondences A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
  15. Logical association: An association obtained by &amp;quot;chasing&amp;quot; constraints (starting with a structural or a user association) Logical associations are meaningful combinations of correspondences A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
  16. A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
  17. Contributions of the paper
  18. The schema mapping specify how the data of two schemas relate to each other For data exchange an instance of the source schema must be transformed to an instance of the target schema Note the schema mapping migth not contain all the target values, and may not specify the grouping/ nested semantics for target data
  19. When one schema is XML Clio can generate a data exchange query in Xquery or XSLT The paper describe how to generate Xquery , SQL is similar without having nested elements
  20. Obvious relationships
  21. Obvious relationships
  22. Obvious relationships
  23. finally
  24. Annotation to facilitate generation of Skolem functions These source elements will be the arguments of the potential skolem functions
  25. Every expression that a node acquires is propagated to its children if they do not already have it and if they are not equal to any of the source nodes. Annotation to facilitate generation of Skolem functions These source elements will be the arguments of the potential skolem functions
  26. Annotation to facilitate generation of Skolem functions These source elements will be the arguments of the potential skolem functions
  27. Annotation to facilitate generation of Skolem functions These source elements will be the arguments of the potential skolem functions
  28. It is straightforward, Clio binds one variable to each term, and add the conditions in the where clause Noted it by Q S M1 It is not the complete query because it does not have the result yet It will be used repeatedly in the larger query
  29. It will be used repeatedly in the larger query Starts at the target schema root in query graph , depth first traversal If a node is a complex type element (like y1 fundings) , the element is generated by visiting the children If the node is an atomic type, if it is linked to the source node (like y1.fid) , a simple element is created with the value equal to source, If it is an optional element, nothing generated If it is a nullable element, null value is generated else (like y1. finId) a value will be generated using a new Skolem function, with all arguments that annotate to the node (take care that all the nodes equal to this node receive the same Skolem function name) If it is a variable, For Where Return query produced, copy Q S M1 (the query fragment) rename all the variables, compare annotation with its parent variable, for each common expression correlated sub query generated
  30. If it is a variable, For Where Return query produced, copy Q S M1 (the query fragment) rename all the variables, compare annotation with its parent variable, for each common expression correlated sub query generated
  31. It will be used repeatedly in the larger query Starts at the target schema root in query graph , depth first traversal
  32. The path in an NRI require matchings, to determine the variables in the path However it is exponential to the size of the path , which is often small . Some matching are not possible because of schema restrictions a Chase step can take exponential (in the worst case, it could be multiple ways of matching a variable in a path)
  33. Providing tools that help in automating and managing the problem of Data Conversion Makes no assumption about the schemas, their relationships or how they were created The mapping language is more general than TSIMMIS, Information Manifold Able to map between relational schemas and nested schemas Mapping at different levels of granularities: fine grained mappings such as translating the salary in francs to dollars, boarder concept (documents from one schema to the other schema) Incremental mapping algorithms: sometimes the complete mapping is not the goal (we want a single concept to be mapped) or we have partial knowledge of the schemas so we want to support incomplete mappings as well
  34. Correspondence can be formally expressed using tuple generating dependency(tgd) Using shared variables: for each company there must be an organization whose code is the same as companies.name All the shared variables are underlined
  35. Correspondences alone do not specify how individual values should be connected in the target For e.g. fundings is nested inside organization which means there is a semantic association between them We should look for the association between organization information and funding information in the source to know about the association in the target One such association is f1, each grant is associated with a company. Thus in target we can associate with each organization a set of fundings The algorithm use logical inference to find all associations represented by referential constraints and a schema relational and nesting structure
  36. F is a set identifier, set of fundings that an organizations tuple has This mapping tells us that if there is a pattern in source data what must be true in the target, if we join grant and a company there must be organization with the name of company as its source, and fundings inside it, with fid equal gid.
  37. V3 does not recognize that grant amounts are associated with specific gids. Using f4 the better mapping would be this
  38. To complete our example, consider v4, there are two ways to associate the grant amount(budget) to the phone, Using f2 supervisor phone or f3 manager phone
  39. Consider this simple mapping An employee in the source has atomic elements A ,B, C , Employee record in the targer: A’, B’, C’, and an extra elemnt E’ A and B are mapped to A’, B’. But E’ and C’ left unmapped. Now what should be the values for C’, E’: 1. When neither used in the schema as contraints: creating null value is sufficient 2. If E’ is a key in target : not nullable, not optional  like employee id: create values using one-to-one Skolem function, E’ depends only on A and B not on C
  40. E’ is the refrence page 224
  41. Target schema contains two levels.
  42. One reason for XSLT is that there are no efficient, robust implementation of Xquery today I give the size of the largest schemas and some idea of compilation/interpretation times
  43. The path in an NRI require matchings, to determine the variables in the path However it is exponential to the size of the path , which is often small . Some matching are not possible because of schema restrictions a Chase step can take exponential (in the worst case, it could be multiple ways of matching a variable in a path)
  44. Primary path (given a schema root R, that is a first level element in the schema): x 1 in g 1 , x 2 in g 2 , …, x n in g n where g 1 is an expression on R (just R?), g i (for i ≥ 2) g 1 is an expression on x i-1 Examples c in companies o in organizations, f in o.fundings Relative path with respect to a variable x x 1 in g 1 , x 2 in g 2 , …, x n in g n where g 1 is an expression on x, g i (for i ≥ 2) g 1 is an expression on x i-1 Example f in o.fundings Given as association, repeatedly applying a chase rule to the &amp;quot;current&amp;quot; association (initialed as the input one) If there is a NRI constraint foreach X exists Y where B such that the &amp;quot;current&amp;quot; association contains X and does not contain a Y that satisfies B then add Y to the generators and B to the where clause Example. If we start with from g in grants then we have to add various components and obtain from g in grants, c in companies, s in contacts, m in contacts where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid
  45. NRI capture relations foreign key and referential constraints as well as xml keyref constraints Referential integrity is essential in this approach as the basis for the discovery of &amp;quot;associations&amp;quot; Given the nested model, they need a rather complex definition Primary path (given a schema root R, that is a first level element in the schema): x 1 in g 1 , x 2 in g 2 , …, x n in g n where g 1 is an expression on R (just R?), g i (for i ≥ 2) g 1 is an expression on x i-1 Examples c in companies o in organizations, f in o.fundings Relative path with respect to a variable x x 1 in g 1 , x 2 in g 2 , …, x n in g n where g 1 is an expression on x, g i (for i ≥ 2) g 1 is an expression on x i-1 Example f in o.fundings
  46. Primary path (given a schema root R, that is a first level element in the schema): x 1 in g 1 , x 2 in g 2 , …, x n in g n where g 1 is an expression on R (just R?), g i (for i ≥ 2) g 1 is an expression on x i-1 Examples c in companies o in organizations, f in o.fundings Relative path with respect to a variable x x 1 in g 1 , x 2 in g 2 , …, x n in g n where g 1 is an expression on x, g i (for i ≥ 2) g 1 is an expression on x i-1 Example f in o.fundings Given as association, repeatedly applying a chase rule to the &amp;quot;current&amp;quot; association (initialed as the input one) If there is a NRI constraint foreach X exists Y where B such that the &amp;quot;current&amp;quot; association contains X and does not contain a Y that satisfies B then add Y to the generators and B to the where clause Example. If we start with from g in grants then we have to add various components and obtain from g in grants, c in companies, s in contacts, m in contacts where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid
  47. Logical association: An association obtained by &amp;quot;chasing&amp;quot; constraints (starting with a structural or a user association) Logical associations are meaningful combinations of correspondences A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them