ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

Perm
Processing Provenance and Data on the
Same Data Model through Query
Rewriting
Boris
Glavic
Database Technology Group
Department of Informatics
University of Zurich
glavic@ifi.uzh.ch
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Gustavo Alonso
Systems Group
Department of Computer Science
ETH Zurich
alonso@inf.ethz.ch

2
Dekompressor „“
benötigt.
Overview
1. Introduction to Perm
2. The Perm Provenance Representation
3. Query Rewriting for Provenance
Computation
4. Perm Implementation
5. Experimental Results
6. Conclusion

3
Dekompressor „“
benötigt.
1. Introduction
Query Transformation
Data items: Result relation
Data items: Base relations
 Relational Provenance

4
Dekompressor „“
benötigt.
1. Introduction
Query
 Which input data item(s)
influenced which output data
item(s)?
 Granularity
 Tuple
 Attribute Value
 ...
 Contribution semantics
 Influence (Why)
 Copy (Where)
 ...

5
Dekompressor „“
benötigt.
 The problem of computing this type of
provenance has been solved before
 See e.g. [Cui, Widom ICDE ‘00]
 but...
 Non-relational representation of provenance
data
 Separation of provenance and “normal” data
 Non-relational computation of provenance
data
1. Introduction

6
Dekompressor „“
benötigt.
1. Introduction
 Perm
 Provenance Extension of the Relational
Model
 Provenance Management System
 “Pure” Relational representation of
provenance
 Query result tuples and provenance tuples
are represented as a single relation

7
Dekompressor „“
benötigt.
1. Introduction
 Benefits: Provenance can be...
 ... Stored in standard DBMS
 ... Queried using SQL
 ... Directly interpreted by a user
 Direct association between provenance and
“normal data”

8
Dekompressor „“
benötigt.
1. Introduction
 Provenance Computation
 -> Use query rewrite
 Given query q
 Generate query q+
 Computes the provenance of all result tuples from
q

9
Dekompressor „“
benötigt.
1. Introduction
 Benefits:
 Rewritten query is expressed in relational
algebra
 Can be optimized and executed by a R-DBMS
 E.g. can be stored as a view
 Used as a subquery

10
Dekompressor „“
benötigt.
Overview
Computation
5. Results
6. Conclusion

11
Dekompressor „“
benötigt.
2. The Perm Approach
sNam
e
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales

12
Dekompressor „“
benötigt.
 Compute the sum of sales for each shop
SELECT sName, sum(price)
FROM sales, items
WHERE itemId = id
GROUP BY sName;

13
Dekompressor „“
benötigt.
sNam
e
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result

14
Dekompressor „“
benötigt.
sNam
e
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result

15
Dekompressor „“
benötigt.
sNam
e
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result

16
Dekompressor „“
benötigt.
 Desired result format:
Original
Attributes
Relation 1
Attributes
Relation n
Attributes

17
Dekompressor „“
benötigt.
name sum(price) P(sName) P(itemId) P(id) P(price)
Migros 120 Migros 1 1 100
Coop 10 Coop 3 3 25
Coop 10 Coop 3 3 25
Original result sales items

18
Dekompressor „“
benötigt.
Overview
Computation
5. Results
6. Conclusion

19
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
 Rewrite method basics
 Use algebra representation of the query
 Replace every algebra operator with an
algebra statement that propagates
provenance alongside with the original results
 -> need a rewrite rule for each relational
algebra operator

20
Dekompressor „“
benötigt.
 Rewrite process
op3
op1
op2

21
Dekompressor „“
benötigt.
 Rewrite process
op3
op1
op2 op3
op1b
op2
op1a
op1c
Apply Rewrite rule

22
Dekompressor „“
benötigt.
 Rewrite process
op3
op1b
op2
op1a
op1c
Apply Rewrite rules

23
Dekompressor „“
benötigt.
 Rewrite rules notations:
Rewritten statement (query)
Provenance attributes
T +
P(T +
)

24
Dekompressor „“
benötigt.
 Rewrite rules example:
SELECT agg, G
FROM T
GROUP BY G
SELECT agg, G, P(T)
FROM
(SELECT agg, G FROM T GROUP BY G) AS agg
LEFT OUTER JOIN
(SELECT G AS G’, P(T) FROM T ) AS prov
ON (G = G’)
+

25
Dekompressor „“
benötigt.
 Rewrite rules example:
SELECT sum(revenue) AS sum, shop
FROM sales
GROUP BY shop
shop month revenue
Migros Jan 100
Migros Feb 10
Migros Mar 10
Coop Jan 25
Coop Feb 25
sales
sum shop
120 Migros
50 Coop
result

26
Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenue
FROM
(SELECT sum(revenue) AS sum, shop
FROM sales GROUP BY shop) AS agg
LEFT OUTER JOIN
(SELECT shop AS shop’, pShop, pMonth, pRevenue
FROM sales ) AS prov
ON (shop = shop’)
sum shop pShop pMonth pRevenu
e
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+

27
Dekompressor „“
benötigt.
FROM
LEFT OUTER JOIN
ON (shop = shop’)
e
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+

28
Dekompressor „“
benötigt.
FROM
LEFT OUTER JOIN
ON (shop = shop’)
e
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+

29
Dekompressor „“
benötigt.
Overview
Computation
5. Results
6. Conclusion

30
Dekompressor „“
benötigt.
 Extension of PostgreSQL DBMS
 Implemented inside of PostgreSQL
 -> does not affect client applications
 Extended SQL language
 Perm module
 Implements algebraic rewrite rules as query
rewrites

31
Dekompressor „“
benötigt.
 SQL-PLE: SQL extension
 SELECT PROVENANCE ...
 Nice benefits:
 CREATE VIEW x AS SELECT
PROVENANCE ...
 SELECT PROVENANCE ... INTO x ...
 SELECT ... FROM (SELECT
PROVENANCE ...

32
Dekompressor „“
benötigt.
 Perm Architecture
Parser & Analyser
Rewriter
Perm Module
Planner
Executor
SELECT PROVENANCE ....
Q =...
Q’+ =...
MergeJoin (...
Q’ =...

33
Dekompressor „“
benötigt.
Overview
Computation
6. Conclusion

34
Dekompressor „“
benötigt.
 TPC-H benchmark
Dekompressor „“
benötigt.
Dekompressor „“
benötigt.

35
Dekompressor „“
benötigt.
Overview
Computation
6. Conclusion

36
Dekompressor „“
benötigt.
6. Conclusion
 Benefits
 Compute provenance for SQL
 Full SQL query power for provenance data
 Lazy or eager computation
 Reuse existing database technology
 Supports external provenance

37
Dekompressor „“
benötigt.
6. Conclusion
 Future work
 Physical operators for more efficient
provenance computation
 Storage compression
 Include transformation provenance
 Support different contribution semantics
 Support various granularities

38
Dekompressor „“
benötigt.
Questions
Dekompressor „“
benötigt.
Dekompressor „“
benötigt.
Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™Dekompressor „“benötigt.
Dekompressor „“
benötigt.
Dekompressor „“
benötigt.

ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

Ähnlich wie ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting (20)

Mehr von Boris Glavic

Mehr von Boris Glavic (17)

ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

Hinweis der Redaktion