SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Perm
Processing Provenance and Data on the
Same Data Model through Query
Rewriting
Boris
Glavic
Database Technology Group
Department of Informatics
University of Zurich
glavic@ifi.uzh.ch
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Gustavo Alonso
Systems Group
Department of Computer Science
ETH Zurich
alonso@inf.ethz.ch
2
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Overview
1. Introduction to Perm
2. The Perm Provenance Representation
3. Query Rewriting for Provenance
Computation
4. Perm Implementation
5. Experimental Results
6. Conclusion
3
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
1. Introduction
Query Transformation
Data items: Result relation
Data items: Base relations
 Relational Provenance
4
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
1. Introduction
Query
 Which input data item(s)
influenced which output data
item(s)?
 Granularity
 Tuple
 Attribute Value
 ...
 Contribution semantics
 Influence (Why)
 Copy (Where)
 ...
5
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
 The problem of computing this type of
provenance has been solved before
 See e.g. [Cui, Widom ICDE ‘00]
 but...
 Non-relational representation of provenance
data
 Separation of provenance and “normal” data
 Non-relational computation of provenance
data
1. Introduction
6
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
1. Introduction
 Perm
 Provenance Extension of the Relational
Model
 Provenance Management System
 “Pure” Relational representation of
provenance
 Query result tuples and provenance tuples
are represented as a single relation
7
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
1. Introduction
 Benefits: Provenance can be...
 ... Stored in standard DBMS
 ... Queried using SQL
 ... Directly interpreted by a user
 Direct association between provenance and
“normal data”
8
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
1. Introduction
 Provenance Computation
 -> Use query rewrite
 Given query q
 Generate query q+
 Computes the provenance of all result tuples from
q
9
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
1. Introduction
 Benefits:
 Rewritten query is expressed in relational
algebra
 Can be optimized and executed by a R-DBMS
 E.g. can be stored as a view
 Used as a subquery
10
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Overview
1. Introduction to Perm
2. The Perm Provenance Representation
3. Query Rewriting for Provenance
Computation
4. Perm Implementation
5. Results
6. Conclusion
11
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
2. The Perm Approach
sNam
e
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
12
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
2. The Perm Approach
 Compute the sum of sales for each shop
SELECT sName, sum(price)
FROM sales, items
WHERE itemId = id
GROUP BY sName;
13
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
2. The Perm Approach
sNam
e
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
14
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
2. The Perm Approach
sNam
e
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
15
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
2. The Perm Approach
sNam
e
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
16
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
2. The Perm Approach
 Desired result format:
Original
Attributes
Relation 1
Attributes
Relation n
Attributes
17
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
2. The Perm Approach
name sum(price) P(sName) P(itemId) P(id) P(price)
Migros 120 Migros 1 1 100
Migros 120 Migros 2 2 10
Migros 120 Migros 2 2 10
Coop 10 Coop 3 3 25
Coop 10 Coop 3 3 25
Original result sales items
18
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Overview
1. Introduction to Perm
2. The Perm Provenance Representation
3. Query Rewriting for Provenance
Computation
4. Perm Implementation
5. Results
6. Conclusion
19
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
 Rewrite method basics
 Use algebra representation of the query
 Replace every algebra operator with an
algebra statement that propagates
provenance alongside with the original results
 -> need a rewrite rule for each relational
algebra operator
20
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
 Rewrite process
op3
op1
op2
21
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
 Rewrite process
op3
op1
op2 op3
op1b
op2
op1a
op1c
Apply Rewrite rule
22
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
 Rewrite process
op3
op1b
op2
op1a
op1c
Apply Rewrite rules
23
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
 Rewrite rules notations:
Rewritten statement (query)
Provenance attributes
T +
P(T +
)
24
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
 Rewrite rules example:
SELECT agg, G
FROM T
GROUP BY G
SELECT agg, G, P(T)
FROM
(SELECT agg, G FROM T GROUP BY G) AS agg
LEFT OUTER JOIN
(SELECT G AS G’, P(T) FROM T ) AS prov
ON (G = G’)
+
25
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
 Rewrite rules example:
SELECT sum(revenue) AS sum, shop
FROM sales
GROUP BY shop
shop month revenue
Migros Jan 100
Migros Feb 10
Migros Mar 10
Coop Jan 25
Coop Feb 25
sales
sum shop
120 Migros
50 Coop
result
26
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
3. Query Rewriting for
Provenance Computation
SELECT sum, shop, pShop, pMonth, pRevenue
FROM
(SELECT sum(revenue) AS sum, shop
FROM sales GROUP BY shop) AS agg
LEFT OUTER JOIN
(SELECT shop AS shop’, pShop, pMonth, pRevenue
FROM sales ) AS prov
ON (shop = shop’)
sum shop pShop pMonth pRevenu
e
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
27
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenue
FROM
(SELECT sum(revenue) AS sum, shop
FROM sales GROUP BY shop) AS agg
LEFT OUTER JOIN
(SELECT shop AS shop’, pShop, pMonth, pRevenue
FROM sales ) AS prov
ON (shop = shop’)
3. Query Rewriting for
Provenance Computation
sum shop pShop pMonth pRevenu
e
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
28
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenue
FROM
(SELECT sum(revenue) AS sum, shop
FROM sales GROUP BY shop) AS agg
LEFT OUTER JOIN
(SELECT shop AS shop’, pShop, pMonth, pRevenue
FROM sales ) AS prov
ON (shop = shop’)
3. Query Rewriting for
Provenance Computation
sum shop pShop pMonth pRevenu
e
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
29
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Overview
1. Introduction to Perm
2. The Perm Provenance Representation
3. Query Rewriting for Provenance
Computation
4. Perm Implementation
5. Results
6. Conclusion
30
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
4. Perm Implementation
 Extension of PostgreSQL DBMS
 Implemented inside of PostgreSQL
 -> does not affect client applications
 Extended SQL language
 Perm module
 Implements algebraic rewrite rules as query
rewrites
31
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
4. Perm Implementation
 SQL-PLE: SQL extension
 SELECT PROVENANCE ...
 Nice benefits:
 CREATE VIEW x AS SELECT
PROVENANCE ...
 SELECT PROVENANCE ... INTO x ...
 SELECT ... FROM (SELECT
PROVENANCE ...
32
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
4. Perm Implementation
 Perm Architecture
Parser & Analyser
Rewriter
Perm Module
Planner
Executor
SELECT PROVENANCE ....
Q =...
Q’+ =...
MergeJoin (...
Q’ =...
33
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Overview
1. Introduction to Perm
2. The Perm Provenance Representation
3. Query Rewriting for Provenance
Computation
4. Perm Implementation
5. Experimental Results
6. Conclusion
34
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
5. Experimental Results
 TPC-H benchmark
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
35
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Overview
1. Introduction to Perm
2. The Perm Provenance Representation
3. Query Rewriting for Provenance
Computation
4. Perm Implementation
5. Experimental Results
6. Conclusion
36
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
6. Conclusion
 Benefits
 Compute provenance for SQL
 Full SQL query power for provenance data
 Lazy or eager computation
 Reuse existing database technology
 Supports external provenance
37
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
6. Conclusion
 Future work
 Physical operators for more efficient
provenance computation
 Storage compression
 Include transformation provenance
 Support different contribution semantics
 Support various granularities
38
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Questions
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™Dekompressor „“benötigt.
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™
Dekompressor „“
benötigt.

Weitere ähnliche Inhalte

Ähnlich wie ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

Camunda@1&1
Camunda@1&1Camunda@1&1
Camunda@1&11&1
 
Monitoring von SAP mit check_sap_health
Monitoring von SAP mit check_sap_healthMonitoring von SAP mit check_sap_health
Monitoring von SAP mit check_sap_healthGerhard Lausser
 
The Best Things in Life Are Free – Wie Sie Ihre IBM ConnecEons Umgebung koste...
The Best Things in Life Are Free – Wie Sie Ihre IBM ConnecEons Umgebung koste...The Best Things in Life Are Free – Wie Sie Ihre IBM ConnecEons Umgebung koste...
The Best Things in Life Are Free – Wie Sie Ihre IBM ConnecEons Umgebung koste...Klaus Bild
 
.NET Summit 2016 München: EcmaScript 2015+ with TypeScript
.NET Summit 2016 München: EcmaScript 2015+ with TypeScript.NET Summit 2016 München: EcmaScript 2015+ with TypeScript
.NET Summit 2016 München: EcmaScript 2015+ with TypeScriptManfred Steyer
 
OSMC 2021 | SNMP Monitoring mit Prometheus / OIDs dynamisch auswählen und im ...
OSMC 2021 | SNMP Monitoring mit Prometheus / OIDs dynamisch auswählen und im ...OSMC 2021 | SNMP Monitoring mit Prometheus / OIDs dynamisch auswählen und im ...
OSMC 2021 | SNMP Monitoring mit Prometheus / OIDs dynamisch auswählen und im ...NETWAYS
 
Eine kleine praktische Philosophie über das Requirements Engineering
Eine kleine praktische Philosophie über das Requirements EngineeringEine kleine praktische Philosophie über das Requirements Engineering
Eine kleine praktische Philosophie über das Requirements Engineeringadesso AG
 
Viele Server - Wenig Arbeit: Betriebsautomation bei ImmobilienScout24
Viele Server - Wenig Arbeit: Betriebsautomation bei ImmobilienScout24Viele Server - Wenig Arbeit: Betriebsautomation bei ImmobilienScout24
Viele Server - Wenig Arbeit: Betriebsautomation bei ImmobilienScout24Schlomo Schapiro
 
Best Practices für TDD in JavaScript
Best Practices für TDD in JavaScriptBest Practices für TDD in JavaScript
Best Practices für TDD in JavaScriptSebastian Springer
 
SAP SuccessFactors Architektur und Administration
SAP SuccessFactors Architektur und AdministrationSAP SuccessFactors Architektur und Administration
SAP SuccessFactors Architektur und AdministrationMichael Mueller
 
OSMC 2009 | Entwicklung von Nagios-Plugins mit Net::SNMP und Nagios::Plugin b...
OSMC 2009 | Entwicklung von Nagios-Plugins mit Net::SNMP und Nagios::Plugin b...OSMC 2009 | Entwicklung von Nagios-Plugins mit Net::SNMP und Nagios::Plugin b...
OSMC 2009 | Entwicklung von Nagios-Plugins mit Net::SNMP und Nagios::Plugin b...NETWAYS
 
Dev Day 2019: Kay Grebenstein – Wie wir müssen das noch testen? - design for ...
Dev Day 2019: Kay Grebenstein – Wie wir müssen das noch testen? - design for ...Dev Day 2019: Kay Grebenstein – Wie wir müssen das noch testen? - design for ...
Dev Day 2019: Kay Grebenstein – Wie wir müssen das noch testen? - design for ...DevDay Dresden
 
ICIS User Group - Oberflächentests mittels LCT deklarativ angehen
ICIS User Group - Oberflächentests mittels LCT deklarativ angehenICIS User Group - Oberflächentests mittels LCT deklarativ angehen
ICIS User Group - Oberflächentests mittels LCT deklarativ angehenKai Donato
 
Open Source BPM - iteratec Architekturtag
Open Source BPM - iteratec ArchitekturtagOpen Source BPM - iteratec Architekturtag
Open Source BPM - iteratec Architekturtagcamunda services GmbH
 
Der ultimative PHP Framework Vergleich 2023 Edition
Der ultimative PHP Framework Vergleich 2023 EditionDer ultimative PHP Framework Vergleich 2023 Edition
Der ultimative PHP Framework Vergleich 2023 EditionRalf Eggert
 
Introduction into Oracle Data Pump 11g/12c - Export and Import Data
Introduction into Oracle Data Pump 11g/12c - Export and Import DataIntroduction into Oracle Data Pump 11g/12c - Export and Import Data
Introduction into Oracle Data Pump 11g/12c - Export and Import DataGunther Pippèrr
 
Vorstellung der Aufgabenstellung der lpa GmbH im Rahmen der Ringvorlesung
Vorstellung der Aufgabenstellung der lpa GmbH im Rahmen der RingvorlesungVorstellung der Aufgabenstellung der lpa GmbH im Rahmen der Ringvorlesung
Vorstellung der Aufgabenstellung der lpa GmbH im Rahmen der RingvorlesungCommunity ITmitte.de
 
OSMC 2008 | Monitoring Microsoft SQL Server by Michael Streb
OSMC 2008 | Monitoring Microsoft SQL Server by Michael StrebOSMC 2008 | Monitoring Microsoft SQL Server by Michael Streb
OSMC 2008 | Monitoring Microsoft SQL Server by Michael StrebNETWAYS
 

Ähnlich wie ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting (20)

Camunda@1&1
Camunda@1&1Camunda@1&1
Camunda@1&1
 
check_sap_health
check_sap_healthcheck_sap_health
check_sap_health
 
Monitoring von SAP mit check_sap_health
Monitoring von SAP mit check_sap_healthMonitoring von SAP mit check_sap_health
Monitoring von SAP mit check_sap_health
 
The Best Things in Life Are Free – Wie Sie Ihre IBM ConnecEons Umgebung koste...
The Best Things in Life Are Free – Wie Sie Ihre IBM ConnecEons Umgebung koste...The Best Things in Life Are Free – Wie Sie Ihre IBM ConnecEons Umgebung koste...
The Best Things in Life Are Free – Wie Sie Ihre IBM ConnecEons Umgebung koste...
 
.NET Summit 2016 München: EcmaScript 2015+ with TypeScript
.NET Summit 2016 München: EcmaScript 2015+ with TypeScript.NET Summit 2016 München: EcmaScript 2015+ with TypeScript
.NET Summit 2016 München: EcmaScript 2015+ with TypeScript
 
camunda BPM @ JUG München
camunda BPM @ JUG Münchencamunda BPM @ JUG München
camunda BPM @ JUG München
 
OSMC 2021 | SNMP Monitoring mit Prometheus / OIDs dynamisch auswählen und im ...
OSMC 2021 | SNMP Monitoring mit Prometheus / OIDs dynamisch auswählen und im ...OSMC 2021 | SNMP Monitoring mit Prometheus / OIDs dynamisch auswählen und im ...
OSMC 2021 | SNMP Monitoring mit Prometheus / OIDs dynamisch auswählen und im ...
 
Eine kleine praktische Philosophie über das Requirements Engineering
Eine kleine praktische Philosophie über das Requirements EngineeringEine kleine praktische Philosophie über das Requirements Engineering
Eine kleine praktische Philosophie über das Requirements Engineering
 
Viele Server - Wenig Arbeit: Betriebsautomation bei ImmobilienScout24
Viele Server - Wenig Arbeit: Betriebsautomation bei ImmobilienScout24Viele Server - Wenig Arbeit: Betriebsautomation bei ImmobilienScout24
Viele Server - Wenig Arbeit: Betriebsautomation bei ImmobilienScout24
 
Best Practices für TDD in JavaScript
Best Practices für TDD in JavaScriptBest Practices für TDD in JavaScript
Best Practices für TDD in JavaScript
 
SAP SuccessFactors Architektur und Administration
SAP SuccessFactors Architektur und AdministrationSAP SuccessFactors Architektur und Administration
SAP SuccessFactors Architektur und Administration
 
OSMC 2009 | Entwicklung von Nagios-Plugins mit Net::SNMP und Nagios::Plugin b...
OSMC 2009 | Entwicklung von Nagios-Plugins mit Net::SNMP und Nagios::Plugin b...OSMC 2009 | Entwicklung von Nagios-Plugins mit Net::SNMP und Nagios::Plugin b...
OSMC 2009 | Entwicklung von Nagios-Plugins mit Net::SNMP und Nagios::Plugin b...
 
Dev Day 2019: Kay Grebenstein – Wie wir müssen das noch testen? - design for ...
Dev Day 2019: Kay Grebenstein – Wie wir müssen das noch testen? - design for ...Dev Day 2019: Kay Grebenstein – Wie wir müssen das noch testen? - design for ...
Dev Day 2019: Kay Grebenstein – Wie wir müssen das noch testen? - design for ...
 
ICIS User Group - Oberflächentests mittels LCT deklarativ angehen
ICIS User Group - Oberflächentests mittels LCT deklarativ angehenICIS User Group - Oberflächentests mittels LCT deklarativ angehen
ICIS User Group - Oberflächentests mittels LCT deklarativ angehen
 
Open Source BPM - iteratec Architekturtag
Open Source BPM - iteratec ArchitekturtagOpen Source BPM - iteratec Architekturtag
Open Source BPM - iteratec Architekturtag
 
Der ultimative PHP Framework Vergleich 2023 Edition
Der ultimative PHP Framework Vergleich 2023 EditionDer ultimative PHP Framework Vergleich 2023 Edition
Der ultimative PHP Framework Vergleich 2023 Edition
 
Introduction into Oracle Data Pump 11g/12c - Export and Import Data
Introduction into Oracle Data Pump 11g/12c - Export and Import DataIntroduction into Oracle Data Pump 11g/12c - Export and Import Data
Introduction into Oracle Data Pump 11g/12c - Export and Import Data
 
Vorstellung der Aufgabenstellung der lpa GmbH im Rahmen der Ringvorlesung
Vorstellung der Aufgabenstellung der lpa GmbH im Rahmen der RingvorlesungVorstellung der Aufgabenstellung der lpa GmbH im Rahmen der Ringvorlesung
Vorstellung der Aufgabenstellung der lpa GmbH im Rahmen der Ringvorlesung
 
camunda BPM launch party - 04.2013
camunda BPM launch party - 04.2013camunda BPM launch party - 04.2013
camunda BPM launch party - 04.2013
 
OSMC 2008 | Monitoring Microsoft SQL Server by Michael Streb
OSMC 2008 | Monitoring Microsoft SQL Server by Michael StrebOSMC 2008 | Monitoring Microsoft SQL Server by Michael Streb
OSMC 2008 | Monitoring Microsoft SQL Server by Michael Streb
 

Mehr von Boris Glavic

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...Boris Glavic
 
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...Boris Glavic
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata GeneratorBoris Glavic
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...Boris Glavic
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...Boris Glavic
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSONBoris Glavic
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationBoris Glavic
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceBoris Glavic
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...Boris Glavic
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"Boris Glavic
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"Boris Glavic
 
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"Boris Glavic
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningBoris Glavic
 
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...Boris Glavic
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanBoris Glavic
 

Mehr von Boris Glavic (17)

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
 
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database Virtualization
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
 
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data Mining
 
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 

ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

  • 1. Perm Processing Provenance and Data on the Same Data Model through Query Rewriting Boris Glavic Database Technology Group Department of Informatics University of Zurich glavic@ifi.uzh.ch Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Gustavo Alonso Systems Group Department of Computer Science ETH Zurich alonso@inf.ethz.ch
  • 2. 2 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Overview 1. Introduction to Perm 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Perm Implementation 5. Experimental Results 6. Conclusion
  • 3. 3 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 1. Introduction Query Transformation Data items: Result relation Data items: Base relations  Relational Provenance
  • 4. 4 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 1. Introduction Query  Which input data item(s) influenced which output data item(s)?  Granularity  Tuple  Attribute Value  ...  Contribution semantics  Influence (Why)  Copy (Where)  ...
  • 5. 5 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.  The problem of computing this type of provenance has been solved before  See e.g. [Cui, Widom ICDE ‘00]  but...  Non-relational representation of provenance data  Separation of provenance and “normal” data  Non-relational computation of provenance data 1. Introduction
  • 6. 6 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 1. Introduction  Perm  Provenance Extension of the Relational Model  Provenance Management System  “Pure” Relational representation of provenance  Query result tuples and provenance tuples are represented as a single relation
  • 7. 7 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 1. Introduction  Benefits: Provenance can be...  ... Stored in standard DBMS  ... Queried using SQL  ... Directly interpreted by a user  Direct association between provenance and “normal data”
  • 8. 8 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 1. Introduction  Provenance Computation  -> Use query rewrite  Given query q  Generate query q+  Computes the provenance of all result tuples from q
  • 9. 9 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 1. Introduction  Benefits:  Rewritten query is expressed in relational algebra  Can be optimized and executed by a R-DBMS  E.g. can be stored as a view  Used as a subquery
  • 10. 10 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Overview 1. Introduction to Perm 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Perm Implementation 5. Results 6. Conclusion
  • 11. 11 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 2. The Perm Approach sNam e itemID Migros 1 Migros 2 Migros 2 Coop 3 Coop 3 id price 1 100 2 10 3 25 itemssales
  • 12. 12 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 2. The Perm Approach  Compute the sum of sales for each shop SELECT sName, sum(price) FROM sales, items WHERE itemId = id GROUP BY sName;
  • 13. 13 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 2. The Perm Approach sNam e itemID Migros 1 Migros 2 Migros 2 Coop 3 Coop 3 id price 1 100 2 10 3 25 itemssales name Sum(price) Migros 120 Coop 50 result
  • 14. 14 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 2. The Perm Approach sNam e itemID Migros 1 Migros 2 Migros 2 Coop 3 Coop 3 id price 1 100 2 10 3 25 itemssales name Sum(price) Migros 120 Coop 50 result
  • 15. 15 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 2. The Perm Approach sNam e itemID Migros 1 Migros 2 Migros 2 Coop 3 Coop 3 id price 1 100 2 10 3 25 itemssales name Sum(price) Migros 120 Coop 50 result
  • 16. 16 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 2. The Perm Approach  Desired result format: Original Attributes Relation 1 Attributes Relation n Attributes
  • 17. 17 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 2. The Perm Approach name sum(price) P(sName) P(itemId) P(id) P(price) Migros 120 Migros 1 1 100 Migros 120 Migros 2 2 10 Migros 120 Migros 2 2 10 Coop 10 Coop 3 3 25 Coop 10 Coop 3 3 25 Original result sales items
  • 18. 18 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Overview 1. Introduction to Perm 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Perm Implementation 5. Results 6. Conclusion
  • 19. 19 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 3. Query Rewriting for Provenance Computation  Rewrite method basics  Use algebra representation of the query  Replace every algebra operator with an algebra statement that propagates provenance alongside with the original results  -> need a rewrite rule for each relational algebra operator
  • 20. 20 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 3. Query Rewriting for Provenance Computation  Rewrite process op3 op1 op2
  • 21. 21 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 3. Query Rewriting for Provenance Computation  Rewrite process op3 op1 op2 op3 op1b op2 op1a op1c Apply Rewrite rule
  • 22. 22 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 3. Query Rewriting for Provenance Computation  Rewrite process op3 op1b op2 op1a op1c Apply Rewrite rules
  • 23. 23 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 3. Query Rewriting for Provenance Computation  Rewrite rules notations: Rewritten statement (query) Provenance attributes T + P(T + )
  • 24. 24 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 3. Query Rewriting for Provenance Computation  Rewrite rules example: SELECT agg, G FROM T GROUP BY G SELECT agg, G, P(T) FROM (SELECT agg, G FROM T GROUP BY G) AS agg LEFT OUTER JOIN (SELECT G AS G’, P(T) FROM T ) AS prov ON (G = G’) +
  • 25. 25 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 3. Query Rewriting for Provenance Computation  Rewrite rules example: SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop shop month revenue Migros Jan 100 Migros Feb 10 Migros Mar 10 Coop Jan 25 Coop Feb 25 sales sum shop 120 Migros 50 Coop result
  • 26. 26 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 3. Query Rewriting for Provenance Computation SELECT sum, shop, pShop, pMonth, pRevenue FROM (SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS agg LEFT OUTER JOIN (SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS prov ON (shop = shop’) sum shop pShop pMonth pRevenu e 120 Migros Migros Jan 100 120 Migros Migros Feb 10 120 Migros Migros Mar 10 50 Coop Coop Jan 25 50 Coop Coop Feb 25 +
  • 27. 27 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. SELECT sum, shop, pShop, pMonth, pRevenue FROM (SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS agg LEFT OUTER JOIN (SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS prov ON (shop = shop’) 3. Query Rewriting for Provenance Computation sum shop pShop pMonth pRevenu e 120 Migros Migros Jan 100 120 Migros Migros Feb 10 120 Migros Migros Mar 10 50 Coop Coop Jan 25 50 Coop Coop Feb 25 +
  • 28. 28 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. SELECT sum, shop, pShop, pMonth, pRevenue FROM (SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS agg LEFT OUTER JOIN (SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS prov ON (shop = shop’) 3. Query Rewriting for Provenance Computation sum shop pShop pMonth pRevenu e 120 Migros Migros Jan 100 120 Migros Migros Feb 10 120 Migros Migros Mar 10 50 Coop Coop Jan 25 50 Coop Coop Feb 25 +
  • 29. 29 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Overview 1. Introduction to Perm 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Perm Implementation 5. Results 6. Conclusion
  • 30. 30 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 4. Perm Implementation  Extension of PostgreSQL DBMS  Implemented inside of PostgreSQL  -> does not affect client applications  Extended SQL language  Perm module  Implements algebraic rewrite rules as query rewrites
  • 31. 31 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 4. Perm Implementation  SQL-PLE: SQL extension  SELECT PROVENANCE ...  Nice benefits:  CREATE VIEW x AS SELECT PROVENANCE ...  SELECT PROVENANCE ... INTO x ...  SELECT ... FROM (SELECT PROVENANCE ...
  • 32. 32 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 4. Perm Implementation  Perm Architecture Parser & Analyser Rewriter Perm Module Planner Executor SELECT PROVENANCE .... Q =... Q’+ =... MergeJoin (... Q’ =...
  • 33. 33 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Overview 1. Introduction to Perm 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Perm Implementation 5. Experimental Results 6. Conclusion
  • 34. 34 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 5. Experimental Results  TPC-H benchmark Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.
  • 35. 35 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Overview 1. Introduction to Perm 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Perm Implementation 5. Experimental Results 6. Conclusion
  • 36. 36 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 6. Conclusion  Benefits  Compute provenance for SQL  Full SQL query power for provenance data  Lazy or eager computation  Reuse existing database technology  Supports external provenance
  • 37. 37 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. 6. Conclusion  Future work  Physical operators for more efficient provenance computation  Storage compression  Include transformation provenance  Support different contribution semantics  Support various granularities
  • 38. 38 Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Questions Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Zur Anzeige wird der QuickTime™Dekompressor „“benötigt. Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt. Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.

Hinweis der Redaktion

  1. ICDE 2009: 22.5 minutes: Welcome to my talk, my name, I’m from DBTG, in Assoc. With Gustavo ETH Systems Group
  2. The outline of the talk is 1. Introduc. What is provenance 2. Present a relational representation of provenance info and why this is good 3. Show how to produce such provenance information
  3. We are focusing on the problem of provenance for relational database systems. Where data items are e.g. tuples and transformation are queries, view definitions, user defined functions, etc.
  4. The main problem faced can then be stated as: Which input... This problem can be solved for different levels of granularity of data items: Tuples, Attribute Values and so on. (We are looking at tuple level granularity) -different definitions of what influences means (we call this contribution semantics) for example only tuples that have been copied literally from the source to the result. (We are looking at influence contribution semantics which also have been called Why-Provenance -add related work slide (my approach)
  5. Problem solved but, -used non-relational representation of provenance which leads to a number of problems -computation returns only provenance data -> assocation between provenance and normal data is lost -not possible to store provenance in an unmodified relational data base -different data model for provenance and normal data -> so cannot reuse query language and processing of relational systems -computation requires completely new system or at least some middleware that computes the provenance and is limited to subset of SQL ? Leave out other disadvantages that are hard to explain without more preliminaries (e.g. more user friendly to include complete tuples)
  6. Problem solved but, -used non-relational representation of provenance which leads to a number of problems -computation returns only provenance data -> assocation between provenance and normal data is lost -not possible to store provenance in an unmodified relational data base -different data model for provenance and normal data -> so cannot reuse query language and processing of relational systems -computation requires completely new system or at least some middleware that computes the provenance and is limited to subset of SQL ? Leave out other disadvantages that are hard to explain without more preliminaries (e.g. more user friendly to include complete tuples)
  7. Explain benefits. Directly interpreted by user because complete provenance tuples in result So nice we have this benefitial format, but how to create it?
  8. Problem solved but, -used non-relational representation of provenance which leads to a number of problems -computation returns only provenance data -> assocation between provenance and normal data is lost -not possible to store provenance in an unmodified relational data base -different data model for provenance and normal data -> so cannot reuse query language and processing of relational systems -computation requires completely new system or at least some middleware that computes the provenance and is limited to subset of SQL ? Leave out other disadvantages that are hard to explain without more preliminaries (e.g. more user friendly to include complete tuples)
  9. So lets see what we got from using this type of computation: -same language for query and provenance computation: -we can feed this into optimizer of a normal dbms -and thus store it as a view or use it as a subquery (this fulfills the need for querying provenance and data using the same query language!)
  10. Know how to represent provenance infomation
  11. Lets have an example: we have a sales database with shops and items that were sold and items with an id and price
  12. Consider the following query that computes the sum of sales for each shop
  13. Here is the result of this query for the given table instances
  14. So if we want to know from which tuples the result tuple Migors,120 is derived from intutivelly that are...
  15. Sales with sName “Migros” and all the item tuples for items sold there (we use a formal definition introduced by Cui and Widom to check if a tuple bleongs to the provenace)
  16. We want to present the normal results of a query together with the provenance as a single relation, which contains complete result tuples with attached provenance tuples
  17. So back to our example you can see for each tuple of the result we can directly see which tuples influenced which tuple
  18. Our solution is to use query rewriting for computation
  19. This rewrites is performed on the algebraic representation of query q, by replacing every algebra operator of the original query wiht Algebra statement that propagates the provenance alongside with the original result. So we need a rewrite rule for each algebra operator (simplifies things, e.g. incremental computation because of recursive definition)
  20. This rewrites is performed on the algebraic representation of query q, by replacing every algebra operator of the original query wiht Algebra statement that propagates the provenance alongside with the original result. So we need a rewrite rule for each algebra operator (simplifies things, e.g. incremental computation because of recursive definition)
  21. This rewrites is performed on the algebraic representation of query q, by replacing every algebra operator of the original query wiht Algebra statement that propagates the provenance alongside with the original result. So we need a rewrite rule for each algebra operator (simplifies things, e.g. incremental computation because of recursive definition)
  22. This rewrites is performed on the algebraic representation of query q, by replacing every algebra operator of the original query wiht Algebra statement that propagates the provenance alongside with the original result. So we need a rewrite rule for each algebra operator (simplifies things, e.g. incremental computation because of recursive definition)
  23. Before I show you an example of how this rewrite rules look like, some notional preliminaries: The + operator stands for the rewrite operation P(T+) of a rewritten statement is the list of provenance attributes in the result of the rewritten statement Bold characters are use to denote the schema of a relation or algebra statement A arrow b means rename attr a to b (we use this for lists of attributes too)
  24. Lets have a short example: (the + operator transforms a operator or algebra statement into a provenance computation) / P is a list of provenance attributes of an algebra expression. Here we have the rewrite rule for aggregation operator -SQL! Fast too to intro
  25. Lets have a short example: (the + operator transforms a operator or algebra statement into a provenance computation) / P is a list of provenance attributes of an algebra expression. Here we have the rewrite rule for aggregation operator -SQL! Fast too to intro
  26. Lets have a short example: (the + operator transforms a operator or algebra statement into a provenance computation) / P is a list of provenance attributes of an algebra expression. Here we have the rewrite rule for aggregation operator -SQL! Fast too to intro
  27. Lets have a short example: (the + operator transforms a operator or algebra statement into a provenance computation) / P is a list of provenance attributes of an algebra expression. Here we have the rewrite rule for aggregation operator -SQL! Fast too to intro
  28. Lets have a short example: (the + operator transforms a operator or algebra statement into a provenance computation) / P is a list of provenance attributes of an algebra expression. Here we have the rewrite rule for aggregation operator -SQL! Fast too to intro
  29. Enough about the theory, now to the implementation -> perm -> permimplementation
  30. Our implementation of this principles is called Perm (which, besides beeing an geological area stands for Provenance extension of the relational model. It it implemented as an extension of Postgres. The provenance computation is triggered by the use of a few additional SQl-key-words we added. The algebraic rewrite rules are implemented as query rewriting (some pattern matching need because we have a QGM-like structures here)
  31. Lets have a short look at the SQL-extension we are using. The keyword PROVENANCE after SELECT is used to indicate that this query block (and of cause all contained query blocks) should be rewritten. This suffices to for example store a provenance computation as a view, store the provenance into a table (using SQL into) or use provenance computation in a subquery)
  32. In the postgres system the major change we did was to add a new modul directly above the planer module. The input to this module is a rewritten query graph (here rewritting is basically view unfolding). The module checks if the incoming query has parts that should be rewritten, if necessary applies the rewrites and send the rewritten query graph to the planer which chooses an execution plan and calls the executor the executes the query and returns results to the client
  33. 1. Result slide (TPCH) explain everything works, early version of Perm
  34. 1. Result slide (TPCH) explain everything works, early version of Perm
  35. 1. Result slide (TPCH) explain everything works, early version of Perm
  36. I hope I convinced you the query rewrite techniques implemented in Perm allow .... (benefits) But there are also disadvantages: the representation we use might store a lot of redundent information The performace is limited for some types of queries, because for some operators it is not possible to propagate the provenance withou introducing extra joins) -more merchandise (wholde of SQL), not disad. But open issues
  37. So what we are doing now or like to do: -implement physical operators for provenance computation (performance) -test how some of the proposed storage compression mechanisms for provenance data can be integrated into our system and what the ROI -store and compute information about the queries or other kind of transformations and integrate query language support for this -support different contribution semantics and granularities -for flexiblity -and as further prove that our approach is feasible