Full Disjunctions: Polynomial-Delay Iterators in Action
1. Full Disjunctions : Polynomial-Delay Iterators in Action VLDB 2006 Seoul, Korea Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University Israel Yehoshua Sagiv Hebrew University Israel Itzhak Fadida Technion Israel
2.
3.
4.
5. The Natural Join Operator Climates Accommodations Sites Climates Accommodations Sites Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
6. The Natural Join Misses Information Climates Accommodations Sites Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada Stars Hotel Climate City Site Country
7. The Natural Join Misses Information Climates Accommodations Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it Mouth Logan is not in a city, hence missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada Empty space means null value
8. The Natural Join Misses Information Climates Accommodations A looser notion of join is needed — one that enables joining tuples from some of the tables Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it Mouth Logan is not in a city, hence missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada
9. The Natural Join Operator Climates Accommodations Sites Climates Accommodations Sites A tuple of the join corresponds to a set of tuples from the source relations Join consistent Connected No Cartesian product Complete One tuple from each relation Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
10. Join-Consistent Sets of Tuples A set T of tuples is join-consistent if every two tuples of T are join-consistent Two tuples t 1 and t 2 are join-consistent if for every common attribute A : 1. t 1 [ A ] and t 2 [ A ] are non-null 2. t 1 [ A ] = t 2 [ A ] Ramada London Canada Stars Hotel City Country Air Show London Canada Site City Country
11.
12. Natural Join (w/o Cartesian Product) Each tuple of the result corresponds to a set T of tuples from the source relations T is join consistent 1. T is connected No Cartesian product 2. T is complete One tuple from each relation 3. JCC
13. Full Disjunction (Galindo-Legaria 1994) T is join consistent 1. Each tuple of the result corresponds to a set T of tuples from the source relations T is connected No Cartesian product 2. T is complete One tuple from each relation 3. T is maximal Not properly contained in any JCC set 3. JCC
14. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country
15. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada
16. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada
17. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada
18. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
19. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
20. Padding Joined Tuple Sets with Nulls Mouth Logan Canada Site City Country diverse Canada Climate Country Mouth Logan diverse Canada Stars Hotel Climate City Site Country
21. The Outerjoin Operator The outerjoin of two relations R 1 and R 2 R 1 R 2 The natural join R 1 R 2 and, in addition, all dangling tuples padded with nulls
22. Example of an Outerjoin Climates Accommodations temperate UK tropical Bahamas diverse Canada Climate Country 4 Atala Paris France Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country temperate UK Hilton Nassau tropical Bahamas diverse Climate Paris Toronto City Atala Plaza Hotel 4 France 4 Canada Stars Country Climates Accommodations
23. Combining Relations using Outerjoins The outerjoin operator is not associative For more than two relations, the result depends on the order in which the outerjoin is applied In general, outerjoins cannot maximally combine relations (no matter what order is used) Outerjoin is not suitable for combining more than two relations !
24.
25. Efficiency of Evaluation The full-disjunction operator (as well as other operators like the Cartesian product or the natural join ) can generate an exponential (in the input size) number of tuples Polynomial running time is not a suitable yardstick The usual notion: Polynomial time in the combined size of the input and the output
26. History of Algorithms for Full Disjunctions n : N : F : number of relations number of tuples in the DB number of tuples in the FD This paper: linear dependence on F F is typically very large Can be exponential in the size of the database Source Time Databases RU96 O ( n + F 2 ) -acyclic KS03 O ( n 5 N 2 F 2 ) general CS05 O ( n 3 N F 2 ) “ incremental polynomial” general
27. Polynomial Delay One way to obtain an evaluation with a running time linear in the output is to devise an algorithm that acts as an iterator with an efficient next () operator, that is, An enumeration algorithm that runs with polynomial delay An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input time
28.
29.
30. Main Contributions 1. First algorithm for computing full disjunctions with polynomial delay 2. First algorithm for computing full disjunctions in time linear in the output 3. A general optimization technique for computing full disjunctions Division into biconnected components Substantial improvement over the state-of-art is proved theoretically and experimentally
31.
32. Our Algorithms Algorithm NLOJ Tree Schemes Algorithm PDelayFD General Schemes Division into Biconnected Components Optimization Algorithm BiComNLOJ Main Algorithm − General Schemes Combine
33.
34. Tree Schemes Scheme graphs w/o cycles In the scheme graph , the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes R 1 R 2 R 3 R 4 R 5 R 6 R 7
35. Left-Deep Sequence of Outerjoins R : a set of relations with a tree scheme R 1 ,…, R n : a connected-prefix order of R 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order FD ( R ) = (…(( R 1 R 2 ) R 3 ) …) R n Proposition: Algorithm NLOJ ( N ested L oop O uter J oin)
36. Connected-Prefix Order of Relations R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 1 R 3 R 2 R 7 R 4 R 5 R 6 A connected-prefix order of relations: Each prefix forms a (connected) subtree
37. Achieving Polynomial Delay 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order R 1 … Problem: exp. delay Solution: use iterators Algorithm NLOJ ( N ested L oop O uter J oin) R 2 R 3 R n -1 R n Already exponential size !
38.
39. Using Iterators for Outerjoins R 1 … Iterator 1 Iterator n Iterator 2 Iterator n -1 R 2 R 3 R n -1 R n
40. Outerjoins are not Always Applicable It is not always possible to formulate a full disjunction as a left-deep sequence of outerjoins Rajaraman and Ullman [PODS 96] : Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)
41.
42.
43. Shifting a Maximal JCC Tuple Set T t -shifting T : t t t t -shift of T 1. Add t to T 2. Extract max. JCC subset containing t 3. Extend to a maximal JCC set T
44. Algorithm PDelayFD Validate that the t -shift is not already in Q or C 1. Generate a max. JCC set T 0 2. Insert T 0 into Q Repeat until Q is empty : 1. Move some T from Q to C 2. Print the join of T , padded with nulls 3. Insert into Q a t -shift of T for all tuples t in the database Output : … PDelayFD ( R ) computes FD ( R ) with polynomial delay C Q Theorem:
45.
46.
47. Biconnected Components R 1 R 2 R 3 R 4 R 7 R 5 R 6 R 8 Biconnected component : A maximal subset B of relations, s.t. the scheme graph has two (or more) disjoint paths between every two relations of B R 1 R 2 R 4 R 7 R 8 R 9 R 5 R 6 R 3
48. Left-Deep Sequence of Outerjoins R : a set of relations Theorem: Optimized Algorithm: 1. Compute the biconnected components of R 2. Compute the full disjunction of each component 3. Apply outerjoins in a suitable order There exists an (efficiently computable) order B 1 ,…, B k of the biconnected components of R , s.t . FD ( R ) = (…(( FD ( B 1 ) FD ( B 2 )) …) FD ( B k )
49. BiComNLOJ : a Naïve Attempt 1. Divide R into biconnected components -> B 1 ,… B k in a suitable order 2. Compute FD ( B 1 ) ,…, FD ( B k ) − using PDelayFD 3. U sing NLOJ , compute (…(( FD ( B 1 ) FD ( B 2 )) …) FD ( B k ) Each FD ( B i ) can be exponential in the input Non-polynomial delay! Solution: Iterator Iterator Iterator
50.
51.
52.
53.
54. State-of-Art vs. Main Algorithm Number of Tuples in each Relation Average Delay (msec) IncrementalFD (state of art, CS05) BiComNJOJ our main algorithm BiComNLOJ is a substantial improvement over the state-of-art Scheme 1 Scheme 2 Scheme 3
55. Division into Biconnected Components Number of Tuples in each Relation Average Delay (msec) Division reduces delays (amount depends on the scheme) PDelayFD (no division to b.c.c.) BiComNJOJ our main algorithm Scheme 1 Scheme 2 Scheme 3
56. Behavior of Delay IncrementalFD (state of art, CS05) BiComNJOJ our main algorithm Tuple Number Delay (msec) Measure the delay before each generated tuple While IncrementalFD has a slowdown , the delay of BiComNLOJ remains almost constant
57.
58. Summary Full Disjunction : An associative extension of the outerjoin operator to an arbitrary number of relations 3 Algorithms for computing FD: NLOJ N ested- L oop O uter j oin Tree-Structured Schemes PDelayFD P olynomial- Delay F ull D isjunction General Schemes BiComNLOJ Combine first 2, deploy div. into bi connected com ponents General Schemes