8. Iterative Process LOAD (user, url) LOAD (url, pagerank) Joining on right attribute? JOIN on url GROUP on user TRANSFORM user, canonicalize(url) TRANSFORM user, AVG(pagerank) Bug in UDF canonicalize? Everything being filtered out? FILTER avgPR> 0.5 No Output
9. How to do test runs? Run with real data Too inefficient (TBs of data) Create smaller data sets (e.g., by sampling) Empty results due to joins [Chaudhuri et. al. 99], and selective filters Biased sampling for joins Indexes not always present
11. Value Addition From Examples Examples can be used for Debugging Understanding a program written by someone else Learning a new operator, or language
13. Good Examples: Consistency LOAD (user, url) LOAD (url, pagerank) (Amy, cnn.com) (Amy, http://www.frogs.com) (Fred, www.snails.com/index.html) JOIN on url GROUP on user 0. Consistency TRANSFORM user, canonicalize(url) TRANSFORM user, AVG(pagerank) output example = operator applied on input example (Amy, www.cnn.com) (Amy, www.frogs.com) (Fred, www.snails.com) FILTER avgPR> 0.5
14. Good Examples: Realism LOAD (user, url) LOAD (url, pagerank) (Amy, cnn.com) (Amy, http://www.frogs.com) (Fred, www.snails.com/index.html) JOIN on url GROUP on user 1. Realism TRANSFORM user, canonicalize(url) TRANSFORM user, AVG(pagerank) (Amy, www.cnn.com) (Amy, www.frogs.com) (Fred, www.snails.com) Formalization: Fraction of examples that are real or are derived from real records FILTER avgPR> 0.5
15. Good Examples: Completeness LOAD (user, url) LOAD (url, pagerank) 2. Completeness JOIN on url Demonstrate the salient properties of each operator, e.g., FILTER GROUP on user TRANSFORM user, canonicalize(url) TRANSFORM user, AVG(pagerank) (Amy, 0.6) (Fred, 0.4) FILTER avgPR> 0.5 (Amy, 0.6)
16. Good Examples: Completeness (www.cnn.com, 0.9) (www.frogs.com, 0.3) (www.snails.com, 0.4) LOAD (user, url) LOAD (url, pagerank) JOIN on url (Amy, www.cnn.com, 0.9) (Amy, www.frogs.com, 0.3) (Fred, www.snails.com, 0.4) GROUP on user TRANSFORM user, canonicalize(url) 2. Completeness TRANSFORM user, AVG(pagerank) (Amy, www.cnn.com) (Amy, www.frogs.com) (Fred, www.snails.com) Demonstrate the salient properties of each operator, e.g., JOIN FILTER avgPR> 0.5
20. Formalizing Completeness Operator Completeness: Fraction of equivalence classes that have at least one example record. Overall Completeness: Average of per-operator completeness.
21. Good Examples: Conciseness LOAD (user, url) LOAD (url, pagerank) 3. Conciseness (Amy, cnn.com) (Amy, http://www.frogs.com) (Fred, www.snails.com/index.html) JOIN on url Operator Conciseness: # equivalence classes # example records GROUP on user TRANSFORM user, canonicalize(url) Overall Conciseness: Average of per-operator conciseness TRANSFORM user, AVG(pagerank) (Amy, www.cnn.com) (Amy, www.frogs.com) (Fred, www.snails.com) FILTER avgPR> 0.5
23. Related Work Related Areas: Reverse Query Processing Database Testing Software and Hardware Verification Differences Realism not a concern Notion of conciseness is different Intermediate result size is immaterial
24. Strawman I: Downstream Propagation Take some portion of input data and run the program over it. 1. Realism 2. Completeness 3. Conciseness
25. Strawman II: Upstream Propagation Start from what output is desired, and work backwards 1. Realism 2. Completeness 3. Conciseness
31. Formalization of Pruning Example Records Elements Equivalence Classes Sets Pick minimum #records to cover every equivalence class Set-Cover Problem More involved because completeness of other operators must be maintained; details in paper
41. Performance Evaluation Program I: (Web Search Result Viewing Statistics) LOAD FILTER by compound arithmetic expression GROUP TRANSFORM using built-in aggregate function
43. Performance Evaluation Program II: (Web Advertising Activity) LOAD table A FILTER A by compound logical expression JOIN with table B (highly selective) TRANSFORM using 4 string manipulation UDFS (non-invertible)