SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Query Optimization
over Crowdsourced Data
Hyunjung Park, Jennifer Widom
Stanford University
Deco: Declarative Crowdsourcing
Give me a Spanish-speaking
country.
Give me a country.
What language do they speak
in country X?
What is the capital of country X?
8/27/2013 Hyunjung Park 2
“Find the capitals of eight
Spanish-speaking countries”
DBMS
country language capital
Italy Italian Rome
Spain Spanish Madrid
… … …
country language capital
Italy Italian Rome
Spain Spanish Madrid
Deco System
Deco Query Optimization
•  Crowd incurs monetary cost
•  Some query plans are much cheaper than others
•  Cost estimation is complicated by:
–  Previously collected data
–  Unknown database state
–  Inconsistency of human answers
8/27/2013 Hyunjung Park 3
Outline
•  Motivating example
•  Deco data model and queries
•  Cost and cardinality estimation
•  Experimental results
8/27/2013 Hyunjung Park 4
Everything implemented in full prototype
Motivating Example: Plan 1
8/27/2013 Hyunjung Park 5
Give me a country.
What language do they speak in country X?
What is the capital of country X?
unseen
Spanish
F
T
T
F
“Find the capitals of eight Spanish-speaking countries”
8x
Give me a country.Give me a country.Give me a country.
Motivating Example: Plan 2
8/27/2013 Hyunjung Park 6
Give me a Spanish-speaking country.
What language do they speak in country X?
What is the capital of country X?
unseen
Spanish
F
T
T
F
“Find the capitals of eight Spanish-speaking countries”
8x
Preview of Experimental Results
0
5
10
15
Plan 1 Plan 2
Actual costs spent on Mechanical Turk
What is the capital of
country X?
What language do they
speak in country X?
Give me a Spanish-speaking
country.
Give me a country.
8/27/2013 Hyunjung Park 7
($)
Outline
•  Motivating example
•  Deco data model and queries
•  Cost and cardinality estimation
•  Experimental results
8/27/2013 Hyunjung Park 8
Deco: Data Model (1/2)
•  Conceptual Relation: visible to end-users
Country (country, language, capital)
•  Resolution Rules: cleanse raw data using UDFs
country: dupElim
language: majority(3)
capital: majority(3)
8/27/2013 Hyunjung Park 9
Deco: Data Model (2/2)
•  Fetch Rules: “access methods” for the crowd
language => country
“Give me a {language}-speaking country.”
Ø => country
“Give me a country.”
country => language
“What language do they speak in {country}?”
country => capital
“What is the capital of {country}?”
8/27/2013 Hyunjung Park 10
[$0.05]
[$0.01]
[$0.02]
[$0.03]
Deco: Queries
•  Deco query: SQL query over conceptual relations
SELECT country, capital
FROM Country
WHERE language=‘Spanish’
MINTUPLES 8
•  Query processor: access the crowd as needed to
produce query result while:
1.  Minimizing monetary cost
2.  Reducing latency
8/27/2013 Hyunjung Park 11
query optimizer
query execution engine
Query Optimization
•  Find the best query plan in terms of estimated
monetary cost
•  As in traditional query optimizer
1.  Cost and cardinality estimation
2.  Search space
3.  Plan enumeration algorithm
8/27/2013 12Hyunjung Park
Cost Estimation
•  Total monetary cost = ∑Fetch	
  F	
  F.price × F.cardinality
–  Existing data is “free”
•  Definition of Cardinality in Deco
–  Total number of expected output tuples from operator
until query execution terminates
•  Cardinality estimation
–  Final database state needs to be estimated
simultaneously
8/27/2013 Hyunjung Park 13
Cardinality Estimation: Setting
•  $0.05 for all fetch rules
•  No existing data
•  Selectivity factors
–  language=‘Spanish’: 0.1
–  dupElim: 0.8
–  majority(3): 0.4 (=1/2.5)
8/27/2013 Hyunjung Park 14
Cardinality Estimation: Plan 1
8/27/2013 15Hyunjung Park
SELECT country, capital
FROM Country
WHERE language=‘Spanish’
MINTUPLES 8
MinTuples[8]
Project[co,ca]
DLOJoin[co]
DLOJoin[co]
Resolve[dupeli] Resolve[maj3]
Resolve[maj3]Filter[la=’Spanish’]
Scan
[CtryA]
Fetch
[Øàco]
Scan
[CtryD2]
Fetch
[coàca]
Scan
[CtryD1]
Fetch
[coàla]
1
2
3
4 12
5
13
96
7 8 10 11
14
Ø => country
country => language
country => capital
Cost estimation:
$0.05×(100+200+20)
= $16.00200
20
100
Cardinality Estimation: Plan 2
8/27/2013 16Hyunjung Park
MinTuples[8]
Project[co,ca]
DLOJoin[co]
DLOJoin[co]
Resolve[dupeli] Resolve[maj3]
Resolve[maj3]Filter[la=’Spanish’]
Scan
[CtryA]
Fetch
[laàco]
Scan
[CtryD2]
Fetch
[coàca]
Scan
[CtryD1]
Fetch
[coàla]
1
2
3
4 12
5
13
96
7 8a 10 11
14
SELECT country, capital
FROM Country
WHERE language=‘Spanish’
MINTUPLES 8
language => country
country => language
country => capital
Cost estimation:
$0.05×(10+20+20)
= $2.502010
20
8/27/2013 Hyunjung Park 17
0
1
2
3
Actual
Plan 2
Experimental Results
0
5
10
15
Actual
Plan 1
country => capital
country => language
language => country
Ø => country
($) ($)
8/27/2013 Hyunjung Park 18
0
1
2
3
Actual Estimated
Plan 2
Experimental Results
0
5
10
15
Actual Estimated
Plan 1
country => capital
country => language
language => country
Ø => country
($) ($)
Related Work
•  Declarative approach for crowdsourcing
–  Arnold, CrowdDB, CrowdSearcher, Jabberwocky, Qurk, ...
•  Crowd-powered algorithms/operations
–  Filter, sort, join, max, entity resolution, …
•  Also:
–  Traditional query optimization
–  Heterogeneous or federated database systems
8/27/2013 19Hyunjung Park
Summary
•  Cost estimation in Deco
–  Distinguish between existing data vs. new data
–  Estimate cardinality and final database state
simultaneously
•  In the paper:
–  Full description of cost estimation and plan
enumeration algorithms
–  More experimental results
8/27/2013 Hyunjung Park 20
Thank you!

Weitere ähnliche Inhalte

Ähnlich wie Query Optimization over Crowdsourced Data

Seattle hug 2010
Seattle hug 2010Seattle hug 2010
Seattle hug 2010Abe Taha
 
Maps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneMaps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneOlli Rinne
 
Maps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneMaps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneApps4Finland
 
Geography of Digital Earth
Geography of Digital EarthGeography of Digital Earth
Geography of Digital EarthGeorge Percivall
 
EOSC-hub and OpenAIRE Advance webinar - introduction
EOSC-hub and OpenAIRE Advance webinar - introductionEOSC-hub and OpenAIRE Advance webinar - introduction
EOSC-hub and OpenAIRE Advance webinar - introductionOpenAIRE
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesData Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesIan Huston
 
Data matters-bournemouth-2015
Data matters-bournemouth-2015Data matters-bournemouth-2015
Data matters-bournemouth-2015Alan Dix
 
Lecture 3 needs assessment
Lecture 3   needs assessmentLecture 3   needs assessment
Lecture 3 needs assessmentyihongyuan19
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
SC13 BoF: RDA and HPC
SC13 BoF: RDA and HPCSC13 BoF: RDA and HPC
SC13 BoF: RDA and HPCJohn Cobb
 
Thinking spatially with your open data
Thinking spatially with your open dataThinking spatially with your open data
Thinking spatially with your open dataTwinbit
 
Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupalDay
 
Research paper presentation
Research paper presentation Research paper presentation
Research paper presentation Akshat Sharma
 
Peter Bjørn Larsen - Öresund Smart City Hub
Peter Bjørn Larsen - Öresund Smart City HubPeter Bjørn Larsen - Öresund Smart City Hub
Peter Bjørn Larsen - Öresund Smart City HubBigDataViz
 
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18DataconomyGmbH
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereAlex Hardisty
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer visionEran Shlomo
 
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzieGlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzieFAO
 

Ähnlich wie Query Optimization over Crowdsourced Data (20)

Seattle hug 2010
Seattle hug 2010Seattle hug 2010
Seattle hug 2010
 
Maps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneMaps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinne
 
Maps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneMaps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinne
 
Geography of Digital Earth
Geography of Digital EarthGeography of Digital Earth
Geography of Digital Earth
 
OMANTEL
OMANTELOMANTEL
OMANTEL
 
EOSC-hub and OpenAIRE Advance webinar - introduction
EOSC-hub and OpenAIRE Advance webinar - introductionEOSC-hub and OpenAIRE Advance webinar - introduction
EOSC-hub and OpenAIRE Advance webinar - introduction
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesData Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
 
Data matters-bournemouth-2015
Data matters-bournemouth-2015Data matters-bournemouth-2015
Data matters-bournemouth-2015
 
Lecture 3 needs assessment
Lecture 3   needs assessmentLecture 3   needs assessment
Lecture 3 needs assessment
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
SC13 BoF: RDA and HPC
SC13 BoF: RDA and HPCSC13 BoF: RDA and HPC
SC13 BoF: RDA and HPC
 
Thinking spatially with your open data
Thinking spatially with your open dataThinking spatially with your open data
Thinking spatially with your open data
 
Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open data
 
Research paper presentation
Research paper presentation Research paper presentation
Research paper presentation
 
Peter Bjørn Larsen - Öresund Smart City Hub
Peter Bjørn Larsen - Öresund Smart City HubPeter Bjørn Larsen - Öresund Smart City Hub
Peter Bjørn Larsen - Öresund Smart City Hub
 
What can be done with Open Data?
What can be done with Open Data?What can be done with Open Data?
What can be done with Open Data?
 
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzieGlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
 

Kürzlich hochgeladen

Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Projectwajrcs
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9Jürgen Gutsch
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기Chiwon Song
 

Kürzlich hochgeladen (20)

Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Program with GUTs
Program with GUTsProgram with GUTs
Program with GUTs
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기
 

Query Optimization over Crowdsourced Data

  • 1. Query Optimization over Crowdsourced Data Hyunjung Park, Jennifer Widom Stanford University
  • 2. Deco: Declarative Crowdsourcing Give me a Spanish-speaking country. Give me a country. What language do they speak in country X? What is the capital of country X? 8/27/2013 Hyunjung Park 2 “Find the capitals of eight Spanish-speaking countries” DBMS country language capital Italy Italian Rome Spain Spanish Madrid … … … country language capital Italy Italian Rome Spain Spanish Madrid Deco System
  • 3. Deco Query Optimization •  Crowd incurs monetary cost •  Some query plans are much cheaper than others •  Cost estimation is complicated by: –  Previously collected data –  Unknown database state –  Inconsistency of human answers 8/27/2013 Hyunjung Park 3
  • 4. Outline •  Motivating example •  Deco data model and queries •  Cost and cardinality estimation •  Experimental results 8/27/2013 Hyunjung Park 4 Everything implemented in full prototype
  • 5. Motivating Example: Plan 1 8/27/2013 Hyunjung Park 5 Give me a country. What language do they speak in country X? What is the capital of country X? unseen Spanish F T T F “Find the capitals of eight Spanish-speaking countries” 8x
  • 6. Give me a country.Give me a country.Give me a country. Motivating Example: Plan 2 8/27/2013 Hyunjung Park 6 Give me a Spanish-speaking country. What language do they speak in country X? What is the capital of country X? unseen Spanish F T T F “Find the capitals of eight Spanish-speaking countries” 8x
  • 7. Preview of Experimental Results 0 5 10 15 Plan 1 Plan 2 Actual costs spent on Mechanical Turk What is the capital of country X? What language do they speak in country X? Give me a Spanish-speaking country. Give me a country. 8/27/2013 Hyunjung Park 7 ($)
  • 8. Outline •  Motivating example •  Deco data model and queries •  Cost and cardinality estimation •  Experimental results 8/27/2013 Hyunjung Park 8
  • 9. Deco: Data Model (1/2) •  Conceptual Relation: visible to end-users Country (country, language, capital) •  Resolution Rules: cleanse raw data using UDFs country: dupElim language: majority(3) capital: majority(3) 8/27/2013 Hyunjung Park 9
  • 10. Deco: Data Model (2/2) •  Fetch Rules: “access methods” for the crowd language => country “Give me a {language}-speaking country.” Ø => country “Give me a country.” country => language “What language do they speak in {country}?” country => capital “What is the capital of {country}?” 8/27/2013 Hyunjung Park 10 [$0.05] [$0.01] [$0.02] [$0.03]
  • 11. Deco: Queries •  Deco query: SQL query over conceptual relations SELECT country, capital FROM Country WHERE language=‘Spanish’ MINTUPLES 8 •  Query processor: access the crowd as needed to produce query result while: 1.  Minimizing monetary cost 2.  Reducing latency 8/27/2013 Hyunjung Park 11 query optimizer query execution engine
  • 12. Query Optimization •  Find the best query plan in terms of estimated monetary cost •  As in traditional query optimizer 1.  Cost and cardinality estimation 2.  Search space 3.  Plan enumeration algorithm 8/27/2013 12Hyunjung Park
  • 13. Cost Estimation •  Total monetary cost = ∑Fetch  F  F.price × F.cardinality –  Existing data is “free” •  Definition of Cardinality in Deco –  Total number of expected output tuples from operator until query execution terminates •  Cardinality estimation –  Final database state needs to be estimated simultaneously 8/27/2013 Hyunjung Park 13
  • 14. Cardinality Estimation: Setting •  $0.05 for all fetch rules •  No existing data •  Selectivity factors –  language=‘Spanish’: 0.1 –  dupElim: 0.8 –  majority(3): 0.4 (=1/2.5) 8/27/2013 Hyunjung Park 14
  • 15. Cardinality Estimation: Plan 1 8/27/2013 15Hyunjung Park SELECT country, capital FROM Country WHERE language=‘Spanish’ MINTUPLES 8 MinTuples[8] Project[co,ca] DLOJoin[co] DLOJoin[co] Resolve[dupeli] Resolve[maj3] Resolve[maj3]Filter[la=’Spanish’] Scan [CtryA] Fetch [Øàco] Scan [CtryD2] Fetch [coàca] Scan [CtryD1] Fetch [coàla] 1 2 3 4 12 5 13 96 7 8 10 11 14 Ø => country country => language country => capital Cost estimation: $0.05×(100+200+20) = $16.00200 20 100
  • 16. Cardinality Estimation: Plan 2 8/27/2013 16Hyunjung Park MinTuples[8] Project[co,ca] DLOJoin[co] DLOJoin[co] Resolve[dupeli] Resolve[maj3] Resolve[maj3]Filter[la=’Spanish’] Scan [CtryA] Fetch [laàco] Scan [CtryD2] Fetch [coàca] Scan [CtryD1] Fetch [coàla] 1 2 3 4 12 5 13 96 7 8a 10 11 14 SELECT country, capital FROM Country WHERE language=‘Spanish’ MINTUPLES 8 language => country country => language country => capital Cost estimation: $0.05×(10+20+20) = $2.502010 20
  • 17. 8/27/2013 Hyunjung Park 17 0 1 2 3 Actual Plan 2 Experimental Results 0 5 10 15 Actual Plan 1 country => capital country => language language => country Ø => country ($) ($)
  • 18. 8/27/2013 Hyunjung Park 18 0 1 2 3 Actual Estimated Plan 2 Experimental Results 0 5 10 15 Actual Estimated Plan 1 country => capital country => language language => country Ø => country ($) ($)
  • 19. Related Work •  Declarative approach for crowdsourcing –  Arnold, CrowdDB, CrowdSearcher, Jabberwocky, Qurk, ... •  Crowd-powered algorithms/operations –  Filter, sort, join, max, entity resolution, … •  Also: –  Traditional query optimization –  Heterogeneous or federated database systems 8/27/2013 19Hyunjung Park
  • 20. Summary •  Cost estimation in Deco –  Distinguish between existing data vs. new data –  Estimate cardinality and final database state simultaneously •  In the paper: –  Full description of cost estimation and plan enumeration algorithms –  More experimental results 8/27/2013 Hyunjung Park 20