SlideShare a Scribd company logo
1 of 24
Download to read offline
CrowdFill: Collecting Structured
Data from the Crowd
Hyunjung Park Jennifer Widom
Stanford University
Goal
•Collect high-quality structured data from the
crowd, while capping total monetary cost and
keeping latency low
6/25/2014 Hyunjung Park 2
name nationality position caps goals
Brazil
Messi FW
Klose Germany 133
Traditional Microtask-based Approach
1. Decompose the data collection task into a set
of microtasks
e.g.,“What position does Klose play?”
“How many goals has Messi scored?”
2. Each worker provides specific pieces of data
via microtasks
3. Assemble the collected pieces of data into the
final table
6/25/2014 Hyunjung Park 3
CrowdFill’s Table-filling Approach
1. Present an entire partially-filled table to all
participating workers
2. Each worker contributes what they know to the
table by filling in empty cells, and voting on
data entered by others
3. Propagate worker actions in real-time to
synchronize the table across all workers
6/25/2014 Hyunjung Park 4
CrowdFill’s Table-filling Approach
6/25/2014 Hyunjung Park 5
Outline
•Formal model
•Overall architecture
•Concurrent operations
•Satisfying values constraint
•Compensation scheme
•Experimental evaluation
•Related work
6/25/2014 Hyunjung Park 6
Formal Model: Schema
•Table Specification
Column definitions and primary key
SoccerPlayer(name, nationality, position, caps, goals)
•Scoring Function
Accept a row r if and only if f(ur, dr) > 0
where ur and dr are its upvote and downvote counts
e.g.,“majority of three or more”
f(ur, dr) = ur−dr if ur+dr≥2
0 otherwise
6/25/2014 Hyunjung Park 7
Formal Model: Constraints
•Values Constraint
Final table S must “match” template T (a partially-filled
table)
•Cardinality Constraint
Final table S must contain at least N rows
Special case of values constraint
6/25/2014 Hyunjung Park 8
name nationality position
Argentina
FW
name nationality position
Messi Argentina FW
Rooney England FW
Formal Model: Candidate Table
•Candidate table R
Exposed to clients
Primary key not enforced
Each row annotated with its upvote and downvote
counts
6/25/2014 Hyunjung Park 9
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Formal Model: Operations
•Primitive Operations on R
Insert a new empty row into R
Fill in an empty column of a row with a value
Upvote a complete row
Downvote a non-empty row
6/25/2014 Hyunjung Park 10
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
0 0
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose 0 0
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany 0 0
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 0 0
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 0
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 1
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 2
Formal Model: Final Table
•Final table S
Derived from candidate table R
Each complete row r in R such that f(ur, dr) > 0, and
f(ur, dr) is the highest score of any row with the same
primary key as r
6/25/2014 Hyunjung Park 11
name nationality position  
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose German DF 1 2
name nationality position
Messi Argentina FW
Ronaldo Portugal FW
CrowdFill Architecture
Front-end Server
Back-end Server
Database
Worker
Client
Web Interface
Crowdsourcing
Marketplace
task
acceptance
task setup,
payment
results collectiontable specs, payment
Execution
Server
Central
Client
Worker
Client
Worker
Client
Worker
Client
data
entry
6/25/2014 Hyunjung Park 12
Outline
•Formal model
•Overall architecture
•Concurrent operations
•Satisfying values constraint
•Compensation scheme
•Experimental evaluation
•Related work
6/25/2014 Hyunjung Park 13
Concurrent Operations
•Model designed to minimize effects of
concurrency (details in paper)
Operations are easily merged
Conflicts are resolved seamlessly
•Convergence theorem
Architecture ensures server and all clients apply the
same operations, possibly with different orders
Theorem guarantees that server and all clients
converge to the same candidate table whenever the
system quiesces
6/25/2014 Hyunjung Park 14
Satisfying Values Constraint
• Values constraint
 Final table S must match template T
• Worker clients
 Perform fill, upvote, and downvote operations
 Need not be aware of the template T
• Special “Central client”
 Automatically populates new rows to guide the final table S
towards the template T
• Probable Row Invariant (PRI)
 R always contains just enough “probable” rows matching
template T
 PRI maintained based on maximum bipartite matching
6/25/2014 Hyunjung Park 15
Compensation Scheme: Overview
•After data collection
Allocate a total monetary budget based on each
worker’s overall contribution to the final table
Encourage workers to submit useful work
Make total monetary cost predictable
•During data collection
Provide estimated compensation for individual actions
to keep workers engaged
6/25/2014 Hyunjung Park 16
Compensation Scheme: Contribution
•Given final table S, operation op contributed to S
if:
op filled in a cell in S (“direct” contribution)
op first provided a value for S while creating a subset
of a row in S (“indirect” contribution)
op upvoted a row in S
op downvoted a combination of values not present in S
6/25/2014 Hyunjung Park 17
Compensation Scheme: Allocation
•Uniform allocation
Each cell and contributing vote has the same
compensation
Each cell divided into direct and indirect contributions
•Column-weighted allocation
Take into account varying difficulty of filling in
different columns and casting votes
•Dual-weighted allocation
Also take into account entering new key values can get
progressively more difficult as the table fills up
6/25/2014 Hyunjung Park 18
Experimental Evaluation: Setting
•SoccerPlayer(name, nationality, position, caps,
goals, date-of-birth)
•Scoring function: “majority of three or more”
•Goal: information about 20 players with caps
between 80 and 99
•Five volunteer workers
•Total monetary budget: $10
•Dual-weighted allocation scheme
6/25/2014 Hyunjung Park 19
Experimental Evaluation: Summary
•In our representative run
Overall latency: 10m 44s
#Rows in the candidate table: 23
Final compensations: $0.51, $1.68, $2.08, $2.24, $3.49
No “slowdown” in obtaining new primary keys
6/25/2014 Hyunjung Park 20
Accuracy of Estimated Compensation
6/25/2014 Hyunjung Park 21
Related Work
•Crowdsourcing structured data
CrowdDB [Franklin et al. 2011]
Deco [Park et al. 2012]
•Real-time cooperative editing systems
Convergence [Ellis and Gibbs 1989]
Intention preservation [Sun et al. 1998]
•Monetary compensation for crowdsourcing
Incentive designs [Shaw et al. 2011]
6/25/2014 Hyunjung Park 22
Summary
•CrowdFill’s novel table-filling approach
Real-time collaboration among workers
Intuitive data entry interface
Compensation based on contribution
•In the paper:
Full description of the formal model
PRI maintenance algorithm with examples
More details about the compensation scheme
More experimental results
6/25/2014 Hyunjung Park 23
Thank you

More Related Content

Recently uploaded

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Recently uploaded (20)

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

CrowdFill: Collecting Structured Data from the Crowd in Real-Time

  • 1. CrowdFill: Collecting Structured Data from the Crowd Hyunjung Park Jennifer Widom Stanford University
  • 2. Goal •Collect high-quality structured data from the crowd, while capping total monetary cost and keeping latency low 6/25/2014 Hyunjung Park 2 name nationality position caps goals Brazil Messi FW Klose Germany 133
  • 3. Traditional Microtask-based Approach 1. Decompose the data collection task into a set of microtasks e.g.,“What position does Klose play?” “How many goals has Messi scored?” 2. Each worker provides specific pieces of data via microtasks 3. Assemble the collected pieces of data into the final table 6/25/2014 Hyunjung Park 3
  • 4. CrowdFill’s Table-filling Approach 1. Present an entire partially-filled table to all participating workers 2. Each worker contributes what they know to the table by filling in empty cells, and voting on data entered by others 3. Propagate worker actions in real-time to synchronize the table across all workers 6/25/2014 Hyunjung Park 4
  • 6. Outline •Formal model •Overall architecture •Concurrent operations •Satisfying values constraint •Compensation scheme •Experimental evaluation •Related work 6/25/2014 Hyunjung Park 6
  • 7. Formal Model: Schema •Table Specification Column definitions and primary key SoccerPlayer(name, nationality, position, caps, goals) •Scoring Function Accept a row r if and only if f(ur, dr) > 0 where ur and dr are its upvote and downvote counts e.g.,“majority of three or more” f(ur, dr) = ur−dr if ur+dr≥2 0 otherwise 6/25/2014 Hyunjung Park 7
  • 8. Formal Model: Constraints •Values Constraint Final table S must “match” template T (a partially-filled table) •Cardinality Constraint Final table S must contain at least N rows Special case of values constraint 6/25/2014 Hyunjung Park 8 name nationality position Argentina FW name nationality position Messi Argentina FW Rooney England FW
  • 9. Formal Model: Candidate Table •Candidate table R Exposed to clients Primary key not enforced Each row annotated with its upvote and downvote counts 6/25/2014 Hyunjung Park 9 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1
  • 10. Formal Model: Operations •Primitive Operations on R Insert a new empty row into R Fill in an empty column of a row with a value Upvote a complete row Downvote a non-empty row 6/25/2014 Hyunjung Park 10 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 0 0 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 Klose 0 0 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 Klose Germany 0 0 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 Klose Germany DF 0 0 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 Klose Germany DF 1 0 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 Klose Germany DF 1 1 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 Klose Germany DF 1 2
  • 11. Formal Model: Final Table •Final table S Derived from candidate table R Each complete row r in R such that f(ur, dr) > 0, and f(ur, dr) is the highest score of any row with the same primary key as r 6/25/2014 Hyunjung Park 11 name nationality position   Messi Argentina FW 2 0 Ronaldo Portugal FW 3 0 Ronaldo Portugal MF 2 1 Neymar Brazil 0 1 Klose German DF 1 2 name nationality position Messi Argentina FW Ronaldo Portugal FW
  • 12. CrowdFill Architecture Front-end Server Back-end Server Database Worker Client Web Interface Crowdsourcing Marketplace task acceptance task setup, payment results collectiontable specs, payment Execution Server Central Client Worker Client Worker Client Worker Client data entry 6/25/2014 Hyunjung Park 12
  • 13. Outline •Formal model •Overall architecture •Concurrent operations •Satisfying values constraint •Compensation scheme •Experimental evaluation •Related work 6/25/2014 Hyunjung Park 13
  • 14. Concurrent Operations •Model designed to minimize effects of concurrency (details in paper) Operations are easily merged Conflicts are resolved seamlessly •Convergence theorem Architecture ensures server and all clients apply the same operations, possibly with different orders Theorem guarantees that server and all clients converge to the same candidate table whenever the system quiesces 6/25/2014 Hyunjung Park 14
  • 15. Satisfying Values Constraint • Values constraint  Final table S must match template T • Worker clients  Perform fill, upvote, and downvote operations  Need not be aware of the template T • Special “Central client”  Automatically populates new rows to guide the final table S towards the template T • Probable Row Invariant (PRI)  R always contains just enough “probable” rows matching template T  PRI maintained based on maximum bipartite matching 6/25/2014 Hyunjung Park 15
  • 16. Compensation Scheme: Overview •After data collection Allocate a total monetary budget based on each worker’s overall contribution to the final table Encourage workers to submit useful work Make total monetary cost predictable •During data collection Provide estimated compensation for individual actions to keep workers engaged 6/25/2014 Hyunjung Park 16
  • 17. Compensation Scheme: Contribution •Given final table S, operation op contributed to S if: op filled in a cell in S (“direct” contribution) op first provided a value for S while creating a subset of a row in S (“indirect” contribution) op upvoted a row in S op downvoted a combination of values not present in S 6/25/2014 Hyunjung Park 17
  • 18. Compensation Scheme: Allocation •Uniform allocation Each cell and contributing vote has the same compensation Each cell divided into direct and indirect contributions •Column-weighted allocation Take into account varying difficulty of filling in different columns and casting votes •Dual-weighted allocation Also take into account entering new key values can get progressively more difficult as the table fills up 6/25/2014 Hyunjung Park 18
  • 19. Experimental Evaluation: Setting •SoccerPlayer(name, nationality, position, caps, goals, date-of-birth) •Scoring function: “majority of three or more” •Goal: information about 20 players with caps between 80 and 99 •Five volunteer workers •Total monetary budget: $10 •Dual-weighted allocation scheme 6/25/2014 Hyunjung Park 19
  • 20. Experimental Evaluation: Summary •In our representative run Overall latency: 10m 44s #Rows in the candidate table: 23 Final compensations: $0.51, $1.68, $2.08, $2.24, $3.49 No “slowdown” in obtaining new primary keys 6/25/2014 Hyunjung Park 20
  • 21. Accuracy of Estimated Compensation 6/25/2014 Hyunjung Park 21
  • 22. Related Work •Crowdsourcing structured data CrowdDB [Franklin et al. 2011] Deco [Park et al. 2012] •Real-time cooperative editing systems Convergence [Ellis and Gibbs 1989] Intention preservation [Sun et al. 1998] •Monetary compensation for crowdsourcing Incentive designs [Shaw et al. 2011] 6/25/2014 Hyunjung Park 22
  • 23. Summary •CrowdFill’s novel table-filling approach Real-time collaboration among workers Intuitive data entry interface Compensation based on contribution •In the paper: Full description of the formal model PRI maintenance algorithm with examples More details about the compensation scheme More experimental results 6/25/2014 Hyunjung Park 23