CrowdFill is a system that collects structured data from crowdsourcing workers in a table format. It presents workers with a partially filled table that they can collaboratively complete by filling in empty cells and voting on existing entries. This real-time table filling approach aims to collect high quality data while keeping costs low and latency short. The paper presents CrowdFill's formal model, architecture, algorithms for concurrent edits and satisfying data constraints, as well as a compensation scheme that rewards workers based on their contributions to the final table. An experimental evaluation with volunteer workers demonstrated CrowdFill could fill a soccer player data table within 10 minutes with high accuracy of estimated versus actual worker compensation.
2. Goal
•Collect high-quality structured data from the
crowd, while capping total monetary cost and
keeping latency low
6/25/2014 Hyunjung Park 2
name nationality position caps goals
Brazil
Messi FW
Klose Germany 133
3. Traditional Microtask-based Approach
1. Decompose the data collection task into a set
of microtasks
e.g.,“What position does Klose play?”
“How many goals has Messi scored?”
2. Each worker provides specific pieces of data
via microtasks
3. Assemble the collected pieces of data into the
final table
6/25/2014 Hyunjung Park 3
4. CrowdFill’s Table-filling Approach
1. Present an entire partially-filled table to all
participating workers
2. Each worker contributes what they know to the
table by filling in empty cells, and voting on
data entered by others
3. Propagate worker actions in real-time to
synchronize the table across all workers
6/25/2014 Hyunjung Park 4
7. Formal Model: Schema
•Table Specification
Column definitions and primary key
SoccerPlayer(name, nationality, position, caps, goals)
•Scoring Function
Accept a row r if and only if f(ur, dr) > 0
where ur and dr are its upvote and downvote counts
e.g.,“majority of three or more”
f(ur, dr) = ur−dr if ur+dr≥2
0 otherwise
6/25/2014 Hyunjung Park 7
8. Formal Model: Constraints
•Values Constraint
Final table S must “match” template T (a partially-filled
table)
•Cardinality Constraint
Final table S must contain at least N rows
Special case of values constraint
6/25/2014 Hyunjung Park 8
name nationality position
Argentina
FW
name nationality position
Messi Argentina FW
Rooney England FW
9. Formal Model: Candidate Table
•Candidate table R
Exposed to clients
Primary key not enforced
Each row annotated with its upvote and downvote
counts
6/25/2014 Hyunjung Park 9
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
10. Formal Model: Operations
•Primitive Operations on R
Insert a new empty row into R
Fill in an empty column of a row with a value
Upvote a complete row
Downvote a non-empty row
6/25/2014 Hyunjung Park 10
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 1
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 2
11. Formal Model: Final Table
•Final table S
Derived from candidate table R
Each complete row r in R such that f(ur, dr) > 0, and
f(ur, dr) is the highest score of any row with the same
primary key as r
6/25/2014 Hyunjung Park 11
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose German DF 1 2
name nationality position
Messi Argentina FW
Ronaldo Portugal FW
12. CrowdFill Architecture
Front-end Server
Back-end Server
Database
Worker
Client
Web Interface
Crowdsourcing
Marketplace
task
acceptance
task setup,
payment
results collectiontable specs, payment
Execution
Server
Central
Client
Worker
Client
Worker
Client
Worker
Client
data
entry
6/25/2014 Hyunjung Park 12
14. Concurrent Operations
•Model designed to minimize effects of
concurrency (details in paper)
Operations are easily merged
Conflicts are resolved seamlessly
•Convergence theorem
Architecture ensures server and all clients apply the
same operations, possibly with different orders
Theorem guarantees that server and all clients
converge to the same candidate table whenever the
system quiesces
6/25/2014 Hyunjung Park 14
15. Satisfying Values Constraint
• Values constraint
Final table S must match template T
• Worker clients
Perform fill, upvote, and downvote operations
Need not be aware of the template T
• Special “Central client”
Automatically populates new rows to guide the final table S
towards the template T
• Probable Row Invariant (PRI)
R always contains just enough “probable” rows matching
template T
PRI maintained based on maximum bipartite matching
6/25/2014 Hyunjung Park 15
16. Compensation Scheme: Overview
•After data collection
Allocate a total monetary budget based on each
worker’s overall contribution to the final table
Encourage workers to submit useful work
Make total monetary cost predictable
•During data collection
Provide estimated compensation for individual actions
to keep workers engaged
6/25/2014 Hyunjung Park 16
17. Compensation Scheme: Contribution
•Given final table S, operation op contributed to S
if:
op filled in a cell in S (“direct” contribution)
op first provided a value for S while creating a subset
of a row in S (“indirect” contribution)
op upvoted a row in S
op downvoted a combination of values not present in S
6/25/2014 Hyunjung Park 17
18. Compensation Scheme: Allocation
•Uniform allocation
Each cell and contributing vote has the same
compensation
Each cell divided into direct and indirect contributions
•Column-weighted allocation
Take into account varying difficulty of filling in
different columns and casting votes
•Dual-weighted allocation
Also take into account entering new key values can get
progressively more difficult as the table fills up
6/25/2014 Hyunjung Park 18
19. Experimental Evaluation: Setting
•SoccerPlayer(name, nationality, position, caps,
goals, date-of-birth)
•Scoring function: “majority of three or more”
•Goal: information about 20 players with caps
between 80 and 99
•Five volunteer workers
•Total monetary budget: $10
•Dual-weighted allocation scheme
6/25/2014 Hyunjung Park 19
20. Experimental Evaluation: Summary
•In our representative run
Overall latency: 10m 44s
#Rows in the candidate table: 23
Final compensations: $0.51, $1.68, $2.08, $2.24, $3.49
No “slowdown” in obtaining new primary keys
6/25/2014 Hyunjung Park 20
22. Related Work
•Crowdsourcing structured data
CrowdDB [Franklin et al. 2011]
Deco [Park et al. 2012]
•Real-time cooperative editing systems
Convergence [Ellis and Gibbs 1989]
Intention preservation [Sun et al. 1998]
•Monetary compensation for crowdsourcing
Incentive designs [Shaw et al. 2011]
6/25/2014 Hyunjung Park 22
23. Summary
•CrowdFill’s novel table-filling approach
Real-time collaboration among workers
Intuitive data entry interface
Compensation based on contribution
•In the paper:
Full description of the formal model
PRI maintenance algorithm with examples
More details about the compensation scheme
More experimental results
6/25/2014 Hyunjung Park 23