Micro-task crowdsourcing is rapidly gaining popularity among research communities and businesses as a means to leverage Human Computation in their daily operations. Unlike any other service, a crowdsourcing platform is in fact a marketplace subject to human factors that affect its performance, both in terms of speed and quality. Indeed, such factors shape the dynamics of the crowdsourcing market. For example, a known behavior of such markets is that increasing the reward of a set of tasks would lead to faster results. However, it is still unclear how different dimensions interact with each other: reward, task type, market competition, requester reputation, etc.
In this paper, we adopt a data-driven approach to (A) perform a long-term analysis of a popular micro-task crowdsourcing platform and understand the evolution of its main actors (workers, requesters, and platform). (B) We leverage the main findings of our five year log analysis to propose features used in a predictive model aiming at determining the expected performance of any batch at a specific point in time. We show that the number of tasks left in a batch and how recent the batch is are two key features of the prediction. (C) Finally, we conduct an analysis of the demand (new tasks posted by the requesters) and supply (number of tasks completed by the workforce) and show how they affect task prices on the marketplace.
1. The Dynamics of Micro-Task
Crowdsourcing
The Case of Amazon MTurk
Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini,
Panos Ipeirotis, Philippe Cudré-Mauroux
WWW’15 - 20th May 2015 - Florence 1
3. Background
A Crowdsourcing Platform allows requesters to publish a
crowdsourcing request (batch)
composed of multiple tasks (HITs)
Programmatically Invoke the crowd with APIs
3
7. MTurk is a Marketplace for HITs
Direct: Price, Time of the day, #workers, #HITs etc
Other: Forums, Reputation-sys (TurkOpticon), Recommendation-sys (Openturk) 7
10. ...Five Years Later
[2009 - 2014]
mturk-tracker collected
2.5Million different batches
with over 130Million HITs
10
11. mturk-tracker.com
● Collects metadata about each visible batch (Title, description, rewards,
required qualifications, HITs available etc)
● Records batch progress (every ~20 minutes)
We note that the tracker reports data periodically only and does not reflect
fine-grained information (e.g., real-time variations)
11
12. Menu
1. Notable Facts Extracted from the Data
2. Large-scale HIT Type Classification
3. Analyzing the Features Affecting Batch Throughput
4. Market Analysis
12
22. Classify HITs into types (Gadiraju et. al 2014)
- Information Finding (IF)
- Verification and Validation (VV )
- Interpretation and Analysis (IA)
- Content Creation (CC)
- Surveys (SU)
- Content Access (CA)
22
HIT Classes
23. We trained a Support Vector Machine (SVM) model
- HIT title, description, keywords, reward, date, allocated time, and batch
size
- Created labeled data on Mturk for 5,000 HITs uniformly sampled HITs
- Our HIT used 3 repetitions
- Consensus reached for 89% of the tasks
- 10-fold cross validation
- Precision of 0.895
- Recall of 0.899
- F-Measure of 0.895
- We then performed a large-scale classification for all 2.5M HITs
Supervised Classification
With the Crowd
23
24. Distribution of HIT Types
Less Content Access batches
Content Creation being the most popular
24
25. 3) Analyzing the Features
Affecting Batch
Throughput
25
time
#HITs/ Minute
Batch Throughput
26. Batch Throughput Prediction
29 Features
HIT Features
HITs available, Start Time, Reward, Description length, Title length, Keywords,
requester_id, Time_alloted, Task type, Age (minutes) etc.
Market Features
Total HITs available, HITs arrived, rewards Arrived, % HITs completed etc.
26
27. Batch Throughput Prediction
T
time
delta
- Predict batch throughput at time T by training a Random Forest
Regression model with samples taken in [T-delta, T) time span
- 29 Features (including the Type of the Batch)
- Hourly Data in range [June-October] 2014
- We sampled 50 times points for evaluation purposes
27
28. Batch Throughput Prediction
T
time
delta
- Predict batch throughput at time T by training a Random Forest
Regression model with samples taken in [T-delta, T) time span
- 29 Features (including the Type of the Batch)
- Hourly Data in range [June-October] 2014
- We sampled 50 times points for evaluation purposes
We are interested in cases where prediction works reasonably 28
29. Predicted vs. Actual Batch
Throughput (delta=4 hours)
Prediction Works best for larger batches having
large momentum
29
30. Significant Features
- What features contribute best when the
prediction works reasonably
- We proceed by feature ablation
- Re-run prediction by removing 1 feature at a time
- 1000 samples
30
31. Significant Features
- What features contribute best when the
prediction works reasonably
- We proceed by feature ablation
- Re-run prediction by removing 1 feature at a time.
- 1000 samples
HITs_Available (Number of tasks in the batch)
Age_Minutes (how long ago the batch was created)
31
32. 4) Market Analysis
32
Demand - The number of new tasks
published on the platform by the requesters
Supply - The workforce that the crowd is
providing
39. Conclusions
- Long time data analysis uncovers some hidden trends
- Large scale HIT classification
- Important features in throughput prediction (HITs
available, Age_minutes)
- Supply is Elastic
- (More work available -> More work Done)
- Supply and Demand are periodic (7-10days) 39
40. Is a Crowdsourcing Marketplace the right
paradigm for efficient and predictable
crowdsourcing?
40
41. Is a Crowdsourcing Marketplace the right
paradigm for efficient and predictable
crowdsourcing?
41
Q&A
Djellel Difallah
ded@exascale.info