SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
The Dynamics of Micro-Task
Crowdsourcing
The Case of Amazon MTurk
Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini,
Panos Ipeirotis, Philippe Cudré-Mauroux
WWW’15 - 20th May 2015 - Florence 1
Background
Crowdsourcing is an Effective solution
to certain classes of problems
2
Background
A Crowdsourcing Platform allows requesters to publish a
crowdsourcing request (batch)
composed of multiple tasks (HITs)
Programmatically Invoke the crowd with APIs
3
Background
Paid Microtask Crowdsourcing
scales-out but remains highly unpredictable
4
Background
Paid Microtask Crowdsourcing
scales-out but remains highly unpredictable
5
time
#HITs/ Minute
Batch Throughput
SLAs are expensive
6
MTurk is a Marketplace for HITs
Direct: Price, Time of the day, #workers, #HITs etc
Other: Forums, Reputation-sys (TurkOpticon), Recommendation-sys (Openturk) 7
A Data Driven
Approach
8
9
...Five Years Later
[2009 - 2014]
mturk-tracker collected
2.5Million different batches
with over 130Million HITs
10
mturk-tracker.com
● Collects metadata about each visible batch (Title, description, rewards,
required qualifications, HITs available etc)
● Records batch progress (every ~20 minutes)
We note that the tracker reports data periodically only and does not reflect
fine-grained information (e.g., real-time variations)
11
Menu
1. Notable Facts Extracted from the Data
2. Large-scale HIT Type Classification
3. Analyzing the Features Affecting Batch Throughput
4. Market Analysis
12
1) Notable Facts Extracted
from the Data
13
Country-Specific HITs
14
US and India?
Country-Specific HITs
Workers from US, India and Canada are the most sought after.
15
Distribution of Batch Size
16
“Power-law”
Evolution of Batch Sizes
Very large batches
start to appear
17
HIT Pricing
18
Is 1-cent per HIT
the norm?
HIT Pricing
19
5-cents is the new
1-cent
Requesters and Reward Evolution
20
Increasing number of New
and Distinct Requesters
2) Large-scale HIT Type
Classification
21
Classify HITs into types (Gadiraju et. al 2014)
- Information Finding (IF)
- Verification and Validation (VV )
- Interpretation and Analysis (IA)
- Content Creation (CC)
- Surveys (SU)
- Content Access (CA)
22
HIT Classes
We trained a Support Vector Machine (SVM) model
- HIT title, description, keywords, reward, date, allocated time, and batch
size
- Created labeled data on Mturk for 5,000 HITs uniformly sampled HITs
- Our HIT used 3 repetitions
- Consensus reached for 89% of the tasks
- 10-fold cross validation
- Precision of 0.895
- Recall of 0.899
- F-Measure of 0.895
- We then performed a large-scale classification for all 2.5M HITs
Supervised Classification
With the Crowd
23
Distribution of HIT Types
Less Content Access batches
Content Creation being the most popular
24
3) Analyzing the Features
Affecting Batch
Throughput
25
time
#HITs/ Minute
Batch Throughput
Batch Throughput Prediction
29 Features
HIT Features
HITs available, Start Time, Reward, Description length, Title length, Keywords,
requester_id, Time_alloted, Task type, Age (minutes) etc.
Market Features
Total HITs available, HITs arrived, rewards Arrived, % HITs completed etc.
26
Batch Throughput Prediction
T
time
delta
- Predict batch throughput at time T by training a Random Forest
Regression model with samples taken in [T-delta, T) time span
- 29 Features (including the Type of the Batch)
- Hourly Data in range [June-October] 2014
- We sampled 50 times points for evaluation purposes
27
Batch Throughput Prediction
T
time
delta
- Predict batch throughput at time T by training a Random Forest
Regression model with samples taken in [T-delta, T) time span
- 29 Features (including the Type of the Batch)
- Hourly Data in range [June-October] 2014
- We sampled 50 times points for evaluation purposes
We are interested in cases where prediction works reasonably 28
Predicted vs. Actual Batch
Throughput (delta=4 hours)
Prediction Works best for larger batches having
large momentum
29
Significant Features
- What features contribute best when the
prediction works reasonably
- We proceed by feature ablation
- Re-run prediction by removing 1 feature at a time
- 1000 samples
30
Significant Features
- What features contribute best when the
prediction works reasonably
- We proceed by feature ablation
- Re-run prediction by removing 1 feature at a time.
- 1000 samples
HITs_Available (Number of tasks in the batch)
Age_Minutes (how long ago the batch was created)
31
4) Market Analysis
32
Demand - The number of new tasks
published on the platform by the requesters
Supply - The workforce that the crowd is
providing
Supply Elasticity
How does the market reacts when new tasks
arrive on the platform?
33
Supply Elasticity
We regressed the percentage of
work done (within 1 Hour)
against the number of new HITs
34
Supply Elasticity
Intercept = 2.5
Slope = 0.5%
20% of new work gets
completed within an hour
35
Supply Elasticity
Intercept = 2.5
Slope = 0.5%
20% of new work gets
completed within an hour
36
Demand and Supply Periodicity
Demand Supply
37
Demand and Supply Periodicity
Strong weekly periodicity 7-10 days.
38
Conclusions
- Long time data analysis uncovers some hidden trends
- Large scale HIT classification
- Important features in throughput prediction (HITs
available, Age_minutes)
- Supply is Elastic
- (More work available -> More work Done)
- Supply and Demand are periodic (7-10days) 39
Is a Crowdsourcing Marketplace the right
paradigm for efficient and predictable
crowdsourcing?
40
Is a Crowdsourcing Marketplace the right
paradigm for efficient and predictable
crowdsourcing?
41
Q&A
Djellel Difallah
ded@exascale.info

Weitere ähnliche Inhalte

Ähnlich wie The Dynamics of Micro-Task Crowdsourcing

Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 eswcsummerschool
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?Raffael Marty
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsVincenzo Gulisano
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Dr. Kim (Kyllesbech Larsen)
 
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...South Tyrol Free Software Conference
 
Streaming and Social Media
Streaming and Social MediaStreaming and Social Media
Streaming and Social MediaJoe Olson
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & StormOtto Mok
 
Predictive Analytics
Predictive AnalyticsPredictive Analytics
Predictive Analyticsrkpv2002
 
Predictive Analytics
Predictive AnalyticsPredictive Analytics
Predictive Analyticsrkpv2002
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresCrai Macdonald
 
How to break apart a monolithic system safely without destroying your team - ...
How to break apart a monolithic system safely without destroying your team - ...How to break apart a monolithic system safely without destroying your team - ...
How to break apart a monolithic system safely without destroying your team - ...Matthew Skelton
 
Teams and monoliths - Matthew Skelton - Velocity EU 2016
Teams and monoliths - Matthew Skelton - Velocity EU 2016Teams and monoliths - Matthew Skelton - Velocity EU 2016
Teams and monoliths - Matthew Skelton - Velocity EU 2016Skelton Thatcher Consulting Ltd
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...DevOps.com
 
Micro-Architectural Attacks on Cyber-Physical Systems
Micro-Architectural Attacks on Cyber-Physical SystemsMicro-Architectural Attacks on Cyber-Physical Systems
Micro-Architectural Attacks on Cyber-Physical SystemsHeechul Yun
 
Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016
Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016
Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016Skelton Thatcher Consulting Ltd
 
What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...
What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...
What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...Matthew Skelton
 
2017 Melbourne YOW! CTO Summit - Monolith to micro-services with CQRS & Event...
2017 Melbourne YOW! CTO Summit - Monolith to micro-services with CQRS & Event...2017 Melbourne YOW! CTO Summit - Monolith to micro-services with CQRS & Event...
2017 Melbourne YOW! CTO Summit - Monolith to micro-services with CQRS & Event...Douglas English
 

Ähnlich wie The Dynamics of Micro-Task Crowdsourcing (20)

Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.
 
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
 
Streaming and Social Media
Streaming and Social MediaStreaming and Social Media
Streaming and Social Media
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
 
Predictive Analytics
Predictive AnalyticsPredictive Analytics
Predictive Analytics
 
Predictive Analytics
Predictive AnalyticsPredictive Analytics
Predictive Analytics
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
 
How to break apart a monolithic system safely without destroying your team - ...
How to break apart a monolithic system safely without destroying your team - ...How to break apart a monolithic system safely without destroying your team - ...
How to break apart a monolithic system safely without destroying your team - ...
 
Teams and monoliths - Matthew Skelton - Velocity EU 2016
Teams and monoliths - Matthew Skelton - Velocity EU 2016Teams and monoliths - Matthew Skelton - Velocity EU 2016
Teams and monoliths - Matthew Skelton - Velocity EU 2016
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
P78
P78P78
P78
 
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
 
Micro-Architectural Attacks on Cyber-Physical Systems
Micro-Architectural Attacks on Cyber-Physical SystemsMicro-Architectural Attacks on Cyber-Physical Systems
Micro-Architectural Attacks on Cyber-Physical Systems
 
Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016
Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016
Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016
 
What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...
What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...
What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...
 
2017 Melbourne YOW! CTO Summit - Monolith to micro-services with CQRS & Event...
2017 Melbourne YOW! CTO Summit - Monolith to micro-services with CQRS & Event...2017 Melbourne YOW! CTO Summit - Monolith to micro-services with CQRS & Event...
2017 Melbourne YOW! CTO Summit - Monolith to micro-services with CQRS & Event...
 

Mehr von eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 

Mehr von eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
 
Dagstuhl2014
Dagstuhl2014Dagstuhl2014
Dagstuhl2014
 

Kürzlich hochgeladen

Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 

Kürzlich hochgeladen (20)

Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 

The Dynamics of Micro-Task Crowdsourcing

  • 1. The Dynamics of Micro-Task Crowdsourcing The Case of Amazon MTurk Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panos Ipeirotis, Philippe Cudré-Mauroux WWW’15 - 20th May 2015 - Florence 1
  • 2. Background Crowdsourcing is an Effective solution to certain classes of problems 2
  • 3. Background A Crowdsourcing Platform allows requesters to publish a crowdsourcing request (batch) composed of multiple tasks (HITs) Programmatically Invoke the crowd with APIs 3
  • 4. Background Paid Microtask Crowdsourcing scales-out but remains highly unpredictable 4
  • 5. Background Paid Microtask Crowdsourcing scales-out but remains highly unpredictable 5 time #HITs/ Minute Batch Throughput
  • 7. MTurk is a Marketplace for HITs Direct: Price, Time of the day, #workers, #HITs etc Other: Forums, Reputation-sys (TurkOpticon), Recommendation-sys (Openturk) 7
  • 9. 9
  • 10. ...Five Years Later [2009 - 2014] mturk-tracker collected 2.5Million different batches with over 130Million HITs 10
  • 11. mturk-tracker.com ● Collects metadata about each visible batch (Title, description, rewards, required qualifications, HITs available etc) ● Records batch progress (every ~20 minutes) We note that the tracker reports data periodically only and does not reflect fine-grained information (e.g., real-time variations) 11
  • 12. Menu 1. Notable Facts Extracted from the Data 2. Large-scale HIT Type Classification 3. Analyzing the Features Affecting Batch Throughput 4. Market Analysis 12
  • 13. 1) Notable Facts Extracted from the Data 13
  • 15. Country-Specific HITs Workers from US, India and Canada are the most sought after. 15
  • 16. Distribution of Batch Size 16 “Power-law”
  • 17. Evolution of Batch Sizes Very large batches start to appear 17
  • 18. HIT Pricing 18 Is 1-cent per HIT the norm?
  • 19. HIT Pricing 19 5-cents is the new 1-cent
  • 20. Requesters and Reward Evolution 20 Increasing number of New and Distinct Requesters
  • 21. 2) Large-scale HIT Type Classification 21
  • 22. Classify HITs into types (Gadiraju et. al 2014) - Information Finding (IF) - Verification and Validation (VV ) - Interpretation and Analysis (IA) - Content Creation (CC) - Surveys (SU) - Content Access (CA) 22 HIT Classes
  • 23. We trained a Support Vector Machine (SVM) model - HIT title, description, keywords, reward, date, allocated time, and batch size - Created labeled data on Mturk for 5,000 HITs uniformly sampled HITs - Our HIT used 3 repetitions - Consensus reached for 89% of the tasks - 10-fold cross validation - Precision of 0.895 - Recall of 0.899 - F-Measure of 0.895 - We then performed a large-scale classification for all 2.5M HITs Supervised Classification With the Crowd 23
  • 24. Distribution of HIT Types Less Content Access batches Content Creation being the most popular 24
  • 25. 3) Analyzing the Features Affecting Batch Throughput 25 time #HITs/ Minute Batch Throughput
  • 26. Batch Throughput Prediction 29 Features HIT Features HITs available, Start Time, Reward, Description length, Title length, Keywords, requester_id, Time_alloted, Task type, Age (minutes) etc. Market Features Total HITs available, HITs arrived, rewards Arrived, % HITs completed etc. 26
  • 27. Batch Throughput Prediction T time delta - Predict batch throughput at time T by training a Random Forest Regression model with samples taken in [T-delta, T) time span - 29 Features (including the Type of the Batch) - Hourly Data in range [June-October] 2014 - We sampled 50 times points for evaluation purposes 27
  • 28. Batch Throughput Prediction T time delta - Predict batch throughput at time T by training a Random Forest Regression model with samples taken in [T-delta, T) time span - 29 Features (including the Type of the Batch) - Hourly Data in range [June-October] 2014 - We sampled 50 times points for evaluation purposes We are interested in cases where prediction works reasonably 28
  • 29. Predicted vs. Actual Batch Throughput (delta=4 hours) Prediction Works best for larger batches having large momentum 29
  • 30. Significant Features - What features contribute best when the prediction works reasonably - We proceed by feature ablation - Re-run prediction by removing 1 feature at a time - 1000 samples 30
  • 31. Significant Features - What features contribute best when the prediction works reasonably - We proceed by feature ablation - Re-run prediction by removing 1 feature at a time. - 1000 samples HITs_Available (Number of tasks in the batch) Age_Minutes (how long ago the batch was created) 31
  • 32. 4) Market Analysis 32 Demand - The number of new tasks published on the platform by the requesters Supply - The workforce that the crowd is providing
  • 33. Supply Elasticity How does the market reacts when new tasks arrive on the platform? 33
  • 34. Supply Elasticity We regressed the percentage of work done (within 1 Hour) against the number of new HITs 34
  • 35. Supply Elasticity Intercept = 2.5 Slope = 0.5% 20% of new work gets completed within an hour 35
  • 36. Supply Elasticity Intercept = 2.5 Slope = 0.5% 20% of new work gets completed within an hour 36
  • 37. Demand and Supply Periodicity Demand Supply 37
  • 38. Demand and Supply Periodicity Strong weekly periodicity 7-10 days. 38
  • 39. Conclusions - Long time data analysis uncovers some hidden trends - Large scale HIT classification - Important features in throughput prediction (HITs available, Age_minutes) - Supply is Elastic - (More work available -> More work Done) - Supply and Demand are periodic (7-10days) 39
  • 40. Is a Crowdsourcing Marketplace the right paradigm for efficient and predictable crowdsourcing? 40
  • 41. Is a Crowdsourcing Marketplace the right paradigm for efficient and predictable crowdsourcing? 41 Q&A Djellel Difallah ded@exascale.info