Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Beyond monetary incentives: experiments with paid microtasks

539 Aufrufe

Veröffentlicht am

Experiments using gamification, social incentives and contests in the context of paid microtask crowdsourcing, presentation at Data Science with Human in the Loop in Amsterdam, 09/2017

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

Beyond monetary incentives: experiments with paid microtasks

  1. 1. BEYOND MONETARY INCENTIVES: EXPERIMENTS IN PAID MICROTASK CROWDSOURCING Elena Simperl @esimperl Data Science with Humans in the Loop Amsterdam, September 14-15, 2017 1
  2. 2. THIS IS ME Computer scientist (Web science, Semantic Web, crowd computing) Based at University of Southampton, UK Working in  Web-based socio-technical systems  Crowdsourcing and human computation  Human data interaction  Open innovation 2
  3. 3. THEORY OF MOTIVATION Love and glory keep costs down Money and glory deliver faster 3 LOVE MONEY GLORY
  4. 4. PAID MICROTASKS Money makes the crowd work faster* How about love and glory? *[Mason &Watts, 2009] 4
  5. 5. EXPERIMENT 1 Make paid microtasks more cost-effective w/ gamification Workers will perform better if tasks are more engaging  Increased accuracy through higher inter-annotator agreement  Cost savings through reduced unit costs Micro-targeting incentives when players attempt to quit improves retention 5
  6. 6. MICROTASK DESIGN Image labelling tasks, published on microtask platform  Free-text labels, varying numbers of labels per image, taboo words  Workers can skip images, play as much as they want Baseline: ‘standard’ tasks w/ basic spam control vs Gamified: same requirements & rewards, but crowd asked to complete tasks in Wordsmith vs Gamified & furtherance incentives: additional rewards to stay (random, personalised) 6
  7. 7. LOVE & GLORY Gamification Levels – 9 levels from ‘newbie’ to ‘Wordsmith’, function of # images tagged Badges – function of number of images tagged Bonus points – for new tags Treasure points – for multiples of bonus points Leaderboard - hourly scores and top 5 players Feedback alerts - related to badges, points, levels Activities widget – real-time updates on other players Furtherance incentives Leaderboard - ‘Global’ leaderboard seen by everyone Badges –’Ultimate’ badge and avatar Levels – go straight to the next level Access - quicker access to treasure points Power – see how other players tag Money – 5 cents extra 7
  8. 8. EVALUATION ESP data set as gold standard #labels, agreement, mean & max #labels/worker Three tasks  Nano: 1 image  Micro: 11 images  Small: up to 2000 images Probabilistic reasoning to predict worker exit and personalize furtherance incentives 8
  9. 9. RESULTS (GAMIFICATION, 1 IMAGE) BETTER, CHEAPER, BUT FEWER WORKERS 9 Metric CrowdFlower Wordsmith Total workers 600 423 Total keywords 1,200 41,206 Unique keywords 111 5,708 Avg. agreement 5.72% 37.7% Avg. images/person 1 32 Max images/person 1 200
  10. 10. RESULTS (GAMIFICATION, 11 IMAGES) COMPARABLE QUALITY, HIGHER UNIT COSTS, FEWER DROPOUTS 10 Metric CrowdFlower Wordsmith Total workers 600 514 Total keywords 13,200 35,890 Unique keywords 1,323 4,091 Avg. agreement 6.32% 10.9% Avg. images/person 11 27 Max images/person 1 351
  11. 11. RESULTS (WITH FURTHERANCE INCENTIVES) MORE ENGAGEMENT, TARGETING WORKS Increased participation  People come back (20 times) and play longer (43 hours vs 3 hours without incentives)  Financial incentives play important role Targeted incentives work  77% players stayed vs. 27% in the randomised condition  19% more labels compared to no incentives condition 11
  12. 12. EXPERIMENT 2 Make paid microtasks more cost-effective w/ social incentives Working in pairs is more effective than the baseline  Increased higher inter-annotator agreement  Higher output Social incentives improve retention past payment threshold 12
  13. 13. MICROTASK DESIGN Image labelling tasks published on microtask platform  Free-text labels, varying numbers of labels per image, taboo words Baseline: ‘standard’ tasks w/ basic spam control vs Pairs: Wordsmith-based, randomly formed pairs, people join and leave all the time, in time more partner switches vs Pairs & social incentives: let’s play vs please stay offered to worker when we expect their partner to leave 13
  14. 14. INCENTIVES 14 No global leaderboard Empathic social pressure: stay (and help your partner get paid) Social flow: keep playing and having fun together
  15. 15. EVALUATION ESP data set as gold standard Evaluated #labels, agreement, avg/max #labels/worker Two tasks  Low threshold: 1 image  High threshold: 11 images Probabilistic reasoning to predict worker exit* and offer social incentive * [Kobren et al, 2015] extended w/ utility features 15
  16. 16. RESULTS (COLLABORATION) BETTER, CHEAPER, FEWER WORKERS, ADDS COMPLEXITY 16
  17. 17. RESULTS (SOCIAL INCENTIVES) IMPROVED RETENTION, PLEASE STAY MORE EFFECTIVE 17
  18. 18. SUMMARY OF FINDINGS Social incentives generate more tags and improve retention Social dynamics: different responses if partner has been paid or not  Paid worker 76% more likely to stay after social pressure, unpaid worker: 95% more likely to stay  Paid workers annotate more if they decide to stay than unpaid workers Social flow more effective than social pressure in generating more tags: 99% of unpaid workers are likely to stay Social pressure works more often overall 18
  19. 19. EXPERIMENT 3 Make real-time crowdsourcing affordable Workers compete against each other in a live contest  Contest produces accurate answers faster  Task thresholds and reward spreads affect volume of work and retention 19
  20. 20. MICROTASK DESIGN Twitter labelling tasks published on microtask platform  NER (people, places, organisations etc.) Baseline: ‘standard’ task w/ basic spam control vs Live contest: Wordsmith-based, different reward spreads, different task thresholds. 20
  21. 21. EVALUATION Four Twitter datasets w/ gold standard Compared to baseline from [Feyisetan et al, 2015] Evaluated F1, time/entity, #labels, #labels/worker, #labels/top-10, exit prediction, #workers Two tasks  Low threshold: 1 tweet  High: 10 tweets Probabilistic reasoning to predict worker exit 21
  22. 22. SUMMARY OF FINDINGS With twice the task speed, contests could potentially serve as a real-time task model An increase in reward spread leads to more tasks completed by the best workers Increasing the task threshold within a reward spread reduces the number of tasks completed Workers exit a task when they perceive an overall loss of utility accrued by remaining  Tasks with high rewards and low task thresholds attract workers to stay on longer 22
  23. 23. CONCLUSIONS Monetary incentives are just the tip of the iceberg Layering other incentives on top of payments works How do workers assess the utility of a task? Does time change any of the findings and how? 23
  24. 24. E.SIMPERL@SOTON.AC.UK @ESIMPERL QROWD-PROJECT.EU WDAQUA.EU 9/15/2017 24 Improving paid microtasks through gamification and adaptive furtherance incentives. O Feyisetan, E Simperl, M Van Kleek, N Shadbolt. 24th International Conference on World Wide Web, 333-343, 2015 Social Incentives in Paid Collaborative Crowdsourcing. O Feyisetan, E Simperl. ACM Transactions on Intelligent Systems and Technology (TIST), 8 (6), to appear, 2017

×