With Amazon Mechanical Turk (MTurk), you can leverage the power of the crowd for a host of tasks ranging from image moderation and video transcription to data collection and user testing. You simply build a process that submit tasks to the Mechanical Turk marketplace and get results quickly, accurately, and at scale. In this session, Russ, from Rainforest QA, shares best practices and lessons learned from his experience using MTurk. The session covers the key concepts of MTurk, getting started as a Requester, and using MTurk via the API. You learn how to set and manage Worker incentives, achieve great Worker quality, and how to integrate and scale your crowdsourced application. By the end of this session, you will have a comprehensive understanding of MTurk and know how to get started harnessing the power of the crowd.
2. What to Expect from the Session
• Learn what Mechanical Turk (MTurk) is
• Understand the basics
• Learn about scaling beyond the basics
• How Rainforest leverages MTurk
3. Who am I?
Russell Smith
• CTO & Co-Founder of Rainforest QA
• Programmer
• MTurk Requester for ~5 years
• ~>250m questions through MTurk
• Can follow me on twitter — @rhs
4. What is Rainforest?
QA-as-a-Service: Fast Crowdsourced Testing for Web and
Mobile Apps thanks to Mechanical Turk:
• Customers write tests in plain English
• Results in ~30 minutes, anytime, 24x7
• Powered by humans
5. What is Mechanical Turk?
• Super early AWS service
• Public since 2005
• First invented in 2001
• 24 x 7, on-demand, programmatic interface to do Human
Intelligence Tasks (HITs)
• “Automate” the un-automatable
6. What is Mechanical Turk?
• Pay (lots of) humans to do (lots of) things. Classic things:
• Extract data from receipts
• Identify things in photos
• Search for data for you (find the phone number of XYZ restaurant)
• Transcribe audio
• More hip / upcoming things
• Data science – build ground truth for machine learning and AI
8. Marketplace
• Connects Workers and Requesters
• Requesters are you!
• Web-interface where Workers execute your tasks
• Searchable list of HITs, Workers pick
9. Requester interface
1. Select a template
2. Provide info on your task and how
much you want to pay.
3. Design the layout of your task
4. Load your variables
5. Publish
10. Requester interface
- The results of your task can be viewed in the Manage tab.
- This is also where you can view and manage your Workers.
11. Worker interface
- Workers visit mturk.com
to find HITs they want to
work on.
- Description, reward, and
reputation all matter in
determining if your work
gets done.
12. Worker interface
- Workers can choose to Accept
a HIT or Skip to the next one in
a set.
- Once they’ve accepted the HIT
they have until the allotted time
has expired to Submit.
- Workers can also Return the
task if they decide they don’t
want to complete it.
14. Basics - Task design
Design is critical:
• Bad tasks = bad reputation + bad results
• Unclear tasks = bad reputation + bad results
• Good tasks ~= good reputation + good results
15. Basics - Task design
My rules:
1. Have instructions and/or rules
2. Must be clear to understand (note, not necessarily simple)
3. Must protect against mistakes or fraud
4. Have a fair price
5. Include a feedback field
16. Basics - Task design
Ask:
• Can the worker get in a groove and churn through tasks?
• Can anyone read the instructions and do this right?
• Do we need to qualify the workers?
17. Basics - Task design
Pricing iteration
1. Work out a budget per assignment
2. Do a small run
3. Verify quality vs speed* of results
4. Fix your task, optimize spend** and goto 4 (repeat forever)
* Qualifications, SEO, # of workers
** Payment, repetition, requirements
21. Workers
• Motivations
• Earn money
• Status
• Incentives
• Leveling up
• Pride
• Expectations
• Traditionally being treated like an API
• Now; being treated like a human
• Fairness, transparency
24. Community
- Retention is key
- Finding the leaders
- Worker enablement
- Help Workers improve
- We do: video tutorials, community forum, clear rules, automated training, re-training
- Ask them what they need!
- Listen to complaints
- Add a comment box to your tasks to collect feedback
- NPS
25. Community
- Handling Workers that you don’t want doing your tasks
- Rejecting
- Qualifications
- Blocking
- Finding spammers and cheaters
- Join the external forums
- Your reputation matters
27. Hits
- HITType
- HIT
- Assignments
- Notifications
HITType
HIT
Assignment Assignment
Assignment Assignment
HIT
Assignment Assignment
Assignment Assignment
HIT
Assignment Assignment
Assignment Assignment
Notification:
Reviewable
28. Useful API operations
CreateHIT Create new tasks for Workers to do.
GetAccountBalance Check the funding available for publishing new tasks.
RevokeQualification /
GrantQualification
Modify the Qualifications assigned to Workers.
ForceExpireHIT Immediately remove a HIT from MTurk.
GetAssignment The status and results from an Assignment.
NotifyWorkers Send a message to your Workers.
GrantBonus Provide a bonus payment to Workers.
Use the Sandbox environment to experiment with creating and
responding to HITs without spending money.
29. Question types
• QuestionForm – XML defined questions.
• HTMLQuestion – HTML form based questions.
• ExternalQuestion – Questions hosted on your own website.
30. Review Policies
- Review Policies can be specified in your CreateHIT call to automatically
evaluate Worker submissions.
- Assignment-level policies can be used to validate Worker responses to
known answers.
- HIT-level policies look for consensus amongst Workers on each HIT.
B B C
B C B
B B
• Imagine you want to ask six Workers
and get 75% agreement.
• If two Workers disagree, the policy
will add additional Assignments until
there is agreement.
38. Scaling - Rainforest v1
• Initially linked jobs to HITs 1:1
• Balanced a list of HITs against an internal list of jobs
• Constantly pulling on / off MTurk when jobs were added,
cancelled, changed.
Jobs HITs
39. Scaling - Rainforest v2
• Decoupled jobs from HITs
• Balance list of HITs against an internal list of jobs
• Qualifications, constantly pulling on / off MTurk
Jobs HITs
40. Scaling - Rainforest v3
• Unbalanced job / HITs - no 1:1 ratio, allowing for more
SEO and higher chance of workers finding us
• Stopped using Qualifications
Jobs HITs