SlideShare a Scribd company logo
1 of 38
Mturk > Machine Learning BhaskarRao, Polyvore 1
What is Polyvore? An online fashion community 2
Discover your style
How Big is Polyvore? 6/24/11 4 1M sets Sets created monthly 7 minutes Average time on site 10M visitors Unique visitors to Polyvore monthly 1.5M clips Images clipped monthly 12.4% Of Polyvore’s users visit 100+ times monthly 140M views Pageviews on Polyvore monthly
Polyvores in the Wild General Behavior Collect & Create Clip from the Internet, organize, tag. Create sets, make collections. Consume Explore, search, browse. Like stuff, leave comments, build social networks. Share Embed in an offsite instance. Get alerts for offsite activity. 5
What is the Mechanical Turk? 11
The Turk (circa 1770)  Invented in 1770 by Mr.Wolfgang. The first “machine” that could play chess. Beat challengers like Napoleon and Benjamin Franklin. Hoax revealed 50 years later. 12 wikipedia.com
Amazon Mechanical Turk	(circa 2007) Artificial AI  Crowd-sourced marketplace HIT = Question 24/7 ; 100,000s of on-demand workers 13
14 Amazon Mechanical Turk	(circa 2007) mturk.com
Why Turk? 15
The power of The Turk. Surveys Startup idea validation  Training Classifiers Gathering data  Attempt to find Jim Gray. Validating recommendations  Removing Porn Art 16
Power of the Turk : Wisdom of Crowds 17 Source – crowdflower.com
Power of the Turk : Replacing Journalism? 18 mybossisarobot.com
(Mturk > ML if (problem == hard OR time == startup) Mturk is a fantastic resource for startups!  Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). …..  Porn Removal Find the official website of Chanel Is Chanel a fashion brand ? 19
(Mturk > ML if (problem == hard OR time == startup OR….) Mturk is a fantastic resource for startups!  Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). ….. POSSIBLE Porn / Not porn classification Find the official website of Chanel Perfect logo of Chanel 20
(Mturk > ML if (problem == hard OR time == startup OR….) Mturk is a fantastic resource for startups!  Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). …..  Porn / Not porn classificationHARD Find the official website of Chanel Is Chanel a fashion brand ? 21
(Mturk > ML if (problem == hard OR time == startup OR….) Mturk is a fantastic resource for startups!  Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). …..  Porn / Not porn classification Find the official website of ChanelYIKES! Is Chanel a fashion brand ? 22
(Mturk > ML if (problem == hard OR time == startup OR….) Mturk is a fantastic resource for startups!  Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). …..  Porn / Not porn classification Find the official website of Chanel Is Chanel a fashion brand  IMPOSSIBLE? 23
How to Turk ? The basics.... Designing complex crowdsourcing tasks is hard Stick to simple tasks Iterate 24
The golden rule     We are all human…you and I and mturk. Say hello at www.turkernation.com Get feedback Be fair Do not get ripped off  25
Ready, Set, Fire… Is this website an e-commerce store? Fire 50 questions 60% accuracy  FAIL ! 26 Twitter.com
How to design a HIT ? 27
Supervision needed….. 28
Retry 50 questions. Allow only reputed workers New HIT design after feedback That should do it, right ? 29
80%
Better? NO! Call a crowdsourcing company ? Hire an army? Write classifier? 31
EUREKA – The golden rule REDUX Qualification Tests … duh! So very overlooked or so very obvious ? Automate it all. Training data for Mturk ? 32
97%
The process	(successful mturk recipe) Design a “HIT” Iterate on Design Answer a few tricky ones. Upload the HITs  Go home and drink beer and watch reruns Next day -> 87+% accuracy  (usually). 34
Best Practices Automate it all … $ASK->ask($Question, $Options) $ASK->final_answer() 35
Maximum Awesome What happens if you meld a Classifier, Mturk and yourself into an Unholy Q&A System. Answer a few questions, and the system self-calibrates. NEXT TECH TALK… 36
Thank You Questions? bhaskar@polyvore.com www.polyvore.com/cgi/about
MTurk > Machine Learning

More Related Content

Viewers also liked

感恩惜福更種福
感恩惜福更種福感恩惜福更種福
感恩惜福更種福Caspar Wang
 
BCI Puzzle Children’s Library Furniture
BCI Puzzle Children’s Library FurnitureBCI Puzzle Children’s Library Furniture
BCI Puzzle Children’s Library FurnitureBCIEurobib
 
Thang danh gia dinh duong trong dat
Thang danh gia dinh duong trong datThang danh gia dinh duong trong dat
Thang danh gia dinh duong trong datcinnamonVY
 
Windows Live Writer rebcorl bar camp 2010
Windows Live Writer rebcorl bar camp 2010Windows Live Writer rebcorl bar camp 2010
Windows Live Writer rebcorl bar camp 2010Chris Griffith
 
Big 6 first3
Big 6 first3Big 6 first3
Big 6 first3BISS
 
Linked In for business
Linked In for businessLinked In for business
Linked In for businessalancole99
 
UKS orientation
UKS orientationUKS orientation
UKS orientationneilp9
 
http://taiwanheart.ning.com
http://taiwanheart.ning.comhttp://taiwanheart.ning.com
http://taiwanheart.ning.commuchmm
 
Tech tools for building and managing a remote culture
Tech tools for building and managing a remote cultureTech tools for building and managing a remote culture
Tech tools for building and managing a remote cultureTony Summerville
 
3D Camp 2013 - Build your own Weather Station
3D Camp 2013 - Build your own Weather Station3D Camp 2013 - Build your own Weather Station
3D Camp 2013 - Build your own Weather StationBrian O'Donovan
 

Viewers also liked (17)

Shoes, Ruby
Shoes, RubyShoes, Ruby
Shoes, Ruby
 
Word beach2012
Word beach2012Word beach2012
Word beach2012
 
感恩惜福更種福
感恩惜福更種福感恩惜福更種福
感恩惜福更種福
 
BCI Puzzle Children’s Library Furniture
BCI Puzzle Children’s Library FurnitureBCI Puzzle Children’s Library Furniture
BCI Puzzle Children’s Library Furniture
 
Thang danh gia dinh duong trong dat
Thang danh gia dinh duong trong datThang danh gia dinh duong trong dat
Thang danh gia dinh duong trong dat
 
Windows Live Writer rebcorl bar camp 2010
Windows Live Writer rebcorl bar camp 2010Windows Live Writer rebcorl bar camp 2010
Windows Live Writer rebcorl bar camp 2010
 
Big 6 first3
Big 6 first3Big 6 first3
Big 6 first3
 
WordBeach @kurudrive
WordBeach @kurudriveWordBeach @kurudrive
WordBeach @kurudrive
 
Cuenta
CuentaCuenta
Cuenta
 
Linked In for business
Linked In for businessLinked In for business
Linked In for business
 
UKS orientation
UKS orientationUKS orientation
UKS orientation
 
http://taiwanheart.ning.com
http://taiwanheart.ning.comhttp://taiwanheart.ning.com
http://taiwanheart.ning.com
 
Tech tools for building and managing a remote culture
Tech tools for building and managing a remote cultureTech tools for building and managing a remote culture
Tech tools for building and managing a remote culture
 
Membership Crossroads
Membership CrossroadsMembership Crossroads
Membership Crossroads
 
一輩子珍藏
一輩子珍藏一輩子珍藏
一輩子珍藏
 
E kataloga-lietosana
E kataloga-lietosanaE kataloga-lietosana
E kataloga-lietosana
 
3D Camp 2013 - Build your own Weather Station
3D Camp 2013 - Build your own Weather Station3D Camp 2013 - Build your own Weather Station
3D Camp 2013 - Build your own Weather Station
 

Similar to MTurk > Machine Learning

whisper.aclive (1).pdf
whisper.aclive (1).pdfwhisper.aclive (1).pdf
whisper.aclive (1).pdfDayanidhiDalei
 
Presentación financiación startups redradix
Presentación financiación startups   redradixPresentación financiación startups   redradix
Presentación financiación startups redradixRedradix
 
"Great Content" - from Cliche to Competence
"Great Content" - from Cliche to Competence"Great Content" - from Cliche to Competence
"Great Content" - from Cliche to CompetenceRohan Ayyar
 
Projects Colman2010 Part2
Projects Colman2010 Part2Projects Colman2010 Part2
Projects Colman2010 Part2Shai Wolkomir
 
Same and different - architectures for mass-uniqueness
Same and different - architectures for mass-uniquenessSame and different - architectures for mass-uniqueness
Same and different - architectures for mass-uniquenessTetradian Consulting
 
Ken courtright at ceo space 515 2
Ken courtright at ceo space 515 2Ken courtright at ceo space 515 2
Ken courtright at ceo space 515 2Ken Courtright
 
Product Design Canterbury 2019
Product Design Canterbury 2019Product Design Canterbury 2019
Product Design Canterbury 2019R. Sosa
 
Session 5 - Ignitor Bootcamp - 3 July 2015
Session 5 - Ignitor Bootcamp - 3 July 2015 Session 5 - Ignitor Bootcamp - 3 July 2015
Session 5 - Ignitor Bootcamp - 3 July 2015 Co-founder Ignitor
 
45 marketing wins in 45 minutes
45 marketing wins in 45 minutes45 marketing wins in 45 minutes
45 marketing wins in 45 minutesDarren Hart
 
How To Strategically Differentiate Your Web 2.0 Idea From Others?
How To Strategically Differentiate Your Web 2.0 Idea From Others?How To Strategically Differentiate Your Web 2.0 Idea From Others?
How To Strategically Differentiate Your Web 2.0 Idea From Others?Apisilp Trunganont
 

Similar to MTurk > Machine Learning (20)

Stuart
StuartStuart
Stuart
 
whisper.aclive (1).pdf
whisper.aclive (1).pdfwhisper.aclive (1).pdf
whisper.aclive (1).pdf
 
032411 econ entrep intro 50m
032411 econ entrep intro 50m032411 econ entrep intro 50m
032411 econ entrep intro 50m
 
CeoSpace 1214
CeoSpace 1214CeoSpace 1214
CeoSpace 1214
 
Presentación financiación startups redradix
Presentación financiación startups   redradixPresentación financiación startups   redradix
Presentación financiación startups redradix
 
"Great Content" - from Cliche to Competence
"Great Content" - from Cliche to Competence"Great Content" - from Cliche to Competence
"Great Content" - from Cliche to Competence
 
Projects Colman2010 Part2
Projects Colman2010 Part2Projects Colman2010 Part2
Projects Colman2010 Part2
 
Same and different - architectures for mass-uniqueness
Same and different - architectures for mass-uniquenessSame and different - architectures for mass-uniqueness
Same and different - architectures for mass-uniqueness
 
The Art of the Business Model
The Art of the Business ModelThe Art of the Business Model
The Art of the Business Model
 
Ken courtright at ceo space 515 2
Ken courtright at ceo space 515 2Ken courtright at ceo space 515 2
Ken courtright at ceo space 515 2
 
Ken at nwa 2015
Ken at nwa 2015Ken at nwa 2015
Ken at nwa 2015
 
Product Design Canterbury 2019
Product Design Canterbury 2019Product Design Canterbury 2019
Product Design Canterbury 2019
 
Never Work Again 14
Never Work Again 14Never Work Again 14
Never Work Again 14
 
Session 5 - Ignitor Bootcamp - 3 July 2015
Session 5 - Ignitor Bootcamp - 3 July 2015 Session 5 - Ignitor Bootcamp - 3 July 2015
Session 5 - Ignitor Bootcamp - 3 July 2015
 
45 marketing wins in 45 minutes
45 marketing wins in 45 minutes45 marketing wins in 45 minutes
45 marketing wins in 45 minutes
 
How To Strategically Differentiate Your Web 2.0 Idea From Others?
How To Strategically Differentiate Your Web 2.0 Idea From Others?How To Strategically Differentiate Your Web 2.0 Idea From Others?
How To Strategically Differentiate Your Web 2.0 Idea From Others?
 
Rewind Pitch Deck
Rewind Pitch DeckRewind Pitch Deck
Rewind Pitch Deck
 
Crowdstorm CrossCampus
Crowdstorm CrossCampusCrowdstorm CrossCampus
Crowdstorm CrossCampus
 
90^1: Learn Smarter
90^1: Learn Smarter90^1: Learn Smarter
90^1: Learn Smarter
 
160921 exponentiality wtf
160921 exponentiality wtf160921 exponentiality wtf
160921 exponentiality wtf
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

MTurk > Machine Learning

  • 1. Mturk > Machine Learning BhaskarRao, Polyvore 1
  • 2. What is Polyvore? An online fashion community 2
  • 4. How Big is Polyvore? 6/24/11 4 1M sets Sets created monthly 7 minutes Average time on site 10M visitors Unique visitors to Polyvore monthly 1.5M clips Images clipped monthly 12.4% Of Polyvore’s users visit 100+ times monthly 140M views Pageviews on Polyvore monthly
  • 5. Polyvores in the Wild General Behavior Collect & Create Clip from the Internet, organize, tag. Create sets, make collections. Consume Explore, search, browse. Like stuff, leave comments, build social networks. Share Embed in an offsite instance. Get alerts for offsite activity. 5
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. What is the Mechanical Turk? 11
  • 12. The Turk (circa 1770) Invented in 1770 by Mr.Wolfgang. The first “machine” that could play chess. Beat challengers like Napoleon and Benjamin Franklin. Hoax revealed 50 years later. 12 wikipedia.com
  • 13. Amazon Mechanical Turk (circa 2007) Artificial AI Crowd-sourced marketplace HIT = Question 24/7 ; 100,000s of on-demand workers 13
  • 14. 14 Amazon Mechanical Turk (circa 2007) mturk.com
  • 16. The power of The Turk. Surveys Startup idea validation Training Classifiers Gathering data Attempt to find Jim Gray. Validating recommendations Removing Porn Art 16
  • 17. Power of the Turk : Wisdom of Crowds 17 Source – crowdflower.com
  • 18. Power of the Turk : Replacing Journalism? 18 mybossisarobot.com
  • 19. (Mturk > ML if (problem == hard OR time == startup) Mturk is a fantastic resource for startups! Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). ….. Porn Removal Find the official website of Chanel Is Chanel a fashion brand ? 19
  • 20. (Mturk > ML if (problem == hard OR time == startup OR….) Mturk is a fantastic resource for startups! Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). ….. POSSIBLE Porn / Not porn classification Find the official website of Chanel Perfect logo of Chanel 20
  • 21. (Mturk > ML if (problem == hard OR time == startup OR….) Mturk is a fantastic resource for startups! Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). ….. Porn / Not porn classificationHARD Find the official website of Chanel Is Chanel a fashion brand ? 21
  • 22. (Mturk > ML if (problem == hard OR time == startup OR….) Mturk is a fantastic resource for startups! Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). ….. Porn / Not porn classification Find the official website of ChanelYIKES! Is Chanel a fashion brand ? 22
  • 23. (Mturk > ML if (problem == hard OR time == startup OR….) Mturk is a fantastic resource for startups! Classify 10000 sites as store2-3 Weeks of researcher time (7000$+) vs1 day of Mturk ($500). ….. Porn / Not porn classification Find the official website of Chanel Is Chanel a fashion brand IMPOSSIBLE? 23
  • 24. How to Turk ? The basics.... Designing complex crowdsourcing tasks is hard Stick to simple tasks Iterate 24
  • 25. The golden rule We are all human…you and I and mturk. Say hello at www.turkernation.com Get feedback Be fair Do not get ripped off 25
  • 26. Ready, Set, Fire… Is this website an e-commerce store? Fire 50 questions 60% accuracy  FAIL ! 26 Twitter.com
  • 27. How to design a HIT ? 27
  • 29. Retry 50 questions. Allow only reputed workers New HIT design after feedback That should do it, right ? 29
  • 30. 80%
  • 31. Better? NO! Call a crowdsourcing company ? Hire an army? Write classifier? 31
  • 32. EUREKA – The golden rule REDUX Qualification Tests … duh! So very overlooked or so very obvious ? Automate it all. Training data for Mturk ? 32
  • 33. 97%
  • 34. The process (successful mturk recipe) Design a “HIT” Iterate on Design Answer a few tricky ones. Upload the HITs Go home and drink beer and watch reruns Next day -> 87+% accuracy (usually). 34
  • 35. Best Practices Automate it all … $ASK->ask($Question, $Options) $ASK->final_answer() 35
  • 36. Maximum Awesome What happens if you meld a Classifier, Mturk and yourself into an Unholy Q&A System. Answer a few questions, and the system self-calibrates. NEXT TECH TALK… 36
  • 37. Thank You Questions? bhaskar@polyvore.com www.polyvore.com/cgi/about

Editor's Notes

  1. Add a polyvore stats slide.
  2. The precursor to IBM Deep Blue ? Hoax revealed 1820!
  3. Artificial AI (AAI) = APIs can completely hide the human from your computing systems.Crowd-sourced marketplace. Pennies for answers!Questions asked in the form of HITs (Human Intelligence Tasks).24/7 ; 100,000s of flexible workers who could be doing your HITs
  4. Surveys and opinions.Startup idea validation (swayable.com)Validating / training Classifiers. (twitter sentiment analysis)Gathering data (location, phone numbers, twitter handles from the web)Finding Jim Gray.Validating recommendations generated by your recommendation engines.Keeping Porn away from your site.Art (10000 sheep project).
  5. Mentioncrowdflower blog.
  6. Not doing so well…….Raison d’etre for CrowdFlowerUses Crowd Forge Quora – What are the most crazy mturk uses ?
  7. Mturk is a fantastic resource for startups! (except if….)Sweet spot of volume and accuracy ……Classify 10000 sites as store: 2-3 Weeks of researcher time (7000$+) vs 1 day of Mturk ($500). ….. Possible ?Porn / Not porn classification: Hard…..Find the official website of Chanel : Yikes!Perfect logo of Chanel : Impossible
  8. Mturk is a fantastic resource for startups! (except if….)Classify 10000 sites as store: 2-3 Weeks of researcher time (7000$+) vs 1 day of Mturk ($500). ….. Possible ?Porn / Not porn classification: Hard…..Find the official website of Chanel : Yikes!Perfect logo of Chanel : Impossible
  9. Mturk is a fantastic resource for startups! (except if….)Classify 10000 sites as store: 2-3 Weeks of researcher time (7000$+) vs 1 day of Mturk ($500). ….. Possible ?Porn / Not porn classification: Hard…..Find the official website of Chanel : Yikes!Perfect logo of Chanel : Impossible
  10. Mturk is a fantastic resource for startups! (except if….)Classify 10000 sites as store: 2-3 Weeks of researcher time (7000$+) vs 1 day of Mturk ($500). ….. Possible ?Porn / Not porn classification: Hard…..Find the official website of Chanel : Yikes!Perfect logo of Chanel : Impossible
  11. Mturk is a fantastic resource for startups! (except if….) ML = MACHINE LEARNINGClassify 10000 sites as store: 2-3 Weeks of researcher time (7000$+) vs 1 day of Mturk ($500). ….. Possible ?Porn / Not porn classification: Hard…..Find the official website of Chanel : Yikes!Perfect logo of Chanel : Impossible
  12. The basics....Designing Complex Crowdsourcing solution is hard….there are startups…..Stick to simple tasks, or break tasks into simple tasks.Be a startup (iterate) : deploy, gather results, modify…rinse and repeat.
  13. Do not blockRarely reject esp for highly qualified workers.Make sure you pay proportinoate to time taken for the tasks.
  14. Make this image bigger
  15. 98% acceptance rate.
  16. Reading thro the docs…..custom qualification tests.Why golden rule,,,,we always have tests, interviews to select workers, so it should be with mturk.Automate: Programmer answers a few questions and the system creates a test for the mturk tasks.
  17. Create an “Ask”Answer a few tricky ones. To create the Qualification test automatically.Upload the HITs with the Qualification Test (you automated it right ?)Go home and Watch a rerun of “Game of Thrones”Come back next to get a warm and fuzzy 87+% accuracy.
  18. If you do not automate, why bother hiding the human.bad soap API, bad docs.$ASK->approve(ANSWER)$ASK->reject(ANSWER)$ASK->final_answer()$ASK->grant_bonus(WORKER)All test generation, fetch answers, approve, reject answers, formatting etc. hidden using internal processes and APIs
  19. Create an “Ask”Answer a few tricky ones.Upload the HITs with the Qualification Test (you automated it right ?)Go home and Watch a rerun of “Game of Thrones”Come back next to get a warm and fuzzy 87+% accuracy.