SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Crowd Agents
                            Interactive Crowd-Powered Systems in the Real World




                                                     Jeffrey P. Bigham
                                                     University of Rochester
University of Rochester Human-Computer Interaction                   Jeffrey P. Bigham
Crowd Agents
                            Interactive Crowd-Powered Systems in the Real World




                                                     Jeffrey P. Bigham
                                                     University of Rochester
University of Rochester Human-Computer Interaction                   Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                 VizWiz            Crowd Agents    Scribe


              Human Assistance in History




What the Disability Community
Can Teach Us About Interactive
Crowdsourcing. Jeffrey P.
Bigham and Richard Ladner.
Iteractions magazine. July 2011.




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                                  Connectivity




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents             Scribe




University of Rochester Human-Computer Interaction                  Courtesy Jeffrey P. Brabyn
                                                                             of John Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                          Remote Assistance




                  Video Relay Services




                                                      Real-time
                                                      Captioning

University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz             Crowd Agents    Scribe


                      Connectivity -> Crowd
                     Mechanical Turk




                                       Friends and Family
                                         on Social Networks

University of Rochester Human-Computer Interaction                   Jeffrey P. Bigham
Introduction                  VizWiz             Crowd Agents               Scribe


                                          VizWiz




                        Bigham et al. Nearly Real-Time Answers to Visual Questions. UIST 2010.
University of Rochester Human-Computer Interaction                              Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                          Access Technology
     •    Optical Character Recognition
     •    Color Recognizers
     •    Talking GPS
     •    …

           Problems
           1. Limited Scope
           2. Unacceptable Error Rate
           3. $$$
           4. Not Exactly What Users Want


University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                             Releasing VizWiz
     • Released on May 31, 2011
          – 5000 users asked more than 50,000 questions
          – answers in less than a minute




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz             Crowd Agents               Scribe


                 Recruiting Crowd Quickly
                                                         How many workers do we need?
                                                         - number of current workers
                                                         - likelihood of needing more workers




 Post jobs or remove jobs                                 Turkers answer multiple questions




                        Turkit
       For $4/hr goes down to under 30s from start to finish.
                                                                   quikturkit.googlecode.com
                        Bigham et al. Nearly Real-Time Answers to Visual Questions. UIST 2010.
University of Rochester Human-Computer Interaction                              Jeffrey P. Bigham
Introduction                  VizWiz              Crowd Agents      Scribe


           Characterization of the Crowd




                                                     - Workers Come and Go
                                                     - Some May Do the Wrong Thing

University of Rochester Human-Computer Interaction                      Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents                    Scribe


      Supporting a Continuous Interaction?

                                                               Where’s the coffee?
                                                               Walk to end of this hall, turn right.
                                                               Turn right into the kitchen.
   Where’s the                                                 Soda on left, coffee on the right
                                                               How do I use this machine?
   coffee?




University of Rochester Human-Computer Interaction                                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                  Model for Crowd Agents




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents               Scribe


                  Model for Crowd Agents




                                                       Input Mediation


                                                                    Learning

University of Rochester Human-Computer Interaction                             Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                  Model for Crowd Agents




                                  • What interface is being controlled?
                                  • How is input mediation done?
                                  • Role of automated agents?


University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                                          Chorus




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                        VizWiz                                       Crowd Agents                                                                         Scribe


                       Legion: Control of Any Interface


                                                              Input Media on
                                                                                                                    Legion Server                                          - video stream
                                                                                                                   Flash Media Server -                                    - task description
                                                                                                                      Input Mediators -                                    - crowd agreement/payment info
                                                                                    - video stream                          quikTurkit -
                                                                                    - task description
                                                                                                                                                 - worker input
                                                                                                                                                 (key presses, mouse clicks)            Worker Interface
                                                                                                         - mediated input
                                                                    Legion Client

                 250                 8/10
                              8/10
                 200
                                            10/10                                                                                                                                                           Explanation of
    Time (sec)




                                                                                                                                                                                                            controls, and feedback
                 150                                                                                                                                                                                        regarding current
                                                                                                                                                                                                            bonus level (tied to
                                                                                                                                                                                                            crowd agreement).
                                                     10/10
                 100
                       4/10
                  50

                  0                                                                                                                                                    Feedback reflecting worker’s
                       Solo   Mob    Vote   Active   Leader                                                                                                            last key press, and whether
                                                                                                                                                                       the interface last followed
                                                                                                                                      multiple workers                 the crowd or the worker.




                         W. Lasecki, S. White, K. Murray, R. Miller, and J.P. Bigham “Real-Time Control of
                         Existing Interfaces.” UIST 2011.
University of Rochester Human-Computer Interaction                                            Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                   VizWiz                 Crowd Agents                   Scribe


                               Crowd Memory




                          W.S. Lasecki, S.C. White, K.I. Murray and J.P. Bigham. “Crowd Memory: Learning in
                          the Collective.” Collective Intelligence 2012.
University of Rochester Human-Computer Interaction                                        Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                              Crowd Memory




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                    VizWiz                  Crowd Agents                    Scribe


         Deployable Activity Recognition




                         W.S. Lasecki, Y. Song, H. Kautz, and J.P. Bigham. “Real-Time Activity Labeling for
                         Deployable Activity Recognition.” Submitted to CSCW 2012. Pervasive 2012 (poster)
University of Rochester Human-Computer Interaction                                            Jeffrey P. Bigham
Legion:Scribe
                  Real-Time Captions by Groups of Non-Experts



University of Rochester Human-Computer Interaction              Jeffrey P. Bigham
Introduction                   VizWiz                 Crowd Agents                   Scribe


                       Real-Time Captioning
         Problem: produce text transcript of speech with less than 5-second latency

          Stenographers                                                  ASR
                   expensive                                             cheap

            difficult to schedule                             available on demand

           lack domain expertise                         Can I
                                                        can be trained for new vocab
                                                         help?
          pretty accurate                                   does not work*
                                                                  NO,
                                                                  you are worse than ASR.
                         * in real settings from an unknown mic with speaker who hasn’t trained the ASR
University of Rochester Human-Computer Interaction                                        Jeffrey P. Bigham
Introduction                   VizWiz                   Crowd Agents                     Scribe


                      Real-Time Captioning




                         W. Lasecki, C. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P.
                         Bigham. “Real-Time Captioning by Groups of Non-Experts.” UIST 2012.
University of Rochester Human-Computer Interaction                                           Jeffrey P. Bigham
Introduction                     VizWiz                      Crowd Agents                      Scribe


                                    Input Mediator
     Multiple Sequence Alignment




     Online Version
                   Stage 1   the     Stage 2    the      Stage 3          the              now        and
       Graph        open               open               open                    file
            Time
                             java               java                      java                 up
      Worker 1 open     the                                file                          now
      Worker 2                        the         java                           fiel
      Worker 3    open    java                                     file             up               and
      Baseline open the java                              file                    now          and


                         W.S. Lasecki, C.D. Miller, D. Borrello and J.P. Bigham. “Online Sequence Alignment
                         for Real-Time Audio Transcription by Non-Experts.” AAAI 2012 (poster).
University of Rochester Human-Computer Interaction                                            Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                              Scribe Interface

 Encourages:
 - real-time input
 - global coverage
 - short sequences



   Co-evolution of
   Interface and
   Algorithm




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                              Coverage Graph




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe


                                        Tradeoff
 Failures:
 “n-factorial” 
 “in pectoral”




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Introduction                  VizWiz                 Crowd Agents    Scribe


                        Interesting Qualities
     • Captionists can be experts
          – not at captioning but in the subject

     • Low cost
          – $30/hour on Mturk                   (did not optimize)

          – or free (impossible before)

     • Recruited on demand
          – for only as long as needed


University of Rochester Human-Computer Interaction                       Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents                Scribe




 Scribe                                               ASR
 Web prefetching is 1 technique that                  A lactate fencing is one thinking that
 ressearchers rely on history based to and            etc. rely on to improve network.
 the non history based technique the                  Phillipe pitching. Anything survived
 downloaded pages will be scanned and all             incident techniques…
 hyperlinks will be…



University of Rochester Human-Computer Interaction                              Jeffrey P. Bigham
Introduction                   VizWiz           Crowd Agents    Scribe


                           Incorporating ASR




                            Coverage Increase: 28% to 55%
                            (single worker case)

University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham
Conclusions
                     General Lessons, Science, and the Future




University of Rochester Human-Computer Interaction              Jeffrey P. Bigham
Introduction                         VizWiz        Crowd Agents       Scribe




                                                  “What would it take for me
                                                  to be proud of my daughter
                                                  being a crowd worker?”
                                                          - Niki Kittur @ CrowdCamp




  Currence Bigham after her first running race.




University of Rochester Human-Computer Interaction                         Jeffrey P. Bigham
Introduction                  VizWiz             Crowd Agents          Scribe




                                     Do Good
                                            Connect to help and support.




                                    Do Better
                                          Do better work than anyone could alone.



University of Rochester Human-Computer Interaction                         Jeffrey P. Bigham
Introduction                  VizWiz             Crowd Agents               Scribe

                                                                 hci.cs.rochester.edu
                                                                         @jeffbigham

              Thanks!




                        Funded by: National Science Foundation Grants (#IIS-1149709, #IIS-
                        1116051, #IIS-1049080 ), and Google.
University of Rochester Human-Computer Interaction                              Jeffrey P. Bigham
Introduction                  VizWiz            Crowd Agents    Scribe




University of Rochester Human-Computer Interaction                  Jeffrey P. Bigham

Weitere ähnliche Inhalte

Andere mochten auch

CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...Search Computing
 
Fundchange and Koodonation Workshop Slides - Nov 23, 2011
Fundchange and Koodonation Workshop Slides - Nov 23, 2011Fundchange and Koodonation Workshop Slides - Nov 23, 2011
Fundchange and Koodonation Workshop Slides - Nov 23, 2011Ideavibes | Paul Dombowsky
 
Volunteer Anywhere
Volunteer AnywhereVolunteer Anywhere
Volunteer AnywhereHelpFromHome
 
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)es712
 
Social Recommendation
Social RecommendationSocial Recommendation
Social Recommendationgu wendong
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationAmit Sharma
 
It Only Takes a Minute
It Only Takes a MinuteIt Only Takes a Minute
It Only Takes a Minuteelliottofhook
 
Answering Search Queries with CrowdSearcher: a crowdsourcing and social netwo...
Answering Search Queries with CrowdSearcher: a crowdsourcing and social netwo...Answering Search Queries with CrowdSearcher: a crowdsourcing and social netwo...
Answering Search Queries with CrowdSearcher: a crowdsourcing and social netwo...Marco Brambilla
 
Robert Rosenthal - Social Media & the 3Rs: Content Strategy Basics for Engagi...
Robert Rosenthal - Social Media & the 3Rs: Content Strategy Basics for Engagi...Robert Rosenthal - Social Media & the 3Rs: Content Strategy Basics for Engagi...
Robert Rosenthal - Social Media & the 3Rs: Content Strategy Basics for Engagi...Social Media for Nonprofits
 
Fundchange Koodonation Social Media for Charities and Non-Profits
Fundchange Koodonation Social Media for Charities and Non-ProfitsFundchange Koodonation Social Media for Charities and Non-Profits
Fundchange Koodonation Social Media for Charities and Non-ProfitsIdeavibes | Paul Dombowsky
 
CSCW 2013 - Investigating the Appropriateness of Social Network Question Aski...
CSCW 2013 - Investigating the Appropriateness of Social Network Question Aski...CSCW 2013 - Investigating the Appropriateness of Social Network Question Aski...
CSCW 2013 - Investigating the Appropriateness of Social Network Question Aski...erinleebrady
 
Introduction to the Social Dimension of Education (gamilla, vinson, sabelo)
Introduction to the Social Dimension of Education (gamilla, vinson, sabelo)Introduction to the Social Dimension of Education (gamilla, vinson, sabelo)
Introduction to the Social Dimension of Education (gamilla, vinson, sabelo)Frezzy Vinson
 
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014Nima Dokoohaki
 
We Are Social's Guide To Building A Connected Strategy
We Are Social's Guide To Building A Connected StrategyWe Are Social's Guide To Building A Connected Strategy
We Are Social's Guide To Building A Connected StrategyWe Are Social Singapore
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systemsguest77b0cd12
 
JUnit - Germán Domínguez
JUnit - Germán DomínguezJUnit - Germán Domínguez
JUnit - Germán Domínguez2008PA2Info3
 

Andere mochten auch (20)

CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
 
Fundchange and Koodonation Workshop Slides - Nov 23, 2011
Fundchange and Koodonation Workshop Slides - Nov 23, 2011Fundchange and Koodonation Workshop Slides - Nov 23, 2011
Fundchange and Koodonation Workshop Slides - Nov 23, 2011
 
Volunteer Anywhere
Volunteer AnywhereVolunteer Anywhere
Volunteer Anywhere
 
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)
 
Social Recommendation
Social RecommendationSocial Recommendation
Social Recommendation
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendation
 
It Only Takes a Minute
It Only Takes a MinuteIt Only Takes a Minute
It Only Takes a Minute
 
Answering Search Queries with CrowdSearcher: a crowdsourcing and social netwo...
Answering Search Queries with CrowdSearcher: a crowdsourcing and social netwo...Answering Search Queries with CrowdSearcher: a crowdsourcing and social netwo...
Answering Search Queries with CrowdSearcher: a crowdsourcing and social netwo...
 
Robert Rosenthal - Social Media & the 3Rs: Content Strategy Basics for Engagi...
Robert Rosenthal - Social Media & the 3Rs: Content Strategy Basics for Engagi...Robert Rosenthal - Social Media & the 3Rs: Content Strategy Basics for Engagi...
Robert Rosenthal - Social Media & the 3Rs: Content Strategy Basics for Engagi...
 
Fundchange Koodonation Social Media for Charities and Non-Profits
Fundchange Koodonation Social Media for Charities and Non-ProfitsFundchange Koodonation Social Media for Charities and Non-Profits
Fundchange Koodonation Social Media for Charities and Non-Profits
 
CSCW 2013 - Investigating the Appropriateness of Social Network Question Aski...
CSCW 2013 - Investigating the Appropriateness of Social Network Question Aski...CSCW 2013 - Investigating the Appropriateness of Social Network Question Aski...
CSCW 2013 - Investigating the Appropriateness of Social Network Question Aski...
 
Introduction to the Social Dimension of Education (gamilla, vinson, sabelo)
Introduction to the Social Dimension of Education (gamilla, vinson, sabelo)Introduction to the Social Dimension of Education (gamilla, vinson, sabelo)
Introduction to the Social Dimension of Education (gamilla, vinson, sabelo)
 
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
 
We Are Social's Guide To Building A Connected Strategy
We Are Social's Guide To Building A Connected StrategyWe Are Social's Guide To Building A Connected Strategy
We Are Social's Guide To Building A Connected Strategy
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
 
A Diva Do Fado
A Diva Do FadoA Diva Do Fado
A Diva Do Fado
 
JUnit - Germán Domínguez
JUnit - Germán DomínguezJUnit - Germán Domínguez
JUnit - Germán Domínguez
 
Final de matematicas cedart
Final de matematicas cedartFinal de matematicas cedart
Final de matematicas cedart
 
Generación Ohlalá
Generación OhlaláGeneración Ohlalá
Generación Ohlalá
 
Soledad Compartida
Soledad CompartidaSoledad Compartida
Soledad Compartida
 

Mehr von Jeffrey Bigham

Augmenting Vision for Accessibility
Augmenting Vision for AccessibilityAugmenting Vision for Accessibility
Augmenting Vision for AccessibilityJeffrey Bigham
 
The Design of Human-Powered Access Technology
The Design of Human-Powered Access TechnologyThe Design of Human-Powered Access Technology
The Design of Human-Powered Access TechnologyJeffrey Bigham
 
WebAnywhere - Experiences with a New Delivery Model for Access Technology
WebAnywhere - Experiences with a New Delivery Model for Access TechnologyWebAnywhere - Experiences with a New Delivery Model for Access Technology
WebAnywhere - Experiences with a New Delivery Model for Access TechnologyJeffrey Bigham
 
Trailblazer: Enabling Blind Web Users to Blaze Trails Through the Web
Trailblazer:  Enabling Blind Web Users to Blaze Trails Through the WebTrailblazer:  Enabling Blind Web Users to Blaze Trails Through the Web
Trailblazer: Enabling Blind Web Users to Blaze Trails Through the WebJeffrey Bigham
 
Webanywhere: A Screen Reader On-the-Go
Webanywhere:  A Screen Reader On-the-GoWebanywhere:  A Screen Reader On-the-Go
Webanywhere: A Screen Reader On-the-GoJeffrey Bigham
 
Transcendence: Enabling A Personal View of the Deep Web
Transcendence:  Enabling A Personal View of the Deep WebTranscendence:  Enabling A Personal View of the Deep Web
Transcendence: Enabling A Personal View of the Deep WebJeffrey Bigham
 
Accessmonkey: Scripting Accessibility
Accessmonkey:  Scripting AccessibilityAccessmonkey:  Scripting Accessibility
Accessmonkey: Scripting AccessibilityJeffrey Bigham
 

Mehr von Jeffrey Bigham (9)

Augmenting Vision for Accessibility
Augmenting Vision for AccessibilityAugmenting Vision for Accessibility
Augmenting Vision for Accessibility
 
Crowd-Powered Dialog
Crowd-Powered DialogCrowd-Powered Dialog
Crowd-Powered Dialog
 
The Design of Human-Powered Access Technology
The Design of Human-Powered Access TechnologyThe Design of Human-Powered Access Technology
The Design of Human-Powered Access Technology
 
WebAnywhere - Experiences with a New Delivery Model for Access Technology
WebAnywhere - Experiences with a New Delivery Model for Access TechnologyWebAnywhere - Experiences with a New Delivery Model for Access Technology
WebAnywhere - Experiences with a New Delivery Model for Access Technology
 
Systems Science
Systems ScienceSystems Science
Systems Science
 
Trailblazer: Enabling Blind Web Users to Blaze Trails Through the Web
Trailblazer:  Enabling Blind Web Users to Blaze Trails Through the WebTrailblazer:  Enabling Blind Web Users to Blaze Trails Through the Web
Trailblazer: Enabling Blind Web Users to Blaze Trails Through the Web
 
Webanywhere: A Screen Reader On-the-Go
Webanywhere:  A Screen Reader On-the-GoWebanywhere:  A Screen Reader On-the-Go
Webanywhere: A Screen Reader On-the-Go
 
Transcendence: Enabling A Personal View of the Deep Web
Transcendence:  Enabling A Personal View of the Deep WebTranscendence:  Enabling A Personal View of the Deep Web
Transcendence: Enabling A Personal View of the Deep Web
 
Accessmonkey: Scripting Accessibility
Accessmonkey:  Scripting AccessibilityAccessmonkey:  Scripting Accessibility
Accessmonkey: Scripting Accessibility
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 

Kürzlich hochgeladen (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 

Crowd Agents: Interactive Crowd-Powered Systems in the Real World

  • 1. Crowd Agents Interactive Crowd-Powered Systems in the Real World Jeffrey P. Bigham University of Rochester University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 2. Crowd Agents Interactive Crowd-Powered Systems in the Real World Jeffrey P. Bigham University of Rochester University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 3. Introduction VizWiz Crowd Agents Scribe University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 4. Introduction VizWiz Crowd Agents Scribe Human Assistance in History What the Disability Community Can Teach Us About Interactive Crowdsourcing. Jeffrey P. Bigham and Richard Ladner. Iteractions magazine. July 2011. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 5. Introduction VizWiz Crowd Agents Scribe Connectivity University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 6. Introduction VizWiz Crowd Agents Scribe University of Rochester Human-Computer Interaction Courtesy Jeffrey P. Brabyn of John Bigham
  • 7. Introduction VizWiz Crowd Agents Scribe Remote Assistance Video Relay Services Real-time Captioning University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 8. Introduction VizWiz Crowd Agents Scribe Connectivity -> Crowd Mechanical Turk Friends and Family on Social Networks University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 9. Introduction VizWiz Crowd Agents Scribe VizWiz Bigham et al. Nearly Real-Time Answers to Visual Questions. UIST 2010. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 10. Introduction VizWiz Crowd Agents Scribe Access Technology • Optical Character Recognition • Color Recognizers • Talking GPS • … Problems 1. Limited Scope 2. Unacceptable Error Rate 3. $$$ 4. Not Exactly What Users Want University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 11. Introduction VizWiz Crowd Agents Scribe Releasing VizWiz • Released on May 31, 2011 – 5000 users asked more than 50,000 questions – answers in less than a minute University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 12. Introduction VizWiz Crowd Agents Scribe Recruiting Crowd Quickly How many workers do we need? - number of current workers - likelihood of needing more workers Post jobs or remove jobs Turkers answer multiple questions Turkit For $4/hr goes down to under 30s from start to finish. quikturkit.googlecode.com Bigham et al. Nearly Real-Time Answers to Visual Questions. UIST 2010. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 13. Introduction VizWiz Crowd Agents Scribe Characterization of the Crowd - Workers Come and Go - Some May Do the Wrong Thing University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 14. Introduction VizWiz Crowd Agents Scribe Supporting a Continuous Interaction? Where’s the coffee? Walk to end of this hall, turn right. Turn right into the kitchen. Where’s the Soda on left, coffee on the right How do I use this machine? coffee? University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 15. Introduction VizWiz Crowd Agents Scribe Model for Crowd Agents University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 16. Introduction VizWiz Crowd Agents Scribe Model for Crowd Agents Input Mediation Learning University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 17. Introduction VizWiz Crowd Agents Scribe Model for Crowd Agents • What interface is being controlled? • How is input mediation done? • Role of automated agents? University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 18. Introduction VizWiz Crowd Agents Scribe Chorus University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 19. Introduction VizWiz Crowd Agents Scribe University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 20. Introduction VizWiz Crowd Agents Scribe Legion: Control of Any Interface Input Media on Legion Server - video stream Flash Media Server - - task description Input Mediators - - crowd agreement/payment info - video stream quikTurkit - - task description - worker input (key presses, mouse clicks) Worker Interface - mediated input Legion Client 250 8/10 8/10 200 10/10 Explanation of Time (sec) controls, and feedback 150 regarding current bonus level (tied to crowd agreement). 10/10 100 4/10 50 0 Feedback reflecting worker’s Solo Mob Vote Active Leader last key press, and whether the interface last followed multiple workers the crowd or the worker. W. Lasecki, S. White, K. Murray, R. Miller, and J.P. Bigham “Real-Time Control of Existing Interfaces.” UIST 2011. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 21. Introduction VizWiz Crowd Agents Scribe University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 22. Introduction VizWiz Crowd Agents Scribe Crowd Memory W.S. Lasecki, S.C. White, K.I. Murray and J.P. Bigham. “Crowd Memory: Learning in the Collective.” Collective Intelligence 2012. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 23. Introduction VizWiz Crowd Agents Scribe Crowd Memory University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 24. Introduction VizWiz Crowd Agents Scribe Deployable Activity Recognition W.S. Lasecki, Y. Song, H. Kautz, and J.P. Bigham. “Real-Time Activity Labeling for Deployable Activity Recognition.” Submitted to CSCW 2012. Pervasive 2012 (poster) University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 25. Legion:Scribe Real-Time Captions by Groups of Non-Experts University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 26. Introduction VizWiz Crowd Agents Scribe Real-Time Captioning Problem: produce text transcript of speech with less than 5-second latency Stenographers ASR expensive cheap difficult to schedule available on demand lack domain expertise Can I can be trained for new vocab help? pretty accurate does not work* NO, you are worse than ASR. * in real settings from an unknown mic with speaker who hasn’t trained the ASR University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 27. Introduction VizWiz Crowd Agents Scribe Real-Time Captioning W. Lasecki, C. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P. Bigham. “Real-Time Captioning by Groups of Non-Experts.” UIST 2012. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 28. Introduction VizWiz Crowd Agents Scribe Input Mediator Multiple Sequence Alignment Online Version Stage 1 the Stage 2 the Stage 3 the now and Graph open open open file Time java java java up Worker 1 open the file now Worker 2 the java fiel Worker 3 open java file up and Baseline open the java file now and W.S. Lasecki, C.D. Miller, D. Borrello and J.P. Bigham. “Online Sequence Alignment for Real-Time Audio Transcription by Non-Experts.” AAAI 2012 (poster). University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 29. Introduction VizWiz Crowd Agents Scribe Scribe Interface Encourages: - real-time input - global coverage - short sequences Co-evolution of Interface and Algorithm University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 30. Introduction VizWiz Crowd Agents Scribe Coverage Graph University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 31. Introduction VizWiz Crowd Agents Scribe Tradeoff Failures: “n-factorial”  “in pectoral” University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 32. Introduction VizWiz Crowd Agents Scribe Interesting Qualities • Captionists can be experts – not at captioning but in the subject • Low cost – $30/hour on Mturk (did not optimize) – or free (impossible before) • Recruited on demand – for only as long as needed University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 33. Introduction VizWiz Crowd Agents Scribe Scribe ASR Web prefetching is 1 technique that A lactate fencing is one thinking that ressearchers rely on history based to and etc. rely on to improve network. the non history based technique the Phillipe pitching. Anything survived downloaded pages will be scanned and all incident techniques… hyperlinks will be… University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 34. Introduction VizWiz Crowd Agents Scribe Incorporating ASR Coverage Increase: 28% to 55% (single worker case) University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 35. Conclusions General Lessons, Science, and the Future University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 36. Introduction VizWiz Crowd Agents Scribe “What would it take for me to be proud of my daughter being a crowd worker?” - Niki Kittur @ CrowdCamp Currence Bigham after her first running race. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 37. Introduction VizWiz Crowd Agents Scribe Do Good Connect to help and support. Do Better Do better work than anyone could alone. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 38. Introduction VizWiz Crowd Agents Scribe hci.cs.rochester.edu @jeffbigham Thanks! Funded by: National Science Foundation Grants (#IIS-1149709, #IIS- 1116051, #IIS-1049080 ), and Google. University of Rochester Human-Computer Interaction Jeffrey P. Bigham
  • 39. Introduction VizWiz Crowd Agents Scribe University of Rochester Human-Computer Interaction Jeffrey P. Bigham

Hinweis der Redaktion

  1. Hi everyone,I’m Jeff from the University of Rochester.Over the past few years, we have been working on crowd-powered systems designed to be used in the real world, to help real people solve everyday problems.Today, I’ll tell you about some of those systems, general lessons I think we can take away from them, and how users and workers have reacted to interacting with them on real tasks.
  2. But, before I do, I want to take a bit of a step back, in order to place our work in the context of history, and to set a foundation for my vision for the future.Since the earliest days of computer science, computer scientists have dreamed of a future world in which we work seamlessly with machines to get things done in the real world.The AI and HCI communities have in particular taken up this challenge, with slightly different focuses, but with what I believe to be often similar end goals.What I’m excited about is that I believe we’re finally at a point where we can actually build the intelligent interactive agents of our dreams.A big part of how we’ll build them is real-time human computation, which I believe requires a tight coupling of AI and HCI.
  3. A lot of my research is in building applications targeted at helping people with disabilities, and nowhere is the long history of human assistance as readily apparent as it is there.Peopleprovide one another assistance every day.Volunteers may go to a blind person’s home to read her mail, sign language interpreters help ensure education is available to deaf students, and friends help people with physical disabilities get around.This has been true forever – what has changed is connectivity.
  4. Connectivity means that wherever I am, whatever I need, I can now easily recruit a person to help me with it.I needn’t rely on having someone nearby or technology that is itself intelligent enough to help me.
  5. And, people with disabilities were some of the first to leverage what we today might call human computation.This sketch from the early 90s, illustrates a service developed by the Smith-Kettlewell Eye Institute, in which a blind person has scanned a frozen dinner and is talking to a remote supporter to find out more about it.I especially like this picture because the blind person is being assisted remotely by a person in a wheelchair.
  6. As technology improved, so did the services available to people with disabilities.By 2000 or so, deaf people were connecting to video relay services that allowed them to sign to hearing folks on the phone, and they connected to remote real-time captionists who could convert speech to written text.These were huge advances, but because they required experts who need to be available for a long time, they are very expensive – in the range of $100 to $200 an hour.
  7. I’m excited because increasing connectivity now means that anyone can help – workers on Mechanical Turk, Volunteers, and Friends and Family.Potentially, making the market for assistance much more elastic.
  8. A few years ago, we explored this potential through an iPhone application that we developed called VizWiz.VizWizlets blind people take a picture, speak a question, and get an answer back in a few seconds from people out on the web.
  9. There is already a lot of great access technology that serves as sensors onto an inaccessible world for people with disabilities.OCR recognizes text, color recognizers can help people coordinate outfits, and talking GPS units can help people find their way.Unfortunately, despite its promise – this technology remains limited in the scope of problems it can reliably solve, still has unacceptable error rates for real applications. The technology is expensive, costing 100s to 1000s of dollars, and in the end often isn’t exactly what users would want anyway.In fact, we as technologists often don’t really know what users really want.
  10. And, so in course to running what called a deployable Wizard-of-Oz experiment, we released VizWiz on the app store about a year ago to pretty dramatic results – 5000 users have asked more than 50,000 questions.This provides us an unprecedented look at what blind people might actually want to know about their visual environments.
  11. So, how do we get answers back quickly for VizWiz?On the backend, we run a service called quikTurkit. The goal of quikTurkit is to keep workers around to answer questions. It can either be used on demand (when a question is received, or keep a pool of workers around at all times to farther reduce latency). To help improve on-demand response times, the VizWiz application lets quikTurkit know when someone has started to interact with it (aka, took a picture), so it can begin recruiting workers.An interesting result that came from our initial work is that time to answer is very much dependent on how difficult the work is to do – in this case, VizWiz questions are all answered pretty quickly, but they are answered most quickly when the question can actually be answered from the photo and the question could be automatically converted to text using speech recognition.It turns out keep a steady pool around isn’t that expensive, and doing so farther reduces latency to under 30 seconds from when a question is received to when an answer is sent.
  12. Our experience with VizWiz led us to characterize the crowd that is easy to recruit online as follows:The crowd is dynamic, which means that workers come and go.And, some workers may do the wrong thing.
  13. So, given that characterization of the crowd, imagine that we wanted to support a richer, continuous interaction like this one.How could that work with the crowds that we have?
  14. We could imagine recruiting a single worker from the crowd, who could chat with the user much like they would on IM. This has definite advantages.For instance, by using existing interfaces, we can leverage all that we know about making these usable, and we can leverage the experience that people have using them – turkers know how to use instant messenger, and so do blind people.But, doing this naively fails under our model of the crowd – in particular, what if a worker provides bad input, or what if a worker disappears entirely.To accommodate for this, we add in more workers, all controlling IM as they know how to do.But, now we have another problem – the user’s interaction is not what they’re accustomed to – namely, they’re being expected to hold multiple conversations at once.
  15. To address this we introduce an input mediation layer that takes all the input that it receives, and condenses it to a single stream that is forwarded on.This layer could be powered by an automatic algorithm, or also powered by the crowd.We might also introduce learning into the pipeline, so that the system can learn to serve as one of its own workers, thus for instance allowing the crowd to take on the difficult bit of adapting to new environments after which the automatic agents take over.
  16. This model is what we mean by crowd agents – it’s crowd workers acting as one.And so, questions that can be asked to define a particular crowd agent are:What interface is being controlled?How is input mediation done?And, what is the role of automated agents?
  17. Our Chorus system demonstrates how this works for chatting with the crowd agent.Each crowd worker chats in an interface that looks a lot like instant messenger. To maintain consistency, they are provided a space for shared memory.The crowd mediates its own inputs by voting responses through.
  18. This is an example conversation – in this case, the user chats with the crowd agent about a place to eat in Los Angeles.It seems as though this real-time chat is happening with a real person.Behind the scenes, Chorus is making sure that happens. Workers propose messages, and only those that receive enough votes are forwarded on.In our experiments, the crowd agent was able to reliably carry on a conversation with the user, answering nearly all questions in a reasonable way.Even though the crowd is comprised of people, issues like consistency and memory make a Crowd Turing Test something reasonable we might explore in the future.
  19. Legion is another system that we created. In this case we put the crowd agent in control of an existing desktop interface via VNC (remote desktop).Crowd workers send their commands (keys or mouse clicks), and the Legion input mediators decide how to forward them on.The most basic strategy one might try is to divide time into windows and just take a vote – but it turns out this is slow and leads to thrashing. What worked best for us in this case was to use the vote not to decide what to do next, but to elect leaders who would temporarily assume full control.Over a number of trials on different tasks, the leader input mediator showed the best compromise between speed and successful task completion.
  20. Legion can be used for all sorts of tasks –In this example, we used it to copy a table we drew on a whiteboard into a spreadsheet.We even drove a robot about with it, in this case turning a cheap mobile webcam into a robot that followed natural language commands.
  21. We also used Legion to investigate properties of our crowd – specifically, with people coming and going, would the crowd learn from each other?We set up a simple board in a first-person shooter in which players needed to press one of two buttons to progress through the game (either a white or a black button). We told the first generation of crowd workers which button to press, and then let them loose.
  22. Over the course of an hour-long experiment, the crowd completely turned over several times, but they continued to press only the white button, presumably because they were learning from each other.We relate this back to the concept of Organizational Learning, which is one construct that helps to explain how culture and traditions are passed down from generation to generation at organizations ranging in size from families to nations.Of course, the time scales of the crowd are much shorter.
  23. We also created a system for more deployable activity recognition using this model.The idea is that while automated system can do a decent job at recognizing activities, they struggle in new environments or when someone does an activity in a new way.In our system, when the automated system, in this case an HMM-based activity recognizer, is not confident about a label, it sends the video out to the crowd. Each crowd worker inputs activity labels, and other crowd workers serve as input mediator to decide what is forwarded.The labels get sent along with the sensor stream to train the system to work better next time.As an interesting side note, automated suggestions server a dual purpose. Clearly, they can be used directly when they are correct, but they also help tune the crowd to the desired granularity of response – for instance, if the suggested label is “making breakfast”, workers are less likely to suggest and choose lower-level actions like “raising spoon” or “closing bag of cereal.”
  24. The final system I’ll describe is calledLegion Scribe, which allows students to caption speech in real time for deaf and hard of hearing students.
  25. Real-time captioning is the problem of producing a text version of speech with less than 5 seconds latency.Currently, there are two main approaches to real-time captioning, and they both have drawbacks.The first is to employ professional stenographers – they are pretty accurate, but expensive, difficult to schedule, and often lack domain expertise – which makes it difficult to caption advanced technical material.The second is Automatic Speech Recognition – it’s cheap, available on demand, and able to be adapted to new vocab.Unfortunately, despite impressive advances over the past few decades, it does not work…which is only a slight exaggeration, which is to say it does not work in novel contexts, such as when a deaf student showing up to a classroom and pulling out her iPhone.So, that led me to ask whether I could help. I type pretty fast, I know about computer science, maybe I could at least help caption our courses.Unfortunately, I can’t. In fact, by some metrics, I’m worse than ASR because I just can’t type fast enough.
  26. So, we built a system that allows me to help.It’s called Scribe. A traditional stenographer setup looks like this. You stream audio to someone, they type what they hear, and the digital text is forwarded back to you.Unfortunately, if that person is me, I can’t type the 225 wpm or so necessary to type at natural speaking rates.So, instead, we distribute the audio to multiple people, they all type, and then we merge the text they type together to form a single output.Making this work well has two main components – the computer interface side that encourages workers to type what they hear, and to type different parts of the speech. And, the algorithm side, which takes these pieces and stitches them back together.First, the algorithm:
  27. It turns out that our problem is sort of similar to one encountered in computational biology.In particular, in shotgun sequencing, DNA is broken into multiple short strands that can be more easily sequenced. These sequences are then merged back together in order by MSA by computing the best alignment.To use MSA, we replaced the mutation model for nucleotides with a natural language model.MSA is usually an offline procedure. To do alignment online, we perform a greedy search on a dependency graph that we create in which edges join words that appeared next to each other in the crowd input.
  28. Unfortunately, this is only half the story because it turns out the interface design for this task was non-trivial.The task is actually pretty difficult. And, by its nature is frustrating because you really can’t do it perfectly.Our interface encourages real-time input with feedback to captionists, and encourages global coverage by systematically varying the volume of the clip. The algorithm only works well with continuous sequences, and so the interface rewards workers for typing a few words in a row. Each word a worker types is more latent than the last, so the interface stops rewarding workers after about eight word sequences.Scribe required us to carefully consider both the interface and algorithm at once so we could make up for a deficiency in one with the other, and so we describe this process as one of co-evolution.
  29. So, we ran some experiments with the system with a bunch of workers – both local undergrads and turk workers – capturing some technical lectures from courses drawn from MIT X.The first thing you’d want to know is whether our workers can actually even type all of the words that they hear in aggregate.This graph shows they can surpass at about 7 workers, although it’s important to point out that these workers were complete novices.As expected, Scribe quickly outperforms both ASR and a single worker.
  30. Here’s a precision vs. coverage graph – in this case, coverage is roughly recall. We can get pretty close to CART, although metrics in this space are tricky because not all errors are created equally.But, not all errors are created equally. Because of how the computer systems stenographers use to convert phonemes to text work, they often make homonym errors. These errors are compounded when the captionist is not a domain expert.So, for instance, when transcribing an Electrical Engineering lecture, CART transcribed “n-factorial” as “in pectoral” – whereas our workers got it right.
  31. Believe it or not, deaf and hard of hearing people often actually prefer our captions.Here’s a quick video that illustrates one of the reasons why.First, you’ll notice that while our captions aren’t perfect, the errors make much more sense than ASR. This is one reason that even while ASR seems competitive to individuals on automated metrics, in practice it is much worse.
  32. This is one reason why incorporating ASR back into the overall system is difficult.Nevertheless, doing so does increase coverage substantially – from 28% to 55%, showing there is information there that could be leveraged.I’m most excited about the work that we’re beginning that will use real-time crowd captions to train ASR on the fly.
  33. So, I am done with the majority of my talk.But, I want to end with a challenge, a partial solution, and another challenge.
  34. The first challenge is not mine. It comes from NikiKittur at CrowdCamp at CHI this year.He asks, “What would it take for me to be proud of my daughter for being a crowd worker?”And, I think this is a very interesting question. So much of our research in human computation is about how to get the crowd to do work we don’t want to do, how to compensate for the low quality work they often provide, that I think we are missing an enormous opportunity to leverage the crowd to do work that we can be proud of, that we would be proud for our sons and daughters to do, that we would be proud to do ourselves.
  35. So, part of my answer to Niki’s question is to pursue systems that allow us to come together to Do Good. I think VizWiz is a great example of this. Spend a few seconds, and help a blind person go about their day more independently. I would be proud of my daughter for doing that. …  Eventually, I think we can build interactive, crowd-powered systems that provide real value to all of us during our everyday lives.But, I think we can also DO BETTER. One of the reasons why I am excited about Scribe is because it allows me to do something as part of a crowd that I simply could not do alone. Real-time captioning requires motor and cognitive performance at the outer limits of what humans can do.The challenge we are currently pursuing is to better understand the capabilities of crowd agents – both through the development of new applications that leverage them and their potentiallly super-human abilities, and through the development of a basic science of crowd motor and cognitive performance modeled on what we have for individual humans.Collectively, I hope these directions will allow crowdsourcing work to transition from work we don’t want to do, to work we can be proud to do.
  36. The content of this talk is the result of the hard work of a whole bunch of collaborators, some of whom are shown here, and a result of generous funding by the National Science Foundation and Google.