SlideShare a Scribd company logo
1 of 25
Download to read offline
CROWDSOURCING
CONTENT
MANAGEMENT:
CHALLENGES AND
OPPORTUNITIES
ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON
03-Jul-14
LIBER2014
1
EXECUTIVE SUMMARY
Crowdsourcing helps with content
management tasks.
However,
• there is crowdsourcing and crowdsourcing 
pick your faves and mix them
• human intelligence is a valuable resource 
experiment design is key
• sustaining engagement is an art 
crowdsourcing analytics may help
• computers are sometimes better than humans 
the age of ‘social machines’
2
CROWDSOURCING:
PROBLEM SOLVING VIA
OPEN CALLS
"Simply defined, crowdsourcing represents the act of a
company or institution taking a function once performed by
employees and outsourcing it to an undefined (and generally
large) network of people in the form of an open call. This can
take the form of peer-production (when the job is performed
collaboratively), but is also often undertaken by sole
individuals. The crucial prerequisite is the use of the open
call format and the large network of potential .“
[Howe, 2006]
03-Jul-14
3
THE MANY FACES OF
CROWDSOURCING
03-Jul-14
4
CROWDSOURCING AND
RESEARCH LIBRARIES
CHALLENGES
Understand what drives
participation
Design systems to reach
critical mass and sustain
engagement
OPPORTUNITIES
Better ‘customer’ experience
Enhanced information
management
Capitalize on crowdsourced
scientific workflows
03-Jul-14
5
03-Jul-14
Tutorial@ISWC2013
IN THIS TALK:
CROWDSOURCING AS
‚HUMAN COMPUTATION‘
Outsourcing tasks that machines find difficult to solve
to humans
6
IN THIS TALK:
CROWDSOURCING DATA
CITATION
‘The USEWOD experiment ‘
• Goal: collect information about
the usage of Linked Data sets
in research papers
• Explore different crowdsourcing
methods
• Online tool to link publications
to data sets (and their versions)
• 1st feasibility study with 10
researchers in May 2014
03-Jul-14
7
http://prov.usewod.org/
9650 publications
03-Jul-14
8
DIMENSIONS OF CROWDSOURCING
DIMENSIONS OF CROWDSOURCING
WHAT IS
OUTSOURCED
Tasks based on
human skills not
easily replicable by
machines
• Visual recognition
• Language
understanding
• Knowledge acquisition
• Basic human
communication
• ...
WHO IS THE CROWD
• Open call (crowd
accessible through a
platform)
• Call may target
specific skills and
expertise
(qualification tests)
• Requester typically
knows less about the
‘workers’ than in other
‘work’ environments
03-Jul-14
9
See also [Quinn & Bederson, 2012]
DIMENSIONS OF CROWDSOURCING (2)
HOW IS THE TASK OUTSOURCED
• Explicit vs. implicit participation
• Tasks broken down into smaller units
undertaken in parallel by different people
• Coordination required to handle cases with
more complex workflows
• Partial or independent answers consolidated
and aggregated into complete solution
03-Jul-14
10
See also [Quinn & Bederson, 2012]
EXAMPLE: CITIZEN SCIENCE
WHAT IS OUTSOURCED
• Object recognition, labeling,
categorization in media content
WHO IS THE CROWD
• Anyone
HOW IS THE TASK
OUTSOURCED
• Highly parallelizable tasks
• Every item is handled by multiple
annotators
• Every annotator provides an answer
• Consolidated answers solve scientific
problems
03-Jul-14
11
USEWOD EXPERIMENT: TASK
AND CROWD
WHAT IS
OUTSOURCED
Annotating research papers
with data set information
• Alternative
representations of the
domain
• What if the paper is not
available?
• What if the domain is not
known in advance or is
infinite?
• Do we know the list of
potential answers?
• Is there only one correct
solution to each atomic
task?
• How many people would
solve the same task?
WHO IS THE CROWD
• People who know the
papers or the data sets
• Experts in the (broader )
field
• Casual gamers
• Librarians
• Anyone (knowledgeable
of English, with a
computer/cell phone…)
• Combinations thereof…
03-Jul-14
12
USEWOD EXPERIMENT: TASK
DESIGN
HOW IS THE TASK OUTSOURCED:
ALTERNATIVE MODELS
• Use the data collected here to train a IE algorithm
• Use paid microtask workers to go a first screening, then
expert crowd to sort out challenging cases
• What if you have very long documents potentially
mentioning different/unknown data sets?
• Competition via Twitter
• ‘Which version of DBpedia does this paper use?’
• One question a day, prizes
• Needs golden standard to bootstrap and redundancy
• Involve the authors
• Use crowdsourcing to find out Twitter accounts, then launch campaign
on Twitter
• Write an email to the authors…
• Change the task
• Which papers use Dbpedia 3.X?
• Competition to find all papers
03-Jul-14
13
DIMENSIONS OF CROWDSOURCING (3)
HOW ARE THE
RESULTS VALIDATED
• Solutions space closed
vs. open
• Performance
measurements/ground
truth
• Statistical techniques
employed to predict
accurate solutions
• May take into account
confidence values of
algorithmically
generated solutions
HOW CAN THE
PROCESS BE
OPTIMIZED
• Incentives and
motivators
• Assigning tasks to
people based on their
skills and performance
(as opposed to random
assignments)
• Symbiotic
combinations of
human- and machine-
driven computation,
including combinations
of different forms of
crowdsourcing
03-Jul-14
14
See also [Quinn & Bederson, 2012]
USEWOD EXPERIMENT:
VALIDATION
• Domain is fairly restricted
• Spam and obvious wrong answers can be detected easily
• When are two answers the same? Can there be more
than one correct answer per question?
• Redundancy may not be the final answer
• Most people will be able to identify the data set, but
sometimes the actual version is not trivial to reproduce
• Make educated version guess based on time intervals
and other features
03-Jul-14
15
ALIGNING INCENTIVES
IS ESSENTIAL
Successful volunteer crowdsourcing is difficult to
predict or replicate
• Highly context-specific
• Not applicable to arbitrary tasks
Reward models often easier to study and control (if
performance can be reliably measured)
• Different models: pay-per-time, pay-per-unit, winner-
takes-it-all
• Not always easy to abstract from social aspects (free-
riding, social pressure)
• May undermine intrinsic motivation
16
IT‘S NOT ALWAYS
JUST ABOUT MONEY
03-Jul-14
17
http://www.crowdsourcing.org/editorial/how-to-motivate-the-crowd-infographic/
http://www.oneskyapp.com/blog/tips-to-motivate-participants-of-crowdsourced-
translation/
[Source: Kaufmann,
Schulze, Viet, 2011]
[Source: Ipeirotis, 2008]
CROWDSOURCING
ANALYTICS
03-Jul-14
18
0
2
4
6
8
10
12
14
16
18
20
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Activeusersin%
Month since registration
See also [Luczak-Rösch et al. 2014]
USEWOD EXPERIMENT:
OTHER INCENTIVES
MODELS
• Who benefits from the results
• Who owns the results
• Twitter-based contest
• ‘Which version of DBpedia does this paper use?’
• One question a day, prizes
• If question is not answered correctly, increase the prize
• If low participation, re-focus the audience or change the
incentive.
• Altruism: for each ten papers annotated we send a
student to ESWC…
03-Jul-14
19
[Source: Nature.com]
DIFFERENT CROWDS FOR
DIFFERENT TASKS
Contest
Linked Data experts
Difficult task
Final prize
Find Verify
Microtasks
Workers
Easy task
Micropayments
TripleCheckMate
[Kontoskostas2013] MTurk
http://mturk.com
See also [Acosta et al., 2013]
20
Not sure
COMBINING HUMAN AND
COMPUTATIONAL INTELLIGENCE
EXAMPLE: BIBLIOGRAPHIC DATA
INTEGRATION
21
paper conf
Data integration VLDB-01
Data mining SIGMOD-02
title author email
OLAP Mike mike@a
Social media Jane jane@b
Generate plausible matches
– paper = title, paper = author, paper = email, paper = venue
– conf = title, conf = author, conf = email, conf = venue
Ask users to verify
paper conf
Data integration VLDB-01
Data mining SIGMOD-02
title author email venue
OLAP Mike mike@a ICDE-02
Social media Jane jane@b PODS-05
Does attribute paper match attribute author?
NoYes
See also [McCann, Shen, Doan, 2008]
03-Jul-14
22
SUMMARY AND FINAL REMARKS
[Source: Dave de Roure]
SUMMARY
• There is crowdsourcing and
crowdsourcing  pick your faves
and mix them
• Human intelligence is a valuable
resource  experiment design is
key
• Sustaining engagement is an art 
crowdsourcing analytics may help
• Computers are sometimes better
than humans  the age of ‘social
machines’
03-Jul-14
23
THE AGE OF SOCIAL
MACHINES
03-Jul-14
24
E.SIMPERL@SOTON.AC.UK
@ESIMPERL
WWW.SOCIAM.ORG
WWW.PLANET-DATA.EU
THANKS TO MARIBEL ACOSTA, LAURA
DRAGAN, MARKUS LUCZAK-RÖSCH, RAMINE
TINATI, AND MANY OTHERS
03-Jul-14
25

More Related Content

Similar to Crowdsourcing for research libraries

Guided Interaction: Rethinking the Query-Result Paradigm
Guided Interaction: Rethinking the Query-Result ParadigmGuided Interaction: Rethinking the Query-Result Paradigm
Guided Interaction: Rethinking the Query-Result Paradigm
arnabdotorg
 
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
ijtsrd
 
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
ijtsrd
 

Similar to Crowdsourcing for research libraries (20)

Fundamentals of human computation
Fundamentals of human computationFundamentals of human computation
Fundamentals of human computation
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentives
 
Embracing student innovation in the age of Generative AI
Embracing student innovation in the age of Generative AIEmbracing student innovation in the age of Generative AI
Embracing student innovation in the age of Generative AI
 
Social Network Analysis Applications and Approach
Social Network Analysis Applications and ApproachSocial Network Analysis Applications and Approach
Social Network Analysis Applications and Approach
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to Tools
 
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
 
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big BrainsLearning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
 
User experience at Imperial: a case study of qualitative approaches to Primo ...
User experience at Imperial: a case study of qualitative approaches to Primo ...User experience at Imperial: a case study of qualitative approaches to Primo ...
User experience at Imperial: a case study of qualitative approaches to Primo ...
 
Keynote Sally Jordan - Computer-based assessment friend or foe? - OWD14
Keynote Sally Jordan - Computer-based assessment friend or foe? - OWD14Keynote Sally Jordan - Computer-based assessment friend or foe? - OWD14
Keynote Sally Jordan - Computer-based assessment friend or foe? - OWD14
 
Guided Interaction: Rethinking the Query-Result Paradigm
Guided Interaction: Rethinking the Query-Result ParadigmGuided Interaction: Rethinking the Query-Result Paradigm
Guided Interaction: Rethinking the Query-Result Paradigm
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Qs1 group a
Qs1 group a Qs1 group a
Qs1 group a
 
UX Burlington 2017: Exploratory Research in UX Design
UX Burlington 2017: Exploratory Research in UX DesignUX Burlington 2017: Exploratory Research in UX Design
UX Burlington 2017: Exploratory Research in UX Design
 
Jisc Academic Networking
Jisc Academic NetworkingJisc Academic Networking
Jisc Academic Networking
 
Evaluation and User Study in HCI
Evaluation and User Study in HCIEvaluation and User Study in HCI
Evaluation and User Study in HCI
 
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
 
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
Comparative Study of Different Approaches for Measuring Difficulty Level of Q...
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Tell me what you want and I’ll show you what you can have: who drives design ...
Tell me what you want and I’ll show you what you can have: who drives design ...Tell me what you want and I’ll show you what you can have: who drives design ...
Tell me what you want and I’ll show you what you can have: who drives design ...
 
AICS_2016_paper_9
AICS_2016_paper_9AICS_2016_paper_9
AICS_2016_paper_9
 

More from Elena Simperl

One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Elena Simperl
 

More from Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 

Recently uploaded

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Crowdsourcing for research libraries

  • 2. EXECUTIVE SUMMARY Crowdsourcing helps with content management tasks. However, • there is crowdsourcing and crowdsourcing  pick your faves and mix them • human intelligence is a valuable resource  experiment design is key • sustaining engagement is an art  crowdsourcing analytics may help • computers are sometimes better than humans  the age of ‘social machines’ 2
  • 3. CROWDSOURCING: PROBLEM SOLVING VIA OPEN CALLS "Simply defined, crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. This can take the form of peer-production (when the job is performed collaboratively), but is also often undertaken by sole individuals. The crucial prerequisite is the use of the open call format and the large network of potential .“ [Howe, 2006] 03-Jul-14 3
  • 4. THE MANY FACES OF CROWDSOURCING 03-Jul-14 4
  • 5. CROWDSOURCING AND RESEARCH LIBRARIES CHALLENGES Understand what drives participation Design systems to reach critical mass and sustain engagement OPPORTUNITIES Better ‘customer’ experience Enhanced information management Capitalize on crowdsourced scientific workflows 03-Jul-14 5
  • 6. 03-Jul-14 Tutorial@ISWC2013 IN THIS TALK: CROWDSOURCING AS ‚HUMAN COMPUTATION‘ Outsourcing tasks that machines find difficult to solve to humans 6
  • 7. IN THIS TALK: CROWDSOURCING DATA CITATION ‘The USEWOD experiment ‘ • Goal: collect information about the usage of Linked Data sets in research papers • Explore different crowdsourcing methods • Online tool to link publications to data sets (and their versions) • 1st feasibility study with 10 researchers in May 2014 03-Jul-14 7 http://prov.usewod.org/ 9650 publications
  • 9. DIMENSIONS OF CROWDSOURCING WHAT IS OUTSOURCED Tasks based on human skills not easily replicable by machines • Visual recognition • Language understanding • Knowledge acquisition • Basic human communication • ... WHO IS THE CROWD • Open call (crowd accessible through a platform) • Call may target specific skills and expertise (qualification tests) • Requester typically knows less about the ‘workers’ than in other ‘work’ environments 03-Jul-14 9 See also [Quinn & Bederson, 2012]
  • 10. DIMENSIONS OF CROWDSOURCING (2) HOW IS THE TASK OUTSOURCED • Explicit vs. implicit participation • Tasks broken down into smaller units undertaken in parallel by different people • Coordination required to handle cases with more complex workflows • Partial or independent answers consolidated and aggregated into complete solution 03-Jul-14 10 See also [Quinn & Bederson, 2012]
  • 11. EXAMPLE: CITIZEN SCIENCE WHAT IS OUTSOURCED • Object recognition, labeling, categorization in media content WHO IS THE CROWD • Anyone HOW IS THE TASK OUTSOURCED • Highly parallelizable tasks • Every item is handled by multiple annotators • Every annotator provides an answer • Consolidated answers solve scientific problems 03-Jul-14 11
  • 12. USEWOD EXPERIMENT: TASK AND CROWD WHAT IS OUTSOURCED Annotating research papers with data set information • Alternative representations of the domain • What if the paper is not available? • What if the domain is not known in advance or is infinite? • Do we know the list of potential answers? • Is there only one correct solution to each atomic task? • How many people would solve the same task? WHO IS THE CROWD • People who know the papers or the data sets • Experts in the (broader ) field • Casual gamers • Librarians • Anyone (knowledgeable of English, with a computer/cell phone…) • Combinations thereof… 03-Jul-14 12
  • 13. USEWOD EXPERIMENT: TASK DESIGN HOW IS THE TASK OUTSOURCED: ALTERNATIVE MODELS • Use the data collected here to train a IE algorithm • Use paid microtask workers to go a first screening, then expert crowd to sort out challenging cases • What if you have very long documents potentially mentioning different/unknown data sets? • Competition via Twitter • ‘Which version of DBpedia does this paper use?’ • One question a day, prizes • Needs golden standard to bootstrap and redundancy • Involve the authors • Use crowdsourcing to find out Twitter accounts, then launch campaign on Twitter • Write an email to the authors… • Change the task • Which papers use Dbpedia 3.X? • Competition to find all papers 03-Jul-14 13
  • 14. DIMENSIONS OF CROWDSOURCING (3) HOW ARE THE RESULTS VALIDATED • Solutions space closed vs. open • Performance measurements/ground truth • Statistical techniques employed to predict accurate solutions • May take into account confidence values of algorithmically generated solutions HOW CAN THE PROCESS BE OPTIMIZED • Incentives and motivators • Assigning tasks to people based on their skills and performance (as opposed to random assignments) • Symbiotic combinations of human- and machine- driven computation, including combinations of different forms of crowdsourcing 03-Jul-14 14 See also [Quinn & Bederson, 2012]
  • 15. USEWOD EXPERIMENT: VALIDATION • Domain is fairly restricted • Spam and obvious wrong answers can be detected easily • When are two answers the same? Can there be more than one correct answer per question? • Redundancy may not be the final answer • Most people will be able to identify the data set, but sometimes the actual version is not trivial to reproduce • Make educated version guess based on time intervals and other features 03-Jul-14 15
  • 16. ALIGNING INCENTIVES IS ESSENTIAL Successful volunteer crowdsourcing is difficult to predict or replicate • Highly context-specific • Not applicable to arbitrary tasks Reward models often easier to study and control (if performance can be reliably measured) • Different models: pay-per-time, pay-per-unit, winner- takes-it-all • Not always easy to abstract from social aspects (free- riding, social pressure) • May undermine intrinsic motivation 16
  • 17. IT‘S NOT ALWAYS JUST ABOUT MONEY 03-Jul-14 17 http://www.crowdsourcing.org/editorial/how-to-motivate-the-crowd-infographic/ http://www.oneskyapp.com/blog/tips-to-motivate-participants-of-crowdsourced- translation/ [Source: Kaufmann, Schulze, Viet, 2011] [Source: Ipeirotis, 2008]
  • 18. CROWDSOURCING ANALYTICS 03-Jul-14 18 0 2 4 6 8 10 12 14 16 18 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Activeusersin% Month since registration See also [Luczak-Rösch et al. 2014]
  • 19. USEWOD EXPERIMENT: OTHER INCENTIVES MODELS • Who benefits from the results • Who owns the results • Twitter-based contest • ‘Which version of DBpedia does this paper use?’ • One question a day, prizes • If question is not answered correctly, increase the prize • If low participation, re-focus the audience or change the incentive. • Altruism: for each ten papers annotated we send a student to ESWC… 03-Jul-14 19 [Source: Nature.com]
  • 20. DIFFERENT CROWDS FOR DIFFERENT TASKS Contest Linked Data experts Difficult task Final prize Find Verify Microtasks Workers Easy task Micropayments TripleCheckMate [Kontoskostas2013] MTurk http://mturk.com See also [Acosta et al., 2013] 20
  • 21. Not sure COMBINING HUMAN AND COMPUTATIONAL INTELLIGENCE EXAMPLE: BIBLIOGRAPHIC DATA INTEGRATION 21 paper conf Data integration VLDB-01 Data mining SIGMOD-02 title author email OLAP Mike mike@a Social media Jane jane@b Generate plausible matches – paper = title, paper = author, paper = email, paper = venue – conf = title, conf = author, conf = email, conf = venue Ask users to verify paper conf Data integration VLDB-01 Data mining SIGMOD-02 title author email venue OLAP Mike mike@a ICDE-02 Social media Jane jane@b PODS-05 Does attribute paper match attribute author? NoYes See also [McCann, Shen, Doan, 2008]
  • 22. 03-Jul-14 22 SUMMARY AND FINAL REMARKS [Source: Dave de Roure]
  • 23. SUMMARY • There is crowdsourcing and crowdsourcing  pick your faves and mix them • Human intelligence is a valuable resource  experiment design is key • Sustaining engagement is an art  crowdsourcing analytics may help • Computers are sometimes better than humans  the age of ‘social machines’ 03-Jul-14 23
  • 24. THE AGE OF SOCIAL MACHINES 03-Jul-14 24
  • 25. E.SIMPERL@SOTON.AC.UK @ESIMPERL WWW.SOCIAM.ORG WWW.PLANET-DATA.EU THANKS TO MARIBEL ACOSTA, LAURA DRAGAN, MARKUS LUCZAK-RÖSCH, RAMINE TINATI, AND MANY OTHERS 03-Jul-14 25