Crowdsourcing Linked Data management

HUMAN
COMPUTATION IN THE
LINKED DATA
MANAGEMENT LIFE
CYCLE
ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON
7/18/2013
1st PRELIDA workshop
1

HUMAN
COMPUTATION
Outsourcing tasks that machines find difficult to solve to
humans (accuracy, efficiency, costs)

SEMANTIC TECHNOLOGIES
ARE ALL ABOUT
AUTOMATION
…but many tasks rely
on human input
• Modeling a domain
• Integrating data sources
originating from different
contexts
• Producing semantic
markup for various types of
digital artifacts
• ...
3

DIMENSIONS OF HUMAN
COMPUTATION SYSTEMS
What
Tasks that
require basic
human skills
How
Distribution
Coordination
Aggregation
Quality
Closed vs
open
answers
Ground truth
Quantitative
vs qualitative
Who is the
evaluator?
Optimize!
Incentives
Reduce
problem size
Task
assignment
7/18/2013
4

GAMES WITH A
PURPOSE (GWAP)
Human computation disguised as casual games
Tasks are divided into parallelizable atomic units
(challenges) solved (consensually) by players
Game models
• Single vs. multi-player
• Selection agreement vs. input agreement vs. inversion-
problem games
7/18/2013
5

MICROTASK
CROWDSOURCING
Similar types of tasks, but different incentives model
(monetary reward, PPP)
Successfully applied to transcription, classification, and
content generation, data collection, image tagging, website
feedback, usability tests…
7/18/2013
6

THE SAME, BUT
DIFFERENT
• Tasks leveraging common human skills, appealing to large
audiences
• Selection of domain and task more constrained in games to
create typical UX
• Tasks decomposed into smaller units of work to be solved
independently
• Complex workflows
• Creating a casual game experience vs. patterns in microtasks
• Quality assurance
• Synchronous interaction in games
• Levels of difficulty and near-real-time feedback in games
• Many methods applied in both cases (redundancy, votes,
statistical techniques)
• Different set of incentives and motivators
7/18/2013
7

Physical World
(people and devices)
HYBRID SYSTEMS
Design and
composition
Participation and
data supply
Model of social interaction
Virtual world
(Network of
social interactions)
Dave Robertson

Not sure
EXAMPLE: HYBRID DATA
INTEGRATION
paper conf
Data integration VLDB-01
Data mining SIGMOD-02
title author email
OLAP Mike mike@a
Social media Jane jane@b
Generate plausible matches
– paper = title, paper = author, paper = email, paper = venue
– conf = title, conf = author, conf = email, conf = venue
Ask users to verify
paper conf
Data integration VLDB-01
Data mining SIGMOD-02
title author email venue
OLAP Mike mike@a ICDE-02
Social media Jane jane@b PODS-05
Does attribute paper match attribute author?
NoYes
[McCann, Shen, Doan, ICDE 2008]
9

EXAMPLES FROM
THE LINKED DATA
WORLD
ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON, UK
7/18/2013
10

WHAT IS DIFFERENT ABOUT
SEMANTIC SYSTEMS?
Semantic Web tools vs.
applications
• Intelligent (specialized) Web
sites (portals) with improved
(local) search based on
vocabularies and ontologies
• X2X integration (often
combined with Web services)
• Knowledge representation,
communication and exchange
7/18/2013

TASKS NAMED IN
METHODOLOGIES ARE TOO HIGH-
LEVEL
Crowdsource very specific tasks that
are (highly) divisible
• Labeling (in different languages)
• Finding relationships
• Populating the ontology
• Aligning and interlinking
• Ontology-based annotation
• Validating the results of automatic
methods
• …
Think about the context of the
application (social structure) and about
how to hide tasks behind existing
practices and tools
12
7/18/2013
Tutorial@ESWC2013

TASTE IT! TRY IT!
• Restaurant review Android app developed in the Insemtives project
• Uses Dbpedia concepts to generate structured reviews
• Uses mechanism design/gamification to configure incentives
• User study
• 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts
7/18/2013
13
https://play.google.com/store/apps/details?id=insemtives.android&hl=en
0
500
1000
1500
2000
2500
CAFE FASTFOOD PUB RESTAURANT
Numer of reviews
Number of semantic annotations (type of cuisine)
Number of semantic annotations (dishes)

LODREFINE
7/18/2013
14
http://research.zemanta.com/crowds-to-the-rescue/

DBPEDIA CURATION
7/18/2013
15
http://aksw.org/Projects/TripleCheckMate.html

CROWDMAP
Experiments using MTurk, CrowdFlower and established benchmarks
Enhancing the results of automatic techniques
Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012]
16
CartP
301-304
100R50P
Edas-Iasted
100R50P
Ekaw-Iasted
100R50P
Cmt-Ekaw
100R50P
ConfOf-Ekaw
Imp
301-304
PRECISION 0.53 0.8 1.0 1.0 0.93 0.73
RECALL 1.0 0.42 0.7 0.75 0.65 1.0

ONTOLOGY
POPULATION
7/18/2013
17

LINKED DATA
CURATION
7/18/2013
18

PROBLEMS AND
CHALLENGES
•What is feasible and how can tasks be optimally translated into microtasks?
• Examples: data quality assessment for technical and contextual features; subjective vs
objective tasks (also in modeling); open-ended questions
•What to show to users
• Natural language descriptions of Linked Data/SPARQL
• How much context
• What form of rendering
• How about links?
•How to combine with automatic tools
• Which results to validate
• Low precision (no fun for gamers...)
• Low recall (vs all possible questions)
•How to embed it into an existing application
• Tasks are fine granular, perceived as additional burden to the actual functionality
•What to do with the resulting data?
• Integration into existing practices
• Vocabularies!
7/18/2013
19

Crowdsourcing Linked Data management

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Crowdsourcing Linked Data management

Ähnlich wie Crowdsourcing Linked Data management (20)

Mehr von Elena Simperl

Mehr von Elena Simperl (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Crowdsourcing Linked Data management