SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Downloaden Sie, um offline zu lesen
The Bayesian Crowd: scalable
informaon combinaon for Cizen
Science and Crowdsourcing
Stephen Roberts
Machine Learning Research Group  Oxford-Man Instute
University of Oxford

Alan Turing Instute
Joint work with Edwin Simpson, Steven Reece  Ma-eo Venanzi
Bayes Nets Meeng, January 2017
• Bayesian modelling allows for explicit incorporation of all desiderata
• Effort focused not only on theory development, but algorithmic
implementations that are timely  practical for real-world, real-time
scenarios
• Single, under- and over-arching philosophy…
“one method to rule them all… and in the darkness bind them”
“The language is that of Bayesian inference, which I will not utter
here...”
p(a|b) =
p(b|a)p(a)/p(b)
Core methodology – Bayesian inference
• Uncertainty at all levels of inference is naturally taken into account
• Optimal fusion of information: subjective, objective
• Handling missing values
• Handling of noise
• Principled inference of confidence and risk
• Optimal decision making
What does this buy us?
The scaling issue...
Data growth: Moore's law
The scale of things
The big data we generate
The rise of the flop and the fall in price
Science: the 4th paradigm?
Decision
Combination
How can we deal with unreliable worker responses and
very large datasets?
Big data: Square Kilometer Array,
10 petabytes compressed images/day
Noisy reports: Twi-er, Typhoon Haiyan
Aims: Reliability and E9ciency
●
Challenge: volunteers have varying reliability
– Di;erent knowledge, interests, skills
– Typically handled with redundancy → build a consensus
●
Challenge: datasets are large, what to priorise?
●
Aim: increase accuracy by learning reliability
●
Aim: use our volunteers' me eciently
– Reduce redundant decisions
– Deploy experts where needed
– Use addional data to scale up to larger datasets
Machine Learning: aggregate responses and assign
tasks intelligently
●
Probabilisc models of people and data
●
Handle uncertainty in model
●
Opmise and automate analysis to reduce costs
Machine learning
Data
Crowd AnnotaonsCrowd
Results
Zooniverse has 26 current applicaons across a
range of domains, with  1 million volunteers
● Can we use ML to handle variaons in ability?
● Or to match tasks to people's interests and skills?
How can we combine annotaons from
di;erent members of the crowd?
● Fewer annotaons needed from more reliable labellers
● ConCdence and trust → user weights
● But weighted majority is soE selecon
– Blurred decision boundaries
● Need to combine di;erent experse + weak labellers
Bayesian Methods
● Opmal framework for combining evidence
● Quanfy prior beliefs explicitly
– E.g. workers are mostly be-er than random
● QuanCes uncertainty at all levels
– Which agents are reliable?
– Do we need more evidence for an object's target class?
● Principled approach
– Move away from Cne-tuning each project
– E.g. avoid trial-and-error thresholds to determine when
consensus reached
How can we aggregate responses intelligently?
● Bayes' rule combines di@erent pieces of informaon
● Weight workers' contribuons through their likelihood
of response to class
● Opmal weighted majority decision
● Error guarantees
● SoD selecon
p(t|c)∝p(t)∏k ∈K
p(c
(k)
|t)
p(c(k)
|t) c(k)
t
Likelihood deCned by a confusion matrix
● Likelihood = of response to class :
● Richer than user accuracy weights:
– Di;ering skill levels in each class
– Responses need not be votes
p(c(k)
|t)
Response c(k)
Target
class
t
A B C
1 0.7 0.1 0.2
2 0.4 0.4 0.2
π(k)
c
(k)
t
Independent Bayesian classiCer combinaon
(IBCC) handles parameter uncertainty
Target labels
(multinomial)
Observed worker responses
(multinomial)
Worker-
specific
confusion
matrix
(Dirichlet)
Proportions of each
class (Dirichlet)
●
Deal raonally with limited or missing data
Hyperparameters encode prior beliefs in worker
behaviour, e.g. worker is be-er than random
●
Opmise/marginalise to handle model uncertainty
●
Share prior pseudo-counts between similar projects
●
Rao → relave
probability of
agent
responses given
class t
●
Magnitude →
strength of
prior beliefs
c(k)
t
A B C
1 7 1 2
2 4 4 2
Joint, condi
oned on hyper-hyper parameters
Inference
Gibbs sampling – rather slow
Variaonal Bayes – o;ers fast inference, at
expense of approximaons
Inference
-ve free energy Kullback-Leibler divergence
Variational Bayes
Variational Bayes
Variaonal Bayes: inOang the balloon
Variaonal Bayes: inOang the balloon
Variaonal Bayes: inOang the balloon
Users rate each presented object which provides a score of
-1 : very unlikely SN object
1 : possible SN object
3 : likely SN object
(“true” labels obtained retrospectively via Palomar Transient Factory
spectrographic analysis)
Zooniverse: Galaxy Zoo Supernovae
IBCC-VB outperforms alternaves
Galaxy Zoo
Supernovae
AUC
IBCC-VB 0.90
Mean 0.65
Weighted Sum 0.64
Weighted Majority 0.58
Area under ROC curve defining better
solutions
25,000 50,000 75,000 100,000 125,000 150,000
0.2
0.3
0.4
0.5
0.6
0.7
0.8
#labels
Accuracy
IBCC
DawidSkene
MV
Votedistribution
IBCC outperforms alternaves across domains
CrowdFlower Tweet Senment
IBCC
Galaxy Zoo
Supernovae
AUC
IBCC-VB 0.90
Mean 0.65
Weighted Sum 0.64
Weighted Majority 0.58
Community detecon over E[π] matrices:
behaviour types among Zooniverse users
Sensible Extreme Random Opmist Pessimist
● vbIBCC provides insights into crowd behaviour using
Bayesian community analysis
● Design training to inOuence these types
● CommunityBCC builds these types into the model to
be-er predict new workers
CommunityBCC builds these disnct types into
the model to be-er understand new workers
● Priors constrain the
worker model
● Fewer examples needed
to learn reliabilies
Dynamic IBCC: behaviour changes as people
learn, get bored, move...
● Detect a worker's current state: aggregate correctly,
select suitable tasks, inOuence behaviour
Current state
“true” decision label
(multinomial)
Set of all observed decisions
(multinomial)
Dirichlet
Dirichlet
Agent specific
“confusion” matrix
What about dynamics?
“true” decision label
(multinomial)
Set of all observed decisions
(multinomial)
Dirichlet
Dirichlet
Agent specific
“confusion” matrix
time
What about dynamics?
Dynamic IBCC tracks changes to the confusion
matrix over me
● Bayes' Clter
esmates
evolving
Markov chain
● Assumpon:
unexpected
behaviour →
state changes
Galaxy Zoo Supernovae example volunteer
Dynamic IBCC tracks changes to the confusion
matrix over me
● Bayes' Clter
esmates
evolving
Markov chain
● Assumpon:
unexpected
behaviour →
state changes
Mechanical Turk document classiCcaon
Modelling the data so we can deploy the
crowd more e9ciently...
Combining the crowd with features:
TREC Crowdsourcing Challenge
● IBCC + 2000 LDA features acng
as addional classiCers [11]
● Classify unlabelled documents
● Results:
– 0.81 AUC with only 16%
documents labelled at all
– 0.77 for next-best approach
– 1st place required mulple
labellings of all documents
BCCWords: an e9cient way to learn language
in new contexts
25,000 50,000 75,000 100,000 125,000 150,000
0.2
0.3
0.4
0.5
0.6
0.7
0.8
#labels
Accuracy
IBCC
CBCC
ScalBCCWords
MV(Textclassi+er
DawidSkene
MV
Votedistribution
CrowdFlower Tweet
Senment
Posive words about the
weather learnt by
BCCWords
BCCWords increases
accuracy with limited
labels
Unstructured data in social media: a rich
source of mely informaon
Real-me, local events – e.g. emergency reports aDer an
earthquake
Senment about products, health and social issues – e.g.
opinions about H1N1, product reviews
Butler 2013, Morrow et al. 2011
Understanding Textual Data Streams
● Turn unstructured data into reliable, machine-readable
informaon
● Automated classiBers struggle to understand diverse,
evolving language in new contexts
● Need new tools to resolve ambiguity and lack of
training data
Ushahidi – From Hai 2010 earthquake
Morrow et al. 2011
Categories of earthquake reports
Nepal, 2015, Quakemap.org
Gender
Kivran-Swaine et al., 2013
“Love” “Dude”
Interpreng Language through Crowdsourcing
● Biased and noisy interpretaons
● Scalability: the workers cannot label everything mulple mes
● New techniques needed to reduce the workload of labellers
using textual informaon
● How to learn a language model from unreliable judgements?
+
+
-
+
Repeve TasksRepeve Tasks Time Costs
Scenario: Senment Analysis of Tweets and
Reviews
Dataset Text Plaorm Sen
ment
Classes
No.
Documents
No.
Judgements
No.
Workers
2013
CrowdScale
shared
task
challenge
Tweets about
weather
CrowdFlower Posive
Negave
Neutral –
Not related X
Unknown ?
98,980 569,375 461
Rodrigues et
al., 2013
Ro-en
Tomatoes
Movie
Reviews
Amazon
Mechanical
Turk
Posive
Negave
5,000 27,747 203
“Morning sunshine”
09:18 PM June 7, 2011
“Is it rainy too?
Totally hate it”
10:05 PM June 7, 2011
“lovely sunny day”
10:06 PM June 7,
2011
Bayesian ClassiCer Combinaon with Words
BCCWords
●
Bayes' theorem provides a principled mathemacal
framework for classiCer combinaon
– Dawid  Skene, 1979; Kim  Ghahramani, 2012; Simpson et al., 2013;
Venanzi et al., 2014.
– Outperforms weighted majority vong etc.
+
+
-
+BCCWords
Bayesian ClassiCer Combinaon with Words
BCCWords
● Novel approach to combine weak signals from text
and crowd
– Model the reliability of members of the crowd
– Train a language model to reduce the number of
judgements needed
+
+
-
+BCCWords
Reliability of judgements deBned by a
confusion matrix for each worker
● DeBnes likelihood for worker k:
● Aggregate support for class c using Bayes' rule:
● Richer than weighng by overall accuracy:
– Accounts for bias and random noise
– Di@ering skill levels in each class
– Labels need not be votes for true class
p(label
(k)
|true class)
label(k)
True
class
+ve uncertain -ve
+ve 0.7 0.1 0.2
-ve 0.4 0.4 0.2
∏k∈K
p(label
(k)
|trueclass=c)
Likelihood of text features in each class: bag-of-
words
ωc=p(wordn|true class=c)
●
Words have di;erent likelihoods in each senment class
●
Prior distribuon over word likelihoods in each class
●
Learning posterior : update pseudo-counts as we observe words
in document of class c
Good, nice
More likely
Terrible
More likely
ωc
ωc
BCCWords: integrang this into one model...
BCCWords: judgements are condioned on
true class
Confusion
Matrix
Judgement
Label
True Class
BCCWords: judgements are condioned on
true class
Confusion
Matrix
Judgement
Label
True Class
N documents
BCCWords: judgements and words are
condioned on the true class
Confusion
Matrix
Judgement
Label
True Class
Word
Likelihoods
Words
ωc
N documents
BCCWords: judgements and words are
condioned on the true class
Use Bayes' rule to infer true class
from labels and words
Confusion
Matrix
Judgement
Label
True Class
Word
Likelihoods
Words
ωc
N documents
… but we need to
learn the likelihoods
from true class
labels
Variaonal Bayes: learn confusion matrices, language
model and true class with limited training data
●
Computaonally e9cient: 20 mins for 500k judgements, 98k tweets
●
Iteravely updates each variable in turn, learning from latent structure
and any prior knowledge or training data
●
Algorithm can be distributed to constrain memory requirements
Experiments: Senment Analysis of Tweets and
Reviews
Dataset Text Plaorm Sen
ment
Classes
No.
Documents
No.
Judgements
No.
Workers
2013
CrowdScale
shared
task
challenge
Tweets about
weather
CrowdFlower Posive
Negave
Neutral –
Not related X
Unknown ?
98,980 569,375 461
Rodrigues et
al., 2013
Ro-en
Tomatoes
Movie
Reviews
Amazon
Mechanical
Turk
Posive
Negave
5,000 27,747 203
“Morning sunshine”
09:18 PM June 7, 2011
“Is it rainy too?
Totally hate it”
10:05 PM June 7, 2011
“lovely sunny day”
10:06 PM June 7,
2011
Language Model for Weather Senment
Posive NegaveMost Likely Words
Discriminave Words
Disnct worker types show the importance of
learning reliability
1
0.5
0
1
0.5
0
1
1
0.5
True
class Worker
Label
Probability
Good Worker Inaccurate Worker
CrowdLower Weather – 5 classes
Summary: BCCWords fuses subjecve
interpretaons to learn models of language in
the wild
● Important to account for skills and bias
of individuals in crowd
● Learns worker reliability and language
model in a single integrated inference
algorithm
● Uses textual informaon to reduce the
number of judgements required
● Bayesian inference
– Proven framework for fusing informaon
– Handles uncertainty in true class labels
and model itself
1
0.5
0
1
0.5
0
1
0.5
0
1
0.5
0
Moving towards e9cient learning with
Crowd in-the-Loop
● Turn masses of unstructured, heterogeneous data into
reliable, machine-readable informaon
● Use the model to choose who does what task
1
0.5
0
1
0.5
0
1
0.5
0
1
0.5
0
● Detect di;erent interpretaons of language between communies
in the crowd?
Intelligent agent-task assignment:
who should classify which object?
● Aim: direct crowd's e;ort to learn quickly  cheaply
● Priorise tasks by considering their features and conCdence
in their classiCcaon
● Task choice depends on the workers available
● Maximise expected ulity
DynIBCC confusion matrix
describes individual skills
Ulity of response: informaon gain about
targets when DynIBCC is updated
● Naturally balances exploraon  exploitaon
● Explore an agent's behaviour from silver tasks
– Objects already labelled conCdently by crowd
– Increases ulity of past responses
● Exploit an agent's skills to learn uncertain targets t
E[U τ (k ,i)]=E[ I (t ; ci
(k)
∣Dτ )]
Index of target object
Worker ID
Crowdsourced data
collected so far
Time index
Hiring and Cring algorithm makes greedy
assignments to reduce computaonal cost
● Hire for priority task that matches current skills
● Fire if new crowd members likely to do be-er
Loose crowds on the web  in organisaons:
Disaster Response
● Extracng key informaon from noisy background
– Text: Twi-er, Ushahidi 15000 messages in a few weeks [8]
– Images: Satellite, Social Media
– Team communicaons, other agencies
● Locaons of emergencies:
– connuous target funcon
Bayesian crowdsourced heatmaps visualise
likely emergencies and informaon gaps
● Neighbouring reports related by spaal Gaussian
process (GP) classiCer
Κ
ti
Density of
emergencies
at (x,y)
Emergency
state at (x,y)
ci
(k)
π(k)
α0
(k)
Sigmoid funcon maps GP to Dirichlet
GP Variance
Bayesian crowdsourced heatmaps visualise
likely emergencies and informaon gaps
Ushahidi crowd + trusted report from Crst responder
Future Opportunies
Adapve training and movaon to create diverse
skills and smulate workers
●
Model worker preferences, rewards
●
Fast approximaons to future ulity
– Deduct cost of rewards
– Add retenon, work rate, reliability
– Target clusters of workers
●
Selecng tasks/training: consider person's
history
Apprenceship/Peer Training
Infer improvements in confusion
matrices from e;ect of task on others
Models for combining new data types  target
funcons
● Targets have mulple dimensions
– Shapes in PlanetFour
● Poisson processes, event rates
– Malaria rates
Acvely switch types of tasks to opmise
learning from the crowd
● Select quesons from decision tree
● Labelling, comparing, marking features, grouping...
● Ulity varies: accuracy of responses, current model of
features...
34.556
Maximise
informaon
about t
...is like...
Learn how people make decisions by
acvely adapng tasks
● Improve automaon,
reduce work
● Select interacon mode or
quesons in the micro-task
● Maximise informaon given
current model
● Crowd-supervised feature
extracon, e.g. adapng
PCA to learn more useful
features from the crowd
Projecon
Summary: Bayesian models enable accurate
and scalable crowdsourcing across domains
● Quanfy uncertainty in data model  worker behaviour
● Acvely learn from crowds using model of features
● Opportunies: opmisaon and learning to automate
with humans-in-the-loop
Machine
learning
Data
Crowd AnnotaonsCrowd
Results
ORCHID and Zooniverse collaborators worked
with Rescue Global to idenfy and then reCne
their crical informaon requirements.
• placement of life detectors and water
Clters within 50 mile radius of Kathmandu.
Crowd labelled 1200 Planet Labs satellite images
using Zooniverse soEware.
• Recruited 25 image labellers from within
Oxford University and Rescue Global sta;
(they worked hard over the bank holiday
weekend).
Folded in OpenStreetMap building density data
and inferred populaon density map using
ORCHID data processing algorithms.
Delivered map overlay to Rescue Global for
disseminaon to their CaDRA partners (SARaid,
Team Rubicon, CADENA).
29/04/15 to
2/05/15
02/05/15 to
20:13 GMT 05/05/15
00:15 GMT
06/05/15
05/05/15
25/04/15, 7.8 Earthquake in Gorkha District of Nepal
SoDware on Github
● h+p://www.robots.ox.ac.uk/~edwin/
– Please use and report bugs
● PyIBCC: IBCC-VB and DynIBCC-VB in Python 2
– Collaborang with Zooniverse
● MatlabIBCC: IBCC-VB and DynIBCC-VB in Matlab
Acknowledgements
● Uni of Southampton: Nick Jennings, Alex Rogers, Sarvapali
Ramchurn, Ma+eo Venanzi
● Oxford: Edwin Simpson, Steve Reece, Chris Linto+  Zooniverse team
● EPSRC (UK research council), the ORCHID project, Rescue Global,
MicrosoD, Zooniverse
References
[1] Dawid, A. P.,  Skene, A. M. (1979). Maximum likelihood esmaon of observer error-rates using the EM algorithm. Applied stascs, 20-28.
[2] Kim, H. C.,  Ghahramani, Z. (2012). Bayesian classiCer combinaon. In Internaonal conference on arCcial intelligence and stascs (pp. 619-
627).
[3] E. Simpson, S. Roberts, I. Psorakis, A. Smith and C. Linto- (2011). Bayesian Combinaon of Mulple, Imperfect ClassiCers. Proceedings of NIPS
2011 workshop
[4] Simpson, E., Roberts, S., Psorakis, I.,  Smith, A. (2013). Dynamic bayesian combinaon of mulple imperfect classiCers. In Decision Making and
Imperfecon (pp. 1-35). Springer.
[5] Psorakis, I., Roberts, S., Ebden, M.,  Sheldon, B. (2011). Overlapping Community Detecon using Bayesian Nonnegave Matrix Factorizaon.
Physical Review E, 83.
[6] Venanzi, M., Guiver, J., Kazai, G., Kohli, P.,  Shokouhi, M. (2014). Community-based bayesian aggregaon models for crowdsourcing. In
Proceedings of the 23rd internaonal conference on World wide web (pp. 155-164). Internaonal World Wide Web Conferences Steering
Commi-ee.
[7] E. Simpson, S. Roberts (2015 – to appear). Bayesian Methods for Intelligent Task Assignment in Crowdsourcing Systems, Scalable Decision
Making: Uncertainty, Imperfecon, Deliberaon; Studies in Computaonal Intelligence, Springer
[8] N. Morrow, N. Mock, A. Papendieck, and N. Kocmich (2011). Independent Evaluaon of the Ushahidi Hai Project. Development Informaon
Systems., 8:2011.
[9] MacKay, David J. C. (1992). Informaon-based objecve funcons for acve data selecon. Neural computaon, 4(4):590–604.
[10]Chen, X., Benne-, P. N., Collins-Thompson, K., and Horvitz, E. (2013). Pairwise ranking aggregaon in a crowdsourced se`ng. In Proceedings of
the sixth ACM internaonal conference on Web search and data mining. ACM
[11]E. Simpson, S. Reece, A. Penta, G. Ramchurn, and S. Roberts (2012). Using a Bayesian Model to Combine LDA Features with Crowdsourced
Responses. In The Twenty-First Text REtrieval Conference (TREC 2012), Crowdsourcing Track, NIST.
[12]S. Nitzan, J. Paroush (1982). Opmal decision rules in uncertain dichotomous choice situaons. Internaonal Economic Review, 23(2):289–297,
1982.
[13]D. Berend, A. Kontorovich (2014). Consistency of Weighted Majority Votes. NIPS
[14]Y. Zhang, X. Chen, D. Zhou, M. Jordan (2014). Spectral methods meet EM: a Provable Opmal Algorithm for Crowdsourcing.
Quesons?

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
butest
 

Was ist angesagt? (20)

[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
IROS 2017 Slides
IROS 2017 SlidesIROS 2017 Slides
IROS 2017 Slides
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump Start
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
 
Recent Trends in Deep Learning
Recent Trends in Deep LearningRecent Trends in Deep Learning
Recent Trends in Deep Learning
 
Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
 
AI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AIAI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AI
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in Healthcare
 

Ähnlich wie Professor Steve Roberts; The Bayesian Crowd: scalable information combination for Citizen Science and Crowdsourcing

Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
Current Approaches in Search Result Diversification
Current Approaches in Search Result DiversificationCurrent Approaches in Search Result Diversification
Current Approaches in Search Result Diversification
Mario Sangiorgio
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 

Ähnlich wie Professor Steve Roberts; The Bayesian Crowd: scalable information combination for Citizen Science and Crowdsourcing (20)

Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Current Approaches in Search Result Diversification
Current Approaches in Search Result DiversificationCurrent Approaches in Search Result Diversification
Current Approaches in Search Result Diversification
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDay
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Online machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsOnline machine learning in Streaming Applications
Online machine learning in Streaming Applications
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
Rsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceRsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature Importance
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Professor Steve Roberts; The Bayesian Crowd: scalable information combination for Citizen Science and Crowdsourcing

  • 1. The Bayesian Crowd: scalable informaon combinaon for Cizen Science and Crowdsourcing Stephen Roberts Machine Learning Research Group Oxford-Man Instute University of Oxford Alan Turing Instute Joint work with Edwin Simpson, Steven Reece Ma-eo Venanzi Bayes Nets Meeng, January 2017
  • 2.
  • 3. • Bayesian modelling allows for explicit incorporation of all desiderata • Effort focused not only on theory development, but algorithmic implementations that are timely practical for real-world, real-time scenarios • Single, under- and over-arching philosophy… “one method to rule them all… and in the darkness bind them” “The language is that of Bayesian inference, which I will not utter here...” p(a|b) = p(b|a)p(a)/p(b) Core methodology – Bayesian inference
  • 4. • Uncertainty at all levels of inference is naturally taken into account • Optimal fusion of information: subjective, objective • Handling missing values • Handling of noise • Principled inference of confidence and risk • Optimal decision making What does this buy us?
  • 7. The scale of things
  • 8. The big data we generate
  • 9. The rise of the flop and the fall in price
  • 10. Science: the 4th paradigm?
  • 12. How can we deal with unreliable worker responses and very large datasets? Big data: Square Kilometer Array, 10 petabytes compressed images/day Noisy reports: Twi-er, Typhoon Haiyan
  • 13. Aims: Reliability and E9ciency ● Challenge: volunteers have varying reliability – Di;erent knowledge, interests, skills – Typically handled with redundancy → build a consensus ● Challenge: datasets are large, what to priorise? ● Aim: increase accuracy by learning reliability ● Aim: use our volunteers' me eciently – Reduce redundant decisions – Deploy experts where needed – Use addional data to scale up to larger datasets
  • 14. Machine Learning: aggregate responses and assign tasks intelligently ● Probabilisc models of people and data ● Handle uncertainty in model ● Opmise and automate analysis to reduce costs Machine learning Data Crowd AnnotaonsCrowd Results
  • 15. Zooniverse has 26 current applicaons across a range of domains, with 1 million volunteers ● Can we use ML to handle variaons in ability? ● Or to match tasks to people's interests and skills?
  • 16. How can we combine annotaons from di;erent members of the crowd? ● Fewer annotaons needed from more reliable labellers ● ConCdence and trust → user weights ● But weighted majority is soE selecon – Blurred decision boundaries ● Need to combine di;erent experse + weak labellers
  • 17. Bayesian Methods ● Opmal framework for combining evidence ● Quanfy prior beliefs explicitly – E.g. workers are mostly be-er than random ● QuanCes uncertainty at all levels – Which agents are reliable? – Do we need more evidence for an object's target class? ● Principled approach – Move away from Cne-tuning each project – E.g. avoid trial-and-error thresholds to determine when consensus reached
  • 18. How can we aggregate responses intelligently? ● Bayes' rule combines di@erent pieces of informaon ● Weight workers' contribuons through their likelihood of response to class ● Opmal weighted majority decision ● Error guarantees ● SoD selecon p(t|c)∝p(t)∏k ∈K p(c (k) |t) p(c(k) |t) c(k) t
  • 19. Likelihood deCned by a confusion matrix ● Likelihood = of response to class : ● Richer than user accuracy weights: – Di;ering skill levels in each class – Responses need not be votes p(c(k) |t) Response c(k) Target class t A B C 1 0.7 0.1 0.2 2 0.4 0.4 0.2 π(k) c (k) t
  • 20. Independent Bayesian classiCer combinaon (IBCC) handles parameter uncertainty Target labels (multinomial) Observed worker responses (multinomial) Worker- specific confusion matrix (Dirichlet) Proportions of each class (Dirichlet) ● Deal raonally with limited or missing data
  • 21. Hyperparameters encode prior beliefs in worker behaviour, e.g. worker is be-er than random ● Opmise/marginalise to handle model uncertainty ● Share prior pseudo-counts between similar projects ● Rao → relave probability of agent responses given class t ● Magnitude → strength of prior beliefs c(k) t A B C 1 7 1 2 2 4 4 2
  • 22. Joint, condi oned on hyper-hyper parameters Inference Gibbs sampling – rather slow Variaonal Bayes – o;ers fast inference, at expense of approximaons Inference
  • 23. -ve free energy Kullback-Leibler divergence Variational Bayes
  • 25. Variaonal Bayes: inOang the balloon
  • 26. Variaonal Bayes: inOang the balloon
  • 27. Variaonal Bayes: inOang the balloon
  • 28. Users rate each presented object which provides a score of -1 : very unlikely SN object 1 : possible SN object 3 : likely SN object (“true” labels obtained retrospectively via Palomar Transient Factory spectrographic analysis) Zooniverse: Galaxy Zoo Supernovae
  • 29. IBCC-VB outperforms alternaves Galaxy Zoo Supernovae AUC IBCC-VB 0.90 Mean 0.65 Weighted Sum 0.64 Weighted Majority 0.58 Area under ROC curve defining better solutions
  • 30. 25,000 50,000 75,000 100,000 125,000 150,000 0.2 0.3 0.4 0.5 0.6 0.7 0.8 #labels Accuracy IBCC DawidSkene MV Votedistribution IBCC outperforms alternaves across domains CrowdFlower Tweet Senment IBCC Galaxy Zoo Supernovae AUC IBCC-VB 0.90 Mean 0.65 Weighted Sum 0.64 Weighted Majority 0.58
  • 31. Community detecon over E[π] matrices: behaviour types among Zooniverse users Sensible Extreme Random Opmist Pessimist ● vbIBCC provides insights into crowd behaviour using Bayesian community analysis ● Design training to inOuence these types ● CommunityBCC builds these types into the model to be-er predict new workers
  • 32. CommunityBCC builds these disnct types into the model to be-er understand new workers ● Priors constrain the worker model ● Fewer examples needed to learn reliabilies
  • 33. Dynamic IBCC: behaviour changes as people learn, get bored, move... ● Detect a worker's current state: aggregate correctly, select suitable tasks, inOuence behaviour Current state
  • 34. “true” decision label (multinomial) Set of all observed decisions (multinomial) Dirichlet Dirichlet Agent specific “confusion” matrix What about dynamics?
  • 35. “true” decision label (multinomial) Set of all observed decisions (multinomial) Dirichlet Dirichlet Agent specific “confusion” matrix time What about dynamics?
  • 36. Dynamic IBCC tracks changes to the confusion matrix over me ● Bayes' Clter esmates evolving Markov chain ● Assumpon: unexpected behaviour → state changes Galaxy Zoo Supernovae example volunteer
  • 37. Dynamic IBCC tracks changes to the confusion matrix over me ● Bayes' Clter esmates evolving Markov chain ● Assumpon: unexpected behaviour → state changes Mechanical Turk document classiCcaon
  • 38. Modelling the data so we can deploy the crowd more e9ciently...
  • 39. Combining the crowd with features: TREC Crowdsourcing Challenge ● IBCC + 2000 LDA features acng as addional classiCers [11] ● Classify unlabelled documents ● Results: – 0.81 AUC with only 16% documents labelled at all – 0.77 for next-best approach – 1st place required mulple labellings of all documents
  • 40. BCCWords: an e9cient way to learn language in new contexts 25,000 50,000 75,000 100,000 125,000 150,000 0.2 0.3 0.4 0.5 0.6 0.7 0.8 #labels Accuracy IBCC CBCC ScalBCCWords MV(Textclassi+er DawidSkene MV Votedistribution CrowdFlower Tweet Senment Posive words about the weather learnt by BCCWords BCCWords increases accuracy with limited labels
  • 41. Unstructured data in social media: a rich source of mely informaon Real-me, local events – e.g. emergency reports aDer an earthquake Senment about products, health and social issues – e.g. opinions about H1N1, product reviews Butler 2013, Morrow et al. 2011
  • 42. Understanding Textual Data Streams ● Turn unstructured data into reliable, machine-readable informaon ● Automated classiBers struggle to understand diverse, evolving language in new contexts ● Need new tools to resolve ambiguity and lack of training data Ushahidi – From Hai 2010 earthquake Morrow et al. 2011 Categories of earthquake reports Nepal, 2015, Quakemap.org Gender Kivran-Swaine et al., 2013 “Love” “Dude”
  • 43. Interpreng Language through Crowdsourcing ● Biased and noisy interpretaons ● Scalability: the workers cannot label everything mulple mes ● New techniques needed to reduce the workload of labellers using textual informaon ● How to learn a language model from unreliable judgements? + + - + Repeve TasksRepeve Tasks Time Costs
  • 44. Scenario: Senment Analysis of Tweets and Reviews Dataset Text Plaorm Sen ment Classes No. Documents No. Judgements No. Workers 2013 CrowdScale shared task challenge Tweets about weather CrowdFlower Posive Negave Neutral – Not related X Unknown ? 98,980 569,375 461 Rodrigues et al., 2013 Ro-en Tomatoes Movie Reviews Amazon Mechanical Turk Posive Negave 5,000 27,747 203 “Morning sunshine” 09:18 PM June 7, 2011 “Is it rainy too? Totally hate it” 10:05 PM June 7, 2011 “lovely sunny day” 10:06 PM June 7, 2011
  • 45. Bayesian ClassiCer Combinaon with Words BCCWords ● Bayes' theorem provides a principled mathemacal framework for classiCer combinaon – Dawid Skene, 1979; Kim Ghahramani, 2012; Simpson et al., 2013; Venanzi et al., 2014. – Outperforms weighted majority vong etc. + + - +BCCWords
  • 46. Bayesian ClassiCer Combinaon with Words BCCWords ● Novel approach to combine weak signals from text and crowd – Model the reliability of members of the crowd – Train a language model to reduce the number of judgements needed + + - +BCCWords
  • 47. Reliability of judgements deBned by a confusion matrix for each worker ● DeBnes likelihood for worker k: ● Aggregate support for class c using Bayes' rule: ● Richer than weighng by overall accuracy: – Accounts for bias and random noise – Di@ering skill levels in each class – Labels need not be votes for true class p(label (k) |true class) label(k) True class +ve uncertain -ve +ve 0.7 0.1 0.2 -ve 0.4 0.4 0.2 ∏k∈K p(label (k) |trueclass=c)
  • 48. Likelihood of text features in each class: bag-of- words ωc=p(wordn|true class=c) ● Words have di;erent likelihoods in each senment class ● Prior distribuon over word likelihoods in each class ● Learning posterior : update pseudo-counts as we observe words in document of class c Good, nice More likely Terrible More likely ωc ωc
  • 49. BCCWords: integrang this into one model...
  • 50. BCCWords: judgements are condioned on true class Confusion Matrix Judgement Label True Class
  • 51. BCCWords: judgements are condioned on true class Confusion Matrix Judgement Label True Class N documents
  • 52. BCCWords: judgements and words are condioned on the true class Confusion Matrix Judgement Label True Class Word Likelihoods Words ωc N documents
  • 53. BCCWords: judgements and words are condioned on the true class Use Bayes' rule to infer true class from labels and words Confusion Matrix Judgement Label True Class Word Likelihoods Words ωc N documents … but we need to learn the likelihoods from true class labels
  • 54. Variaonal Bayes: learn confusion matrices, language model and true class with limited training data ● Computaonally e9cient: 20 mins for 500k judgements, 98k tweets ● Iteravely updates each variable in turn, learning from latent structure and any prior knowledge or training data ● Algorithm can be distributed to constrain memory requirements
  • 55. Experiments: Senment Analysis of Tweets and Reviews Dataset Text Plaorm Sen ment Classes No. Documents No. Judgements No. Workers 2013 CrowdScale shared task challenge Tweets about weather CrowdFlower Posive Negave Neutral – Not related X Unknown ? 98,980 569,375 461 Rodrigues et al., 2013 Ro-en Tomatoes Movie Reviews Amazon Mechanical Turk Posive Negave 5,000 27,747 203 “Morning sunshine” 09:18 PM June 7, 2011 “Is it rainy too? Totally hate it” 10:05 PM June 7, 2011 “lovely sunny day” 10:06 PM June 7, 2011
  • 56. Language Model for Weather Senment Posive NegaveMost Likely Words Discriminave Words
  • 57. Disnct worker types show the importance of learning reliability 1 0.5 0 1 0.5 0 1 1 0.5 True class Worker Label Probability Good Worker Inaccurate Worker CrowdLower Weather – 5 classes
  • 58. Summary: BCCWords fuses subjecve interpretaons to learn models of language in the wild ● Important to account for skills and bias of individuals in crowd ● Learns worker reliability and language model in a single integrated inference algorithm ● Uses textual informaon to reduce the number of judgements required ● Bayesian inference – Proven framework for fusing informaon – Handles uncertainty in true class labels and model itself 1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0
  • 59. Moving towards e9cient learning with Crowd in-the-Loop ● Turn masses of unstructured, heterogeneous data into reliable, machine-readable informaon ● Use the model to choose who does what task 1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0 ● Detect di;erent interpretaons of language between communies in the crowd?
  • 60. Intelligent agent-task assignment: who should classify which object? ● Aim: direct crowd's e;ort to learn quickly cheaply ● Priorise tasks by considering their features and conCdence in their classiCcaon ● Task choice depends on the workers available ● Maximise expected ulity DynIBCC confusion matrix describes individual skills
  • 61. Ulity of response: informaon gain about targets when DynIBCC is updated ● Naturally balances exploraon exploitaon ● Explore an agent's behaviour from silver tasks – Objects already labelled conCdently by crowd – Increases ulity of past responses ● Exploit an agent's skills to learn uncertain targets t E[U τ (k ,i)]=E[ I (t ; ci (k) ∣Dτ )] Index of target object Worker ID Crowdsourced data collected so far Time index
  • 62. Hiring and Cring algorithm makes greedy assignments to reduce computaonal cost ● Hire for priority task that matches current skills ● Fire if new crowd members likely to do be-er
  • 63. Loose crowds on the web in organisaons: Disaster Response ● Extracng key informaon from noisy background – Text: Twi-er, Ushahidi 15000 messages in a few weeks [8] – Images: Satellite, Social Media – Team communicaons, other agencies ● Locaons of emergencies: – connuous target funcon
  • 64. Bayesian crowdsourced heatmaps visualise likely emergencies and informaon gaps ● Neighbouring reports related by spaal Gaussian process (GP) classiCer Κ ti Density of emergencies at (x,y) Emergency state at (x,y) ci (k) π(k) α0 (k) Sigmoid funcon maps GP to Dirichlet GP Variance
  • 65. Bayesian crowdsourced heatmaps visualise likely emergencies and informaon gaps Ushahidi crowd + trusted report from Crst responder
  • 67. Adapve training and movaon to create diverse skills and smulate workers ● Model worker preferences, rewards ● Fast approximaons to future ulity – Deduct cost of rewards – Add retenon, work rate, reliability – Target clusters of workers ● Selecng tasks/training: consider person's history Apprenceship/Peer Training Infer improvements in confusion matrices from e;ect of task on others
  • 68. Models for combining new data types target funcons ● Targets have mulple dimensions – Shapes in PlanetFour ● Poisson processes, event rates – Malaria rates
  • 69. Acvely switch types of tasks to opmise learning from the crowd ● Select quesons from decision tree ● Labelling, comparing, marking features, grouping... ● Ulity varies: accuracy of responses, current model of features... 34.556 Maximise informaon about t ...is like...
  • 70. Learn how people make decisions by acvely adapng tasks ● Improve automaon, reduce work ● Select interacon mode or quesons in the micro-task ● Maximise informaon given current model ● Crowd-supervised feature extracon, e.g. adapng PCA to learn more useful features from the crowd Projecon
  • 71. Summary: Bayesian models enable accurate and scalable crowdsourcing across domains ● Quanfy uncertainty in data model worker behaviour ● Acvely learn from crowds using model of features ● Opportunies: opmisaon and learning to automate with humans-in-the-loop Machine learning Data Crowd AnnotaonsCrowd Results
  • 72. ORCHID and Zooniverse collaborators worked with Rescue Global to idenfy and then reCne their crical informaon requirements. • placement of life detectors and water Clters within 50 mile radius of Kathmandu. Crowd labelled 1200 Planet Labs satellite images using Zooniverse soEware. • Recruited 25 image labellers from within Oxford University and Rescue Global sta; (they worked hard over the bank holiday weekend). Folded in OpenStreetMap building density data and inferred populaon density map using ORCHID data processing algorithms. Delivered map overlay to Rescue Global for disseminaon to their CaDRA partners (SARaid, Team Rubicon, CADENA). 29/04/15 to 2/05/15 02/05/15 to 20:13 GMT 05/05/15 00:15 GMT 06/05/15 05/05/15 25/04/15, 7.8 Earthquake in Gorkha District of Nepal
  • 73. SoDware on Github ● h+p://www.robots.ox.ac.uk/~edwin/ – Please use and report bugs ● PyIBCC: IBCC-VB and DynIBCC-VB in Python 2 – Collaborang with Zooniverse ● MatlabIBCC: IBCC-VB and DynIBCC-VB in Matlab Acknowledgements ● Uni of Southampton: Nick Jennings, Alex Rogers, Sarvapali Ramchurn, Ma+eo Venanzi ● Oxford: Edwin Simpson, Steve Reece, Chris Linto+ Zooniverse team ● EPSRC (UK research council), the ORCHID project, Rescue Global, MicrosoD, Zooniverse
  • 74. References [1] Dawid, A. P., Skene, A. M. (1979). Maximum likelihood esmaon of observer error-rates using the EM algorithm. Applied stascs, 20-28. [2] Kim, H. C., Ghahramani, Z. (2012). Bayesian classiCer combinaon. In Internaonal conference on arCcial intelligence and stascs (pp. 619- 627). [3] E. Simpson, S. Roberts, I. Psorakis, A. Smith and C. Linto- (2011). Bayesian Combinaon of Mulple, Imperfect ClassiCers. Proceedings of NIPS 2011 workshop [4] Simpson, E., Roberts, S., Psorakis, I., Smith, A. (2013). Dynamic bayesian combinaon of mulple imperfect classiCers. In Decision Making and Imperfecon (pp. 1-35). Springer. [5] Psorakis, I., Roberts, S., Ebden, M., Sheldon, B. (2011). Overlapping Community Detecon using Bayesian Nonnegave Matrix Factorizaon. Physical Review E, 83. [6] Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M. (2014). Community-based bayesian aggregaon models for crowdsourcing. In Proceedings of the 23rd internaonal conference on World wide web (pp. 155-164). Internaonal World Wide Web Conferences Steering Commi-ee. [7] E. Simpson, S. Roberts (2015 – to appear). Bayesian Methods for Intelligent Task Assignment in Crowdsourcing Systems, Scalable Decision Making: Uncertainty, Imperfecon, Deliberaon; Studies in Computaonal Intelligence, Springer [8] N. Morrow, N. Mock, A. Papendieck, and N. Kocmich (2011). Independent Evaluaon of the Ushahidi Hai Project. Development Informaon Systems., 8:2011. [9] MacKay, David J. C. (1992). Informaon-based objecve funcons for acve data selecon. Neural computaon, 4(4):590–604. [10]Chen, X., Benne-, P. N., Collins-Thompson, K., and Horvitz, E. (2013). Pairwise ranking aggregaon in a crowdsourced se`ng. In Proceedings of the sixth ACM internaonal conference on Web search and data mining. ACM [11]E. Simpson, S. Reece, A. Penta, G. Ramchurn, and S. Roberts (2012). Using a Bayesian Model to Combine LDA Features with Crowdsourced Responses. In The Twenty-First Text REtrieval Conference (TREC 2012), Crowdsourcing Track, NIST. [12]S. Nitzan, J. Paroush (1982). Opmal decision rules in uncertain dichotomous choice situaons. Internaonal Economic Review, 23(2):289–297, 1982. [13]D. Berend, A. Kontorovich (2014). Consistency of Weighted Majority Votes. NIPS [14]Y. Zhang, X. Chen, D. Zhou, M. Jordan (2014). Spectral methods meet EM: a Provable Opmal Algorithm for Crowdsourcing.