This document discusses active learning. It begins with an overview and definition of active learning, contrasting it with passive learning. It then discusses applications of active learning in areas like speech recognition and medical diagnosis. Examples of active learning techniques like pool-based active learning are provided. The document examines whether active learning makes a difference compared to passive learning and discusses related human experiments. It explores active learning from multiple oracles, dealing with weak and strong oracles, and applications to crowdsourcing. The document concludes with discussing future directions and providing references.
2. Overview
● What is active learning?
● Does active learning make any difference?
● Active learning from multiple oracles
● Active learning with weak and strong oracle
● Multiple oracles with varying expertise
2
3. What is Active Learning?
● Introduced in Education by 1990s
● Let students participate actively
● Doing things rather than just listening
● Inspired machine learning
● Also known as Query Learning
3
5. Applications
● Fewer labeled data
● Speech Recognition
○ Word level annotation can take ten times longer
than actual audio (Zhu, 2005)
● Medical Diagnosis
○ Expert doctors
● Document Classification
5
7. Active Learning Examples
a) Toy dataset, two Gaussians b) logistic regression model produces 70% accuracy c) logistic
regression with active querying produces 90% accuracy (Settles, 2009)
7
9. Does AL make any difference?
“Learners do benefit from the
opportunity to actively select
examples during learning. But
It is very difficult to asses the
magnitude of difference that
active learning makes
compared to passive learning”
Laughlin (1973)
There were conflicting claims
throughout the literature on
the effectiveness of active
learning
9
10. Does AL make any difference?
“People make inappropriate
queries to assess simple logical
hypotheses such as if p then q
(frequently examining q
instances to see if they are p, and
failing to explore not-q instances”
Wason et al. (1972)
“If the learning task is properly
construed, human actually do a
great job in asking questions”
Gigerenzer et al.(2002)
Oaksford et al. (2007)
10
11. Does AL make any difference?
Castro et al. (2008) addressed these questions:
[Q1] Do humans perform better when they can select their own examples for labeling,
compared to passive observation of labeled examples?
[Q2] If so, do they achieve the full benefit of active learning suggested by statistical
learning theory?
[Q3] If they do not, can machine learning be used to enhance human performance?
[Q4] Do the answers to these questions vary depending upon the difficulty of the
learning problem?
11
12. Task Formulation
● Binary Classification in interval [0,1]
● Unknown decision boundary,
● 0 and 1 class
● n samples
● Xi
[0, 1], Yi
{0, 1}
● Yi
is correct with probability 1 − ε
● 0 ≤ ε < 1/2
12
[Source: Castro et. al. (2008)]
15. Experiment
A few 3D visual stimuli and their X values used in our experiment.
Participant was asked to guess the decision boundary
after every three iterations
15
16. Experiment
● Random
○ No queries
● Human Active
○ Active queries
● Machine Yoked
○ Machine makes query
○ Human observes
16
18. Answers
[Q1] Do humans perform better when they can select their own examples for labeling,
compared to passive observation of labeled examples? - Yes, in low noise levels
[Q2] If so, do they achieve the full benefit of active learning suggested by statistical
learning theory? - No, slower decay constants
[Q3] If they do not, can machine learning be used to enhance human performance? -
Inconclusive
[Q4] Do the answers to these questions vary depending upon the difficulty of the
learning problem? - Yes, with noise levels
18
19. Conclusion
● Simple learning task
● Machine Yoked Learning
● Impact on:
○ Fields of psychology and cognitive sciences
○ Intelligent tutoring systems
19
22. Multiple Oracle: Challenges
● How to select the most informative query?
● How to select the best oracle to ask questions?
● How to deal with disagreement among the
oracles?
● How to deal with a noisy or weak oracle?
22
23. Weak and strong labeler
● Zhang et al. (2015) considered exactly two oracles
● One standard oracle
○ Accurate but costly
● One weak oracle
○ Noisy but cheap
● Goal
○ Reduce number of queries to standard oracle
○ No impact on accuracy
23
24. Observations
● Difference Classifier to predict disagreement between
strong and weak labeler
○ Might not be statistically consistent
○ Can use cost-sensitive difference classifier
● Active learning queries a localized region of space
○ Train difference classifier on that localized region
24
26. Problem Formulation
● Unlabeled Distribution, U
● Input space, X
● Label space, Y
● Hypothesis class, H
● Data distribution, D
● Excess error,
● Goal:
with as few queries to O as possible
Strong
Oracle
O
Weak
Oracle
W
26
27. Algorithm
● Three key ideas
○ Difference classifier
○ Disagreement region DIS(V)
■ Region of the input space
where two member
classifiers disagree
○ Epoch based agnostic CAL
■ Train fresh difference
classifier in each epoch
27
[Source: Theory of Active Learning
(Steve Hanneke, 2014)]
28. Algorithm
● Initialize error 0
, total number of epochs k0
and draw some n0
examples
to form labeled dataset S0
● In each iteration up to k’ iterations:
○ Set target error
○ Draw nk
unlabeled samples
○ Identify disagreement region Ak
○ Train difference classifier hdf with Ak
, O, W
○ Active learning using hdf
■ Draw mk examples, use hdf
and query either O or W. Add the labeled data
to Sk
● Return a classifier learned from the labeled dataset Sk’
28
29. Performance Guarantee
● First term for learning, second for training difference classifier
● Second term is lower order term when d ≈ d’
● Fitting the difference classifier does not incur a high overhead
29
31. AL from crowds
● Multiple experts in supervised learning (Raykar et al.,
2009 and Yan et al., 2010)
● NLP tasks from AMT data (Snow et al., 2008)
● Yan et al., 2011 proposed a novel method in active
learning
● Focus:
○ Most informative query
○ Most useful annotator
31
34. Algorithm
● Two key steps
○ Select a sample to label next
○ Select the best annotator to label
● Select sample
○ Uncertainty sampling
■ Select the sample for which classifier is least
certain about
34
38. Experiment
(left) Labels, (center) Areas of Labeler expertise and (right) annotator selection information for the
simplified two dimensional Galaxy Dim Data (Yan et al., 2011)
38
39. Experiment: Baselines
● active learning+majority vote
○ Active query based on majority vote of all annotators
● random sample+multi-labeler
○ Multi labeler algorithm on randomly sampled
examples
● random sample+majority vote
○ Random sampling with majority vote
39
41. More Analyses
● Decision boundary intersects
all region of expertise
● Comparison with single oracle
AL
● Specialized vs General
expertise
41
[Source: Yan et. al. (2011)]
42. Future Direction
● More Applications
○ Real world problems
● Optimal number of oracles
○ Does multiple oracles always performs better than single oracle?
○ Is there an optimal number of oracles that works best?
● Cost function associated with labeling
○ Choose single vs multiple oracles
● General expertise
○ Each of multiple oracles have general expertise
42
43. References
● Castro, Rui M. et al. (2008). “Human Active Learning”. In: NIPS.
● Gigerenzer, Gerd and Reinhard Selten (2002). Bounded rationality: The
adaptive toolbox. MIT press.
● Laughlin, Patrick R. (1973). “Focusing strategy in concept attainment as a
function of instructions and task”. In: Journal of Experimental Psychology.
● Oaksford, Mike and Nick Chater (2007). Bayesian rationality: The
probabilistic approach to human reasoning. Oxford University Press.
● Raykar, Vikas C. et al. (2009). “Supervised learning from multiple experts:
whom to trust when everyone lies a bit”. In: ICML.
● Settles, Burr (2009). Active Learning Literature Survey. Computer Sciences
Technical Report 1648. University of Wisconsin–Madison.
43
44. References
● Snow, Rion et al. (2008). “Cheap and Fast - But is it Good? Evaluating
Non-Expert Annotations for Natural Language Tasks”. In: EMNLP.
● Wason, Peter Cathcart and Philip N Johnson-Laird (1972). Psychology of
reasoning: Structure and content. Vol. 86. Harvard University Press.
● Yan, Yan et al. (2010). “Modeling annotator expertise: Learning when
everybody knows a bit of something”. In: AISTATS.
● Yan, Yan et al. (2011). “Active Learning from Crowds”. In: ICML.
● Zhang, Chicheng and Kamalika Chaudhuri (2015). “Active Learning from
Weak and Strong Labelers”. In: NIPS.
● Zhu, Xiaojin (2005). “Semi-supervised Learning with Graphs”. AAI3179046.
PhD thesis. Pittsburgh, PA, USA
● Hanneke, Steve (2014). “Theory of Active Learning”
44