SlideShare a Scribd company logo
1 of 107
Download to read offline
Machine Learning : 1BMVA Summer School 2014
Machine Learning for
Computer Vision
a whirlwind tour of key
concepts for the uninitiated
Toby Breckon
School of Engineering and Computing Sciences
Durham University
www.durham.ac.uk/toby.breckon/mltutorial/ toby.breckon@durham.ac.uk
Machine Learning : 2BMVA Summer School 2014
Machine Learning ?
 Why Machine Learning?
– we cannot program everything
– some tasks are difficult to define algorithmically
• especially in computer vision
…. visual sensing has few rules
 Well-defined learning problems ?
– easy to learn Vs. difficult to learn
• ..... varying complexity of visual patterns
 An example: learning to recognise objects ...
Image: DK
Machine Learning : 3BMVA Summer School 2014
Learning ? - in humans
Machine Learning : 4BMVA Summer School 2014
Learning ? - in computers
Machine Learning : 5BMVA Summer School 2014
Learning ...
 learning verb
–the activity of obtaining knowledge
–knowledge obtained by study
– English Dictionary, Cambridge University Press
Machine Learning : 6BMVA Summer School 2014
Machine Learning
 Definition:
– A set of methods for the automated analysis of
structure in data. …. two main strands of
work, (i) unsupervised learning ….
and (ii) supervised learning.
….similar to ... data mining, but ... focus .. more
on autonomous machine performance, ….
rather than enabling humans to learn from the data.
[Dictionary of Image Processing & Computer Vision, Fisher et al., 2014]
Machine Learning : 7BMVA Summer School 2014
Supervised Vs. Unsupervised
 Supervised
– knowledge of output - learning with the
presence of an “expert” / teacher
• data is labelled with a class or value
• Goal: predict class or value label
– e.g. Neural Network, Support Vector Machines,
Decision Trees, Bayesian Classifiers ....
 Unsupervised
– no knowledge of output class or value
• data is unlabelled or value un-known
• Goal: determine data patterns/groupings
– Self-guided learning algorithm
– (internal self-evaluation against some criteria)
– e.g. k-means, genetic algorithms, clustering
approaches ...
….
c1
c2
c3
…. ?
Machine Learning : 8BMVA Summer School 2014
Machine
Learning
=
“Decision
or
Prediction”
FeatureDetection+Representation
(e.g.SIFT,HoG,….)(e.g.histogram,BagofWords,
PCA...)
person
cat
dog
cow
….
….
….
car
rhino
….
etc.
… in the big picture
Machine Learning : 9BMVA Summer School 2014
ML Input “Examples” in Comp. Vis. ...
 “Direct Sample” inputs
– Pixels / Voxels / 3D Points
– sub-samples (key-points, feature-points)
… i.e. “samples” direct the imagery
 Feature Vectors
– shape measures
– edge distributions
– colour distributions
– texture measures / distributions
[…. SIFT, SURF, HOG …. etc.]
… i.e. calculated summary “numerical” descriptors
V = { 0.103900,
120.102,
30.10101, .... , ...}
Machine Learning : 10BMVA Summer School 2014
Common ML Tasks in Comp. Vis. ...
 Object Classification
what object ?
 Object Detection
object or no-object ?
 Instance Recognition ?
who (or what) is it ?
 Sub-category analysis
which object type ?
 Sequence { Recognition | Classification } ?
what is happening / occurring ?
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
{people | vehicle | … intruder ….}
{gender | type | species | age …...}
{face | vehicle plate| gait …. → biometrics}
Machine Learning : 11BMVA Summer School 2014
Types of ML Problem
 Classification
– Predict (classify) sample → discrete set of class labels
• e.g. classes {object 1, object 2 … } for recognition task
• e.g. classes {object, !object} for detection task
 Regression (traditionally less common in comp. vis.)
– Predict sample → associated numerical value (variable)
• e.g. distance to target based on shape features
– Linear and non-linear attribute to value relationships
 Association or clustering
– grouping a set of instances by attribute similarity
• e.g. image segmentation
[Ess et al, 2009]
…. ?
Machine Learning : 12BMVA Summer School 2014
Regression Example – Head Pose Estimation
 Input: image features (HOG)
 Output: { yaw | pitch }
 varying illumination + vibration
[Walger / Breckon, 2013]
Machine Learning : 13BMVA Summer School 2014
Learning in general, from specific examples
 N.B.
– E is specific : a specific example (e.g. female face)
– T is general : a general task (e.g. gender recognition)
Program P “learns” task T from set of examples {E}
Thus if P improves learning must be of the following form:
– “to learn a general
[ability|behaviour|function|rules]
from specific examples”
…. ?
Machine Learning : 14BMVA Summer School 2014
A simple learning example ....
 Learn prediction of “Safe conditions to fly ?”
– based on the weather conditions = attributes
– classification problem, class = {yes, no}
……………
YesFalse8075Rainy
YesFalse8683Overcast
NoTrue9080Sunny
NoFalse8585Sunny
FlyWindyHumidityTemperatureOutlook
Attributes Classification
Machine Learning : 15BMVA Summer School 2014
Decision Trees
 Attribute based, decision
logic construction
– boolean outcome
– discrete set of outputs
Safe conditions to fly ?
Machine Learning : 16BMVA Summer School 2014
Fly
Decision Trees
 Set of Specific Examples ...
Safe conditions to fly ?
GENERALIZED RULE
LEARNING
(training data)
Machine Learning : 17BMVA Summer School 2014
Decision Tree Logic
(Outlook = Sunny AND Humidity = Normal)
OR (Outlook = Overcast)
OR (Outlook = Rain AND Wind = Weak)
Machine Learning : 18BMVA Summer School 2014
Growing Decision Trees
 Construction is carried out top down based on node
splits that maximise the reduction in the entropy in each
resulting sub-branch of the tree
[Quinlan, '86]
 Key Algorithmic Steps
1. Calculate the information gain of splitting on each attribute
(i.e. reduction in entropy (variance))
2. Select attribute with maximum information gain to be a new node
3. Split the training data based on this attribute
4. Repeat recursively (step 1 → 3) for each sub-node until all
.. see extra slides on ... “Building Decision Trees”
.. see extra slides
Machine Learning : 19BMVA Summer School 2014
Extension : Continuous Valued Attributes
 Create a discrete attribute to test continuous attributes
– chosen threshold that gives greatest information gain
Temperature
Fly
Machine Learning : 20BMVA Summer School 2014
Growing Decision Trees
(from data to “decision maker”)
Fly
(training data)
LEARNING
 “Safe conditions to fly ?”
Machine Learning : 21BMVA Summer School 2014
(Growing =) Learning from Data
 Training data: used to train the system
– i.e. build the rules / learnt target function
– specific examples (used to learn)
 Test data: used to test performance of the system
– unseen by the system during training
– also known as validation data
– specific examples (used to evaluate)
 Training/test data made up of instances
• also referred to as examples/samples
N.B.
– training data == training set
– test data == test/validation set
e.g. facial gender classification
….
….
Machine Learning : 22BMVA Summer School 2014
Is it really that simple ?
Credit: Bekios-Calfa et al., Revisiting Linear Discriminant Techniques in Gender Recognition
IEEE Trans. Pattern Analysis and Machine Intelligence http://dx.doi.org/10.1109/TPAMI.2010.208
Machine Learning : 23BMVA Summer School 2014
Must (Must, Must!)
avoid over-fitting …..
(i.e. over-learning)
Machine Learning : 24BMVA Summer School 2014
Occam's Razor
 Follow the principle of Occam's Razor
 Occam's Razor
– “entia non sunt multiplicanda praeter
necessitatem” (latin!)
– “entities should not be multiplied beyond
necessity” (english)
– “All things being equal, the simplest
solution tends to be the best one”
 Machine Learning : prefer the simplest
{model | hypothesis | …. tree | network }
that fits the data
14th-century English
logician William of
Ockham
Machine Learning : 25BMVA Summer School 2014
Problem of Overfitting
 Consider adding noisy training example #15:
– [ Sunny, Hot, Normal, Strong, Fly=Yes ] (WRONG LABEL)
 What training effect would it have on earlier tree?
Machine Learning : 26BMVA Summer School 2014
Problem of Overfitting
 Consider adding noisy training example #15:
– [ Sunny, Hot, Normal, Strong, Fly=Yes ]
 What effect on earlier decision tree?
– error in example = error in tree construction !
= wind!
Machine Learning : 27BMVA Summer School 2014
Overfitting in general
 Performance on the training data (with noise) improves
 Performance on the unseen test data decreases
– For decision trees: tree complexity increases, learns training
data too well! (over-fits)
Machine Learning : 28BMVA Summer School 2014
Overfitting in general
 Hypothesis is too specific towards training examples
 Hypothesis not general enough for test data
Increasing model complexity
Machine Learning : 29BMVA Summer School 2014
Function f()
Learning Model
(approximation of f())
Training Samples
(from function)
Source: [PRML, Bishop, 2006]
Degree of Polynomial Model
Graphical Example: function approximation (via regression)
Machine Learning : 30BMVA Summer School 2014
Function f()
Learning Model
(approximation of f())
Training Samples
(from function)
Source: [PRML, Bishop, 2006]
Increased Complexity
Machine Learning : 31BMVA Summer School 2014
Function f()
Learning Model
(approximation of f())
Training Samples
(from function)
Source: [PRML, Bishop, 2006]
Increased Complexity
Good Approximation
Machine Learning : 32BMVA Summer School 2014
Function f()
Learning Model
(approximation of f())
Training Samples
(from function)
Source: [PRML, Bishop, 2006]
Over-fitting!
Poor approximation
Machine Learning : 33BMVA Summer School 2014
Avoiding Over-fitting
 Robust Testing & Evaluation
– strictly separate training and test sets
• train iteratively, test for over-fitting divergence
– advanced training / testing strategies (K-fold cross validation)
 For Decision Tree Case:
– control complexity of tree (e.g. depth)
• stop growing when data split not statistically significant
• grow full tree, then post-prune
– minimize { size(tree) + size(misclassifications(tree) }
• i.e. simplest tree that does the job! (Occam again)
Machine Learning : 34BMVA Summer School 2014
Fact 1: Decision Trees are Simple
Fact 2: Performance on Vision Problems is Poor
… unless we combine them in an Ensemble Classifier
Machine Learning : 35BMVA Summer School 2014
Extending to Multi-Tree Ensemble Classifiers
 Key Concept: combining multiple classifiers
– strong classifier: output strongly correlated to correct classification
– weak classifier: output weakly correlated to correct classification
» i.e. it makes a lot of miss-classifications (e.g. tree with limited depth)
 How to combine:
– Bagging:
• train N classifiers on random sub-sets of training set; classify using majority vote of
all N (and for regression use average of N predictions)
– Boosting:
• As per bagging, but introduce weights for each classifier based on
performance over the training set
 Two examples: Boosted Trees + (Random) Decision Forests
– N.B. Can be used with any classifiers (not just decision trees!)
WEAK
Machine Learning : 36BMVA Summer School 2014
Extending to Multi-Tree Classifiers
 To bag or to boost .....
....... that is the question.
Machine Learning : 37BMVA Summer School 2014
Extending to Multi-Tree Classifiers
 Bagging = all equal (simplest approach)
 Boosting = classifiers weighted by performance
– poor performers removed (zero or very low) weight
– t+1th
classifier concentrates on the examples tth
classifier got
wrong
To bag or boost ? - boosting generally works very well
(but what about over-fitting ?)
Machine Learning : 38BMVA Summer School 2014
Decision Forests (a.k.a. Random Forests/Trees)
 Bagging using multiple decision trees where each tree in
the ensemble classifier ...
– is trained on a random subsets of the training data
– computes a node split on a random subset of the attributes
– close to “state of the art” for
object segmentation / classification (inputs : feature vector descriptors)
[Breiman 2001]
[Bosch 2007]
[schroff
2008]
Machine Learning : 39BMVA Summer School 2014
Decision Forests (a.k.a. Random Forests/Trees)
 Decision Forest = Multi Decision Tree Ensemble
Classifier
– bagging approach used to return classification
– [alternatively weighted by number of training items assigned to the final leaf
node reached in tree that have the same class as the sample
(classification) or statistical value (regression)]
 Benefits: efficient on large data sets with multi attributes and/or
missing data, inherent variable importance calc., unbiased test error
(“out of bag”), “does not overfit”
 Drawbacks: evaluation can be slow, lots of data for good
performance, complexity of storage ...
[“Random Forests”, Breiman 2001]
Machine Learning : 40BMVA Summer School 2014
Decision Forests (a.k.a. Random Forests/Trees)
Gall J. and Lempitsky V., Class-Specific Hough Forests for Object Detection,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09),
2009.
Montillo et al.. "Entangled decision forests and their application
for semantic segmentation of CT images." In Information
Processing in Medical Imaging, pp. 184-196. 2011.
http://research.microsoft.com/en-us/projects/decisionforests/
Machine Learning : 41BMVA Summer School 2014
Microsoft Kinect ….
 Body Pose Estimation in Real-
time From Depth Images
– uses Decision Forest Approach
Shotton et al., Real-Time Human Pose Recognition in Parts from a Single Depth Image, CVPR, 2011 -
http://research.microsoft.com/apps/pubs/default.aspx?id=145347
Machine Learning : 42BMVA Summer School 2014
What if every weak classifier was just the
presence/absence of an image feature ?
( i.e. feature present = {yes, no} )
As the number of features present from a given
object, in a given scene location, goes up the
probability of the object not being present goes
down!
This is the concept of feature cascades.
Machine Learning : 43BMVA Summer School 2014
Feature Cascading .....
 Use boosting to order image features from most to least
discriminative for a given object ....
– allow high false positive per feature (i.e. it's a weak classifier!)
 As feature F1
to FN
of an object is present → probability of non-
occurrence within the image tends to zero
 e.g. Extended Haar features
– set of differences between image regions
– rapid evaluation (and non-occurrence) rejection
[Volia / Jones 2004]
F1
FAIL
PASS
F2
FAIL
PASS
FN
FAIL
PASS
...
OBJECT
N-features
Machine Learning : 44BMVA Summer School 2014
Haar Feature Cascades
 Real-time Generalised Object
Recognition
 Benefits
– Multi-scale evaluation
• scale invariant
– Fast, real-time detection
– “Direct” on image
• no feature extraction
– Haar features
• contrast/ colour invariant
 Limitations
– poor performance on non-rigid
objects
– object rotation
[Breckon / Eichner / Barnes / Han / Gaszczak 08-09]
Machine Learning : 45BMVA Summer School 2014
[Breckon / Eichner / Barnes / Han / Gaszczak 08-09]http://www.durham.ac.uk/toby.breckon/demos/modgrandchallenge/
Machine Learning : 46BMVA Summer School 2014
The big ones - “neural inspired approaches”
Machine Learning : 47BMVA Summer School 2014
 Real Neural Networks
– human brain as a collection of
biological neurons and synapses
(1011
neurons, 104
synapse
connections per neuron)
– powerful, adaptive and noise
resilient pattern recognition
– combination of:
• Memory
– Memorisation
• Generalisation
– Learning “rules”
– Learning “patterns”
Biological Motivation
Images: DK Publishing
Machine Learning : 48BMVA Summer School 2014
 Neural Networks (biological
and computational) are
good at noise resilient
pattern recognition
– the human brain can cope
with extreme noise in
pattern recognition
Images: Wikimedia Commons
Machine Learning : 49BMVA Summer School 2014
Artificial Neurons (= a perceptron)
 An n-dimensional input vector x is mapped to output
variable o by means of the scalar product and a
nonlinear function mapping, f
[Han / Kamber 2006]
µk-
f
weighted
sum
Input
vector x
output o
Activation
function
weight
vector w
∑
w0
w1
wn
x0
x1
xn
For Example let f() be:
o=sign(∑
i=0
n
wi xi+μk )
Bias u
Machine Learning : 50BMVA Summer School 2014
Artificial Neural Networks (ANN)
 Multiple Layers of
Perceptrons
• N.B. Neural Networks
a.k.a Multi Layer
Perceptons (MLP)
– N Layers of M
Perceptrons
• Each fully connected
(in graph sense to the
next)
– Every node of layer N
takes outputs of all M
nodes of layer N-1
[Han / Kamber 2006]
Output layer
Input layer
Hidden layer(s)
Output vector
Machine Learning : 51BMVA Summer School 2014
In every node we have ....
 Input to network = (numerical) attribute vector
describing classification examples
 Output of network = vector representing classification
• e.g. {1,0}, {0,1}, {1,1}, {0,0} for classes A,B,C,D
• or alt. {1,0,0}, {0,1,0}, {0,0,1} for classes A, B C
µk
-Output
Of
Layer
N-1
To Layer N+1∑
Machine Learning : 52BMVA Summer School 2014
Essentially, input to output is
mapped as a weighted sum
occurring at multiple (fully
connected) layers
in the network ....
… so the weights are key.
Machine Learning : 53BMVA Summer School 2014
If everything else is constant
...
(i.e. activation function, network topology)
… the weights are only thing
that changes
Machine Learning : 54BMVA Summer School 2014
Thus …
setting the weights =
training the network
Machine Learning : 55BMVA Summer School 2014
Backpropagation Summary
 Modifications are made in the “backwards” direction:
from the output layer, through each hidden layer down to
the first hidden layer, hence “Backpropagation”
 Key Algorithmic Steps
– Initialize weights (to small random values) in the network
– Propagate the inputs forward (by applying activation function) at
each node
– Backpropagate the error backwards (by updating weights and
biases)
– Terminating condition (when error is very small or enough
iterations)
Backpropogation details beyond scope/time (see Mitchell '97).
Machine Learning : 56BMVA Summer School 2014
Example: speed sign recognition
 Input:
– Extracted binary text image
– Scaled to 20x20 pixels
 Network:
– 30 hidden nodes
– 2 layers
– backpropogation
 Output:
– 12 classes
– {10, 20, 30, 40, 50, 60, 70, 80, 90, 100,
national-speed limit,non-sign}
 Results: ~97% (success)
[Eichner / Breckon '08]
Machine Learning : 57BMVA Summer School 2014
http://www.durham.ac.uk/toby.breckon/demos/speedsigns/
Machine Learning : 58BMVA Summer School 2014
Problems suited to ANNs
 Input is high-dimensional discrete or real-valued
– e.g. raw sensor input – signal samples or image pixels
 Output
– discrete or real valued
– is a vector of one or more values
 Possibly noisy data
 Form of target function is generally unknown
– i.e. don't know input to output relationship
 Human readability of result is unimportant
– rules such as IF..THEN .. ELSE not required
Machine Learning : 59BMVA Summer School 2014
Problems with ANNs
 Termination of backpropagation
– Too many iterations can lead to overfitting (to training data)
– Too few iterations can fail to reduce output error sufficiently
 Needs parameter selection
– Learning rate (weight up-dates)
– Network Topology (number of hidden nodes / number of layers)
– Choice of activation function
– ....
 What is the network learning?
– How can we be sure the correct (classification) function is being
learned ?
c.f. AI folk-lore “the tanks story”
Machine Learning : 60BMVA Summer School 2014
Problems with ANNs
 May find local minimum within the search space of all
possible weights (due to nature of backprop. gradient decent)
– i.e. backpropagation is not guaranteed to find the best weights
to learn the classification/regression function
– Thus “learned” neural
network may not find
optimal solution to the
classification/regression
problem
Weight space {Wij
}
Images [Wikimedia Commons]
Machine Learning : 61BMVA Summer School 2014
… towards the future state of the art
 Deep Learning (Deep Neural Networks)
– multi-layer neural networks, varying layer sizes
• varying levels of abstraction / intermediate feature representations
– trained one layer at a time, followed by backprop.
– complex and computationally demanding to train
 Convolutional Neural Networks
– leverage local spatial layout of features in input
– locally adjacent neurons connected layer to layer
– instead of full layer to layer connectivity
– units in m-th layer connected to local subset of units in
(m-1)-th layer (which are spatially adjacent)
 Often combined together : “state of the art” results in Large Scale Visual Recognition Challenge
– Image-Net Challenge - http://image-net.org/challenges/LSVRC/2013/ (1000 classes of object)
Image: http://theanalyticsstore.com/deep-learning/
Machine Learning : 62BMVA Summer School 2014
Deep Learning Neural Networks
Results are impressive.
But the same problems remain.
(overfitting et al.)
Machine Learning : 63BMVA Summer School 2014
The big ones - “kernel driven approaches”
Machine Learning : 64BMVA Summer School 2014
Support Vector Machines
 Basic Approach:
– project instances into high dimensional space
– learn linear separators (in high dim. space) with maximum
margin
– learning as optimizing bound on expected error
 Positives
– good performance on character recognition, text classification, ...
– “appears” to avoid overfitting in high dimensional space
– global optimisation, thus avoids local minima
 Negatives
– applying trained classifier can be expensive (i.e. query time)
Machine Learning : 65BMVA Summer School 2014
N.B. “Machines” is just a sexy name (probably to make
them sound different), they are really just computer
algorithms like everything else in machine learning!
… so don't get confused by the whole “machines” thing
:o)
Machine Learning : 66BMVA Summer School 2014
Simple Example
 How can we separate (i.e. classify) these data examples ?
(i.e. learn +ve / -ve)
e.g. gender recognition ....
{male, female} = {+1, -1}
Machine Learning : 67BMVA Summer School 2014
Simple Example
 Linear separation
e.g. gender recognition ....
{male, female} = {+1, -1}
Machine Learning : 68BMVA Summer School 2014
Simple Example
 Linear separators – which one ?
e.g. gender recognition ....
{male, female} = {+1, -1}
Machine Learning : 69BMVA Summer School 2014
Simple Example
 Linear separators – which one ?
e.g. gender recognition ....
{male, female} = {+1, -1}
Machine Learning : 70BMVA Summer School 2014
Linear Separator
 Instances (i.e, examples) {xi ,
yi
}
– xi
= point in instance space (Rn
) made
up of n attributes
– yi
=class value for classification of xi
 Want a linear separator. Can
view this as constraint
satisfaction problem:
 Equivalently,
y = +1
y = -1
Classification of
example function
f(x) = y = {+1, -1}
i.e. 2 classes
N.B. we have a vector of weights coefficients⃗w
Machine Learning : 71BMVA Summer School 2014
Linear Separator
 Now find the hyperplane (separator) with maximum margin
– thus if size of margin is defined as (see extras)
 So view our problem as a constrained optimization problem:
– Find “hyperplane” using computational optimization approach (beyond scope)
“hyperplane” = separator boundary
hyperplane in R2
== 2D line
Machine Learning : 72BMVA Summer School 2014
What about Non-separable Training Sets ?
– Find “hyperplane” via computational optimization
Penalty term > 0
for each “wrong side of
the boundary” case
Machine Learning : 73BMVA Summer School 2014
Linear Separator
 separator margin determined by just a
few examples
– call these support vectors
– can define separator in terms of support
vectors and classifier examples x as:
Machine Learning : 74BMVA Summer School 2014
 Support vectors = sub-set of training instances that define
decision boundary between classes
This is the simplest kind of Linear SVM (LSVM)
Machine Learning : 75BMVA Summer School 2014
How do we classify a new example ?
 New unseen (test) example attribute vector x
 Set of support vectors {si
} with weights {wi
}, bias b
– define the “hyperplane” boundary in the original dimension
(this is a linear case)
 The output of the classification f(x) is:
– i.e. f(x) = {-1, +1}
Machine Learning : 76BMVA Summer School 2014
What about this ..... ?
Not linearly
Separable !
Machine Learning : 77BMVA Summer School 2014
e.g. 2D to 3D
denotes +1
denotes -1  Projection from 2D to 3D allows separation
by hyperplane (surface) in R3
N.B. A hyperplane in R3
== 3D plane
Machine Learning : 78BMVA Summer School 2014
Machine Learning : 79BMVA Summer School 2014
Non Linear SVMs
 Just as before but now with the kernel ....
– For the (earlier) linear case K is the identity function (or matrix)
• This is often referred to the “linear kernel”
Machine Learning : 80BMVA Summer School 2014
Non Linear SVMs
 Suppose we have instance space X = Rn
, need non-linear
separator.
– project X into some higher dimensional space X' = Rm
where
data will be linearly separable
– let  : X → X' be this projection.
 Interestingly,
– Training depends only on dot products of form (xi
)  (xj
)
• i.e. dot product of instances in Rn
gives divisor in Rn
– So we can train in Rm
with same computational complexity as in Rn
,
provided we can find a kernel basis function K such that:
• K(xi
, xj
) = (xi
)  (xj
) (kernel trick)
– Classifying new instance x now requires calculating sign of:
Machine Learning : 81BMVA Summer School 2014
This is how SVMs solve difficult problems without
linear separation boundaries (hyperplanes) occurring in
their original dimension.
“project data to higher dimension where data is
separable”
Machine Learning : 82BMVA Summer School 2014
Why use maximum margin?
 Intuitively this feels safest.
 a small error in the location of the boundary gives us
least chance of causing a misclassification.
 Model is immune to removal of any non support-vector
data-points
 Some related theory (using V-C dimension) (beyond scope)
 Distance from boundary ~= measure of “good-ness”
 Empirically it works very well
Machine Learning : 83BMVA Summer School 2014
Choosing Kernels ?
 Kernel functions (commonly used):
– Polynomial function of degree p
– Gaussian radial basis function (size σ)
– 2D sigmoid function (as per neural networks)
Commonly chosen by grid search of parameter space
“glorified trial and error”
Machine Learning : 84BMVA Summer School 2014
Application to Image Classification
 Common Model - Bag of Visual Words
1. build histograms of feature occurrence over training data (features:
SIFT, SURF, MSER ....)
2. Use histograms as input to SVM (or other ML approach)

Cluster Features in Rn
space
 Cluster “membership” creates a histogram
of feature occurrence
....
SVM
Bike
Violin
....
Caltech objects database 101 object
classes
Features:
SIFT detector / PCA-SIFT descriptor, d=10
30 training images / class
43% recognition rate
(1% chance performance)
Example: Kristen Grauman
Machine Learning : 85BMVA Summer School 2014
 Bag of Words Model
– SURF features
– SVM {people | vehicle} detection
– Decision Forest - sub-categorieswww.durham.ac.uk/toby.breckon/demos/multimodal/
[Breckon / Han / Richardson, 2012]
Machine Learning : 86BMVA Summer School 2014
Application to Image Classification
 Searching for Cell Nuclei
Locations with SVM
[Han / Breckon et al. 2010]
– input: “laplace” enhanced pixel
values as vector
• scaled to common size
– process:
• exhaustively extract each image
neighbourhood over multiple
scales
• pass pixels to SVM
– Is it a cell nuclei ?
– output: {cell, nocell}
Grid parameter search for RBF kernel
Machine Learning : 87BMVA Summer School 2014
Application: automatic cell counting / cell architecture (position) evaluation
Machine Learning : 88BMVA Summer School 2014
http://www.durham.ac.uk/toby.breckon/demos/cell
Machine Learning : 89BMVA Summer School 2014
A probabilistic interpretation ….
Machine Learning : 90BMVA Summer School 2014
Learning Probability Distributions
 Bayes Formula
in words:
 Assign observed feature vector to
maximally probable class for
j={1 → k classes}
( )
( ) ( )
( )
( ) ( )
( ) ( )∑
==
j
jj
jjjj
j
Pxp
Pxp
xp
Pxp
xP
ωω
ωωωω
ω
evidence
priorlikelihood
posterior
×
=
x
kω
Thomas Bayes
(1701-1761)
 Captures:
– prior probability of occurrence
• e.g. from training examples
– probability of each class given the
evidence (feature vector)
– Return most probable (Maximum
A Posteriori – MAP) class
– Optimal (costly) Vs. Naive
(simple)
Machine Learning : 91BMVA Summer School 2014
“Approx” State of the Art
 Recent approaches – SVM (+ variants), Decision Forests
(+ variants), Boosted Approaches, Bagging ....
– outperform Neural Networks
– some renaissance in Deep Learning
– can find maximally optimal solution (SVM)
– less prone to over-fitting
– allow for extraction of meaning
(e.g. if-then-else rules for tree based approaches)
 Several other ML approaches
– clustering – k-NN, k-Means in multi-dimensional space
– graphical models
– Bayesian methods
– Gaussian Processes (largely regression problems – sweeping generalization!)
Machine Learning : 92BMVA Summer School 2014
But how do we evaluate how
well it is working* ?
* and produce convincing results for our papers and funders
Machine Learning : 93BMVA Summer School 2014
Evaluating Machine Learning
 For classification problems ....
True Positives (TP)
– example correctly classified as an +ve instance of given class A
False Positives (FP)
– example wrongly classified as an +ve instance of given class A
• i.e. it is not an instance of class A
True Negatives (TN)
– example correctly classified as an -ve instance of given class A
False Negatives (FP)
– example wrongly classified as an -ve instance of given class A
• i.e. classified as not class A but is a true class A
Machine Learning : 94BMVA Summer School 2014
Evaluating Machine Learning
 Confusion Matrices
– Table of TP, FP, TN, FN
– e.g. 2 class labels {yes, no}
– can also show TP, FP, TN, FN weighted by cost of mis-classification Vs. true-
classification
N.B. A common name for “actual class” (i.e. the true class) is “ground truth” or “the ground truth data”.
Actual class
True negativeFalse positiveNo
False
negative
True positiveYes
NoYes
Predicted class
Machine Learning : 95BMVA Summer School 2014
Evaluating Machine Learning
 Receiver Operating Characteristic (ROC) Curve
– used to show trade-off between hit rate and false alarm rate
over noisy channel (in communications originally)
• here a “noisy” (i.e. error prone) classifier
– % TP Vs. %FP
– “jagged steps” = actual data
– - - - - = averaged result
(over multiple cross-validation
folds) or best line fit
Machine Learning : 96BMVA Summer School 2014
BETTER
WORSE
Machine Learning : 97BMVA Summer School 2014
Evaluating Machine Learning
 Receiver Operating Characteristic (ROC) Curve
– Used to compare different classifiers on common dataset
– Used to tune a given classifier on a dataset
• by varying a given threshold or parameter of the learner that effects the TP to
FP ratio
[Source: Bradski '09]
See also: precision/recall curves
Machine Learning : 98BMVA Summer School 2014
Evaluating Machine Learning
 The key is:
Robust experimentation on independent training and
testing sets
– Perhaps using cross-validation or similar
.. see extra slides on ...
“Data Training Methodologies”
Machine Learning : 99BMVA Summer School 2014
Many, many ways to perform machine learning ....
… we have seen (only) some of them.
(very briefly!)
Which one is “the best” ?
?
ML “classifier”
Machine Learning : 100BMVA Summer School 2014
No Free Lunch! (Theorem)
 .... the idea that it is impossible to get something for
nothing
 This is very true in Machine Learning
– approaches that train quickly or require little memory or few
training examples produce poor results
• and vice versa .... !!!!!
– poor data = poor learning
• problems with data = problems with learning
• problems = {not enough data, poorly labelled, biased,
unrepresentative … }
Machine Learning : 101BMVA Summer School 2014
What we have seen ...
 The power of combining simple things ….
– Ensemble Classifiers
• Decision Forests / Random Forests
– concept extends to all ML approaches
 Neural Inspired Approaches
– Neural Networks
– many, many variants
 Kernel Inspired Approaches
– Support Vector Machines
– beginning of the story
Machine Learning : 102BMVA Summer School 2014
Further Reading - textbooks
 Machine Learning - Tom
Mitchell (McGraw-Hill, 1997)
 Pattern Recognition &
Machine Learning -
Christopher Bishop
(Springer, 2006)
 Data Mining - Witten / Frank
(Morgan-Kaufmann, 2005)
Machine Learning : 103BMVA Summer School 2014
Further Reading - textbooks
 Bayesian Reasoning and
Machine Learning
– David Barber
http://www.cs.ucl.ac.uk/staff/d.barber/brml/
(Cambs. Univ. Press, 2012)
 Computer Vision: Models,
Learning, and Inference
– Simon Prince
(Springer, 2012)
http://www.computervisionmodels.com/
… both very probability driven, both available
as free PDF online (woo, hoo!)
Machine Learning : 104BMVA Summer School 2014
Publications – a selection
 Decision Forests: A Unified Framework for Classification, Regression, Density
Estimation, Manifold Learning and Semi-Supervised Learning - A. Criminisi et al., in
Foundations and Trends in Computer Graphics and Vision, 2012.
 OverFeat: Integrated Recognition, Localization and Detection using Convolutional
Networks, P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun:
International Conference on Learning Representations 2014.
 Image Classification using Random Forests and Ferns
A Bosch, A Zisserman, X Munoz – Int. Conf. Comp. Vis., 2007.
 Robust Real-time Object Detection
P Viola, M Jones - International Journal of Computer Vision, 2002.
 The PASCAL Visual Object Classes (VOC) challenge
M Everingham, L Van Gool, C Williams, J Winn, A Zisserman - International Journal of
Computer Vision, 2010.
Machine Learning : 105BMVA Summer School 2014
Computer Vision – general overview
 Computer Vision: Algorithms
and Applications
– Richard Szeliski, 2010
(Springer)
 PDF download:
– http://szeliski.org/Book/
Supporting specific content from this
machine learning lecture:
• see Chapter 14
Machine Learning : 106BMVA Summer School 2014
Final shameless plug ….
 Dictionary of Computer Vision
and Image Processing
R.B. Fisher, T.P. Breckon, K.
Dawson-Howe, A. Fitzgibbon, C.
Robertson, E. Trucco, C.K.I.
Williams, Wiley, 2014.
– … maybe it will be useful!
Machine Learning : 107BMVA Summer School 2014
That's all folks ...
Slides, examples, demo code,links + extra supporting slides @ www.durham.ac.uk/toby.breckon/mltutorial/

More Related Content

Viewers also liked

CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
zukun
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
butest
 
Machine Learning CSCI 5622
Machine Learning CSCI 5622Machine Learning CSCI 5622
Machine Learning CSCI 5622
butest
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
butest
 
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining TechniquesA Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
ahmad abdelhafeez
 

Viewers also liked (19)

Stay well with machine learning
Stay well with machine learningStay well with machine learning
Stay well with machine learning
 
"Leveraging Computer Vision and Machine Learning to Power the Visual Commerce...
"Leveraging Computer Vision and Machine Learning to Power the Visual Commerce..."Leveraging Computer Vision and Machine Learning to Power the Visual Commerce...
"Leveraging Computer Vision and Machine Learning to Power the Visual Commerce...
 
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Parallel algorithm in linear algebra
Parallel algorithm in linear algebraParallel algorithm in linear algebra
Parallel algorithm in linear algebra
 
Machine Learning CSCI 5622
Machine Learning CSCI 5622Machine Learning CSCI 5622
Machine Learning CSCI 5622
 
Aggregation computation over distributed data streams
Aggregation computation over distributed data streamsAggregation computation over distributed data streams
Aggregation computation over distributed data streams
 
TensorFlow in Context
TensorFlow in ContextTensorFlow in Context
TensorFlow in Context
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine Learning
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
14. mohsin dalvi artificial neural networks presentation
14. mohsin dalvi   artificial neural networks presentation14. mohsin dalvi   artificial neural networks presentation
14. mohsin dalvi artificial neural networks presentation
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Künstliche Intelligenz - Maschinelles Lernen - Grundlagen
Künstliche Intelligenz - Maschinelles Lernen - GrundlagenKünstliche Intelligenz - Maschinelles Lernen - Grundlagen
Künstliche Intelligenz - Maschinelles Lernen - Grundlagen
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
 
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining TechniquesA Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 

Similar to Machine learning fro computer vision - a whirlwind of key concepts for the uninitiated

Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
butest
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Basic Notions of Learning, Introduction to Learning ...
Basic Notions of Learning, Introduction to Learning ...Basic Notions of Learning, Introduction to Learning ...
Basic Notions of Learning, Introduction to Learning ...
butest
 
Machine Learning Fall, 2007 Course Information
Machine Learning Fall, 2007 Course InformationMachine Learning Fall, 2007 Course Information
Machine Learning Fall, 2007 Course Information
butest
 
Machine learning SVM
Machine  learning SVMMachine  learning SVM
Machine learning SVM
Raj Kamal
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
butest
 
powerpoint
powerpointpowerpoint
powerpoint
butest
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 

Similar to Machine learning fro computer vision - a whirlwind of key concepts for the uninitiated (20)

Machine learning for computer vision part 2
Machine learning for computer vision part 2Machine learning for computer vision part 2
Machine learning for computer vision part 2
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
 
UNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptxUNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptx
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online Classification
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_A
 
M25 ABC Curriculum Design
M25 ABC Curriculum DesignM25 ABC Curriculum Design
M25 ABC Curriculum Design
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Basic Notions of Learning, Introduction to Learning ...
Basic Notions of Learning, Introduction to Learning ...Basic Notions of Learning, Introduction to Learning ...
Basic Notions of Learning, Introduction to Learning ...
 
Machine Learning an Research Overview
Machine Learning an Research OverviewMachine Learning an Research Overview
Machine Learning an Research Overview
 
Machine Learning Fall, 2007 Course Information
Machine Learning Fall, 2007 Course InformationMachine Learning Fall, 2007 Course Information
Machine Learning Fall, 2007 Course Information
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Machine learning SVM
Machine  learning SVMMachine  learning SVM
Machine learning SVM
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning
 
powerpoint
powerpointpowerpoint
powerpoint
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
 

More from potaters

Ln l.agapito
Ln l.agapitoLn l.agapito
Ln l.agapito
potaters
 

More from potaters (15)

Image formation
Image formationImage formation
Image formation
 
Ln l.agapito
Ln l.agapitoLn l.agapito
Ln l.agapito
 
Motion and tracking
Motion and trackingMotion and tracking
Motion and tracking
 
BMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialBMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorial
 
Statistical models of shape and appearance
Statistical models of shape and appearanceStatistical models of shape and appearance
Statistical models of shape and appearance
 
Vision Algorithmics
Vision AlgorithmicsVision Algorithmics
Vision Algorithmics
 
Performance characterization in computer vision
Performance characterization in computer visionPerformance characterization in computer vision
Performance characterization in computer vision
 
Low level vision - A tuturial
Low level vision - A tuturialLow level vision - A tuturial
Low level vision - A tuturial
 
Local feature descriptors for visual recognition
Local feature descriptors for visual recognitionLocal feature descriptors for visual recognition
Local feature descriptors for visual recognition
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
A primer for colour computer vision
A primer for colour computer visionA primer for colour computer vision
A primer for colour computer vision
 
Cognitive Vision - After the hype
Cognitive Vision - After the hypeCognitive Vision - After the hype
Cognitive Vision - After the hype
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and grids
 
Medical image computing - BMVA summer school 2014
Medical image computing - BMVA summer school 2014Medical image computing - BMVA summer school 2014
Medical image computing - BMVA summer school 2014
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysis
 

Recently uploaded

biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Recently uploaded (20)

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 

Machine learning fro computer vision - a whirlwind of key concepts for the uninitiated

  • 1. Machine Learning : 1BMVA Summer School 2014 Machine Learning for Computer Vision a whirlwind tour of key concepts for the uninitiated Toby Breckon School of Engineering and Computing Sciences Durham University www.durham.ac.uk/toby.breckon/mltutorial/ toby.breckon@durham.ac.uk
  • 2. Machine Learning : 2BMVA Summer School 2014 Machine Learning ?  Why Machine Learning? – we cannot program everything – some tasks are difficult to define algorithmically • especially in computer vision …. visual sensing has few rules  Well-defined learning problems ? – easy to learn Vs. difficult to learn • ..... varying complexity of visual patterns  An example: learning to recognise objects ... Image: DK
  • 3. Machine Learning : 3BMVA Summer School 2014 Learning ? - in humans
  • 4. Machine Learning : 4BMVA Summer School 2014 Learning ? - in computers
  • 5. Machine Learning : 5BMVA Summer School 2014 Learning ...  learning verb –the activity of obtaining knowledge –knowledge obtained by study – English Dictionary, Cambridge University Press
  • 6. Machine Learning : 6BMVA Summer School 2014 Machine Learning  Definition: – A set of methods for the automated analysis of structure in data. …. two main strands of work, (i) unsupervised learning …. and (ii) supervised learning. ….similar to ... data mining, but ... focus .. more on autonomous machine performance, …. rather than enabling humans to learn from the data. [Dictionary of Image Processing & Computer Vision, Fisher et al., 2014]
  • 7. Machine Learning : 7BMVA Summer School 2014 Supervised Vs. Unsupervised  Supervised – knowledge of output - learning with the presence of an “expert” / teacher • data is labelled with a class or value • Goal: predict class or value label – e.g. Neural Network, Support Vector Machines, Decision Trees, Bayesian Classifiers ....  Unsupervised – no knowledge of output class or value • data is unlabelled or value un-known • Goal: determine data patterns/groupings – Self-guided learning algorithm – (internal self-evaluation against some criteria) – e.g. k-means, genetic algorithms, clustering approaches ... …. c1 c2 c3 …. ?
  • 8. Machine Learning : 8BMVA Summer School 2014 Machine Learning = “Decision or Prediction” FeatureDetection+Representation (e.g.SIFT,HoG,….)(e.g.histogram,BagofWords, PCA...) person cat dog cow …. …. …. car rhino …. etc. … in the big picture
  • 9. Machine Learning : 9BMVA Summer School 2014 ML Input “Examples” in Comp. Vis. ...  “Direct Sample” inputs – Pixels / Voxels / 3D Points – sub-samples (key-points, feature-points) … i.e. “samples” direct the imagery  Feature Vectors – shape measures – edge distributions – colour distributions – texture measures / distributions […. SIFT, SURF, HOG …. etc.] … i.e. calculated summary “numerical” descriptors V = { 0.103900, 120.102, 30.10101, .... , ...}
  • 10. Machine Learning : 10BMVA Summer School 2014 Common ML Tasks in Comp. Vis. ...  Object Classification what object ?  Object Detection object or no-object ?  Instance Recognition ? who (or what) is it ?  Sub-category analysis which object type ?  Sequence { Recognition | Classification } ? what is happening / occurring ? http://pascallin.ecs.soton.ac.uk/challenges/VOC/ {people | vehicle | … intruder ….} {gender | type | species | age …...} {face | vehicle plate| gait …. → biometrics}
  • 11. Machine Learning : 11BMVA Summer School 2014 Types of ML Problem  Classification – Predict (classify) sample → discrete set of class labels • e.g. classes {object 1, object 2 … } for recognition task • e.g. classes {object, !object} for detection task  Regression (traditionally less common in comp. vis.) – Predict sample → associated numerical value (variable) • e.g. distance to target based on shape features – Linear and non-linear attribute to value relationships  Association or clustering – grouping a set of instances by attribute similarity • e.g. image segmentation [Ess et al, 2009] …. ?
  • 12. Machine Learning : 12BMVA Summer School 2014 Regression Example – Head Pose Estimation  Input: image features (HOG)  Output: { yaw | pitch }  varying illumination + vibration [Walger / Breckon, 2013]
  • 13. Machine Learning : 13BMVA Summer School 2014 Learning in general, from specific examples  N.B. – E is specific : a specific example (e.g. female face) – T is general : a general task (e.g. gender recognition) Program P “learns” task T from set of examples {E} Thus if P improves learning must be of the following form: – “to learn a general [ability|behaviour|function|rules] from specific examples” …. ?
  • 14. Machine Learning : 14BMVA Summer School 2014 A simple learning example ....  Learn prediction of “Safe conditions to fly ?” – based on the weather conditions = attributes – classification problem, class = {yes, no} …………… YesFalse8075Rainy YesFalse8683Overcast NoTrue9080Sunny NoFalse8585Sunny FlyWindyHumidityTemperatureOutlook Attributes Classification
  • 15. Machine Learning : 15BMVA Summer School 2014 Decision Trees  Attribute based, decision logic construction – boolean outcome – discrete set of outputs Safe conditions to fly ?
  • 16. Machine Learning : 16BMVA Summer School 2014 Fly Decision Trees  Set of Specific Examples ... Safe conditions to fly ? GENERALIZED RULE LEARNING (training data)
  • 17. Machine Learning : 17BMVA Summer School 2014 Decision Tree Logic (Outlook = Sunny AND Humidity = Normal) OR (Outlook = Overcast) OR (Outlook = Rain AND Wind = Weak)
  • 18. Machine Learning : 18BMVA Summer School 2014 Growing Decision Trees  Construction is carried out top down based on node splits that maximise the reduction in the entropy in each resulting sub-branch of the tree [Quinlan, '86]  Key Algorithmic Steps 1. Calculate the information gain of splitting on each attribute (i.e. reduction in entropy (variance)) 2. Select attribute with maximum information gain to be a new node 3. Split the training data based on this attribute 4. Repeat recursively (step 1 → 3) for each sub-node until all .. see extra slides on ... “Building Decision Trees” .. see extra slides
  • 19. Machine Learning : 19BMVA Summer School 2014 Extension : Continuous Valued Attributes  Create a discrete attribute to test continuous attributes – chosen threshold that gives greatest information gain Temperature Fly
  • 20. Machine Learning : 20BMVA Summer School 2014 Growing Decision Trees (from data to “decision maker”) Fly (training data) LEARNING  “Safe conditions to fly ?”
  • 21. Machine Learning : 21BMVA Summer School 2014 (Growing =) Learning from Data  Training data: used to train the system – i.e. build the rules / learnt target function – specific examples (used to learn)  Test data: used to test performance of the system – unseen by the system during training – also known as validation data – specific examples (used to evaluate)  Training/test data made up of instances • also referred to as examples/samples N.B. – training data == training set – test data == test/validation set e.g. facial gender classification …. ….
  • 22. Machine Learning : 22BMVA Summer School 2014 Is it really that simple ? Credit: Bekios-Calfa et al., Revisiting Linear Discriminant Techniques in Gender Recognition IEEE Trans. Pattern Analysis and Machine Intelligence http://dx.doi.org/10.1109/TPAMI.2010.208
  • 23. Machine Learning : 23BMVA Summer School 2014 Must (Must, Must!) avoid over-fitting ….. (i.e. over-learning)
  • 24. Machine Learning : 24BMVA Summer School 2014 Occam's Razor  Follow the principle of Occam's Razor  Occam's Razor – “entia non sunt multiplicanda praeter necessitatem” (latin!) – “entities should not be multiplied beyond necessity” (english) – “All things being equal, the simplest solution tends to be the best one”  Machine Learning : prefer the simplest {model | hypothesis | …. tree | network } that fits the data 14th-century English logician William of Ockham
  • 25. Machine Learning : 25BMVA Summer School 2014 Problem of Overfitting  Consider adding noisy training example #15: – [ Sunny, Hot, Normal, Strong, Fly=Yes ] (WRONG LABEL)  What training effect would it have on earlier tree?
  • 26. Machine Learning : 26BMVA Summer School 2014 Problem of Overfitting  Consider adding noisy training example #15: – [ Sunny, Hot, Normal, Strong, Fly=Yes ]  What effect on earlier decision tree? – error in example = error in tree construction ! = wind!
  • 27. Machine Learning : 27BMVA Summer School 2014 Overfitting in general  Performance on the training data (with noise) improves  Performance on the unseen test data decreases – For decision trees: tree complexity increases, learns training data too well! (over-fits)
  • 28. Machine Learning : 28BMVA Summer School 2014 Overfitting in general  Hypothesis is too specific towards training examples  Hypothesis not general enough for test data Increasing model complexity
  • 29. Machine Learning : 29BMVA Summer School 2014 Function f() Learning Model (approximation of f()) Training Samples (from function) Source: [PRML, Bishop, 2006] Degree of Polynomial Model Graphical Example: function approximation (via regression)
  • 30. Machine Learning : 30BMVA Summer School 2014 Function f() Learning Model (approximation of f()) Training Samples (from function) Source: [PRML, Bishop, 2006] Increased Complexity
  • 31. Machine Learning : 31BMVA Summer School 2014 Function f() Learning Model (approximation of f()) Training Samples (from function) Source: [PRML, Bishop, 2006] Increased Complexity Good Approximation
  • 32. Machine Learning : 32BMVA Summer School 2014 Function f() Learning Model (approximation of f()) Training Samples (from function) Source: [PRML, Bishop, 2006] Over-fitting! Poor approximation
  • 33. Machine Learning : 33BMVA Summer School 2014 Avoiding Over-fitting  Robust Testing & Evaluation – strictly separate training and test sets • train iteratively, test for over-fitting divergence – advanced training / testing strategies (K-fold cross validation)  For Decision Tree Case: – control complexity of tree (e.g. depth) • stop growing when data split not statistically significant • grow full tree, then post-prune – minimize { size(tree) + size(misclassifications(tree) } • i.e. simplest tree that does the job! (Occam again)
  • 34. Machine Learning : 34BMVA Summer School 2014 Fact 1: Decision Trees are Simple Fact 2: Performance on Vision Problems is Poor … unless we combine them in an Ensemble Classifier
  • 35. Machine Learning : 35BMVA Summer School 2014 Extending to Multi-Tree Ensemble Classifiers  Key Concept: combining multiple classifiers – strong classifier: output strongly correlated to correct classification – weak classifier: output weakly correlated to correct classification » i.e. it makes a lot of miss-classifications (e.g. tree with limited depth)  How to combine: – Bagging: • train N classifiers on random sub-sets of training set; classify using majority vote of all N (and for regression use average of N predictions) – Boosting: • As per bagging, but introduce weights for each classifier based on performance over the training set  Two examples: Boosted Trees + (Random) Decision Forests – N.B. Can be used with any classifiers (not just decision trees!) WEAK
  • 36. Machine Learning : 36BMVA Summer School 2014 Extending to Multi-Tree Classifiers  To bag or to boost ..... ....... that is the question.
  • 37. Machine Learning : 37BMVA Summer School 2014 Extending to Multi-Tree Classifiers  Bagging = all equal (simplest approach)  Boosting = classifiers weighted by performance – poor performers removed (zero or very low) weight – t+1th classifier concentrates on the examples tth classifier got wrong To bag or boost ? - boosting generally works very well (but what about over-fitting ?)
  • 38. Machine Learning : 38BMVA Summer School 2014 Decision Forests (a.k.a. Random Forests/Trees)  Bagging using multiple decision trees where each tree in the ensemble classifier ... – is trained on a random subsets of the training data – computes a node split on a random subset of the attributes – close to “state of the art” for object segmentation / classification (inputs : feature vector descriptors) [Breiman 2001] [Bosch 2007] [schroff 2008]
  • 39. Machine Learning : 39BMVA Summer School 2014 Decision Forests (a.k.a. Random Forests/Trees)  Decision Forest = Multi Decision Tree Ensemble Classifier – bagging approach used to return classification – [alternatively weighted by number of training items assigned to the final leaf node reached in tree that have the same class as the sample (classification) or statistical value (regression)]  Benefits: efficient on large data sets with multi attributes and/or missing data, inherent variable importance calc., unbiased test error (“out of bag”), “does not overfit”  Drawbacks: evaluation can be slow, lots of data for good performance, complexity of storage ... [“Random Forests”, Breiman 2001]
  • 40. Machine Learning : 40BMVA Summer School 2014 Decision Forests (a.k.a. Random Forests/Trees) Gall J. and Lempitsky V., Class-Specific Hough Forests for Object Detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09), 2009. Montillo et al.. "Entangled decision forests and their application for semantic segmentation of CT images." In Information Processing in Medical Imaging, pp. 184-196. 2011. http://research.microsoft.com/en-us/projects/decisionforests/
  • 41. Machine Learning : 41BMVA Summer School 2014 Microsoft Kinect ….  Body Pose Estimation in Real- time From Depth Images – uses Decision Forest Approach Shotton et al., Real-Time Human Pose Recognition in Parts from a Single Depth Image, CVPR, 2011 - http://research.microsoft.com/apps/pubs/default.aspx?id=145347
  • 42. Machine Learning : 42BMVA Summer School 2014 What if every weak classifier was just the presence/absence of an image feature ? ( i.e. feature present = {yes, no} ) As the number of features present from a given object, in a given scene location, goes up the probability of the object not being present goes down! This is the concept of feature cascades.
  • 43. Machine Learning : 43BMVA Summer School 2014 Feature Cascading .....  Use boosting to order image features from most to least discriminative for a given object .... – allow high false positive per feature (i.e. it's a weak classifier!)  As feature F1 to FN of an object is present → probability of non- occurrence within the image tends to zero  e.g. Extended Haar features – set of differences between image regions – rapid evaluation (and non-occurrence) rejection [Volia / Jones 2004] F1 FAIL PASS F2 FAIL PASS FN FAIL PASS ... OBJECT N-features
  • 44. Machine Learning : 44BMVA Summer School 2014 Haar Feature Cascades  Real-time Generalised Object Recognition  Benefits – Multi-scale evaluation • scale invariant – Fast, real-time detection – “Direct” on image • no feature extraction – Haar features • contrast/ colour invariant  Limitations – poor performance on non-rigid objects – object rotation [Breckon / Eichner / Barnes / Han / Gaszczak 08-09]
  • 45. Machine Learning : 45BMVA Summer School 2014 [Breckon / Eichner / Barnes / Han / Gaszczak 08-09]http://www.durham.ac.uk/toby.breckon/demos/modgrandchallenge/
  • 46. Machine Learning : 46BMVA Summer School 2014 The big ones - “neural inspired approaches”
  • 47. Machine Learning : 47BMVA Summer School 2014  Real Neural Networks – human brain as a collection of biological neurons and synapses (1011 neurons, 104 synapse connections per neuron) – powerful, adaptive and noise resilient pattern recognition – combination of: • Memory – Memorisation • Generalisation – Learning “rules” – Learning “patterns” Biological Motivation Images: DK Publishing
  • 48. Machine Learning : 48BMVA Summer School 2014  Neural Networks (biological and computational) are good at noise resilient pattern recognition – the human brain can cope with extreme noise in pattern recognition Images: Wikimedia Commons
  • 49. Machine Learning : 49BMVA Summer School 2014 Artificial Neurons (= a perceptron)  An n-dimensional input vector x is mapped to output variable o by means of the scalar product and a nonlinear function mapping, f [Han / Kamber 2006] µk- f weighted sum Input vector x output o Activation function weight vector w ∑ w0 w1 wn x0 x1 xn For Example let f() be: o=sign(∑ i=0 n wi xi+μk ) Bias u
  • 50. Machine Learning : 50BMVA Summer School 2014 Artificial Neural Networks (ANN)  Multiple Layers of Perceptrons • N.B. Neural Networks a.k.a Multi Layer Perceptons (MLP) – N Layers of M Perceptrons • Each fully connected (in graph sense to the next) – Every node of layer N takes outputs of all M nodes of layer N-1 [Han / Kamber 2006] Output layer Input layer Hidden layer(s) Output vector
  • 51. Machine Learning : 51BMVA Summer School 2014 In every node we have ....  Input to network = (numerical) attribute vector describing classification examples  Output of network = vector representing classification • e.g. {1,0}, {0,1}, {1,1}, {0,0} for classes A,B,C,D • or alt. {1,0,0}, {0,1,0}, {0,0,1} for classes A, B C µk -Output Of Layer N-1 To Layer N+1∑
  • 52. Machine Learning : 52BMVA Summer School 2014 Essentially, input to output is mapped as a weighted sum occurring at multiple (fully connected) layers in the network .... … so the weights are key.
  • 53. Machine Learning : 53BMVA Summer School 2014 If everything else is constant ... (i.e. activation function, network topology) … the weights are only thing that changes
  • 54. Machine Learning : 54BMVA Summer School 2014 Thus … setting the weights = training the network
  • 55. Machine Learning : 55BMVA Summer School 2014 Backpropagation Summary  Modifications are made in the “backwards” direction: from the output layer, through each hidden layer down to the first hidden layer, hence “Backpropagation”  Key Algorithmic Steps – Initialize weights (to small random values) in the network – Propagate the inputs forward (by applying activation function) at each node – Backpropagate the error backwards (by updating weights and biases) – Terminating condition (when error is very small or enough iterations) Backpropogation details beyond scope/time (see Mitchell '97).
  • 56. Machine Learning : 56BMVA Summer School 2014 Example: speed sign recognition  Input: – Extracted binary text image – Scaled to 20x20 pixels  Network: – 30 hidden nodes – 2 layers – backpropogation  Output: – 12 classes – {10, 20, 30, 40, 50, 60, 70, 80, 90, 100, national-speed limit,non-sign}  Results: ~97% (success) [Eichner / Breckon '08]
  • 57. Machine Learning : 57BMVA Summer School 2014 http://www.durham.ac.uk/toby.breckon/demos/speedsigns/
  • 58. Machine Learning : 58BMVA Summer School 2014 Problems suited to ANNs  Input is high-dimensional discrete or real-valued – e.g. raw sensor input – signal samples or image pixels  Output – discrete or real valued – is a vector of one or more values  Possibly noisy data  Form of target function is generally unknown – i.e. don't know input to output relationship  Human readability of result is unimportant – rules such as IF..THEN .. ELSE not required
  • 59. Machine Learning : 59BMVA Summer School 2014 Problems with ANNs  Termination of backpropagation – Too many iterations can lead to overfitting (to training data) – Too few iterations can fail to reduce output error sufficiently  Needs parameter selection – Learning rate (weight up-dates) – Network Topology (number of hidden nodes / number of layers) – Choice of activation function – ....  What is the network learning? – How can we be sure the correct (classification) function is being learned ? c.f. AI folk-lore “the tanks story”
  • 60. Machine Learning : 60BMVA Summer School 2014 Problems with ANNs  May find local minimum within the search space of all possible weights (due to nature of backprop. gradient decent) – i.e. backpropagation is not guaranteed to find the best weights to learn the classification/regression function – Thus “learned” neural network may not find optimal solution to the classification/regression problem Weight space {Wij } Images [Wikimedia Commons]
  • 61. Machine Learning : 61BMVA Summer School 2014 … towards the future state of the art  Deep Learning (Deep Neural Networks) – multi-layer neural networks, varying layer sizes • varying levels of abstraction / intermediate feature representations – trained one layer at a time, followed by backprop. – complex and computationally demanding to train  Convolutional Neural Networks – leverage local spatial layout of features in input – locally adjacent neurons connected layer to layer – instead of full layer to layer connectivity – units in m-th layer connected to local subset of units in (m-1)-th layer (which are spatially adjacent)  Often combined together : “state of the art” results in Large Scale Visual Recognition Challenge – Image-Net Challenge - http://image-net.org/challenges/LSVRC/2013/ (1000 classes of object) Image: http://theanalyticsstore.com/deep-learning/
  • 62. Machine Learning : 62BMVA Summer School 2014 Deep Learning Neural Networks Results are impressive. But the same problems remain. (overfitting et al.)
  • 63. Machine Learning : 63BMVA Summer School 2014 The big ones - “kernel driven approaches”
  • 64. Machine Learning : 64BMVA Summer School 2014 Support Vector Machines  Basic Approach: – project instances into high dimensional space – learn linear separators (in high dim. space) with maximum margin – learning as optimizing bound on expected error  Positives – good performance on character recognition, text classification, ... – “appears” to avoid overfitting in high dimensional space – global optimisation, thus avoids local minima  Negatives – applying trained classifier can be expensive (i.e. query time)
  • 65. Machine Learning : 65BMVA Summer School 2014 N.B. “Machines” is just a sexy name (probably to make them sound different), they are really just computer algorithms like everything else in machine learning! … so don't get confused by the whole “machines” thing :o)
  • 66. Machine Learning : 66BMVA Summer School 2014 Simple Example  How can we separate (i.e. classify) these data examples ? (i.e. learn +ve / -ve) e.g. gender recognition .... {male, female} = {+1, -1}
  • 67. Machine Learning : 67BMVA Summer School 2014 Simple Example  Linear separation e.g. gender recognition .... {male, female} = {+1, -1}
  • 68. Machine Learning : 68BMVA Summer School 2014 Simple Example  Linear separators – which one ? e.g. gender recognition .... {male, female} = {+1, -1}
  • 69. Machine Learning : 69BMVA Summer School 2014 Simple Example  Linear separators – which one ? e.g. gender recognition .... {male, female} = {+1, -1}
  • 70. Machine Learning : 70BMVA Summer School 2014 Linear Separator  Instances (i.e, examples) {xi , yi } – xi = point in instance space (Rn ) made up of n attributes – yi =class value for classification of xi  Want a linear separator. Can view this as constraint satisfaction problem:  Equivalently, y = +1 y = -1 Classification of example function f(x) = y = {+1, -1} i.e. 2 classes N.B. we have a vector of weights coefficients⃗w
  • 71. Machine Learning : 71BMVA Summer School 2014 Linear Separator  Now find the hyperplane (separator) with maximum margin – thus if size of margin is defined as (see extras)  So view our problem as a constrained optimization problem: – Find “hyperplane” using computational optimization approach (beyond scope) “hyperplane” = separator boundary hyperplane in R2 == 2D line
  • 72. Machine Learning : 72BMVA Summer School 2014 What about Non-separable Training Sets ? – Find “hyperplane” via computational optimization Penalty term > 0 for each “wrong side of the boundary” case
  • 73. Machine Learning : 73BMVA Summer School 2014 Linear Separator  separator margin determined by just a few examples – call these support vectors – can define separator in terms of support vectors and classifier examples x as:
  • 74. Machine Learning : 74BMVA Summer School 2014  Support vectors = sub-set of training instances that define decision boundary between classes This is the simplest kind of Linear SVM (LSVM)
  • 75. Machine Learning : 75BMVA Summer School 2014 How do we classify a new example ?  New unseen (test) example attribute vector x  Set of support vectors {si } with weights {wi }, bias b – define the “hyperplane” boundary in the original dimension (this is a linear case)  The output of the classification f(x) is: – i.e. f(x) = {-1, +1}
  • 76. Machine Learning : 76BMVA Summer School 2014 What about this ..... ? Not linearly Separable !
  • 77. Machine Learning : 77BMVA Summer School 2014 e.g. 2D to 3D denotes +1 denotes -1  Projection from 2D to 3D allows separation by hyperplane (surface) in R3 N.B. A hyperplane in R3 == 3D plane
  • 78. Machine Learning : 78BMVA Summer School 2014
  • 79. Machine Learning : 79BMVA Summer School 2014 Non Linear SVMs  Just as before but now with the kernel .... – For the (earlier) linear case K is the identity function (or matrix) • This is often referred to the “linear kernel”
  • 80. Machine Learning : 80BMVA Summer School 2014 Non Linear SVMs  Suppose we have instance space X = Rn , need non-linear separator. – project X into some higher dimensional space X' = Rm where data will be linearly separable – let  : X → X' be this projection.  Interestingly, – Training depends only on dot products of form (xi )  (xj ) • i.e. dot product of instances in Rn gives divisor in Rn – So we can train in Rm with same computational complexity as in Rn , provided we can find a kernel basis function K such that: • K(xi , xj ) = (xi )  (xj ) (kernel trick) – Classifying new instance x now requires calculating sign of:
  • 81. Machine Learning : 81BMVA Summer School 2014 This is how SVMs solve difficult problems without linear separation boundaries (hyperplanes) occurring in their original dimension. “project data to higher dimension where data is separable”
  • 82. Machine Learning : 82BMVA Summer School 2014 Why use maximum margin?  Intuitively this feels safest.  a small error in the location of the boundary gives us least chance of causing a misclassification.  Model is immune to removal of any non support-vector data-points  Some related theory (using V-C dimension) (beyond scope)  Distance from boundary ~= measure of “good-ness”  Empirically it works very well
  • 83. Machine Learning : 83BMVA Summer School 2014 Choosing Kernels ?  Kernel functions (commonly used): – Polynomial function of degree p – Gaussian radial basis function (size σ) – 2D sigmoid function (as per neural networks) Commonly chosen by grid search of parameter space “glorified trial and error”
  • 84. Machine Learning : 84BMVA Summer School 2014 Application to Image Classification  Common Model - Bag of Visual Words 1. build histograms of feature occurrence over training data (features: SIFT, SURF, MSER ....) 2. Use histograms as input to SVM (or other ML approach)  Cluster Features in Rn space  Cluster “membership” creates a histogram of feature occurrence .... SVM Bike Violin .... Caltech objects database 101 object classes Features: SIFT detector / PCA-SIFT descriptor, d=10 30 training images / class 43% recognition rate (1% chance performance) Example: Kristen Grauman
  • 85. Machine Learning : 85BMVA Summer School 2014  Bag of Words Model – SURF features – SVM {people | vehicle} detection – Decision Forest - sub-categorieswww.durham.ac.uk/toby.breckon/demos/multimodal/ [Breckon / Han / Richardson, 2012]
  • 86. Machine Learning : 86BMVA Summer School 2014 Application to Image Classification  Searching for Cell Nuclei Locations with SVM [Han / Breckon et al. 2010] – input: “laplace” enhanced pixel values as vector • scaled to common size – process: • exhaustively extract each image neighbourhood over multiple scales • pass pixels to SVM – Is it a cell nuclei ? – output: {cell, nocell} Grid parameter search for RBF kernel
  • 87. Machine Learning : 87BMVA Summer School 2014 Application: automatic cell counting / cell architecture (position) evaluation
  • 88. Machine Learning : 88BMVA Summer School 2014 http://www.durham.ac.uk/toby.breckon/demos/cell
  • 89. Machine Learning : 89BMVA Summer School 2014 A probabilistic interpretation ….
  • 90. Machine Learning : 90BMVA Summer School 2014 Learning Probability Distributions  Bayes Formula in words:  Assign observed feature vector to maximally probable class for j={1 → k classes} ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∑ == j jj jjjj j Pxp Pxp xp Pxp xP ωω ωωωω ω evidence priorlikelihood posterior × = x kω Thomas Bayes (1701-1761)  Captures: – prior probability of occurrence • e.g. from training examples – probability of each class given the evidence (feature vector) – Return most probable (Maximum A Posteriori – MAP) class – Optimal (costly) Vs. Naive (simple)
  • 91. Machine Learning : 91BMVA Summer School 2014 “Approx” State of the Art  Recent approaches – SVM (+ variants), Decision Forests (+ variants), Boosted Approaches, Bagging .... – outperform Neural Networks – some renaissance in Deep Learning – can find maximally optimal solution (SVM) – less prone to over-fitting – allow for extraction of meaning (e.g. if-then-else rules for tree based approaches)  Several other ML approaches – clustering – k-NN, k-Means in multi-dimensional space – graphical models – Bayesian methods – Gaussian Processes (largely regression problems – sweeping generalization!)
  • 92. Machine Learning : 92BMVA Summer School 2014 But how do we evaluate how well it is working* ? * and produce convincing results for our papers and funders
  • 93. Machine Learning : 93BMVA Summer School 2014 Evaluating Machine Learning  For classification problems .... True Positives (TP) – example correctly classified as an +ve instance of given class A False Positives (FP) – example wrongly classified as an +ve instance of given class A • i.e. it is not an instance of class A True Negatives (TN) – example correctly classified as an -ve instance of given class A False Negatives (FP) – example wrongly classified as an -ve instance of given class A • i.e. classified as not class A but is a true class A
  • 94. Machine Learning : 94BMVA Summer School 2014 Evaluating Machine Learning  Confusion Matrices – Table of TP, FP, TN, FN – e.g. 2 class labels {yes, no} – can also show TP, FP, TN, FN weighted by cost of mis-classification Vs. true- classification N.B. A common name for “actual class” (i.e. the true class) is “ground truth” or “the ground truth data”. Actual class True negativeFalse positiveNo False negative True positiveYes NoYes Predicted class
  • 95. Machine Learning : 95BMVA Summer School 2014 Evaluating Machine Learning  Receiver Operating Characteristic (ROC) Curve – used to show trade-off between hit rate and false alarm rate over noisy channel (in communications originally) • here a “noisy” (i.e. error prone) classifier – % TP Vs. %FP – “jagged steps” = actual data – - - - - = averaged result (over multiple cross-validation folds) or best line fit
  • 96. Machine Learning : 96BMVA Summer School 2014 BETTER WORSE
  • 97. Machine Learning : 97BMVA Summer School 2014 Evaluating Machine Learning  Receiver Operating Characteristic (ROC) Curve – Used to compare different classifiers on common dataset – Used to tune a given classifier on a dataset • by varying a given threshold or parameter of the learner that effects the TP to FP ratio [Source: Bradski '09] See also: precision/recall curves
  • 98. Machine Learning : 98BMVA Summer School 2014 Evaluating Machine Learning  The key is: Robust experimentation on independent training and testing sets – Perhaps using cross-validation or similar .. see extra slides on ... “Data Training Methodologies”
  • 99. Machine Learning : 99BMVA Summer School 2014 Many, many ways to perform machine learning .... … we have seen (only) some of them. (very briefly!) Which one is “the best” ? ? ML “classifier”
  • 100. Machine Learning : 100BMVA Summer School 2014 No Free Lunch! (Theorem)  .... the idea that it is impossible to get something for nothing  This is very true in Machine Learning – approaches that train quickly or require little memory or few training examples produce poor results • and vice versa .... !!!!! – poor data = poor learning • problems with data = problems with learning • problems = {not enough data, poorly labelled, biased, unrepresentative … }
  • 101. Machine Learning : 101BMVA Summer School 2014 What we have seen ...  The power of combining simple things …. – Ensemble Classifiers • Decision Forests / Random Forests – concept extends to all ML approaches  Neural Inspired Approaches – Neural Networks – many, many variants  Kernel Inspired Approaches – Support Vector Machines – beginning of the story
  • 102. Machine Learning : 102BMVA Summer School 2014 Further Reading - textbooks  Machine Learning - Tom Mitchell (McGraw-Hill, 1997)  Pattern Recognition & Machine Learning - Christopher Bishop (Springer, 2006)  Data Mining - Witten / Frank (Morgan-Kaufmann, 2005)
  • 103. Machine Learning : 103BMVA Summer School 2014 Further Reading - textbooks  Bayesian Reasoning and Machine Learning – David Barber http://www.cs.ucl.ac.uk/staff/d.barber/brml/ (Cambs. Univ. Press, 2012)  Computer Vision: Models, Learning, and Inference – Simon Prince (Springer, 2012) http://www.computervisionmodels.com/ … both very probability driven, both available as free PDF online (woo, hoo!)
  • 104. Machine Learning : 104BMVA Summer School 2014 Publications – a selection  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning - A. Criminisi et al., in Foundations and Trends in Computer Graphics and Vision, 2012.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun: International Conference on Learning Representations 2014.  Image Classification using Random Forests and Ferns A Bosch, A Zisserman, X Munoz – Int. Conf. Comp. Vis., 2007.  Robust Real-time Object Detection P Viola, M Jones - International Journal of Computer Vision, 2002.  The PASCAL Visual Object Classes (VOC) challenge M Everingham, L Van Gool, C Williams, J Winn, A Zisserman - International Journal of Computer Vision, 2010.
  • 105. Machine Learning : 105BMVA Summer School 2014 Computer Vision – general overview  Computer Vision: Algorithms and Applications – Richard Szeliski, 2010 (Springer)  PDF download: – http://szeliski.org/Book/ Supporting specific content from this machine learning lecture: • see Chapter 14
  • 106. Machine Learning : 106BMVA Summer School 2014 Final shameless plug ….  Dictionary of Computer Vision and Image Processing R.B. Fisher, T.P. Breckon, K. Dawson-Howe, A. Fitzgibbon, C. Robertson, E. Trucco, C.K.I. Williams, Wiley, 2014. – … maybe it will be useful!
  • 107. Machine Learning : 107BMVA Summer School 2014 That's all folks ... Slides, examples, demo code,links + extra supporting slides @ www.durham.ac.uk/toby.breckon/mltutorial/