SlideShare ist ein Scribd-Unternehmen logo
1 von 291
Downloaden Sie, um offline zu lesen
Introduction to Machine
Learning
(Supervised learning)
Dmytro Fishman (dmytro@ut.ee)
This is an introduction to the topic
This is an introduction to the topic
We will try to provide a beautiful scenery
“We love you, Mummy!”
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
“We love you, Mummy!”
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
“We love you, Mummy!”
Petal
Sepal
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
“We love you, Mummy!”
Petal
Sepal
Word 1 25
Word 2 23
Word 3 12
… …
Petal
Sepal
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
Word 1 25
Word 2 23
Word 3 12
… …
“We love you, Mummy!”
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Big DataBig Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Big Data
Astronomical?
Youtubical?
Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Genomical?
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Big Data
Astronomical? Genomical?
Youtubical?
1 Exabyte/year 2-40 Exabyte/year
1-2 Exabyte/year
Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Big Data
Astronomical? Genomical?
Youtubical?
1 Exabyte/year 2-40 Exabyte/year
1-2 Exabyte/year
Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
1 Exabyte =1012 Mb
There is a lot of data
produced nowadays
But there are also a vast number of potential ways to use this
data
Supervised Learning
Benign Malignant
Skin cancer example
Malignant?
Tumour size
Benign Malignant
Skin cancer example
Yes(1)
No(0)
Malignant?
Tumour size
Benign Malignant
Skin cancer example
Yes(1)
No(0)
Malignant?
Tumour size
Benign Malignant
Skin cancer example
Yes(1)
No(0)
Malignant?
Tumour size
Yes(1)
Benign Malignant
Skin cancer example
No(0)
Malignant?
Tumour size
Yes(1)
No(0)
Benign Malignant
Skin cancer example
Malignant?
Tumour size
Age
Tumour size
Age
Malignant?
Tumour size
Age
Malignant?
Tumour size
Age
Malignant?
Other features:
Lesion type
Lesion configuration
Texture
Location
Distribution
…
Potentially infinitely many features!
Classification task
Predicting discrete value output using previously labeled
examples
Binary classification
Classification task
Predicting discrete value output using previously labeled
examples
also binary classification
Classification task
Predicting discrete value output using previously labeled
examples
also binary classification
Every time you have to distinguish between TWO
CLASSES it is a binary classification
Classification task
Multiclass classification
Predicting discrete value output using previously labeled
examples
Housing price prediction
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price?
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price?
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price
Regression task
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price
Malignant?
Tumour size
Yes(1)
No(0)
Benign Malignant
Malignant?
VS
Classification Regression
Supervised Learning
Size in m2
Pricein1000’s($)
400
100
200
300
100 200 300 400 500
Price?
You are running a company which has two problems,
namely:

Q:
a. Both problems are examples of classification problems
b. The first one is a classification task and the second one
regression problem
c. The first one is regression problem and the second one as
classification task
d. Both problems are regression problems
1. For each user in the database predict if this user will
continue using your company’s product or will move to
competitors (churn).
2. Predict the profit of your company at the end of this year
based on previous records.
How would you approach these problems?
You are running a company which has two problems,
namely:

Q:
a. Both problems are examples of classification problems
b. The first one is a classification task and the second one
regression problem
c. The first one is regression problem and the second one as
classification task
d. Both problems are regression problems
1. For each user in the database predict if this user will
continue using your company’s product or will move to
competitors (churn).
2. Predict the profit of your company at the end of this year
based on previous records.
How would you approach these problems?
Unsupervised Learning
examples slides
Clustering with google queries
On a contrary to the first category we don’t have labels to our
classes (graphs with two features from previous examples
turns into unlabelled one)
Gene expression clustering
Quiz question: “of the following examples, which would you
address using a n unsupervised learning algorithm?”
Tumour size
Age
Supervised Learning
Tumour size
Age
Unsupervised Learning
Tumour size
Age
Unsupervised Learning
Is there any interesting hidden structure in this data?
Tumour size
Age
Unsupervised Learning
Is there any interesting hidden structure in this data?
What does this hidden structure correspond to?
Gene expression
Gene expression
Two interesting
groups of
species
Gene expression
Two interesting
groups of genes
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of ...
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of ...
Q3: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of ...
Q4: Discriminating between spam and ham e-mails is a
classification task, true or false?
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of
clustering
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of ...
Quiz: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of ...
Quiz: Discriminating between spam and ham e-mails is a
classification task, true or false?
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of
clustering
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of regression
Q3: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of ...
Quiz: Discriminating between spam and ham e-mails is a
classification task, true or false?
Q3: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of stupidity regression
Q4: Discriminating between spam and ham e-mails is a
classification task, true or false?
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of regression
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of
clustering
Q4: Discriminating between spam and ham e-mails is a
classification task.
Q3: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of stupidity regression
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of regression
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of
clustering
MNIST dataset
(10000 images)
Instance Label
28px
28px
3
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
3
MNIST dataset
(10000 images)
In total 784 pixel values
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
3
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Feature
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Feature
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Feature
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel valuesFeatures are also some times
referred to as dimensions
Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel valuesFeatures are also some times
referred to as dimensions
This images are 784 dimensional
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Data is loaded.
What should we do now?
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Data is loaded.
What should we do now?
We would like to build a tool that would
be able to automatically recognise
handwritten images
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Data is loaded.
What should we do now?
We would like to build a tool that would
be able to automatically recognise
handwritten images
Let’s get to the first algorithm
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
What about computing their pixel-wise difference?
OR
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci|Σ
784
|Ai - Bi|i i
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci|Σ
784
|Ai - Bi|i i
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci|Σ
784
|Ai - Bi|i i
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03i i
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03i i
A is more similar to C than B
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03i i
A is more similar (closer) to C than B
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
i i
OR
A is more similar (closer) to C than B
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
i i
OR
A is more similar (closer) to C than B
Instance Label
?
DatasetFor each new instance
We asked our friend to
write a bunch of new
digits so that we can
have something to
recognise, here is the
first one of them
Instance Label
?
DatasetFor each new instance
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
For each new instance Dataset
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Nearest Neighbour
classifier
For each new instance Dataset
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
For each new instance
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Very easy to implement
For each new instance
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Very easy to implement
Very slow classification time
For each new instance
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Very easy to implement
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
For each new instance
Very slow classification time
Curse of dimensionality
Remember we said
that our instances are
784 dimensional?
Curse of dimensionality
Remember we said
that our instances are
784 dimensional?
This is a lot!
http://cs231n.github.io/classification/
http://cs231n.github.io/classification/
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
For each new instance
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
NN is rarely used in
practice
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Can we find a better
algorithm?
Very slow classification time
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
pixel #213
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #213
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #216
pixel #216
> 30 <= 30
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
pixel #216
VS
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree
Instances
pixel #216
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
pixel #216
> 30 <= 30
VS
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
Split
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification pixel #213
> 163 <= 163
pixel #216
pixel #216
> 30 <= 30
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
How do you know which
features to use for best splits?
Split
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
How do you know which
features to use for best splits?
Split
Using various goodness metrics
such as information gain or gini
impurity to define “best”
Decision (classification) tree algorithm
1.Construct a decision
tree based on training
examples
Decision (classification) tree algorithm
1.Construct a decision
tree based on training
examples
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
?
2.Make corresponding
comparisons
1.Construct a decision
tree based on training
examples
#213
For each new instance
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
?
2.Make corresponding
comparisons
1.Construct a decision
tree based on training
examples
#213
#216
For each new instance
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
6
3. Report label
1.Construct a decision
tree based on training
examples
2.Make corresponding
comparisons
#213
#216
For each new instance
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
6 Depth=2
Once the tree is constructed
maximum 2 comparisons
would be needed to test a new
example3. Report label
1.Construct a decision
tree based on training
examples
2.Make corresponding
comparisons
#213
#216
For each new instance
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
6 Depth=2
In general decision trees are
*always faster than NN
algorithm3. Report label
1.Construct a decision
tree based on training
examples
2.Make corresponding
comparisons
*remember, shit happens
#213
#216
For each new instance
Can we find a better
algorithm?
Disadvantages of NN
Very slow classification time
Suffers from
the curse of dimensionality
Can we find a better
algorithm?
Disadvantages of NNDisadvantages of DT
Very slow classification time Very slow classification time
Suffers from
the curse of dimensionality
Can we find a better
algorithm?
Disadvantages of NNDisadvantages of DT
Also suffers from
the curse of dimensionality
Very slow classification time Very slow classification time
Suffers from
the curse of dimensionality
Is there a way to break the curse?
Is there a way to break the curse?
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
The shape of the
tree is determined
by data not our
choice
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
This means that we will always have the same
output given the same input…
The shape of the
tree is determined
by data not our
choice
The shape of the
tree is determined
by data not our
choice
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
This means that we will always have the same
output given the same input…
Are all input dimensions equally important
for classification?
The shape of the
tree is determined
by data not our
choice
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
This means that we will always have the same
output given the same input…
How about building a lot of trees from
random parts of the data and then merging
their predictions?
Are all input dimensions equally important
for classification?
The shape of the
tree is determined
by data not our
choice
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
This means that we will always have the same
output given the same input…
How about building a lot of trees from
random parts of the data and then merging
their predictions?
Are all input dimensions equally important
for classification?
Random forest algorithm
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
Randomly discard some rows
Randomly discard some rows and columns
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
Build a decision tree
based on remaining data
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
Build a decision tree
based on remaining data
Repeat N times until N trees
are constructed
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
?
For each new instance
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
?
For each new instance Use all constructed trees to
generate predictions
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
?
For each new instance Predictions
Tree #2
Tree #1
Tree #3
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
For each new instance Predictions
Tree #2
Tree #1
Tree #3?
Average 2/3
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
For each new instance Predictions
Tree #2
Tree #1
Tree #36
Average 2/3 = 66.6%
Random forest algorithm
Instance Label
6
For each new instance Predictions
Tree #2
Tree #1
Tree #3
Average 2/3 = 66.6%
Quiz time
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Q1:
Which classification algorithm(s) has(ve) the following
weaknesses:
• It takes more time to train the classifier then to
classify a new instance
• It suffers from the curse of dimensionality
A. Nearest neighbour algorithm
B. Decision tree
C. Random forest algorithm
D. None of the above
E. All of the above
Q1:
A. Nearest neighbour algorithm
B. Decision tree
C. Random forest algorithm
D. None of the above
E. All of the above
• It takes more time to train the classifier then to
classify a new example
• It suffers from the curse of dimensionality
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
Which classification algorithm(s) has(ve) the following
weaknesses:
Q2:
A. Prohibitively slow running time at
training given a lot of data
B. Highly biased classification due
prevalence of one of the classes
C. High classification error due to
excessively complex classifier
D. Poor performance of the classifier
trained on data with large number of
features
E. None of the above
Which of the following statements best defines the curse of
dimensionality
Q2:
Which of the following statements best defines the curse of
dimensionality
A. Prohibitively slow running time at
training given a lot of data
B. Highly biased classification due
prevalence of one of the classes
C. High classification error due to
excessively complex classifier
D. Poor performance of the classifier
trained on data with large number
of features
E. None of the above
Q3:
Which of the following algorithms you would prefer If you
would have to classify instances from low-dimensional data?
A. Nearest neighbour algorithm
B. Decision tree algorithm
C. Random forest algorithm
D. All mentioned would cope
E. None of the above are suitable
A. Nearest neighbour algorithm
B. Decision tree algorithm
C. Random forest algorithm
D. All mentioned would cope
E. None of the above are suitable
Q3:
Which of the following algorithm(s) you would prefer If you
would have to classify instances from low-dimensional data?
Support Vector
Machine
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 202 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Let us go primitive, and focus only on two
pixels
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 202 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Let us go primitive, and focus only on two
pixels
It does not really matter, which ones. I will take these two
because we got use to them already :)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Let us go primitive, and focus only on two
pixels
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Now, let’s visualise them on a 2-D plotPixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Now, let’s visualise them on a 2-D plotPixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Now, let’s visualise them on a 2-D plotPixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Now, let’s visualise them on a 2-D plotPixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Support Vector Machine (SVM)Pixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
A
B C
Is it A, B or C?
Support Vector Machine (SVM)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Margin
Support Vector Machine (SVM)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Margin
Support Vector Machine (SVM)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
Closest points that define hyper-
plane are called support vectors
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
more
confidence
Support Vector Machine (SVM)
Closest points that define hyper-
plane are called support vectors
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Instance Label
?
For each new instance
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Instance Label
?
For each new instance
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Instance Label
6
For each new instance
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Instance Label
6
For each new instance
Pixel#215
Pixel #213
254
2540
0
Support Vector Machine (SVM)
What should we do now?
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
This transformation is called a kernel trick
and function z is the kernel
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
This transformation is called a kernel trick
and function z is the kernel
Wow, wow, wow, hold on!
How does this actually work?
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Comparison with SVM
Disadvantages of SVM
Very slow classification time
Suffers from
the curse of dimensionality
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Comparison with SVM
Disadvantages of SVM
Very slow classification time
Suffers from
the curse of dimensionality
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Comparison with SVM
Disadvantages of SVM
Very slow classification time
Suffers from
the curse of dimensionality
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Comparison with SVM
Disadvantages of SVM
Very slow classification time
Suffers from
the curse of dimensionality
It might be tricky to choose
the right kernel
Quiz time
Q:
How would you approach a multi-classification task using
SVM?
Q:
How would you approach a multi-classification task using
SVM?
Pixel#215
Pixel #213
254
2540
0
Support Vector Machine (SVM)
Support Vector Machine (SVM)
Support Vector Machine (SVM)
Support Vector Machine (SVM)
100% accurate!
100% accurate!
accuracy =
correctly classified instances
total number of instances
100% accurate!
Can we trust this model?
100% accurate!
Can we trust this model?
Consider the following example:
100% accurate!
Can we trust this model?
Consider the following example:
Whatever happens,
predict 0
100% accurate!
Can we trust this model?
Consider the following example:
Whatever happens,
predict 0
Accuracy = 49/50
100% accurate!
Can we trust this model?
Consider the following example:
Whatever happens,
predict 0
Accuracy = 98%
100% accurate!
Can we trust this model?
Consider the following example:
Count
Histogram could help you figure
out if your dataset is unbalanced
100% accurate!
Can we trust this model?
Consider the following example:
What if my data is unbalanced?
Count
Histogram could help you figure
out if your dataset is unbalanced
100% accurate!
Can we trust this model?
Consider the following example:
There are few ways, we are
going to discuss them later Count
What if my data is unbalanced?
Histogram could help you figure
out if your dataset is unbalanced
100% accurate!
Can we trust this model?
In our case data is balanced:
100% accurate!
100% accurate!
Can we trust this model?
We have balanced data:
100% accurate!
Can we trust this model?
We have balanced data:
100% accurate!
Can we trust this model?
We have balanced data:
100% accurate!
Can we trust this model?
We have balanced data:
😒
So, what happened?
100% accurate!
Training the model
Feature#2
Feature #1
Let’s add more examples
Training the model
Feature#2
Feature #1
Training the model
Still linearly separable
Feature#2
Feature #1
Still linearly separable
Training the model
Feature#2
Feature #1
Training the model
Feature#2
Feature #1
Training the model
Feature#2
Feature #1
How about now?
Feature#2
Feature #1
Training the model
Feature#2
Feature #1
Simple; not perfect fit Complicated; ideal fit
Which model we should use?
Training the model
Feature#2
Feature #1
Feature#2
Feature #1
Simple; not perfect fit Complicated; ideal fit
Training the model
Which model we should use?
Feature#2
Feature #1
Feature#2
Feature #1
Simple; not perfect fit Complicated; ideal fit
Training the model
Which model we should use?
Feature#2
Feature #1
Feature#2
Feature #1
Simple; not perfect fit Complicated; ideal fit
Training the model
Which model we should use?
Feature#2
Feature #1
Feature#2
Feature #1
So, what happened?
Overfitting
100% accurate!
So, what happened?
Too general
model
Just right! Overfitting
100% accurate!
So, what happened?
Too general
model
Just right! Overfitting
We should split our data into train and test sets
100% accurate!
Split into train and test
Split into train and test
Normally we would
split data into 80%
train and 20% test
sets
Split into train and test
Normally we would
split data into 80%
train and 20% test
sets
As we have a lot of
data we can afford
50/50 ratio
Split into train and test
Can we do better than 90%?
Normally we would
split data into 80%
train and 20% test
sets
As we have a lot of
data we can afford
50/50 ratio
Parameter tuning
Model
hyper-parameter
Pixel#215
Pixel #213
254
2540
0
Model
hyper-parameter
Pixel#215
Pixel #213
254
2540
0
C = 1
Pixel#215
Pixel #213
254
2540
0
In red are ares where
penalty is applied to
instances close to the line
C = 1
Pixel#215
Pixel #213
254
2540
0
In red are ares where
penalty is applied to
instances close to the line
In green are areas where
no penalty is applied
C = 1
Pixel#215
Pixel #213
254
2540
0
In red are ares where
penalty is applied to
instances close to the line
In green are areas where
no penalty is applied
Total amount of penalty applied to the classifier is called loss
Classifiers try to minimise loss by adjusting their parameters
C = 1
Pixel#215
Pixel #213
254
2540
0
In red are ares where
penalty is applied to
instances close to the line
In green are areas where
no penalty is applied
Total amount of penalty applied to the classifier is called loss
Classifiers try to minimise loss by adjusting their parameters
C = 1
This instance increases
the penalty
Pixel#215
Pixel #213
254
2540
0
Total amount of penalty applied to the classifier is called loss
Classifiers try to minimise loss by adjusting their parameters
Now it is in a green area
In red are ares where
penalty is applied to
instances close to the line
In green are areas where
no penalty is applied
C = 1
Parameter tuning
Algorithm Hyper-parameters
K-nearest
neighbour
K - number of neighbours, (1,…,100)
Decision Tree Metric (‘gini’, ‘information gain’)
Random Forest
Number of trees (3,…,100, more better),
metric (‘gini’, information gain’)
SVM C (10-5,…,102) and gamma (10-15,…,102)
Let’s try different C maybe
our score will improve
Let’s try different C maybe
our score will improve
Nope…
Let’s try different C maybe
our score will improve
Fail again…
Let’s try different C maybe
our score will improve
It is getting depressive…
Let’s try different C maybe
our score will improve
Hurrah!
Let’s try different C maybe
our score will improve
Hurrah!
You may not have noticed but…
Let’s try different C maybe
our score will improve
Hurrah!
You may not have noticed but…
We are overfitting again…
The whole
dataset 100%
Training 60%
The whole
dataset 100%
Training 60%
For fitting initial model
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
The whole
dataset 100%
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
5/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
5/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
5/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
5/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
7/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
7/7
Testing 20%
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
7/7
Testing 20%
For one shot evaluation
of trained model
5/5
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
7/7
Testing 20%
For one shot evaluation
of trained model
5/5
But what happens when you overfit
validation set?
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
Testing 20%
For one shot evaluation
of trained model
5/5You’re doing great!
🙂
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
Testing 20%
For one shot evaluation
of trained model
5/5You’re doing great!
🙂
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
Testing 20%
For one shot evaluation
of trained model
4/5You’re doing great!
🙂
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
Testing 20%
For one shot evaluation
of trained model
4/5You’re doing great!
🙂 😒
The whole dataset 100%
Cross Validation (CV) Algorithm
Training data 80%
Cross Validation (CV) Algorithm
Test 20%
Training data 80%
Cross Validation (CV) Algorithm
20%20%20% 20%
Training data 80%
Cross Validation (CV) Algorithm
Training data 80%
Cross Validation (CV) Algorithm
20%20%20% 20%
Train on 60% of data Validate on
20%
20%20%20% 20%
20%20%20% 20%
Training data 80%
Cross Validation (CV) Algorithm
TrainTrainTrain Val
Train on 60% of data Validate on
20%
Cross Validation (CV) Algorithm
0.75
20%20%20% 20%
Training data 80%
TrainTrainTrain Val
Train on 60% of data Validate on
20%
Cross Validation (CV) Algorithm
0.75
ValTrainTrain Train 0.85
20%20%20% 20%
Training data 80%
TrainTrainTrain Val
20%20%20% 20%
Training data 80%
Cross Validation (CV) Algorithm
0.75
ValTrainTrain Train 0.85
TrainTrainTrain Val
TrainValTrain Train 0.91
Cross Validation (CV) Algorithm
0.75
0.85
TrainTrainVal Train
0.91
0.68
20%20%20% 20%
Training data 80%
ValTrainTrain Train
TrainTrainTrain Val
TrainValTrain Train
TrainTrainVal Train
20%20%20% 20%
Training data 80%
ValTrainTrain Train
TrainTrainTrain Val
TrainValTrain Train
Cross Validation (CV) Algorithm
0.75
0.85
0.91
0.68
MEAN (0.75, 0.85, 0.91, 0.68) = ?
TrainTrainVal Train
20%20%20% 20%
Training data 80%
ValTrainTrain Train
TrainTrainTrain Val
TrainValTrain Train
Cross Validation (CV) Algorithm
0.75
0.85
0.91
0.68
MEAN (0.75, 0.85, 0.91, 0.68) = 0.75
TrainTrainVal Train
20%20%20% 20%
Training data 80%
ValTrainTrain Train
TrainTrainTrain Val
TrainValTrain Train
Cross Validation (CV) Algorithm
0.75
0.85
0.91
0.68
MEAN (0.75, 0.85, 0.91, 0.68) = 0.75
Choose the best model/paramters based on
this estimate and then apply it to test set
Machine Learning pipeline
Raw Data
Machine Learning pipeline
Raw Data Preprocessing
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Train the model
on the whole
training set
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Train the model
on the whole
training set
Evaluate final
model on
the test set
test set
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Train the model
on the whole
training set
Evaluate final
model on
the test set
test set
Machine Learning pipeline
Report your results
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Train the model
on the whole
training set
Evaluate final
model on
the test set
test set
Machine Learning pipeline
Report your results
Problem
A machine learning algorithm usually
corresponds to a combination of the
following 3 elements
The choice of a specific mapping function family F (K-NN,
SVM, DT, RF, Neural Networks etc.).
A machine learning algorithm usually
corresponds to a combination of the
following 3 elements
Way to evaluate the quality of a function f out of F. Ways of
saying how bad/good this function f is doing in classifying
real world objects.
The choice of a specific mapping function family F (K-NN,
SVM, DT, RF, Neural Networks etc.).
A machine learning algorithm usually
corresponds to a combination of the
following 3 elements
a way to search for a better function f out of F. How to
choose parameters so that the performance of f would
improve.
Way to evaluate the quality of a function f out of F. Ways of
saying how bad/good this function f is doing in classifying
real world objects.
The choice of a specific mapping function family F (K-NN,
SVM, DT, RF, Neural Networks etc.).
https://github.com/sugyan/tensorflow-mnist
References
• Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine-
learning)
• Introduction to Machine Learning by Pascal Vincent given at Deep Learning
Summer School, Montreal 2015 (http://videolectures.net/
deeplearning2015_vincent_machine_learning/)
• Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP
Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf)
• Stanford CS class: Convolutional Neural Networks for Visual Recognition by
Andrej Karpathy (http://cs231n.github.io/)
• Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/
MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf)
• Machine Learning Essential Conepts by Ilya Kuzovkin (https://
www.slideshare.net/iljakuzovkin)
• From the brain to deep learning and back by Raul Vicente Zafra and Ilya
Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)
www.biit.cs.ut.ee www.ut.ee www.quretec.ee
You, guys, rock!

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaginggeetachauhan
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsJoseph Paul Cohen PhD
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Research Trends in Editing image using GAN (TAGAN, Editable GAN)
Research Trends in Editing image using GAN (TAGAN, Editable GAN)Research Trends in Editing image using GAN (TAGAN, Editable GAN)
Research Trends in Editing image using GAN (TAGAN, Editable GAN)DaeJin Kim
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and ClusteringEng Teong Cheah
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processingData Science Thailand
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its ApplicationsDr Ganesh Iyer
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 

Was ist angesagt? (20)

Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
Xgboost
XgboostXgboost
Xgboost
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applications
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Research Trends in Editing image using GAN (TAGAN, Editable GAN)
Research Trends in Editing image using GAN (TAGAN, Editable GAN)Research Trends in Editing image using GAN (TAGAN, Editable GAN)
Research Trends in Editing image using GAN (TAGAN, Editable GAN)
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 

Ähnlich wie 1 Supervised learning

Fantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFuturice
 
Writing Paper For 1St Grade - Printa
Writing Paper For 1St Grade - PrintaWriting Paper For 1St Grade - Printa
Writing Paper For 1St Grade - PrintaLisa Thompson
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptxARESProject1
 
Example Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayExample Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayJill Swenson
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxGreg Makowski
 
Lightning Talks: An Innovation Showcase
Lightning Talks: An Innovation ShowcaseLightning Talks: An Innovation Showcase
Lightning Talks: An Innovation ShowcaseSomo
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxvrickens
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxbagotjesusa
 
Highschool Research Paper Samples Research Paper
Highschool Research Paper Samples Research PaperHighschool Research Paper Samples Research Paper
Highschool Research Paper Samples Research PaperApril Charlton
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
Ib Extended Essay Politics. Online assignment writing service.
Ib Extended Essay Politics. Online assignment writing service.Ib Extended Essay Politics. Online assignment writing service.
Ib Extended Essay Politics. Online assignment writing service.Lucy Jensen
 
Recasting the Role of Big (or Little) Data
Recasting the Role of Big (or Little) DataRecasting the Role of Big (or Little) Data
Recasting the Role of Big (or Little) DataMerck
 
Virtual/Augmented reality, digital tools and superpowers for health applicati...
Virtual/Augmented reality, digital tools and superpowers for health applicati...Virtual/Augmented reality, digital tools and superpowers for health applicati...
Virtual/Augmented reality, digital tools and superpowers for health applicati...Boo Aguilar
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Mathieu DESPRIEE
 
Case Study Essay Examples.pdf
Case Study Essay Examples.pdfCase Study Essay Examples.pdf
Case Study Essay Examples.pdfTasha Hernandez
 
From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...J On The Beach
 
Permisologia plantilla.pptx
Permisologia plantilla.pptxPermisologia plantilla.pptx
Permisologia plantilla.pptxCLINICASERUM
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfHassanElalfy4
 
Gp Essays On Poverty
Gp Essays On PovertyGp Essays On Poverty
Gp Essays On PovertyAlexis Turner
 

Ähnlich wie 1 Supervised learning (20)

Fantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl Weir
 
Writing Paper For 1St Grade - Printa
Writing Paper For 1St Grade - PrintaWriting Paper For 1St Grade - Printa
Writing Paper For 1St Grade - Printa
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
 
Example Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayExample Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository Essay
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
Lightning Talks: An Innovation Showcase
Lightning Talks: An Innovation ShowcaseLightning Talks: An Innovation Showcase
Lightning Talks: An Innovation Showcase
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docx
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docx
 
Highschool Research Paper Samples Research Paper
Highschool Research Paper Samples Research PaperHighschool Research Paper Samples Research Paper
Highschool Research Paper Samples Research Paper
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Ib Extended Essay Politics. Online assignment writing service.
Ib Extended Essay Politics. Online assignment writing service.Ib Extended Essay Politics. Online assignment writing service.
Ib Extended Essay Politics. Online assignment writing service.
 
Recasting the Role of Big (or Little) Data
Recasting the Role of Big (or Little) DataRecasting the Role of Big (or Little) Data
Recasting the Role of Big (or Little) Data
 
Virtual/Augmented reality, digital tools and superpowers for health applicati...
Virtual/Augmented reality, digital tools and superpowers for health applicati...Virtual/Augmented reality, digital tools and superpowers for health applicati...
Virtual/Augmented reality, digital tools and superpowers for health applicati...
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Case Study Essay Examples.pdf
Case Study Essay Examples.pdfCase Study Essay Examples.pdf
Case Study Essay Examples.pdf
 
From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...
 
Permisologia plantilla.pptx
Permisologia plantilla.pptxPermisologia plantilla.pptx
Permisologia plantilla.pptx
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
Gp Essays On Poverty
Gp Essays On PovertyGp Essays On Poverty
Gp Essays On Poverty
 

Mehr von Dmytro Fishman

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDmytro Fishman
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentationsDmytro Fishman
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPDmytro Fishman
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningDmytro Fishman
 
Introduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/BoltIntroduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/BoltDmytro Fishman
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian ProcessesDmytro Fishman
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDmytro Fishman
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in HealthcareDmytro Fishman
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networksDmytro Fishman
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)Dmytro Fishman
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learningDmytro Fishman
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?Dmytro Fishman
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in BioinformaticsDmytro Fishman
 

Mehr von Dmytro Fishman (14)

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biology
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentations
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/BoltIntroduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/Bolt
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian Processes
 
Biit group 2018
Biit group 2018Biit group 2018
Biit group 2018
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep Learning
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in Healthcare
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networks
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learning
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in Bioinformatics
 

Kürzlich hochgeladen

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Kürzlich hochgeladen (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

1 Supervised learning

  • 1. Introduction to Machine Learning (Supervised learning) Dmytro Fishman (dmytro@ut.ee)
  • 2.
  • 3. This is an introduction to the topic
  • 4. This is an introduction to the topic We will try to provide a beautiful scenery
  • 5. “We love you, Mummy!”
  • 6. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 “We love you, Mummy!”
  • 7. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 “We love you, Mummy!” Petal Sepal
  • 8. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 “We love you, Mummy!” Petal Sepal Word 1 25 Word 2 23 Word 3 12 … …
  • 9. Petal Sepal 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 Word 1 25 Word 2 23 Word 3 12 … … “We love you, Mummy!”
  • 10. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Big DataBig Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
  • 11. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Big Data Astronomical? Youtubical? Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Genomical?
  • 12. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Big Data Astronomical? Genomical? Youtubical? 1 Exabyte/year 2-40 Exabyte/year 1-2 Exabyte/year Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
  • 13. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Big Data Astronomical? Genomical? Youtubical? 1 Exabyte/year 2-40 Exabyte/year 1-2 Exabyte/year Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 1 Exabyte =1012 Mb
  • 14. There is a lot of data produced nowadays But there are also a vast number of potential ways to use this data
  • 17. Malignant? Tumour size Benign Malignant Skin cancer example Yes(1) No(0)
  • 18. Malignant? Tumour size Benign Malignant Skin cancer example Yes(1) No(0)
  • 19. Malignant? Tumour size Benign Malignant Skin cancer example Yes(1) No(0)
  • 25. Tumour size Age Malignant? Other features: Lesion type Lesion configuration Texture Location Distribution … Potentially infinitely many features!
  • 26. Classification task Predicting discrete value output using previously labeled examples Binary classification
  • 27. Classification task Predicting discrete value output using previously labeled examples also binary classification
  • 28. Classification task Predicting discrete value output using previously labeled examples also binary classification Every time you have to distinguish between TWO CLASSES it is a binary classification
  • 29. Classification task Multiclass classification Predicting discrete value output using previously labeled examples
  • 31. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500
  • 32. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500
  • 33. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price?
  • 34. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price?
  • 35. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price
  • 36. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price
  • 37. Regression task Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price
  • 38. Malignant? Tumour size Yes(1) No(0) Benign Malignant Malignant? VS Classification Regression Supervised Learning Size in m2 Pricein1000’s($) 400 100 200 300 100 200 300 400 500 Price?
  • 39. You are running a company which has two problems, namely:
 Q: a. Both problems are examples of classification problems b. The first one is a classification task and the second one regression problem c. The first one is regression problem and the second one as classification task d. Both problems are regression problems 1. For each user in the database predict if this user will continue using your company’s product or will move to competitors (churn). 2. Predict the profit of your company at the end of this year based on previous records. How would you approach these problems?
  • 40. You are running a company which has two problems, namely:
 Q: a. Both problems are examples of classification problems b. The first one is a classification task and the second one regression problem c. The first one is regression problem and the second one as classification task d. Both problems are regression problems 1. For each user in the database predict if this user will continue using your company’s product or will move to competitors (churn). 2. Predict the profit of your company at the end of this year based on previous records. How would you approach these problems?
  • 41. Unsupervised Learning examples slides Clustering with google queries On a contrary to the first category we don’t have labels to our classes (graphs with two features from previous examples turns into unlabelled one) Gene expression clustering Quiz question: “of the following examples, which would you address using a n unsupervised learning algorithm?”
  • 44. Tumour size Age Unsupervised Learning Is there any interesting hidden structure in this data?
  • 45. Tumour size Age Unsupervised Learning Is there any interesting hidden structure in this data? What does this hidden structure correspond to?
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56. Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of ... Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of ... Q3: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of ... Q4: Discriminating between spam and ham e-mails is a classification task, true or false?
  • 57. Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of clustering Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of ... Quiz: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of ... Quiz: Discriminating between spam and ham e-mails is a classification task, true or false?
  • 58. Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of clustering Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of regression Q3: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of ... Quiz: Discriminating between spam and ham e-mails is a classification task, true or false?
  • 59. Q3: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of stupidity regression Q4: Discriminating between spam and ham e-mails is a classification task, true or false? Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of regression Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of clustering
  • 60. Q4: Discriminating between spam and ham e-mails is a classification task. Q3: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of stupidity regression Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of regression Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of clustering
  • 62. MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px 3
  • 63. MNIST dataset (10000 images) In total 784 pixel values Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) 3
  • 64. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values
  • 65. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Feature
  • 66. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Feature
  • 67. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Feature
  • 68. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel valuesFeatures are also some times referred to as dimensions
  • 69. Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel valuesFeatures are also some times referred to as dimensions This images are 784 dimensional
  • 70. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values
  • 71. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Data is loaded. What should we do now?
  • 72. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Data is loaded. What should we do now? We would like to build a tool that would be able to automatically recognise handwritten images
  • 73. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Data is loaded. What should we do now? We would like to build a tool that would be able to automatically recognise handwritten images Let’s get to the first algorithm
  • 74. How to quantitatively say which of these pairs are more similar? & & A B CA OR
  • 75. How to quantitatively say which of these pairs are more similar? & & A B CA What about computing their pixel-wise difference? OR
  • 76. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci|Σ 784 |Ai - Bi|i i
  • 77. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci|Σ 784 |Ai - Bi|i i
  • 78. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci|Σ 784 |Ai - Bi|i i
  • 79. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03i i
  • 80. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03i i A is more similar to C than B
  • 81. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03i i A is more similar (closer) to C than B
  • 82. Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03 How to quantitatively say which of these pairs are more similar? & & A B CA i i OR A is more similar (closer) to C than B
  • 83. Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03 How to quantitatively say which of these pairs are more similar? & & A B CA i i OR A is more similar (closer) to C than B
  • 84. Instance Label ? DatasetFor each new instance We asked our friend to write a bunch of new digits so that we can have something to recognise, here is the first one of them
  • 86. Instance Label ? 1.Compute pixel-wise distance to all training examples For each new instance Dataset
  • 87. Instance Label ? 1.Compute pixel-wise distance to all training examples For each new instance Dataset
  • 88. Instance Label ? 1.Compute pixel-wise distance to all training examples For each new instance Dataset
  • 89. Instance Label ? 1.Compute pixel-wise distance to all training examples For each new instance Dataset
  • 90. Instance Label ? 1.Compute pixel-wise distance to all training examples 2. Find the closest training example For each new instance Dataset
  • 91. Instance Label ? 1.Compute pixel-wise distance to all training examples 2. Find the closest training example For each new instance Dataset
  • 92. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Nearest Neighbour classifier For each new instance Dataset
  • 93. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN For each new instance
  • 94. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Very easy to implement For each new instance
  • 95. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Very easy to implement Very slow classification time For each new instance
  • 96. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Very easy to implement Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems For each new instance Very slow classification time
  • 97. Curse of dimensionality Remember we said that our instances are 784 dimensional?
  • 98. Curse of dimensionality Remember we said that our instances are 784 dimensional? This is a lot!
  • 99.
  • 102. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems For each new instance
  • 103. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems NN is rarely used in practice
  • 104. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Can we find a better algorithm? Very slow classification time
  • 105. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec
  • 106. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 pixel #213
  • 107. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #213
  • 108. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #216 pixel #216 > 30 <= 30
  • 109. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances pixel #216 VS Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree
  • 110. Instances pixel #216 Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 pixel #216 > 30 <= 30 VS Back to binary classification *for a sec pixel #213 > 163 <= 163 Split
  • 111. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification pixel #213 > 163 <= 163 pixel #216 pixel #216 > 30 <= 30
  • 112. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 How do you know which features to use for best splits? Split
  • 113. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 How do you know which features to use for best splits? Split Using various goodness metrics such as information gain or gini impurity to define “best”
  • 114. Decision (classification) tree algorithm 1.Construct a decision tree based on training examples
  • 115. Decision (classification) tree algorithm 1.Construct a decision tree based on training examples pixel #213 > 163 <= 163 pixel #216 > 30 <= 30
  • 116. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label ? 2.Make corresponding comparisons 1.Construct a decision tree based on training examples #213 For each new instance
  • 117. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label ? 2.Make corresponding comparisons 1.Construct a decision tree based on training examples #213 #216 For each new instance
  • 118. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label 6 3. Report label 1.Construct a decision tree based on training examples 2.Make corresponding comparisons #213 #216 For each new instance
  • 119. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label 6 Depth=2 Once the tree is constructed maximum 2 comparisons would be needed to test a new example3. Report label 1.Construct a decision tree based on training examples 2.Make corresponding comparisons #213 #216 For each new instance
  • 120. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label 6 Depth=2 In general decision trees are *always faster than NN algorithm3. Report label 1.Construct a decision tree based on training examples 2.Make corresponding comparisons *remember, shit happens #213 #216 For each new instance
  • 121. Can we find a better algorithm? Disadvantages of NN Very slow classification time Suffers from the curse of dimensionality
  • 122. Can we find a better algorithm? Disadvantages of NNDisadvantages of DT Very slow classification time Very slow classification time Suffers from the curse of dimensionality
  • 123. Can we find a better algorithm? Disadvantages of NNDisadvantages of DT Also suffers from the curse of dimensionality Very slow classification time Very slow classification time Suffers from the curse of dimensionality
  • 124. Is there a way to break the curse?
  • 125. Is there a way to break the curse?
  • 126. pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic
  • 127. pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic The shape of the tree is determined by data not our choice
  • 128. pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic This means that we will always have the same output given the same input… The shape of the tree is determined by data not our choice
  • 129. The shape of the tree is determined by data not our choice pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic This means that we will always have the same output given the same input… Are all input dimensions equally important for classification?
  • 130. The shape of the tree is determined by data not our choice pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic This means that we will always have the same output given the same input… How about building a lot of trees from random parts of the data and then merging their predictions? Are all input dimensions equally important for classification?
  • 131. The shape of the tree is determined by data not our choice pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic This means that we will always have the same output given the same input… How about building a lot of trees from random parts of the data and then merging their predictions? Are all input dimensions equally important for classification? Random forest algorithm
  • 132. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm
  • 133. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm Randomly discard some rows
  • 134. Randomly discard some rows and columns Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm
  • 135. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm Build a decision tree based on remaining data pixel #213 > 163 <= 163 pixel #216 > 0 = 0
  • 136. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 Build a decision tree based on remaining data Repeat N times until N trees are constructed
  • 137. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163
  • 138. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30
  • 139. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label ? For each new instance
  • 140. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label ? For each new instance Use all constructed trees to generate predictions
  • 141. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label ? For each new instance Predictions Tree #2 Tree #1 Tree #3
  • 142. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label For each new instance Predictions Tree #2 Tree #1 Tree #3? Average 2/3
  • 143. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label For each new instance Predictions Tree #2 Tree #1 Tree #36 Average 2/3 = 66.6%
  • 144. Random forest algorithm Instance Label 6 For each new instance Predictions Tree #2 Tree #1 Tree #3 Average 2/3 = 66.6% Quiz time pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30
  • 145. Q1: Which classification algorithm(s) has(ve) the following weaknesses: • It takes more time to train the classifier then to classify a new instance • It suffers from the curse of dimensionality A. Nearest neighbour algorithm B. Decision tree C. Random forest algorithm D. None of the above E. All of the above
  • 146. Q1: A. Nearest neighbour algorithm B. Decision tree C. Random forest algorithm D. None of the above E. All of the above • It takes more time to train the classifier then to classify a new example • It suffers from the curse of dimensionality pixel #213 > 163 <= 163 pixel #216 > 0 = 0 Which classification algorithm(s) has(ve) the following weaknesses:
  • 147. Q2: A. Prohibitively slow running time at training given a lot of data B. Highly biased classification due prevalence of one of the classes C. High classification error due to excessively complex classifier D. Poor performance of the classifier trained on data with large number of features E. None of the above Which of the following statements best defines the curse of dimensionality
  • 148. Q2: Which of the following statements best defines the curse of dimensionality A. Prohibitively slow running time at training given a lot of data B. Highly biased classification due prevalence of one of the classes C. High classification error due to excessively complex classifier D. Poor performance of the classifier trained on data with large number of features E. None of the above
  • 149. Q3: Which of the following algorithms you would prefer If you would have to classify instances from low-dimensional data? A. Nearest neighbour algorithm B. Decision tree algorithm C. Random forest algorithm D. All mentioned would cope E. None of the above are suitable
  • 150. A. Nearest neighbour algorithm B. Decision tree algorithm C. Random forest algorithm D. All mentioned would cope E. None of the above are suitable Q3: Which of the following algorithm(s) you would prefer If you would have to classify instances from low-dimensional data?
  • 152. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 202 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Let us go primitive, and focus only on two pixels
  • 153. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 202 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Let us go primitive, and focus only on two pixels It does not really matter, which ones. I will take these two because we got use to them already :)
  • 154. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Let us go primitive, and focus only on two pixels
  • 155. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Now, let’s visualise them on a 2-D plotPixel#215 Pixel #213 254 2540 0
  • 156. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Now, let’s visualise them on a 2-D plotPixel#215 Pixel #213 254 2540 0
  • 157. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Now, let’s visualise them on a 2-D plotPixel#215 Pixel #213 254 2540 0
  • 158. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Now, let’s visualise them on a 2-D plotPixel#215 Pixel #213 254 2540 0
  • 159. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Support Vector Machine (SVM)Pixel#215 Pixel #213 254 2540 0
  • 160. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane A B C Is it A, B or C? Support Vector Machine (SVM)
  • 161. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM)
  • 162. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Margin Support Vector Machine (SVM)
  • 163. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Margin Support Vector Machine (SVM)
  • 164. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) Closest points that define hyper- plane are called support vectors
  • 165. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction more confidence Support Vector Machine (SVM) Closest points that define hyper- plane are called support vectors
  • 166. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors
  • 167. Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors Instance Label ? For each new instance
  • 168. Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors Instance Label ? For each new instance
  • 169. Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors Instance Label 6 For each new instance
  • 170. Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors Instance Label 6 For each new instance
  • 171. Pixel#215 Pixel #213 254 2540 0 Support Vector Machine (SVM) What should we do now?
  • 172. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+=
  • 173. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0
  • 174. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0
  • 175. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0
  • 176. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0 This transformation is called a kernel trick and function z is the kernel
  • 177. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0 This transformation is called a kernel trick and function z is the kernel Wow, wow, wow, hold on! How does this actually work?
  • 178. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Comparison with SVM Disadvantages of SVM Very slow classification time Suffers from the curse of dimensionality
  • 179. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Comparison with SVM Disadvantages of SVM Very slow classification time Suffers from the curse of dimensionality
  • 180. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Comparison with SVM Disadvantages of SVM Very slow classification time Suffers from the curse of dimensionality
  • 181. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Comparison with SVM Disadvantages of SVM Very slow classification time Suffers from the curse of dimensionality It might be tricky to choose the right kernel
  • 183. Q: How would you approach a multi-classification task using SVM?
  • 184. Q: How would you approach a multi-classification task using SVM? Pixel#215 Pixel #213 254 2540 0
  • 188. Support Vector Machine (SVM) 100% accurate!
  • 190. accuracy = correctly classified instances total number of instances 100% accurate!
  • 191. Can we trust this model? 100% accurate!
  • 192. Can we trust this model? Consider the following example: 100% accurate!
  • 193. Can we trust this model? Consider the following example: Whatever happens, predict 0 100% accurate!
  • 194. Can we trust this model? Consider the following example: Whatever happens, predict 0 Accuracy = 49/50 100% accurate!
  • 195. Can we trust this model? Consider the following example: Whatever happens, predict 0 Accuracy = 98% 100% accurate!
  • 196. Can we trust this model? Consider the following example: Count Histogram could help you figure out if your dataset is unbalanced 100% accurate!
  • 197. Can we trust this model? Consider the following example: What if my data is unbalanced? Count Histogram could help you figure out if your dataset is unbalanced 100% accurate!
  • 198. Can we trust this model? Consider the following example: There are few ways, we are going to discuss them later Count What if my data is unbalanced? Histogram could help you figure out if your dataset is unbalanced 100% accurate!
  • 199. Can we trust this model? In our case data is balanced: 100% accurate!
  • 200. 100% accurate! Can we trust this model? We have balanced data:
  • 201. 100% accurate! Can we trust this model? We have balanced data:
  • 202. 100% accurate! Can we trust this model? We have balanced data:
  • 203. 100% accurate! Can we trust this model? We have balanced data: 😒
  • 205. Training the model Feature#2 Feature #1 Let’s add more examples
  • 207. Training the model Still linearly separable Feature#2 Feature #1
  • 208. Still linearly separable Training the model Feature#2 Feature #1
  • 211. Feature#2 Feature #1 Training the model Feature#2 Feature #1
  • 212. Simple; not perfect fit Complicated; ideal fit Which model we should use? Training the model Feature#2 Feature #1 Feature#2 Feature #1
  • 213. Simple; not perfect fit Complicated; ideal fit Training the model Which model we should use? Feature#2 Feature #1 Feature#2 Feature #1
  • 214. Simple; not perfect fit Complicated; ideal fit Training the model Which model we should use? Feature#2 Feature #1 Feature#2 Feature #1
  • 215. Simple; not perfect fit Complicated; ideal fit Training the model Which model we should use? Feature#2 Feature #1 Feature#2 Feature #1
  • 217. So, what happened? Too general model Just right! Overfitting 100% accurate!
  • 218. So, what happened? Too general model Just right! Overfitting We should split our data into train and test sets 100% accurate!
  • 219. Split into train and test
  • 220. Split into train and test Normally we would split data into 80% train and 20% test sets
  • 221. Split into train and test Normally we would split data into 80% train and 20% test sets As we have a lot of data we can afford 50/50 ratio
  • 222. Split into train and test Can we do better than 90%? Normally we would split data into 80% train and 20% test sets As we have a lot of data we can afford 50/50 ratio
  • 227. Pixel#215 Pixel #213 254 2540 0 In red are ares where penalty is applied to instances close to the line C = 1
  • 228. Pixel#215 Pixel #213 254 2540 0 In red are ares where penalty is applied to instances close to the line In green are areas where no penalty is applied C = 1
  • 229. Pixel#215 Pixel #213 254 2540 0 In red are ares where penalty is applied to instances close to the line In green are areas where no penalty is applied Total amount of penalty applied to the classifier is called loss Classifiers try to minimise loss by adjusting their parameters C = 1
  • 230. Pixel#215 Pixel #213 254 2540 0 In red are ares where penalty is applied to instances close to the line In green are areas where no penalty is applied Total amount of penalty applied to the classifier is called loss Classifiers try to minimise loss by adjusting their parameters C = 1 This instance increases the penalty
  • 231. Pixel#215 Pixel #213 254 2540 0 Total amount of penalty applied to the classifier is called loss Classifiers try to minimise loss by adjusting their parameters Now it is in a green area In red are ares where penalty is applied to instances close to the line In green are areas where no penalty is applied C = 1
  • 232. Parameter tuning Algorithm Hyper-parameters K-nearest neighbour K - number of neighbours, (1,…,100) Decision Tree Metric (‘gini’, ‘information gain’) Random Forest Number of trees (3,…,100, more better), metric (‘gini’, information gain’) SVM C (10-5,…,102) and gamma (10-15,…,102)
  • 233.
  • 234. Let’s try different C maybe our score will improve
  • 235. Let’s try different C maybe our score will improve Nope…
  • 236. Let’s try different C maybe our score will improve Fail again…
  • 237. Let’s try different C maybe our score will improve It is getting depressive…
  • 238. Let’s try different C maybe our score will improve Hurrah!
  • 239. Let’s try different C maybe our score will improve Hurrah! You may not have noticed but…
  • 240. Let’s try different C maybe our score will improve Hurrah! You may not have noticed but… We are overfitting again…
  • 243. Training 60% For fitting initial model The whole dataset 100%
  • 244. Training 60% For fitting initial model Validation 20% The whole dataset 100%
  • 245. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 5/7
  • 246. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 5/7
  • 247. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 5/7
  • 248. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 5/7
  • 249. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 7/7
  • 250. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 7/7 Testing 20%
  • 251. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 7/7 Testing 20% For one shot evaluation of trained model 5/5
  • 252. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 7/7 Testing 20% For one shot evaluation of trained model 5/5 But what happens when you overfit validation set?
  • 253. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation Testing 20% For one shot evaluation of trained model 5/5You’re doing great! 🙂
  • 254. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation Testing 20% For one shot evaluation of trained model 5/5You’re doing great! 🙂
  • 255. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation Testing 20% For one shot evaluation of trained model 4/5You’re doing great! 🙂
  • 256. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation Testing 20% For one shot evaluation of trained model 4/5You’re doing great! 🙂 😒
  • 257. The whole dataset 100% Cross Validation (CV) Algorithm
  • 258. Training data 80% Cross Validation (CV) Algorithm Test 20%
  • 259. Training data 80% Cross Validation (CV) Algorithm
  • 260. 20%20%20% 20% Training data 80% Cross Validation (CV) Algorithm
  • 261. Training data 80% Cross Validation (CV) Algorithm 20%20%20% 20% Train on 60% of data Validate on 20% 20%20%20% 20%
  • 262. 20%20%20% 20% Training data 80% Cross Validation (CV) Algorithm TrainTrainTrain Val Train on 60% of data Validate on 20%
  • 263. Cross Validation (CV) Algorithm 0.75 20%20%20% 20% Training data 80% TrainTrainTrain Val Train on 60% of data Validate on 20%
  • 264. Cross Validation (CV) Algorithm 0.75 ValTrainTrain Train 0.85 20%20%20% 20% Training data 80% TrainTrainTrain Val
  • 265. 20%20%20% 20% Training data 80% Cross Validation (CV) Algorithm 0.75 ValTrainTrain Train 0.85 TrainTrainTrain Val TrainValTrain Train 0.91
  • 266. Cross Validation (CV) Algorithm 0.75 0.85 TrainTrainVal Train 0.91 0.68 20%20%20% 20% Training data 80% ValTrainTrain Train TrainTrainTrain Val TrainValTrain Train
  • 267. TrainTrainVal Train 20%20%20% 20% Training data 80% ValTrainTrain Train TrainTrainTrain Val TrainValTrain Train Cross Validation (CV) Algorithm 0.75 0.85 0.91 0.68 MEAN (0.75, 0.85, 0.91, 0.68) = ?
  • 268. TrainTrainVal Train 20%20%20% 20% Training data 80% ValTrainTrain Train TrainTrainTrain Val TrainValTrain Train Cross Validation (CV) Algorithm 0.75 0.85 0.91 0.68 MEAN (0.75, 0.85, 0.91, 0.68) = 0.75
  • 269. TrainTrainVal Train 20%20%20% 20% Training data 80% ValTrainTrain Train TrainTrainTrain Val TrainValTrain Train Cross Validation (CV) Algorithm 0.75 0.85 0.91 0.68 MEAN (0.75, 0.85, 0.91, 0.68) = 0.75 Choose the best model/paramters based on this estimate and then apply it to test set
  • 272. Raw Data Preprocessing Machine Learning pipeline
  • 274. Raw Data Preprocessing Feature extraction Split into train & test Machine Learning pipeline
  • 275. Raw Data Preprocessing Feature extraction Split into train & test test set Machine Learning pipeline
  • 276. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Machine Learning pipeline
  • 277. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Machine Learning pipeline
  • 278. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Machine Learning pipeline
  • 279. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Train the model on the whole training set Machine Learning pipeline
  • 280. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Train the model on the whole training set Evaluate final model on the test set test set Machine Learning pipeline
  • 281. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Train the model on the whole training set Evaluate final model on the test set test set Machine Learning pipeline Report your results
  • 282. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Train the model on the whole training set Evaluate final model on the test set test set Machine Learning pipeline Report your results Problem
  • 283. A machine learning algorithm usually corresponds to a combination of the following 3 elements The choice of a specific mapping function family F (K-NN, SVM, DT, RF, Neural Networks etc.).
  • 284. A machine learning algorithm usually corresponds to a combination of the following 3 elements Way to evaluate the quality of a function f out of F. Ways of saying how bad/good this function f is doing in classifying real world objects. The choice of a specific mapping function family F (K-NN, SVM, DT, RF, Neural Networks etc.).
  • 285. A machine learning algorithm usually corresponds to a combination of the following 3 elements a way to search for a better function f out of F. How to choose parameters so that the performance of f would improve. Way to evaluate the quality of a function f out of F. Ways of saying how bad/good this function f is doing in classifying real world objects. The choice of a specific mapping function family F (K-NN, SVM, DT, RF, Neural Networks etc.).
  • 287.
  • 288. References • Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine- learning) • Introduction to Machine Learning by Pascal Vincent given at Deep Learning Summer School, Montreal 2015 (http://videolectures.net/ deeplearning2015_vincent_machine_learning/) • Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf) • Stanford CS class: Convolutional Neural Networks for Visual Recognition by Andrej Karpathy (http://cs231n.github.io/) • Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/ MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf) • Machine Learning Essential Conepts by Ilya Kuzovkin (https:// www.slideshare.net/iljakuzovkin) • From the brain to deep learning and back by Raul Vicente Zafra and Ilya Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)
  • 290.