SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Downloaden Sie, um offline zu lesen
INTRO TO MACHINE LEARNING
150
MIN
5.0
DMYTRO FISHMAN
UNIVERSITY OF TARTU
INSTITUTE OF COMPUTER SCIENCE
New York City Taxi
Fare Prediction
https://www.kaggle.com/c/new-york-city-taxi-fare-prediction
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
type in your browser:
tinyurl.com/yxb5k5jl
(save a copy to your drive)
The following slides are inspired by
“An Introduction to Linear Regression Analysis” video
https://youtu.be/zPG4NjIkCjc
y
X
independent variable
dependentvariable
Linear Regression
y
X
independent variable
dependentvariable
Linear Regression
How the change in independent variable
influences dependent variable?
y
X
independent variable
dependentvariable
Positive relationship
Linear Regression
y
X
independent variable
dependentvariable
Negative relationship
Linear Regression
y
X
independent variable
dependentvariable
Linear Regression
In order to build a linear regression
we need observations
y
X
independent variable
dependentvariable
In order to build a linear regression
we need observations
Linear Regression
y
X
independent variable
dependentvariable
Linear Regression
y
X
independent variable
dependentvariable We want to find a line such that …
Linear Regression
y
X
independent variable
dependentvariable We want to find a line such that …
… it minimises the sum of errors
Linear Regression
y
X
independent variable
dependentvariable
actual
estimated
error
We want to find a line such that …
… it minimises the sum of errors
Linear Regression
y
X
independent variable
dependentvariable
arg min =
n
∑
i=1
( − )2yi ̂yi
Regression Line
Least squares method
We want to find a line such that …
… it minimises the sum of errors
Linear Regression
y
X
independent variable
dependentvariable
Linear Regression
y
X
fareamount
distance
Linear Regression
y
X
fareamount
̂y xw0 w1+=
distance
Linear Regression
y
X
fareamount
xw0 w1+=
arg min
,
=
n
∑
i=1
( − )2yi ̂yi
w0 w1
distance
̂y
Linear Regression
minimises the sum of errors with respect to w0 and w1w0 w1
y
X
fareamount
Linear Regression (example)
distance
2
3
4
5
6
1
1 2 3 4 5
x y x - x̄ y - ȳ (x - x̄ )2 (x - x̄ )(y - ȳ)
1 2 -2 -2 4 4
2 4 -1 0 1 0
3 5 0 1 0 0
4 4 1 0 1 0
5 5 2 1 4 2
x̄ = 3 ȳ = 4 10 6
xw0 w1+=̂y
w1
3w0 .6+=4 *
w0 = 2.2
2.2
=
∑ (x − x)(y − y)
∑ (x − x)2
=
6
10
= .6
=
∑ (x − x)(y − y)
∑ (x − x)2
=
6
10
= .6
y
X
fareamount
Linear Regression (example)
distance
2
3
4
5
6
1
1 2 3 4 5
x y x - x̄ y - ȳ (x - x̄ )2 (x - x̄ )(y - ȳ)
1 2 -2 -2 4 4
2 4 -1 0 1 0
3 5 0 1 0 0
4 4 1 0 1 0
5 5 2 1 4 2
x̄ = 3 ȳ = 4 10 6
xw0 w1+=̂y
w1
3w0 .6+=4 *
w0 = 2.2
2.2
Let’s return to our Colabs
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y
False
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Root node
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Root node
Left child Right child
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Root node
Left child Right child
Leafs
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Decision Tree Algorithm
Here, X may correspond to any vertical line.
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
For example if X = 2.5:
2.5
Decision Tree Algorithm
Here, X may correspond to any vertical line.
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
For example if X = 2.5:
2.5
What are most reasonable values
for Y and Z?
Decision Tree Algorithm
Here, X may correspond to any vertical line.
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
For example if X = 2.5:
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Decision Tree Algorithm
What would be MSE if Y = 4 and Z = 5?
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
For example if X = 2.5:
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
yi ̂yiMSE =
1
n
n
∑
i=1
( − )2
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
yi ̂yiMSE =
1
n
n
∑
i=1
( − )2
real value
predicted value
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
(y1 − ̂y1)2
+ (y2 − ̂y2)2
+ (y3 − ̂y3)2
+ (y4 − ̂y4)2
+ (y5 − ̂y5)2
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
(y1 − ̂y1)2
+ (y2 − ̂y2)2
+ (y3 − ̂y3)2
+ (y4 − ̂y4)2
+ (y5 − ̂y5)2
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
(2)2
+ (0)2
+ (0)2
+ (1)2
+ (0)2
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
4 + 0 + 0 + 1 + 0
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
5
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
= 1yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
= 1yi ̂yi
1
2
3
4
5
so, if X = 2.5, Y = 4 and Z = 5, MSE is 1
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
= 1yi ̂yi
1
2
3
4
5
Can we find better Y and Z?
so, if X = 2.5, Y = 4 and Z = 5, MSE is 1
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
(y1 − ̂y1)2
+ (y2 − ̂y2)2
+ (y3 − ̂y3)2
+ (y4 − ̂y4)2
+ (y5 − ̂y5)2
5
yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
(2 − 3)2
+ (4 − 3)2
+ (5 − 5)2
+ (4 − 5)2
+ (5 − 5)2
5
yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
1 + 1 + 0 + 1 + 0
5
yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
3
5
= 0.6yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
3
5
= 0.6yi ̂yi
so, if X = 2.5, Y = 3 and Z = 5,
MSE is 0.6
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3
fare amount =
4.5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 4.66
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
(2 − 3)2
+ (4 − 3)2
+ (5 − 4.66)2
+ (4 − 4.66)2
+ (5 − 4.66)2
5
yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3
fare amount =
4.5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
1 + 1 + 0.12 + 0.43 + 0.12
5
yi ̂yi
Z = 4.66
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3
fare amount =
4.5
False True
2.5
Y = 3
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
2.67
5
= 0.53yi ̂yi so, if Y = 3 and Z = 4.5,
MSE is smallest
Are we happy?
Z = 4.66
Decision Tree Algorithm
Is distance > 2.5
fare amount = 3
fare amount =
4.5
False True
Hold on, how did we choose this split on the first place?
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
2.5
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3
fare amount =
4.5
False True
2.5
1
2
3
4
5
Hold on, how did we choose this split on the first place?
Maybe there are better options?
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
What are the possible split options in this case?
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
What are the possible split options in this case?
0.5 1.5 2.5 3.5 4.5 5.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
Are these meaningful?
0.5 5.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
1.5 2.5 3.5 4.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
?? ? ?MSE
1.5 2.5 3.5 4.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
0.53? ? ?
1.5 2.5 3.5 4.5
MSE
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
?
Y = 2
Z = 4.5
1.5
MSE
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
(0 + 0.25 + 0.25 + 0.25 + 0.25)/5 = 0.2
Y = 2
Z = 4.5
1.5
MSE
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
0.2 ? ?
1.5 2.5 3.5 4.5
MSE 0.53
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
?
3.5
MSE
Y = 3.66
Z = 4.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
1.03
3.5
MSE
Y = 3.66
Z = 4.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
0.2 1.03 ?
1.5 2.5 3.5 4.5
MSE 0.53
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
?
4.5
MSE
Y = 3.75
Z = 5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
0.95
4.5
MSE
Y = 3.75
Z = 5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
0.2 1.03 0.95
1.5 2.5 3.5 4.5
MSE 0.53
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
We choose the split that minimises total MSE
0.2 1.03 0.95
1.5 2.5 3.5 4.5
MSE 0.53
Decision Tree Algorithm
Is distance > 1.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 2
fare amount =
4.5
False True
1
2
3
4
5
Thus, the resulting tree:
0.2
1.5
MSE
Decision Tree Algorithm
Is distance > 1.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 2
fare amount =
4.5
False True
1
2
3
4
5
Can we make our decision tree more accurate?
0.2
1.5
MSE
Decision Tree Algorithm
distance > 1.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
False True
1
2
3
4
5
Can we make our decision tree more accurate?
0.2
1.5
MSE
Yes, by going deeper!
fare amount =
2
distance > X
fare amount =
Y
fare amount =
Z
False True
Decision Tree Algorithm
distance > 1.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
False True
1
2
3
4
5
Can we make our decision tree more accurate?
0.2
1.5
MSE
Yes, by going deeper!
fare amount =
2
distance > X
fare amount =
Y
fare amount =
Z
False True
Let’s return to our Colabs
Overfitting
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
Simple, but imperfect Complicated, but ideal
VS
Train/val split
Initial dataset
MSE = 1.0
Train dataset
Randomly
select 60%
MSE = 0.0
Simple, but
imperfect
Complicated,
but ideal
Validation (val) dataset
Randomly
select 40%
MSE = 2.5 MSE = 0.5
POINTS
POINTS
1. MACHINE LEARNING
MODEL IS NOT MAGIC

2. YOU CAN SAVE AND
LOAD ML MODELS

3. EVALUATING MODEL
PERFORMANCE IS
IMPORTANT

4. YOU MAY NEED TO
RETRAIN YOUR
MODELS
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

2014 st josephs geelong spec maths
2014 st josephs geelong spec maths2014 st josephs geelong spec maths
2014 st josephs geelong spec mathsAndrew Smith
 
LLP and Transportation problems solution
LLP and Transportation problems solution LLP and Transportation problems solution
LLP and Transportation problems solution Aditya Arora
 
Approach to anova questions
Approach to anova questionsApproach to anova questions
Approach to anova questionsGeorgeGidudu
 
Solution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 FunctionsSolution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 FunctionsHareem Aslam
 
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-ssusere0a682
 
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-ssusere0a682
 
Biostatistics Standard deviation and variance
Biostatistics Standard deviation and varianceBiostatistics Standard deviation and variance
Biostatistics Standard deviation and varianceHARINATHA REDDY ASWARTHA
 
resposta do capitulo 15
resposta do capitulo 15resposta do capitulo 15
resposta do capitulo 15silvio_sas
 
RS Agarwal Quantitative Aptitude - 10 chap
RS Agarwal Quantitative Aptitude - 10 chapRS Agarwal Quantitative Aptitude - 10 chap
RS Agarwal Quantitative Aptitude - 10 chapVinoth Kumar.K
 
ゲーム理論BASIC 演習6 -仁を求める-
ゲーム理論BASIC 演習6 -仁を求める-ゲーム理論BASIC 演習6 -仁を求める-
ゲーム理論BASIC 演習6 -仁を求める-ssusere0a682
 
The sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsThe sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsMichielKarskens
 

Was ist angesagt? (15)

Tugas blog-matematika
Tugas blog-matematikaTugas blog-matematika
Tugas blog-matematika
 
Appendex
AppendexAppendex
Appendex
 
2014 st josephs geelong spec maths
2014 st josephs geelong spec maths2014 st josephs geelong spec maths
2014 st josephs geelong spec maths
 
LLP and Transportation problems solution
LLP and Transportation problems solution LLP and Transportation problems solution
LLP and Transportation problems solution
 
Approach to anova questions
Approach to anova questionsApproach to anova questions
Approach to anova questions
 
Solution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 FunctionsSolution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 Functions
 
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
 
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
 
Biostatistics Standard deviation and variance
Biostatistics Standard deviation and varianceBiostatistics Standard deviation and variance
Biostatistics Standard deviation and variance
 
resposta do capitulo 15
resposta do capitulo 15resposta do capitulo 15
resposta do capitulo 15
 
RS Agarwal Quantitative Aptitude - 10 chap
RS Agarwal Quantitative Aptitude - 10 chapRS Agarwal Quantitative Aptitude - 10 chap
RS Agarwal Quantitative Aptitude - 10 chap
 
Inequalities
InequalitiesInequalities
Inequalities
 
Tugas 5.3 kalkulus integral
Tugas 5.3 kalkulus integralTugas 5.3 kalkulus integral
Tugas 5.3 kalkulus integral
 
ゲーム理論BASIC 演習6 -仁を求める-
ゲーム理論BASIC 演習6 -仁を求める-ゲーム理論BASIC 演習6 -仁を求める-
ゲーム理論BASIC 演習6 -仁を求める-
 
The sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsThe sexagesimal foundation of mathematics
The sexagesimal foundation of mathematics
 

Ähnlich wie Introduction to Machine Learning for Taxify/Bolt

Lecture 7.1 to 7.2 bt
Lecture 7.1 to 7.2 btLecture 7.1 to 7.2 bt
Lecture 7.1 to 7.2 btbtmathematics
 
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...Nish Kala Devi
 
Malimu variance and standard deviation
Malimu variance and standard deviationMalimu variance and standard deviation
Malimu variance and standard deviationMiharbi Ignasm
 
2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptxsaadhaq6
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematicsDr. Trilok Kumar Jain
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematicsDr. Trilok Kumar Jain
 
random variable and distribution
random variable and distributionrandom variable and distribution
random variable and distributionlovemucheca
 
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdfPedhaBabu
 
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdfAliceRivera13
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statisticsshekharpatil33
 
Chapter-1-04032021-111422pm (2).pptx
Chapter-1-04032021-111422pm (2).pptxChapter-1-04032021-111422pm (2).pptx
Chapter-1-04032021-111422pm (2).pptxabdulhannan992458
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithmsbigdata trunk
 

Ähnlich wie Introduction to Machine Learning for Taxify/Bolt (20)

Lecture 7.1 to 7.2 bt
Lecture 7.1 to 7.2 btLecture 7.1 to 7.2 bt
Lecture 7.1 to 7.2 bt
 
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
 
Malimu variance and standard deviation
Malimu variance and standard deviationMalimu variance and standard deviation
Malimu variance and standard deviation
 
2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx
 
Numerical Method for UOG mech stu prd by Abdrehman Ahmed
Numerical Method for UOG mech stu prd by Abdrehman Ahmed Numerical Method for UOG mech stu prd by Abdrehman Ahmed
Numerical Method for UOG mech stu prd by Abdrehman Ahmed
 
Basic algebra for entrepreneurs
Basic algebra for entrepreneurs Basic algebra for entrepreneurs
Basic algebra for entrepreneurs
 
Basic algebra for entrepreneurs
Basic algebra for entrepreneurs Basic algebra for entrepreneurs
Basic algebra for entrepreneurs
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematics
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematics
 
PPT SPLTV
PPT SPLTVPPT SPLTV
PPT SPLTV
 
Math quiz general
Math quiz generalMath quiz general
Math quiz general
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
random variable and distribution
random variable and distributionrandom variable and distribution
random variable and distribution
 
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
 
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Chapter-1-04032021-111422pm (2).pptx
Chapter-1-04032021-111422pm (2).pptxChapter-1-04032021-111422pm (2).pptx
Chapter-1-04032021-111422pm (2).pptx
 
04_AJMS_299_21.pdf
04_AJMS_299_21.pdf04_AJMS_299_21.pdf
04_AJMS_299_21.pdf
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithms
 
2LinearSequences
2LinearSequences2LinearSequences
2LinearSequences
 

Mehr von Dmytro Fishman

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDmytro Fishman
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentationsDmytro Fishman
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPDmytro Fishman
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningDmytro Fishman
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian ProcessesDmytro Fishman
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDmytro Fishman
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in HealthcareDmytro Fishman
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networksDmytro Fishman
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)Dmytro Fishman
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learningDmytro Fishman
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?Dmytro Fishman
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in BioinformaticsDmytro Fishman
 

Mehr von Dmytro Fishman (14)

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biology
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentations
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian Processes
 
Biit group 2018
Biit group 2018Biit group 2018
Biit group 2018
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep Learning
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in Healthcare
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networks
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learning
 
1 Supervised learning
1 Supervised learning1 Supervised learning
1 Supervised learning
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in Bioinformatics
 

Kürzlich hochgeladen

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Kürzlich hochgeladen (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Introduction to Machine Learning for Taxify/Bolt

  • 1. INTRO TO MACHINE LEARNING 150 MIN 5.0 DMYTRO FISHMAN UNIVERSITY OF TARTU INSTITUTE OF COMPUTER SCIENCE
  • 2. New York City Taxi Fare Prediction https://www.kaggle.com/c/new-york-city-taxi-fare-prediction
  • 3. x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 type in your browser: tinyurl.com/yxb5k5jl (save a copy to your drive)
  • 4. The following slides are inspired by “An Introduction to Linear Regression Analysis” video https://youtu.be/zPG4NjIkCjc
  • 6. y X independent variable dependentvariable Linear Regression How the change in independent variable influences dependent variable?
  • 9. y X independent variable dependentvariable Linear Regression In order to build a linear regression we need observations
  • 10. y X independent variable dependentvariable In order to build a linear regression we need observations Linear Regression
  • 12. y X independent variable dependentvariable We want to find a line such that … Linear Regression
  • 13. y X independent variable dependentvariable We want to find a line such that … … it minimises the sum of errors Linear Regression
  • 14. y X independent variable dependentvariable actual estimated error We want to find a line such that … … it minimises the sum of errors Linear Regression
  • 15. y X independent variable dependentvariable arg min = n ∑ i=1 ( − )2yi ̂yi Regression Line Least squares method We want to find a line such that … … it minimises the sum of errors Linear Regression
  • 19. y X fareamount xw0 w1+= arg min , = n ∑ i=1 ( − )2yi ̂yi w0 w1 distance ̂y Linear Regression minimises the sum of errors with respect to w0 and w1w0 w1
  • 20. y X fareamount Linear Regression (example) distance 2 3 4 5 6 1 1 2 3 4 5 x y x - x̄ y - ȳ (x - x̄ )2 (x - x̄ )(y - ȳ) 1 2 -2 -2 4 4 2 4 -1 0 1 0 3 5 0 1 0 0 4 4 1 0 1 0 5 5 2 1 4 2 x̄ = 3 ȳ = 4 10 6 xw0 w1+=̂y w1 3w0 .6+=4 * w0 = 2.2 2.2 = ∑ (x − x)(y − y) ∑ (x − x)2 = 6 10 = .6
  • 21. = ∑ (x − x)(y − y) ∑ (x − x)2 = 6 10 = .6 y X fareamount Linear Regression (example) distance 2 3 4 5 6 1 1 2 3 4 5 x y x - x̄ y - ȳ (x - x̄ )2 (x - x̄ )(y - ȳ) 1 2 -2 -2 4 4 2 4 -1 0 1 0 3 5 0 1 0 0 4 4 1 0 1 0 5 5 2 1 4 2 x̄ = 3 ȳ = 4 10 6 xw0 w1+=̂y w1 3w0 .6+=4 * w0 = 2.2 2.2 Let’s return to our Colabs
  • 22. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable
  • 23. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5
  • 24. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5
  • 25. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y False
  • 26. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True
  • 27. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True Root node
  • 28. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True Root node Left child Right child
  • 29. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True Root node Left child Right child Leafs
  • 30. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True
  • 31. Decision Tree Algorithm Here, X may correspond to any vertical line. Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True For example if X = 2.5: 2.5
  • 32. Decision Tree Algorithm Here, X may correspond to any vertical line. Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True For example if X = 2.5: 2.5 What are most reasonable values for Y and Z?
  • 33. Decision Tree Algorithm Here, X may correspond to any vertical line. Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True For example if X = 2.5: 2.5 What are most reasonable values for Y and Z (that minimise total MSE)?
  • 34. Decision Tree Algorithm What would be MSE if Y = 4 and Z = 5? Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True For example if X = 2.5: 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5
  • 35. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 yi ̂yiMSE = 1 n n ∑ i=1 ( − )2
  • 36. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 yi ̂yiMSE = 1 n n ∑ i=1 ( − )2 real value predicted value
  • 37. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = (y1 − ̂y1)2 + (y2 − ̂y2)2 + (y3 − ̂y3)2 + (y4 − ̂y4)2 + (y5 − ̂y5)2 5 yi ̂yi 1 2 3 4 5
  • 38. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = (y1 − ̂y1)2 + (y2 − ̂y2)2 + (y3 − ̂y3)2 + (y4 − ̂y4)2 + (y5 − ̂y5)2 5 yi ̂yi 1 2 3 4 5
  • 39. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = (2)2 + (0)2 + (0)2 + (1)2 + (0)2 5 yi ̂yi 1 2 3 4 5
  • 40. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 4 + 0 + 0 + 1 + 0 5 yi ̂yi 1 2 3 4 5
  • 41. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 5 5 yi ̂yi 1 2 3 4 5
  • 42. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 1yi ̂yi 1 2 3 4 5
  • 43. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 1yi ̂yi 1 2 3 4 5 so, if X = 2.5, Y = 4 and Z = 5, MSE is 1
  • 44. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 1yi ̂yi 1 2 3 4 5 Can we find better Y and Z? so, if X = 2.5, Y = 4 and Z = 5, MSE is 1
  • 45. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = (y1 − ̂y1)2 + (y2 − ̂y2)2 + (y3 − ̂y3)2 + (y4 − ̂y4)2 + (y5 − ̂y5)2 5 yi ̂yi
  • 46. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = (2 − 3)2 + (4 − 3)2 + (5 − 5)2 + (4 − 5)2 + (5 − 5)2 5 yi ̂yi
  • 47. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 1 + 1 + 0 + 1 + 0 5 yi ̂yi
  • 48. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 3 5 = 0.6yi ̂yi
  • 49. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 3 5 = 0.6yi ̂yi so, if X = 2.5, Y = 3 and Z = 5, MSE is 0.6
  • 50. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 4.5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 4.66 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = (2 − 3)2 + (4 − 3)2 + (5 − 4.66)2 + (4 − 4.66)2 + (5 − 4.66)2 5 yi ̂yi
  • 51. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 4.5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 1 + 1 + 0.12 + 0.43 + 0.12 5 yi ̂yi Z = 4.66
  • 52. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 4.5 False True 2.5 Y = 3 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 2.67 5 = 0.53yi ̂yi so, if Y = 3 and Z = 4.5, MSE is smallest Are we happy? Z = 4.66
  • 53. Decision Tree Algorithm Is distance > 2.5 fare amount = 3 fare amount = 4.5 False True Hold on, how did we choose this split on the first place? y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 2.5 1 2 3 4 5
  • 54. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 4.5 False True 2.5 1 2 3 4 5 Hold on, how did we choose this split on the first place? Maybe there are better options?
  • 55. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 What are the possible split options in this case?
  • 56. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 What are the possible split options in this case? 0.5 1.5 2.5 3.5 4.5 5.5
  • 57. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 Are these meaningful? 0.5 5.5
  • 58. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? 1.5 2.5 3.5 4.5
  • 59. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE ?? ? ?MSE 1.5 2.5 3.5 4.5
  • 60. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE 0.53? ? ? 1.5 2.5 3.5 4.5 MSE
  • 61. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 ? Y = 2 Z = 4.5 1.5 MSE
  • 62. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 (0 + 0.25 + 0.25 + 0.25 + 0.25)/5 = 0.2 Y = 2 Z = 4.5 1.5 MSE
  • 63. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE 0.2 ? ? 1.5 2.5 3.5 4.5 MSE 0.53
  • 64. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 ? 3.5 MSE Y = 3.66 Z = 4.5
  • 65. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 1.03 3.5 MSE Y = 3.66 Z = 4.5
  • 66. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE 0.2 1.03 ? 1.5 2.5 3.5 4.5 MSE 0.53
  • 67. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 ? 4.5 MSE Y = 3.75 Z = 5
  • 68. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 0.95 4.5 MSE Y = 3.75 Z = 5
  • 69. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE 0.2 1.03 0.95 1.5 2.5 3.5 4.5 MSE 0.53
  • 70. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 We choose the split that minimises total MSE 0.2 1.03 0.95 1.5 2.5 3.5 4.5 MSE 0.53
  • 71. Decision Tree Algorithm Is distance > 1.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 2 fare amount = 4.5 False True 1 2 3 4 5 Thus, the resulting tree: 0.2 1.5 MSE
  • 72. Decision Tree Algorithm Is distance > 1.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 2 fare amount = 4.5 False True 1 2 3 4 5 Can we make our decision tree more accurate? 0.2 1.5 MSE
  • 73. Decision Tree Algorithm distance > 1.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 False True 1 2 3 4 5 Can we make our decision tree more accurate? 0.2 1.5 MSE Yes, by going deeper! fare amount = 2 distance > X fare amount = Y fare amount = Z False True
  • 74. Decision Tree Algorithm distance > 1.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 False True 1 2 3 4 5 Can we make our decision tree more accurate? 0.2 1.5 MSE Yes, by going deeper! fare amount = 2 distance > X fare amount = Y fare amount = Z False True Let’s return to our Colabs
  • 75. Overfitting y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 Simple, but imperfect Complicated, but ideal VS
  • 76. Train/val split Initial dataset MSE = 1.0 Train dataset Randomly select 60% MSE = 0.0 Simple, but imperfect Complicated, but ideal Validation (val) dataset Randomly select 40% MSE = 2.5 MSE = 0.5
  • 78. POINTS 1. MACHINE LEARNING MODEL IS NOT MAGIC 2. YOU CAN SAVE AND LOAD ML MODELS 3. EVALUATING MODEL PERFORMANCE IS IMPORTANT 4. YOU MAY NEED TO RETRAIN YOUR MODELS