SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Linear Regression, Costs & Gradient Descent
Pallavi Mishra
&
Revanth Kumar
Introduction to Linear Regression
• Linear Regression is a predictive model to map the relation between dependent variable
and one or more independent variables.
• It is a supervised learning method and regression problem which predicts
real valued output.
• The predicted output is done by forming Hypothesis based on learning algo.
𝑌 = 𝜃0 + 𝜃1 𝑥1 ( Single Independent Variable)
𝑌 = 𝜃0 + 𝜃1 𝑥1+ 𝜃2 𝑥2 +…..+ 𝜃 𝑘 𝑥 𝑘 ( Multiple Independent Variables)
= 𝑖=0
𝑘
𝜃𝑖 𝑥𝑖 ; Where 𝑥0 = 1 …………….(1)
Where 𝜃𝑖 = parameters for 𝑖 𝑡ℎindependent variable(s)
For estimation of performance of the linear model, SSE
Squared Sum Error (SSE) = 𝑖=1
𝑘
( 𝑌 − 𝑌)2
Note: Here, 𝑌 is the actual observed output
And, 𝑌 is the predicted output.
Hypothesis line
Actual Output (Y)
Predicted Output ( 𝑌)
Error
Model Representation
Training Set
Learning Algorithm
Hypothesis ( 𝑌)Unknown Independent Value Estimated Output Value
Fig.1 Model Representation of Linear Regression
Hint: Gradient descent as learning algorithm
How to Represent Hypothesis?
• We know, hypothesis is represented by 𝑌, which can be formulated
depending upon single variable linear regression (Univariate Linear
Regression) or Multi-variate linear regression.
• 𝑌 = 𝜃0 + 𝜃1 𝑥1
• Here, 𝜃0 = intercept and 𝜃1 = slope=
Δ𝑦
Δ𝑥
and 𝑥1 = independent variable
• Question arises: How do we choose 𝜃𝑖′ 𝑠 values for best fitting hypothesis?
• Idea : Choose 𝜃0 , 𝜃1 so that 𝑌 is close to 𝑌 for our training examples (x, y)
• Objective: min J(𝜃0 , 𝜃1 ),
• Note: J(𝜃0 , 𝜃1 ) = Cost Function.
• Formulation of J(𝜃0 , 𝜃1 ) =
1
2𝑚 𝑖=1
𝑚
( 𝑌(𝑖)−𝑌(𝑖))
2
Note: m = No. of instances of dataset
Objective function for linear regression
• The most important objective of linear regression model is to minimize cost function by
choosing a optimal value for 𝜃0 , 𝜃1.
• For optimization technique, Gradient Descent is mostly used in case of predictive models.
• By taking 𝜃0 = 0 and 𝜃1 = some random values ( in case of univariate linear regression),
the graph (𝜃1 vs J(𝜃1 )) gets represented in the form of bow shaped.
Advantage of Gradient descent in linear regression model
• No scope to stuck in local optima, since there is only
One global optima position where slope(𝜃1) = 0
(convex graph)
𝜃1
𝐽(𝜃1)
Normal Distribution N(𝜇, 𝜎2
)
Estimation of mean (𝝁) and variance (𝝈 𝟐):
• Let size of data set = n, denoted by 𝑦1, 𝑦2…… 𝑦𝑛
• Assuming 𝑦1, 𝑦2…… 𝑦𝑛 are independent random variables or Independent Identically
Distributed (iid), they are normally distributed random variables.
• Assuming no independent variables (x), in order to estimate the future value of y we need to find
to find unknown parameters (𝜇 & 𝜎2).
Concept of Maximum Likelihood Estimation:
• Using Maximum Likelihood Estimation (MLE) concept, we are trying to find the optimal value for
value for the mean (𝜇) and standard deviation (σ) for distribution given a bunch of observed
observed measurements.
• The goal of MLE is to find optimal way to fit a distribution to the data so, as to work easily with
with data
Continue…
Estimation of 𝝁 & 𝝈 𝟐
:
• Density of normal random variable = f(y) =
1
2𝜋𝜎
𝑒
−1
2𝜎2(𝑦−𝜇)2
L (𝜇, 𝜎2
) is a joint density
Now,
let, L (𝜇, 𝜎2
) = f (𝑦1, 𝑦2…… 𝑦 𝑛) = 𝑖=1
𝑛 1
2𝜋𝜎
𝑒
−1
2𝜎2(𝑦−𝜇)2
let, assume 𝜎2 = 𝜃
let, L (𝜇, 𝜃) =
1
( 2𝜋𝜃)
𝑛 𝑒
−1
2𝜃
(𝑦−𝜇)2
taking log on both sides
LL (𝜇, 𝜃) = log (2𝜋𝜃)−
𝑛
2 + log (𝑒
−1
2𝜎2(𝑦−𝜇)2
) ∗LL (𝜇, 𝜃) is denoted as log of joint density
=−
𝑛
2
log 2𝜋𝜃 −
1
2𝜃
(𝑦 − 𝜇)2
(2) ∗ 𝑙𝑜𝑔𝑒 𝑥
= 𝑥
Continue…
• Our objective is to estimate the next occurring of data point y in the distribution of data.
Using MLE we can find the optimal value for (μ, σ2). For a given trainings set we need to
find max LL (μ, θ) .
• Let us assume 𝜃 = 𝜎2
for simplicity
• Now, we use partial derivatives to find the optimal values of (μ, σ2) and equating to zero
𝐿𝐿′ = 0
LL (𝜇, 𝜃) = −
𝑛
2
log 2𝜋𝜃 −
1
2𝜃
(𝑦 − 𝜇)2
• Taking partial derivative wrt 𝜇 in eq (2), we get
𝐿𝐿 𝜇
′
= 0 −
2
2𝜃
(𝑦𝑖 − 𝜇) (-1)
=> (𝑦𝑖 − 𝜇) = 0 * 𝐿𝐿 𝜇
′
is partial derivative of LL wrt 𝜇
=> 𝑦𝑖 = 𝑛 𝜇
Continue…
𝜇 =
1
𝑛
𝑦𝑖 * μ is estimated mean value
Again taking partial derivatives on eq (2) wrt 𝜃
𝐿𝐿 𝜃
′
= −
𝑛
2
1
2𝜋𝜃
2𝜋 −
−1
2𝜃2 (𝑦𝑖 − 𝜇)2
Setting above to zero, we get
⇒
1
2𝜃
(𝑦𝑖 − 𝜇)2 =
𝑛
2
1
𝜃
Finally, this leads to solution
𝜎2 = 𝜃 =
1
𝑛
(𝑦𝑖 − 𝜇)2 * 𝜎2 is estimated variance
After plugging estimate of
𝜎2 =
1
𝑛
(𝑦 − 𝑦)2
𝜇 =
1
𝑛
𝑦𝑖
Continue…
• Above estimate can be generalized to 𝜎2 =
1
𝑛
𝑒𝑟𝑟𝑜𝑟2 * error = y − 𝑦
• Finally we estimated the value of mean and variance in order to predict the future
occurrence of y ( 𝑦) data points.
• Therefore the best estimate of occurrence of next y ( 𝑦) that is likely to occur is 𝜇 and the
solution is arrived by using SSE ( 𝜎2)
𝜎2 =
1
𝑛
𝑒𝑟𝑟𝑜𝑟2
Optimization & Derivatives
J(𝜃) =
1
2𝑛 𝑖=1
𝑖=𝑛
(𝑦𝑖 − 𝑗=1
𝑗=𝑘
𝑥𝑖𝑗 𝜃𝑗)2
Y=
𝑦1
𝑦2
…
𝑦𝑛
; X=
𝑥11 𝑥12 … 𝑥1𝑘
𝑥21 𝑥22 … 𝑥2𝑘
…
𝑥 𝑛1
…
𝑥2𝑛
… …
… 𝑥 𝑛𝑘
; 𝜃=
𝜃1
𝜃2
…
𝜃 𝑘
𝑗=1
𝑗=𝑘
𝑥𝑖𝑗 𝜃𝑗 is simple multiplication of 𝑖 𝑡ℎ
row of matrix X and vector 𝜃 . Hence
=
1
2𝑛 𝑖=1
𝑖=𝑛
(𝑌 − 𝑋𝜃)2
Continue…
= 𝑌 − 𝑌
′
(𝑌 − 𝑌) ∴ 𝑌 = 𝑋𝜃
J(𝜃)=
1
2𝑛
𝑌 − 𝑋𝜃 ′(𝑌 − 𝑋𝜃)
= 𝑌′
𝑌 − 𝑌′
𝑋𝜃 − 𝑌𝑋𝜃′
− 𝑋𝜃′
𝑋𝜃
Now, Derivative with respect to 𝜃
𝜕
𝜕𝜃
= 0 – 2XY + 2𝑋2 𝜃
=
1
2𝑛
(– 2XY + 2𝑋2 𝜃)
= −
2
2𝑛
(XY – 𝑋2 𝜃)
= −
1
𝑛
(XY – 𝑋′ 𝑋𝜃)
= −
1
𝑛
𝑋′
(Y – 𝑌)
J(𝜃)=
1
𝑛
𝑋′( 𝑌 − 𝑌)
How to start with Gradient Descent
• The basic assumption is to start at any random position 𝑥0 and take derivative value.
• 1 𝑠𝑡 case: if derivative value > 0 , increasing
• Action : then change the 𝜃1 values using the gradient descent formula.
• 𝜃1 = 𝜃1 - 𝛼
𝑑 𝐽(𝜃1)
𝑑𝜃1
• here, 𝛼 = learning rate / parameter
Gradient Descent algorithm
• Repeat until convergence { 𝜃1: = 𝜃1 - 𝛼
𝑑 𝐽(𝜃1)
𝑑𝜃1
here, assuming 𝜃0 = 0 for univariate linear
regression }
For multi variate linear regression:
• Repeat until convergence { 𝜃𝑗 := 𝜃𝑗 - 𝛼
𝑑 𝐽(𝜃0, 𝜃1)
𝑑𝜃 𝑗
}
Simultaneous update of 𝜃0, 𝜃1
Temp 0 := 𝜃 𝑜: = 𝜃0 - 𝛼
𝑑 𝐽(𝜃0, 𝜃1)
𝑑𝜃0
Temp 1 := 𝜃1: = 𝜃1 - 𝛼
𝑑 𝐽(𝜃0, 𝜃1)
𝑑𝜃1
𝜃 𝑜: = Temp 0
𝜃1: = Temp 1
Effects associated with varying values of
learning rate (𝛼)
𝛼
Continue:
• In the first case, we may find difficulty to reach at global optima since large value of 𝛼 may
overshoot the optimal position due to aggressive updating of 𝜃 values.
• Therefore, as we approach optima position, gradient descent will take automatically
smaller steps.
Conclusion
• The cost function for linear regression is always gong to be a bow-shaped function
(convex function)
• This function doesn’t have an local optima except for the one global optima.
• Therefore, using cost function of type 𝐽(𝜃0, 𝜃1) which we get whenever we are using linear
regression, it will always converge to the global optimum.
• Most important is make sure our gradient descent algorithms is working properly .
• On increasing number of iterations, the value of 𝐽(𝜃0, 𝜃1) should get decreasing after every
iterations.
• Determining the automatic convergence test is difficult because we don't know the
threshold value.

Weitere ähnliche Inhalte

Was ist angesagt?

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 

Was ist angesagt? (20)

“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
KNN
KNNKNN
KNN
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 

Ähnlich wie Linear regression, costs & gradient descent

Adaptive filtersfinal
Adaptive filtersfinalAdaptive filtersfinal
Adaptive filtersfinal
Wiw Miu
 

Ähnlich wie Linear regression, costs & gradient descent (20)

Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear Regression
 
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Differentiation
DifferentiationDifferentiation
Differentiation
 
Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...
 
MLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptxMLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptx
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
 
E04 06 3943
E04 06 3943E04 06 3943
E04 06 3943
 
Stochastic optimal control & rl
Stochastic optimal control & rlStochastic optimal control & rl
Stochastic optimal control & rl
 
Differential Calculus- differentiation
Differential Calculus- differentiationDifferential Calculus- differentiation
Differential Calculus- differentiation
 
Adaptive filtersfinal
Adaptive filtersfinalAdaptive filtersfinal
Adaptive filtersfinal
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 
Basic%20Cal%20Final.docx.docx
Basic%20Cal%20Final.docx.docxBasic%20Cal%20Final.docx.docx
Basic%20Cal%20Final.docx.docx
 
The derivatives module03
The derivatives module03The derivatives module03
The derivatives module03
 
Dmitrii Tihonkih - The Iterative Closest Points Algorithm and Affine Transfo...
Dmitrii Tihonkih - The Iterative Closest Points Algorithm and  Affine Transfo...Dmitrii Tihonkih - The Iterative Closest Points Algorithm and  Affine Transfo...
Dmitrii Tihonkih - The Iterative Closest Points Algorithm and Affine Transfo...
 

Mehr von Revanth Kumar

Mehr von Revanth Kumar (8)

APPLIED MACHINE LEARNING
APPLIED MACHINE LEARNINGAPPLIED MACHINE LEARNING
APPLIED MACHINE LEARNING
 
Deep learning algorithms
Deep learning algorithmsDeep learning algorithms
Deep learning algorithms
 
Back propagation using sigmoid & ReLU function
Back propagation using sigmoid & ReLU functionBack propagation using sigmoid & ReLU function
Back propagation using sigmoid & ReLU function
 
Math behind the kernels
Math behind the kernelsMath behind the kernels
Math behind the kernels
 
Kernels in convolution
Kernels in convolutionKernels in convolution
Kernels in convolution
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
 
Self driving car
Self driving carSelf driving car
Self driving car
 
Tomography System
Tomography SystemTomography System
Tomography System
 

Kürzlich hochgeladen

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 

Linear regression, costs & gradient descent

  • 1. Linear Regression, Costs & Gradient Descent Pallavi Mishra & Revanth Kumar
  • 2. Introduction to Linear Regression • Linear Regression is a predictive model to map the relation between dependent variable and one or more independent variables. • It is a supervised learning method and regression problem which predicts real valued output. • The predicted output is done by forming Hypothesis based on learning algo. 𝑌 = 𝜃0 + 𝜃1 𝑥1 ( Single Independent Variable) 𝑌 = 𝜃0 + 𝜃1 𝑥1+ 𝜃2 𝑥2 +…..+ 𝜃 𝑘 𝑥 𝑘 ( Multiple Independent Variables) = 𝑖=0 𝑘 𝜃𝑖 𝑥𝑖 ; Where 𝑥0 = 1 …………….(1) Where 𝜃𝑖 = parameters for 𝑖 𝑡ℎindependent variable(s) For estimation of performance of the linear model, SSE Squared Sum Error (SSE) = 𝑖=1 𝑘 ( 𝑌 − 𝑌)2 Note: Here, 𝑌 is the actual observed output And, 𝑌 is the predicted output. Hypothesis line Actual Output (Y) Predicted Output ( 𝑌) Error
  • 3. Model Representation Training Set Learning Algorithm Hypothesis ( 𝑌)Unknown Independent Value Estimated Output Value Fig.1 Model Representation of Linear Regression Hint: Gradient descent as learning algorithm
  • 4. How to Represent Hypothesis? • We know, hypothesis is represented by 𝑌, which can be formulated depending upon single variable linear regression (Univariate Linear Regression) or Multi-variate linear regression. • 𝑌 = 𝜃0 + 𝜃1 𝑥1 • Here, 𝜃0 = intercept and 𝜃1 = slope= Δ𝑦 Δ𝑥 and 𝑥1 = independent variable • Question arises: How do we choose 𝜃𝑖′ 𝑠 values for best fitting hypothesis? • Idea : Choose 𝜃0 , 𝜃1 so that 𝑌 is close to 𝑌 for our training examples (x, y) • Objective: min J(𝜃0 , 𝜃1 ), • Note: J(𝜃0 , 𝜃1 ) = Cost Function. • Formulation of J(𝜃0 , 𝜃1 ) = 1 2𝑚 𝑖=1 𝑚 ( 𝑌(𝑖)−𝑌(𝑖)) 2 Note: m = No. of instances of dataset
  • 5. Objective function for linear regression • The most important objective of linear regression model is to minimize cost function by choosing a optimal value for 𝜃0 , 𝜃1. • For optimization technique, Gradient Descent is mostly used in case of predictive models. • By taking 𝜃0 = 0 and 𝜃1 = some random values ( in case of univariate linear regression), the graph (𝜃1 vs J(𝜃1 )) gets represented in the form of bow shaped. Advantage of Gradient descent in linear regression model • No scope to stuck in local optima, since there is only One global optima position where slope(𝜃1) = 0 (convex graph) 𝜃1 𝐽(𝜃1)
  • 6. Normal Distribution N(𝜇, 𝜎2 ) Estimation of mean (𝝁) and variance (𝝈 𝟐): • Let size of data set = n, denoted by 𝑦1, 𝑦2…… 𝑦𝑛 • Assuming 𝑦1, 𝑦2…… 𝑦𝑛 are independent random variables or Independent Identically Distributed (iid), they are normally distributed random variables. • Assuming no independent variables (x), in order to estimate the future value of y we need to find to find unknown parameters (𝜇 & 𝜎2). Concept of Maximum Likelihood Estimation: • Using Maximum Likelihood Estimation (MLE) concept, we are trying to find the optimal value for value for the mean (𝜇) and standard deviation (σ) for distribution given a bunch of observed observed measurements. • The goal of MLE is to find optimal way to fit a distribution to the data so, as to work easily with with data
  • 7. Continue… Estimation of 𝝁 & 𝝈 𝟐 : • Density of normal random variable = f(y) = 1 2𝜋𝜎 𝑒 −1 2𝜎2(𝑦−𝜇)2 L (𝜇, 𝜎2 ) is a joint density Now, let, L (𝜇, 𝜎2 ) = f (𝑦1, 𝑦2…… 𝑦 𝑛) = 𝑖=1 𝑛 1 2𝜋𝜎 𝑒 −1 2𝜎2(𝑦−𝜇)2 let, assume 𝜎2 = 𝜃 let, L (𝜇, 𝜃) = 1 ( 2𝜋𝜃) 𝑛 𝑒 −1 2𝜃 (𝑦−𝜇)2 taking log on both sides LL (𝜇, 𝜃) = log (2𝜋𝜃)− 𝑛 2 + log (𝑒 −1 2𝜎2(𝑦−𝜇)2 ) ∗LL (𝜇, 𝜃) is denoted as log of joint density =− 𝑛 2 log 2𝜋𝜃 − 1 2𝜃 (𝑦 − 𝜇)2 (2) ∗ 𝑙𝑜𝑔𝑒 𝑥 = 𝑥
  • 8. Continue… • Our objective is to estimate the next occurring of data point y in the distribution of data. Using MLE we can find the optimal value for (μ, σ2). For a given trainings set we need to find max LL (μ, θ) . • Let us assume 𝜃 = 𝜎2 for simplicity • Now, we use partial derivatives to find the optimal values of (μ, σ2) and equating to zero 𝐿𝐿′ = 0 LL (𝜇, 𝜃) = − 𝑛 2 log 2𝜋𝜃 − 1 2𝜃 (𝑦 − 𝜇)2 • Taking partial derivative wrt 𝜇 in eq (2), we get 𝐿𝐿 𝜇 ′ = 0 − 2 2𝜃 (𝑦𝑖 − 𝜇) (-1) => (𝑦𝑖 − 𝜇) = 0 * 𝐿𝐿 𝜇 ′ is partial derivative of LL wrt 𝜇 => 𝑦𝑖 = 𝑛 𝜇
  • 9. Continue… 𝜇 = 1 𝑛 𝑦𝑖 * μ is estimated mean value Again taking partial derivatives on eq (2) wrt 𝜃 𝐿𝐿 𝜃 ′ = − 𝑛 2 1 2𝜋𝜃 2𝜋 − −1 2𝜃2 (𝑦𝑖 − 𝜇)2 Setting above to zero, we get ⇒ 1 2𝜃 (𝑦𝑖 − 𝜇)2 = 𝑛 2 1 𝜃 Finally, this leads to solution 𝜎2 = 𝜃 = 1 𝑛 (𝑦𝑖 − 𝜇)2 * 𝜎2 is estimated variance After plugging estimate of 𝜎2 = 1 𝑛 (𝑦 − 𝑦)2 𝜇 = 1 𝑛 𝑦𝑖
  • 10. Continue… • Above estimate can be generalized to 𝜎2 = 1 𝑛 𝑒𝑟𝑟𝑜𝑟2 * error = y − 𝑦 • Finally we estimated the value of mean and variance in order to predict the future occurrence of y ( 𝑦) data points. • Therefore the best estimate of occurrence of next y ( 𝑦) that is likely to occur is 𝜇 and the solution is arrived by using SSE ( 𝜎2) 𝜎2 = 1 𝑛 𝑒𝑟𝑟𝑜𝑟2
  • 11. Optimization & Derivatives J(𝜃) = 1 2𝑛 𝑖=1 𝑖=𝑛 (𝑦𝑖 − 𝑗=1 𝑗=𝑘 𝑥𝑖𝑗 𝜃𝑗)2 Y= 𝑦1 𝑦2 … 𝑦𝑛 ; X= 𝑥11 𝑥12 … 𝑥1𝑘 𝑥21 𝑥22 … 𝑥2𝑘 … 𝑥 𝑛1 … 𝑥2𝑛 … … … 𝑥 𝑛𝑘 ; 𝜃= 𝜃1 𝜃2 … 𝜃 𝑘 𝑗=1 𝑗=𝑘 𝑥𝑖𝑗 𝜃𝑗 is simple multiplication of 𝑖 𝑡ℎ row of matrix X and vector 𝜃 . Hence = 1 2𝑛 𝑖=1 𝑖=𝑛 (𝑌 − 𝑋𝜃)2
  • 12. Continue… = 𝑌 − 𝑌 ′ (𝑌 − 𝑌) ∴ 𝑌 = 𝑋𝜃 J(𝜃)= 1 2𝑛 𝑌 − 𝑋𝜃 ′(𝑌 − 𝑋𝜃) = 𝑌′ 𝑌 − 𝑌′ 𝑋𝜃 − 𝑌𝑋𝜃′ − 𝑋𝜃′ 𝑋𝜃 Now, Derivative with respect to 𝜃 𝜕 𝜕𝜃 = 0 – 2XY + 2𝑋2 𝜃 = 1 2𝑛 (– 2XY + 2𝑋2 𝜃) = − 2 2𝑛 (XY – 𝑋2 𝜃) = − 1 𝑛 (XY – 𝑋′ 𝑋𝜃) = − 1 𝑛 𝑋′ (Y – 𝑌) J(𝜃)= 1 𝑛 𝑋′( 𝑌 − 𝑌)
  • 13. How to start with Gradient Descent • The basic assumption is to start at any random position 𝑥0 and take derivative value. • 1 𝑠𝑡 case: if derivative value > 0 , increasing • Action : then change the 𝜃1 values using the gradient descent formula. • 𝜃1 = 𝜃1 - 𝛼 𝑑 𝐽(𝜃1) 𝑑𝜃1 • here, 𝛼 = learning rate / parameter
  • 14. Gradient Descent algorithm • Repeat until convergence { 𝜃1: = 𝜃1 - 𝛼 𝑑 𝐽(𝜃1) 𝑑𝜃1 here, assuming 𝜃0 = 0 for univariate linear regression } For multi variate linear regression: • Repeat until convergence { 𝜃𝑗 := 𝜃𝑗 - 𝛼 𝑑 𝐽(𝜃0, 𝜃1) 𝑑𝜃 𝑗 } Simultaneous update of 𝜃0, 𝜃1 Temp 0 := 𝜃 𝑜: = 𝜃0 - 𝛼 𝑑 𝐽(𝜃0, 𝜃1) 𝑑𝜃0 Temp 1 := 𝜃1: = 𝜃1 - 𝛼 𝑑 𝐽(𝜃0, 𝜃1) 𝑑𝜃1 𝜃 𝑜: = Temp 0 𝜃1: = Temp 1
  • 15. Effects associated with varying values of learning rate (𝛼) 𝛼
  • 16. Continue: • In the first case, we may find difficulty to reach at global optima since large value of 𝛼 may overshoot the optimal position due to aggressive updating of 𝜃 values. • Therefore, as we approach optima position, gradient descent will take automatically smaller steps.
  • 17. Conclusion • The cost function for linear regression is always gong to be a bow-shaped function (convex function) • This function doesn’t have an local optima except for the one global optima. • Therefore, using cost function of type 𝐽(𝜃0, 𝜃1) which we get whenever we are using linear regression, it will always converge to the global optimum. • Most important is make sure our gradient descent algorithms is working properly . • On increasing number of iterations, the value of 𝐽(𝜃0, 𝜃1) should get decreasing after every iterations. • Determining the automatic convergence test is difficult because we don't know the threshold value.