SlideShare a Scribd company logo
1 of 4
Download to read offline
Derivation of Backpropagation
1 Introduction
Figure 1: Neural network processing
Conceptually, a network forward propagates activation to produce an output and it backward
propagates error to determine weight changes (as shown in Figure 1). The weights on the connec-
tions between neurons mediate the passed values in both directions.
The Backpropagation algorithm is used to learn the weights of a multilayer neural network with
a fixed architecture. It performs gradient descent to try to minimize the sum squared error between
the network’s output values and the given target values.
Figure 2 depicts the network components which affect a particular weight change. Notice that
all the necessary components are locally related to the weight being updated. This is one feature
of backpropagation that seems biologically plausible. However, brain connections appear to be
unidirectional and not bidirectional as would be required to implement backpropagation.
2 Notation
For the purpose of this derivation, we will use the following notation:
• The subscript k denotes the output layer.
• The subscript j denotes the hidden layer.
• The subscript i denotes the input layer.
1
Figure 2: The change to a hidden to output weight depends on error (depicted as a lined pattern)
at the output node and activation (depicted as a solid pattern) at the hidden node. While the
change to a input to hidden weight depends on error at the hidden node (which in turn depends
on error at all the output nodes) and activation at the input node.
• wkj denotes a weight from the hidden to the output layer.
• wji denotes a weight from the input to the hidden layer.
• a denotes an activation value.
• t denotes a target value.
• net denotes the net input.
3 Review of Calculus Rules
d(eu)
dx
= eu du
dx
d(g + h)
dx
=
dg
dx
+
dh
dx
d(gn)
dx
= ngn−1 dg
dx
4 Gradient Descent on Error
We can motivate the backpropagation learning algorithm as gradient descent on sum-squared error
(we square the error because we are interested in its magnitude, not its sign). The total error in a
network is given by the following equation (the 1
2 will simplify things later).
E =
1
2
X
k
(tk − ak)2
We want to adjust the network’s weights to reduce this overall error.
∆W ∝ −
∂E
∂W
We will begin at the output layer with a particular weight.
2
∆wkj ∝ −
∂E
∂wkj
However error is not directly a function of a weight. We expand this as follows.
∆wkj = −ε
∂E
∂ak
∂ak
∂netk
∂netk
∂wkj
Let’s consider each of these partial derivatives in turn. Note that only one term of the E summation
will have a non-zero derivative: the one associated with the particular weight we are considering.
4.1 Derivative of the error with respect to the activation
∂E
∂ak
=
∂(1
2 (tk − ak)2)
∂ak
= −(tk − ak)
Now we see why the 1
2 in the E term was useful.
4.2 Derivative of the activation with respect to the net input
∂ak
∂netk
=
∂(1 + e−netk )−1
∂netk
=
e−netk
(1 + e−netk )2
We’d like to be able to rewrite this result in terms of the activation function. Notice that:
1 −
1
1 + e−netk
=
e−netk
1 + e−netk
Using this fact, we can rewrite the result of the partial derivative as:
ak(1 − ak)
4.3 Derivative of the net input with respect to a weight
Note that only one term of the net summation will have a non-zero derivative: again the one
associated with the particular weight we are considering.
∂netk
∂wkj
=
∂(wkjaj)
∂wkj
= aj
4.4 Weight change rule for a hidden to output weight
Now substituting these results back into our original equation we have:
∆wkj = ε
δk
z }| {
(tk − ak)ak(1 − ak) aj
Notice that this looks very similar to the Perceptron Training Rule. The only difference is the
inclusion of the derivative of the activation function. This equation is typically simplified as shown
below where the δ term repesents the product of the error with the derivative of the activation
function.
∆wkj = εδkaj
3
4.5 Weight change rule for an input to hidden weight
Now we have to determine the appropriate weight change for an input to hidden weight. This is
more complicated because it depends on the error at all of the nodes this weighted connection can
lead to.
∆wji ∝ −[
X
k
∂E
∂ak
∂ak
∂netk
∂netk
∂aj
]
∂aj
∂netj
∂netj
∂wji
= ε[
X
k
δk
z }| {
(tk − ak)ak(1 − ak) wkj]aj(1 − aj)ai
= ε
δj
z }| {
[
X
k
δkwkj]aj(1 − aj) ai
∆wji = εδjai
4

More Related Content

What's hot

The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
ESCOM
 

What's hot (20)

Stiffness matrix method of indeterminate Beam2
Stiffness matrix method of indeterminate Beam2Stiffness matrix method of indeterminate Beam2
Stiffness matrix method of indeterminate Beam2
 
Disjoint set
Disjoint setDisjoint set
Disjoint set
 
honn
honnhonn
honn
 
K means
K meansK means
K means
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using Clustering
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Stiffness matrix method of indeterminate Beam5
Stiffness matrix method of indeterminate Beam5Stiffness matrix method of indeterminate Beam5
Stiffness matrix method of indeterminate Beam5
 
K-Means manual work
K-Means manual workK-Means manual work
K-Means manual work
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Data miningpresentation
Data miningpresentationData miningpresentation
Data miningpresentation
 
Stiffness matrix method of indeterminate Beam3
Stiffness matrix method of indeterminate Beam3Stiffness matrix method of indeterminate Beam3
Stiffness matrix method of indeterminate Beam3
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Icmtea
IcmteaIcmtea
Icmtea
 
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Rough K Means - Numerical Example
Rough K Means - Numerical ExampleRough K Means - Numerical Example
Rough K Means - Numerical Example
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
Vgg
VggVgg
Vgg
 
07. disjoint set
07. disjoint set07. disjoint set
07. disjoint set
 
Lecture 19: Implementation of Histogram Image Operation
Lecture 19: Implementation of Histogram Image OperationLecture 19: Implementation of Histogram Image Operation
Lecture 19: Implementation of Histogram Image Operation
 

Similar to Back propderiv

nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
Cemal Ardil
 
Chapter No. 6: Backpropagation Networks
Chapter No. 6:  Backpropagation NetworksChapter No. 6:  Backpropagation Networks
Chapter No. 6: Backpropagation Networks
RamkrishnaPatil17
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 

Similar to Back propderiv (20)

nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 
2-Perceptrons.pdf
2-Perceptrons.pdf2-Perceptrons.pdf
2-Perceptrons.pdf
 
Backpropagation - A peek into the Mathematics of optimization.pdf
Backpropagation - A peek into the Mathematics of optimization.pdfBackpropagation - A peek into the Mathematics of optimization.pdf
Backpropagation - A peek into the Mathematics of optimization.pdf
 
NN-Ch6.PDF
NN-Ch6.PDFNN-Ch6.PDF
NN-Ch6.PDF
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
 
parallel
parallelparallel
parallel
 
ECE 565 presentation
ECE 565 presentationECE 565 presentation
ECE 565 presentation
 
Capstone paper
Capstone paperCapstone paper
Capstone paper
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Introduction to Neural networks (under graduate course) Lecture 4 of 9
Introduction to Neural networks (under graduate course) Lecture 4 of 9Introduction to Neural networks (under graduate course) Lecture 4 of 9
Introduction to Neural networks (under graduate course) Lecture 4 of 9
 
SOFTCOMPUTERING TECHNICS - Unit
SOFTCOMPUTERING TECHNICS - UnitSOFTCOMPUTERING TECHNICS - Unit
SOFTCOMPUTERING TECHNICS - Unit
 
Chapter No. 6: Backpropagation Networks
Chapter No. 6:  Backpropagation NetworksChapter No. 6:  Backpropagation Networks
Chapter No. 6: Backpropagation Networks
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
 
CS767_Lecture_04.pptx
CS767_Lecture_04.pptxCS767_Lecture_04.pptx
CS767_Lecture_04.pptx
 
Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2
 
2. 2.1 BPN.pdf
2. 2.1 BPN.pdf2. 2.1 BPN.pdf
2. 2.1 BPN.pdf
 

Recently uploaded

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 

Back propderiv

  • 1. Derivation of Backpropagation 1 Introduction Figure 1: Neural network processing Conceptually, a network forward propagates activation to produce an output and it backward propagates error to determine weight changes (as shown in Figure 1). The weights on the connec- tions between neurons mediate the passed values in both directions. The Backpropagation algorithm is used to learn the weights of a multilayer neural network with a fixed architecture. It performs gradient descent to try to minimize the sum squared error between the network’s output values and the given target values. Figure 2 depicts the network components which affect a particular weight change. Notice that all the necessary components are locally related to the weight being updated. This is one feature of backpropagation that seems biologically plausible. However, brain connections appear to be unidirectional and not bidirectional as would be required to implement backpropagation. 2 Notation For the purpose of this derivation, we will use the following notation: • The subscript k denotes the output layer. • The subscript j denotes the hidden layer. • The subscript i denotes the input layer. 1
  • 2. Figure 2: The change to a hidden to output weight depends on error (depicted as a lined pattern) at the output node and activation (depicted as a solid pattern) at the hidden node. While the change to a input to hidden weight depends on error at the hidden node (which in turn depends on error at all the output nodes) and activation at the input node. • wkj denotes a weight from the hidden to the output layer. • wji denotes a weight from the input to the hidden layer. • a denotes an activation value. • t denotes a target value. • net denotes the net input. 3 Review of Calculus Rules d(eu) dx = eu du dx d(g + h) dx = dg dx + dh dx d(gn) dx = ngn−1 dg dx 4 Gradient Descent on Error We can motivate the backpropagation learning algorithm as gradient descent on sum-squared error (we square the error because we are interested in its magnitude, not its sign). The total error in a network is given by the following equation (the 1 2 will simplify things later). E = 1 2 X k (tk − ak)2 We want to adjust the network’s weights to reduce this overall error. ∆W ∝ − ∂E ∂W We will begin at the output layer with a particular weight. 2
  • 3. ∆wkj ∝ − ∂E ∂wkj However error is not directly a function of a weight. We expand this as follows. ∆wkj = −ε ∂E ∂ak ∂ak ∂netk ∂netk ∂wkj Let’s consider each of these partial derivatives in turn. Note that only one term of the E summation will have a non-zero derivative: the one associated with the particular weight we are considering. 4.1 Derivative of the error with respect to the activation ∂E ∂ak = ∂(1 2 (tk − ak)2) ∂ak = −(tk − ak) Now we see why the 1 2 in the E term was useful. 4.2 Derivative of the activation with respect to the net input ∂ak ∂netk = ∂(1 + e−netk )−1 ∂netk = e−netk (1 + e−netk )2 We’d like to be able to rewrite this result in terms of the activation function. Notice that: 1 − 1 1 + e−netk = e−netk 1 + e−netk Using this fact, we can rewrite the result of the partial derivative as: ak(1 − ak) 4.3 Derivative of the net input with respect to a weight Note that only one term of the net summation will have a non-zero derivative: again the one associated with the particular weight we are considering. ∂netk ∂wkj = ∂(wkjaj) ∂wkj = aj 4.4 Weight change rule for a hidden to output weight Now substituting these results back into our original equation we have: ∆wkj = ε δk z }| { (tk − ak)ak(1 − ak) aj Notice that this looks very similar to the Perceptron Training Rule. The only difference is the inclusion of the derivative of the activation function. This equation is typically simplified as shown below where the δ term repesents the product of the error with the derivative of the activation function. ∆wkj = εδkaj 3
  • 4. 4.5 Weight change rule for an input to hidden weight Now we have to determine the appropriate weight change for an input to hidden weight. This is more complicated because it depends on the error at all of the nodes this weighted connection can lead to. ∆wji ∝ −[ X k ∂E ∂ak ∂ak ∂netk ∂netk ∂aj ] ∂aj ∂netj ∂netj ∂wji = ε[ X k δk z }| { (tk − ak)ak(1 − ak) wkj]aj(1 − aj)ai = ε δj z }| { [ X k δkwkj]aj(1 − aj) ai ∆wji = εδjai 4