This document discusses backpropagation, an algorithm for supervised learning of artificial neural networks using gradient descent. It provides definitions and history of backpropagation, and explains how to use it with three main points:
1) It uses simple chain rules to calculate derivatives between weights in different layers to update weights.
2) Preparations include defining a cost function and the derivative of the sigmoid activation function commonly used.
3) The weight updates are dependent on derivatives from previous layers, and both forward and backward paths must be considered to calculate some derivatives between weights. Gradient descent is then applied to renew the weights.
2. 1. What is Backpropagation ?
1. Definition
Backpropagation is an algorithm for supervised learning of artificial neural networks using gradient descent.
2. History
Backpropagation algorism was developed in the 1970s, but in 1986, Rumelhart, Hinton and Williams
showed experimentally that this method can generate useful internal representations of incoming data in
Hidden layers of neural networks.
3. How to use Backpropagation?
Backpropagation consists of using simple chain rules. However, we often use non-linear functions
for activation functions, It is hard for us to use backpropagation.
(In this case, I will use sigmoid function for activation function.)
2
3. 2. Preparations
1. Cost function(Loss function)
I will use below cost function(ðŠð is value of hypothesis, ðŠð¡ is value of true)
ð¶ =
1
2
(ðŠð â ðŠð¡)2
2. Derivative of sigmoid function
ðð(ð§)
ðð§
=
1
(1+ðâð§)2 à â1 à âðâð§ = ð(ð§)(1 â ð(ð§))
âµ Sigmoid function = S(z) =
1
1+ðâð§ , ð¹ ð§ =(
1
ð ð§
) =
â ð ð§
ð ð§ 2
3. How to renew weights using Gradient descent
ððâð,ððð€ = ððâð,ððð â η
ðð¶
ðð ðâð,ððð
(η is learning rate)
3
4. 3. Jump to the Backpropagation
1. Derivative Relationship between weights
1-1. The weight update is dependent on derivatives that reside previous layers.(The Word previous means it is located right side ï)
ð¶ =
1
2
(ðŠð â ðŠð¡)2
â
ðð¶
ðð2,3
= (ðŠð â ðŠð¡) Ã
ððŠ ð
ðð2,3
= (ðŠð â ðŠð¡) Ã
ð
ðð2,3
[Ï{ð§(3)
}] (Ï is sigmoid function)
ðð¶
ðð2,3
= (ðŠð â ðŠð¡) ÃÏ{ð§(3)
} à [1 â Ï{ð§(3)
}] Ã
ðð3
ðð2,3
= (ðŠð â ðŠð¡) ÃÏ{ð§(3)
} à [1 â Ï{ð§(3)
}] Ã
ð
ðð2,3
( ð(2)
ð€2,3)
âŽ
ðð¶
ðð2,3
= (ðŠð â ðŠð¡)Ï{ð§(3)
}[1 â Ï{ð§(3)
}] ð(2)
1ððð 2 3
ð€1,2 ð€2,3
ð€ðð,1
ð§(2)
| ð(2)
ð§(1)
| ð(1)
ð§(3)
| ð(3)
ð(3)
= ðŠð
Feed forward
Backpropagation
4
6. 3. Jump to the Backpropagation
1. Derivative Relationship between weights
1-1. The weight update is dependent on derivatives that reside previous layers.(The Word previous means it is located right side ï)
ð¶ =
1
2
(ðŠð â ðŠð¡)2
â
ðð¶
ðð ðð,1
= (ðŠð â ðŠð¡) Ã
ððŠ ð
ðð ðð,1
= (ðŠð â ðŠð¡) Ã
ð
ðð ðð,1
[Ï{ð§(3)
}] (Ï is sigmoid function)
Using same way, we will get below equation.
ðð¶
ðð ðð,1
= (ðŠð â ðŠð¡)Ï{ð§(3)
}[1 â Ï{ð§(3)
}] ð€2,3 Ï{ð§(2)
}[1 â Ï{ð§(2)
} ] ð€1,2 Ï{ð§(1)
}[1 â Ï{ð§(1)
} ] ððð
1ððð 2 3
ð€1,2 ð€2,3
ð€ðð,1
ð§(2)
| ð(2)
ð§(1)
| ð(1)
ð§(3)
| ð(3)
ð(3)
= ðŠð
Feed forward
Backpropagation
6
7. 3. Jump to the Backpropagation
1. Derivative Relationship between weights
1-2. The weight update is dependent on derivatives that reside on both paths.
To get the result, you have to do more tedious calculations than the previous one. So I now just write the result of it.
If you want to know the calculation process, look at the next slide!
ðð¶
ðð ðð,1
= (ðŠð â ðŠð¡) ððð[ Ï{ð§(2)
}[1 â Ï{ð§(2)
}]ð€2,4 Ï{ð§(1)
}[1 â Ï{ð§(1)
}] ð€1,2 + Ï{ð§(3)
}[1 â Ï{ð§(3)
}]ð€3,4 Ï{ð§(1)
}[1 â Ï{ð§(1)
}]ð€1,3]
â ⡠⢠â£
2
3
ððð 1
ð€ðð,1
4 ð(3) = ðŠð
Feed forward
ð§(4)
| ð(4)
ð§(3)
| ð(3)
ð§(1)| ð(1)
ð€1,2 ð€2,4
ð€3,4ð€1,3
ð§(2)
| ð(2)
â â¡
â¢â£
7
9. 3. Jump to the Backpropagation
1. Derivative Relationship between weights
1-3. The derivative for a weight is not dependent on the derivatives of any of the other weights in the same layer.
This is easy, so I will not explain it here.(homework ï)
ð1
ð2
1
2
3
4
5
6
ð€13 ð€36
ð€14
ð€15
ð€23
ð€24
ð€25
ð€46
ð€56
ð€(1)
ð€(2)
Independant
9
10. 3. Jump to the Backpropagation
2. Application of Gradient descent
ððâð,ððð€ = ððâð,ððð â η
ðð¶
ðð ðâð,ððð
(η is learning rate)
â At first, We initialize weights and biases with initializer ïš we know!
â¡ we can control the learning rate ïš we know!
⢠we can get this value through the equation ïš we know!
Then, we can renew the weights using above equation. But, is not it too difficult to apply?
So, We will define Error Signal for simple application.
â â¡ â¢
10
11. 3. Jump to the Backpropagation
3. Error Signals
1-1 Defintion: ÎŽð =
ððª
ðð ð
1-2 General Form of Signals
ÎŽj =
ðC
ðZj
=
ð
ðZj
1
2
(ðŠð â ðŠð¡)2
= (ðŠð â ðŠð¡)
ððŠ ð
ðð ð
------- â
ððŠ ð
ðð ð
=
ððŠ ð
ðð ð
ðð ð
ðð ð
=
ððŠ ð
ðð ð
à Ï(ð§ð) (âµ ðð = Ï(ð§ð))
Because neural network consists of Multiple units, we can think all of the units ð â ðð¢ð¡ð ð .
So,
ððŠ ð
ðð ð
= Ï(ð§ð) ðâðð¢ð¡ð ð
ððŠ ð
ðð§ ð
ðð§ ð
ðð ð
ððŠ ð
ðð ð
= Ï(ð§ð) ðâðð¢ð¡ð ð
ððŠ ð
ðð§ ð
ð€ðð (âµ ð§ ð = ð€ðð ðð)
By above equation â and ÎŽk = (ðŠð â ðŠð¡)
ððŠ ð
ðð ð
ÎŽj = (ðŠð â ðŠð¡) Ï(ð§ð) ðâðð¢ð¡ð ð
ððŠ ð
ðð§ ð
ð€ðð = (ðŠð â ðŠð¡)Ï(ð§ð) ðâðð¢ð¡ð ð
ÎŽk
(ðŠ ðâðŠð¡)
ð€ðð
⎠Ύj= Ï(ð§ð) ðâðð¢ð¡ð ð ÎŽk ð€ðð , and for starting, we define ÎŽðððð¡ððð = (ðŠð â ðŠð¡)Ï{ð§(ðððð¡ððð)
}[1 â Ï{ð§(ðððð¡ððð)
}]
11
13. 4. Summarize
Although the picture below is a bit different from my description, Calculations will show you that this is exactly the same as my explanation.
(Picture Source: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html) 13
14. 4. Summarize
Although the picture below is a bit different from my description, Calculations will show you that this is exactly the same as my explanation.
(Picture Source: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html) 14