Lecture 5 backpropagation

Parveen Malik
Assistant Professor
KIIT University
Neural Networks
Backpropagation

Background
• Perceptron Learning Algorithm , Hebbian Learning can classify input pattern if input
patterns are linearly separable.
• We need an algorithm which can train multilayer of perceptron or classify patterns
which are not linearly separable.
• Algorithm should also be able to use non-linear activation function.
𝒙𝟏
𝒙𝟐
𝑪𝒍𝒂𝒔𝒔 𝟏
𝑪𝒍𝒂𝒔𝒔 𝟐
𝑳𝒊𝒏𝒆𝒂𝒓𝒍𝒚 𝑺𝒆𝒑𝒆𝒓𝒂𝒃𝒍𝒆
𝒙𝟏
𝒙𝟐
𝑪𝒍𝒂𝒔𝒔 𝟏
𝑪𝒍𝒂𝒔𝒔 𝟐
𝑳𝒊𝒏𝒆𝒂𝒓𝒍𝒚 𝑵𝒐𝒏 − 𝒔𝒆𝒑𝒆𝒓𝒂𝒃𝒍𝒆
• Need non-linear boundaries
• Perceptron algorithm can't be used
• Variation of GD rule is used.
• More layers are required
• Non-linear activation function required
Perceptron Algorithm −
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝑵𝒆𝒘
+ (𝒕 − 𝒂)𝒙𝒊
Gradient Descent Algorithm-
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝒘𝒊
𝑳𝒐𝒔𝒔 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏, 𝑳 =
𝟏
𝟐
𝒕 − 𝒂 𝟐

Background- Back Propagation
• The perceptron learning rule of Frank Rosenblatt and the LMS algorithm of Bernard Widrow and
Marcian Hoff were designed to train single-layer perceptron-like networks.
• Single-layer networks suffer from the disadvantage that they are only able to solve linearly separable
classification problems. Both Rosenblatt and Widrow were aware of these limitations and proposed
multilayer networks that could overcome them, but they were not able to generalize their algorithms
to train these more powerful networks.
• First description of an algorithm to train multilayer networks was contained in the thesis of Paul
Werbos in 1974 .This thesis presented the algorithm in the context of general networks, with neural
networks as a special case, and was not disseminated in the neural network community.
• It was not until the mid 1980s that the backpropagation algorithm was rediscovered and widely
publicized. It was rediscovered independently by David Rumelhart, Geoffrey Hinton and Ronald
Williams 1986, David Parker 1985 and Yann Le Cun 1985.
• The algorithm was popularized by its inclusion in the book Parallel Distributed Processing [RuMc86],
which described the work of the Parallel Distributed Processing Group led by psychologists David
Rumelhart and James Mc-Clelland
• The multilayer perceptron, trained by the backpropagation algorithm, is currently the most widely
used neural network.

Network Design
Problem : Whether you watch a movie or not ?
Step 1 : Design – Output can be Yes (1) or No (0). Therefore one neuron or perceptron is
sufficient.
Step -2 : Choose suitable activation function in the output along with a rule to update the
weights. (Hard Limit function for perceptron learning algorithm, sigmoid for the
Widrow-Hoff rule or delta rule.)
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝒘𝒊
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
=
𝟏
𝟐
𝒚 − 𝒇 𝒘𝒙 + 𝒃
𝟐
𝝏𝑳
𝝏𝒘
= 𝟐 ∗
𝟏
𝟐
𝒚 − 𝒇 𝒘𝒙 + 𝒃
𝝏𝒇 𝒘𝒙 + 𝒃
𝝏𝒘
= − 𝒚 − ෝ
𝒚 𝒇′ 𝒘𝒙 + 𝒃 𝒙
𝑥 ෍ 𝑓
Director
or
Actor
or
Genre
or
IMDB
w
Yes (1)
or
No (0)
𝑤𝑥 + 𝑏
ෝ
𝒚 = 𝒇 𝒘𝒙 + 𝒃 =
𝟏
𝟏 + 𝒆−𝒘𝒙+𝒃
𝒇 𝒘𝒙 + 𝒃
𝑤0 = 𝑏
1

Network Design
Problem : Sort the students in the 4 house based on their three qualities like lineage,
choice and ethics ?
Step 1 : Design – Here, the input vector is 3-D i.e for each students, 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 1 =
𝐿1
𝐶1
𝐸1
,
𝑆𝑡𝑢𝑑𝑒𝑛𝑡 2 =
𝐿2
𝐶2
𝐸2
𝒙𝟏
𝒙𝟐
𝒙𝟑
N
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒙𝟎=1
𝒘𝟏
𝒘𝟐
𝒘𝟑
𝒘𝟎 = 𝒃
Yes (1)
or
No (0)
𝑁1
𝑁2
ෝ
𝒚𝟏 = 𝒇 𝒘𝟏𝟏𝒙𝟏 + 𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒘𝟏𝟏
𝒘𝟏𝟐
𝒘𝟏𝟑
𝒘𝟐𝟏
𝒘𝟐𝟐
𝒘𝟐𝟑
ෝ
𝒚𝟐 = 𝒇 𝒘𝟐𝟏𝒙𝟏 + 𝒘𝟐𝟐𝒙𝟐 + 𝒘𝟐𝟑𝒙𝟑 + 𝒃𝟐
𝒃𝟏
𝒃𝟐
ො
𝑦1
ො
𝑦2
0
1
1
0
1
1
0
0
A B C D
Houses
𝑦1
𝑦2
Actual Output
Target Output

Network Design
Step 2 : Choosing the activation function and rule to update weights
Loss function, 𝐿 =
1
2
𝑦 − ො
𝑦 2
1
1
𝑁1
𝑁2
ෝ
𝒚𝟏 = 𝒇 𝒘𝟏𝟏𝒙𝟏 + 𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒘𝟏𝟏
𝒘𝟏𝟐
𝒘𝟏𝟑
𝒘𝟐𝟏
𝒘𝟐𝟐
𝒘𝟐𝟑
ෝ
𝒚𝟐 = 𝒇 𝒘𝟐𝟏𝒙𝟏 + 𝒘𝟐𝟐𝒙𝟐 + 𝒘𝟐𝟑𝒙𝟑 + 𝒃𝟐
𝒃𝟏
𝒃𝟐
ො
𝑦1
ො
𝑦2
0
1
1
0
0
0
A B C D
Houses
𝑦1
𝑦2
Actual Output
Target Output
𝑾𝒊𝒋 𝒕 + 𝟏 = 𝑾𝒊𝒋 𝒕 − 𝜼
𝝏𝑳
𝝏𝒘𝒊𝒋
𝝏𝑳
𝝏𝒘𝟏𝟏
= 𝒚 − ෝ
𝒚 𝒇′ 𝒘𝟏𝟏𝒙𝟏 + +𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏 𝒙𝟏

Network Architectures (Complex)
𝒙𝟏
𝒉𝟏
𝒙𝒏
𝒙𝟐
𝒙𝒊
𝒉𝟐
𝒉𝒎
𝒉𝒋
𝒚𝟏
𝒚𝟐
𝒚𝒍
𝒚𝒌
⋮
Input Layer Hidden Layer Output Layer
𝑾(𝟏)
𝑾(𝟐)
⋮
⋮
⋮
⋮
⋮

Network Architectures (More Complex)
𝑥1 𝑥2 𝑥2 𝑥2
ℎ2
(1)
ℎ1
(1)
ℎ3
(1)
ℎ1
(2)
ℎ2
(2)
ℎ3
(2)
𝑦1 𝑦2
𝑾(𝟏)
𝑾(𝟐)
𝑾(𝟑)

Input
𝝏𝑳
𝝏𝑾𝒊𝒍
= 𝜹𝒊𝒁𝒍
𝜹𝒊 = 𝝈′
𝒂𝒊 ෍
𝑱
𝜹𝒋𝑾𝒋𝒊
𝑎𝑙 𝑧𝑙
𝑎𝑖 𝑧𝑖
𝑎𝑗 𝑧𝑗
⋮
⋮
⋮
⋮
⋮
⋮
𝑾𝒊𝒍
𝑾𝒋𝒊
𝜹𝒊 =
𝝏𝑳
𝝏𝒂𝒊
𝜹𝒋 =
𝝏𝑳
𝝏𝒂𝒋
Cost Function
𝑳 =
𝟏
𝟐
(𝒚 − ෝ
𝒚) 𝟐
Error to
input layer
𝝈 𝒂𝒊 𝟏 − 𝝈 𝒂𝒊
Back-propagation Algorithm (Generalized Expression)
𝜹𝒋 =
𝝏𝑳
𝝏𝒂𝒋
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝒂𝒋
𝒂𝒊 = ෍
𝒍
𝑾𝒊𝒍𝒁𝒍
𝜹𝒊 =
𝝏𝑳
𝝏𝒂𝒊
= ෍
𝑱
𝝏𝑳
𝝏𝒂𝒋
𝝏𝒂𝒋
𝝏𝒂𝒊
𝝏𝒂𝒋
𝝏𝒂𝒊
=
𝝏𝒂𝒋
𝝏𝒁𝒊
𝝏𝒁𝒊
𝝏𝒂𝒊
= 𝑾𝒋𝒊𝝈′ 𝒂𝒊

Back-propagation Algorithm
𝒙𝟏
𝒙𝟐
ෝ
𝒚
𝑎1 𝜎 𝑎1
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
1
1
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2

𝒙𝟏
𝒙𝟐
ෝ
𝒚
𝑎1 𝜎 𝑎1
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝑻𝒂𝒓𝒈𝒆𝒕
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)

𝒙𝟏
𝒙𝟐
ෝ
𝒚
𝑎1 𝜎 𝑎1
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function

𝒙𝟏
𝒙𝟐
ෝ
𝒚
𝑎1 𝜎 𝑎1
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝑾𝑵𝒆𝒘
= 𝑾𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝑾𝒐𝒍𝒅

Step 1 : Forward pass
𝒙𝟏
𝒙𝟐
𝑎1 𝜎 𝑎1
𝒂𝟏=0.2
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑊
12
(1)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝝈 𝒂𝟏 =
𝟏
𝟏 + 𝒆−𝟎.𝟐
= 𝟎. 𝟓𝟒𝟗𝟖
𝑾𝑵𝒆𝒘 = 𝑾𝒐𝒍𝒅 − 𝜼
𝝏𝑳

𝒙𝟏
𝒙𝟐
𝑎1 𝜎 𝑎1
𝒂𝟏=0.2
𝒂𝟐=0.9
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝝈 𝒂𝟏 =
𝟏
𝟏 + 𝒆−𝟎.𝟐
= 𝟎. 𝟓𝟒𝟗𝟖
𝝈 𝒂𝟐 =
𝟏
𝟏 + 𝒆−𝟎.𝟗
= 𝟎. 𝟕𝟏𝟎𝟗
ෝ
𝒚
𝑾𝑵𝒆𝒘
= 𝑾𝒐𝒍𝒅
− 𝜼
𝝏𝑳

𝒙𝟏
𝒙𝟐
𝑎1 𝜎 𝑎1
𝒂𝟏=0.2
𝒂𝟐=0.9
𝒃𝟏=0.09101
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
ෝ
𝒚 = 𝝈 𝒃𝟏 =0.5227
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝝈 𝒂𝟏 =
𝟏
𝟏 + 𝒆−𝟎.𝟐
= 𝟎. 𝟓𝟒𝟗𝟖
𝝈 𝒂𝟐 =
𝟏
𝟏 + 𝒆−𝟎.𝟗
= 𝟎. 𝟕𝟏𝟎𝟗

Step 2 : Backpropagation of error
𝒙𝟏
𝒙𝟐
𝑎1 𝜎 𝑎1
𝒂𝟏=0.2
𝒂𝟐=0.9
𝒃𝟏=0.09101
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
ෝ
𝒚 = 𝝈 𝒃𝟏 =0.5227
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝝈 𝒂𝟏 =
𝟏
𝟏 + 𝒆−𝟎.𝟐
= 𝟎. 𝟓𝟒𝟗𝟖
𝝈 𝒂𝟐 =
𝟏
𝟏 + 𝒆−𝟎.𝟗
= 𝟎. 𝟕𝟏𝟎𝟗
𝑎𝑙 𝑧𝑙
𝑎𝑖 𝑧𝑖
𝑎𝑗 𝑧𝑗
⋮
⋮
⋮
⋮
⋮
⋮
𝝏𝑳
𝝏𝑾𝒊𝒍
= 𝜹𝒊𝒁𝒍
𝜹𝒊 = 𝝈′
𝒂𝒊 ෍
𝑱
𝑾𝒊𝒍
𝑾𝒋𝒊
Imagine

𝝏𝑳
𝝏𝑾𝒊𝒍
= 𝜹𝒊𝒁𝒍
𝜹𝒊 = 𝝈′
𝒂𝒊 ෍
𝑱
𝑎𝑙 𝑧𝑙
𝑎𝑖 𝑧𝑖
𝑎𝑗 𝑧𝑗
⋮
⋮
⋮
⋮
⋮
⋮
𝑾𝒊𝒍 𝑾𝒋𝒊
𝜹𝒊=
𝝏𝑳
𝝏𝒂𝒊
𝜹𝒋=
𝝏𝑳
𝝏𝒂𝒋
𝝈′
𝒂𝒊 = 𝝈 𝒂𝒊 𝟏 − 𝝈 𝒂𝒊
Cost/Error Function
𝑳 =
𝟏
𝟐
(𝒚 − ෝ
𝒚) 𝟐
𝒛𝒊 = 𝝈 𝒂𝒊
𝜹𝟏 = 𝝈′
𝒂𝟏 𝜹𝒋𝑾𝟏𝟏
(𝟐)
= 𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝜹𝒐𝒖𝒕𝑾𝟏𝟏
(𝟐)
= −𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟏
(𝟐)
𝜹𝒋=
𝝏𝑳
𝝏𝒂𝒋
𝜹𝒐𝒖𝒕 =
𝝏𝑳
𝝏𝒃𝒐𝒖𝒕
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
= − 𝒚 − 𝒛𝒐𝒖𝒕
𝝏𝒛𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈′ 𝒃𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕
𝒙𝟏
𝑎1 𝑧1
𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕
𝑾𝟏𝟏
(𝟐)
𝑾𝟏𝟏
(𝟏)
𝒛𝟏 = 𝝈 𝒂𝟏
𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ
𝒚
𝒙𝟐
𝑎2 𝑧2
𝑾𝟏𝟐
(𝟐)
𝑾𝟐𝟐
(𝟏)
𝒛𝟐 = 𝝈 𝒂𝟐
𝜹𝟐 = 𝝈′ 𝒂𝟐 𝜹𝒋𝑾𝟏𝟐
(𝟐)
= 𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝜹𝒐𝒖𝒕𝑾𝟏𝟐
(𝟐)
= −𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟐
(𝟐)

Back-propagation Algorithm - error propagation (Update of Layer 1 weights)
0.5227
0.09101
𝟎. 𝟕𝟏𝟎𝟗
0.9
𝟎. 𝟓𝟒𝟗𝟖
0.2
𝜹𝟏 = 𝝈′
𝒂𝟏 𝜹𝒋𝑾𝟏𝟏
(𝟐)
= 𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝜹𝒐𝒖𝒕𝑾𝟏𝟏
(𝟐)
= −𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟏
(𝟐)
𝜹𝒋=
𝝏𝑳
𝝏𝒂𝒋
𝜹𝒐𝒖𝒕 =
𝝏𝑳
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
= − 𝒚 − 𝒛𝒐𝒖𝒕
𝝏𝒛𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈′
𝒃𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕
𝒙𝟏
𝑎1 𝑧1
𝑾𝟏𝟏
(𝟐)
𝑾𝟏𝟏
(𝟏)
𝒛𝟏 = 𝝈 𝒂𝟏
𝒚
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟐
(𝟏) 𝑾𝟏𝟎
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟎
(𝟐)
𝒙𝟐
𝑎2 𝑧2
𝑾𝟏𝟐
(𝟐)
𝑾𝟐𝟐
(𝟏)
𝒛𝟐 = 𝝈 𝒂𝟐
𝜹𝟐 = 𝝈′
𝒂𝟐 𝜹𝒋𝑾𝟏𝟐
(𝟐)
= 𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝜹𝒐𝒖𝒕𝑾𝟏𝟐
(𝟐)
= −𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟐
(𝟐)
0.4
0.1
𝜹𝟏 = − 𝟎. 𝟓𝟒𝟗𝟖 𝟏 − 𝟎. 𝟓𝟒𝟗𝟖 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟒 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝜹𝟐 = − 𝟎. 𝟕𝟏𝟎𝟗 𝟏 − 𝟎. 𝟕𝟏𝟎𝟗 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟏 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕
0
1
𝒚 = 𝟏
1
1
1

Back-propagation Algorithm (Update of Layer 1 weights)
𝑾𝒊𝒋
𝑵𝒆𝒘
= 𝑾𝒊𝒋
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝑾𝒊𝒋
𝒐𝒍𝒅
𝜹𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝜹𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕
𝜼 = 𝟎. 𝟐𝟓
0.5227
0.09101
0.9
𝟎. 𝟕𝟏𝟎𝟗
𝟎. 𝟓𝟒𝟗𝟖
0.2
𝒙𝟏
𝑎1 𝑧1
𝑾𝟏𝟏
(𝟐)
𝑾𝟏𝟏
(𝟏)
𝒛𝟏 = 𝝈 𝒂𝟏
𝒚
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟎
(𝟏)
1
𝑾𝟐𝟎
(𝟏)
1
1
𝒙𝟐
𝑎2 𝑧2
𝑾𝟏𝟐
(𝟐)
𝑾𝟐𝟐
(𝟏)
𝒛𝟐 = 𝝈 𝒂𝟐
0.6
0.4
0.4
0.1
-0.1
-0.3
0
1
𝒚 = 𝟏
0.3
0.5
-0.2
𝝏𝑳
𝝏𝑾𝟏𝟏
(𝟏)
= 𝜹𝟏𝒙𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟎 = 𝟎
𝝏𝑳
𝝏𝑾𝟏𝟐
(𝟏)
= 𝜹𝟏𝒙𝟐 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝝏𝑳
𝝏𝑾𝟏𝟎
(𝟏)
= 𝜹𝟏𝒙𝟎 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝝏𝑳
𝝏𝑾𝟐𝟏
(𝟏)
= 𝜹𝟐𝒙𝟏 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟎 = 𝟎
𝝏𝑳
𝝏𝑾𝟐𝟐
(𝟏) = 𝜹𝟐𝒙𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟏 = −𝟎. 𝟎02447
𝝏𝑳
𝝏𝑾𝟐𝟎
(𝟏) = 𝜹𝟐𝒙𝟎 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟏 = −𝟎. 𝟎𝟎2447
𝑾𝟏𝟏
𝟏
𝒕 + 𝟏 = 𝑾𝟏𝟏
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟏𝟏
𝟏
𝒕
= 𝟎. 𝟔 − 𝟎. 𝟐𝟓 × 𝟎 = 𝟎. 𝟔
𝑾𝟏𝟐
𝟏
𝒕 + 𝟏 = 𝑾𝟏𝟐
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟏𝟐
𝟏
𝒕
= −𝟎. 𝟏 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 = −𝟎. . 𝟎𝟗𝟕𝟎𝟓
𝑾𝟏𝟎
𝟏
𝒕 + 𝟏 = 𝑾𝟏𝟎
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟏𝟎
𝟏
𝒕
= 𝟎. 𝟑 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 = 𝟎. 𝟑𝟎𝟐𝟗𝟓
𝑾𝟐𝟏
𝟏
𝒕 + 𝟏 = 𝑾𝟐𝟏
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟐𝟏
𝟏
𝒕
= −𝟎. 𝟑 − 𝟎. 𝟐𝟓 × 𝟎 = −𝟎. 𝟑
𝑾𝟐𝟐
𝟏
𝒕 + 𝟏 = 𝑾𝟐𝟐
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟐𝟐
𝟏
𝒕
= 𝟎. 𝟒 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 = 𝟎.4006125
𝑾𝟐𝟎
𝟏
𝒕 + 𝟏 = 𝑾𝟐𝟎
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟐𝟎
𝟏
𝒕
= 𝟎. 𝟓 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 = 𝟎. 𝟓𝟎𝟎𝟔𝟏𝟐𝟓
𝑾𝟏𝟎
(𝟐)

Back-propagation Algorithm (Update of layer 2 Weights)
𝑾𝒊𝒋
𝑵𝒆𝒘
= 𝑾𝒊𝒋
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝑾𝒊𝒋
𝒐𝒍𝒅
𝜹𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝜹𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕
0.5227
0.09101
𝟎. 𝟕𝟏𝟎𝟗
0.9
0.2 𝟎. 𝟓𝟒𝟗𝟖
𝒙𝟏
𝑎1 𝑧1
𝑾𝟏𝟏
(𝟐)
𝑾𝟏𝟏
(𝟏)
𝒛𝟏 = 𝝈 𝒂𝟏
𝒚
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟎
(𝟏)
1
𝑾𝟐𝟎
(𝟏)
1
1
𝑾𝟏𝟎
(𝟐)
𝒙𝟐
𝑎2 𝑧2
𝑾𝟏𝟐
(𝟐)
𝑾𝟐𝟐
(𝟏)
𝒛𝟐 = 𝝈 𝒂𝟐
0.6
0.4
0.4
0.1
-0.1
-0.3
0
1
𝒚 = 𝟏
0.3
0.5
-0.2
𝒃𝒐𝒖𝒕 = 𝒛𝟏𝑾𝟏𝟏
(𝟐)
+ 𝒛𝟐𝑾𝟏𝟐
(𝟐)
+ 𝑾𝟏𝟎
(𝟐)
, 𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ
𝒚
𝝏𝑳
𝝏𝑾𝟏𝟏
(𝟐)
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝑾𝟏𝟏
𝟐
= − 𝒚 − ෝ
𝒚 𝝈′
𝒃𝒐𝒖𝒕 𝒛𝟏 = 𝜹𝒐𝒖𝒕𝒛𝟏
𝝏𝑳
𝝏𝑾𝟏𝟐
(𝟐)
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝑾𝟏𝟐
𝟐
= − 𝒚 − ෝ
𝒚 𝝈′
𝒃𝒐𝒖𝒕 𝒛𝟐 = 𝜹𝒐𝒖𝒕𝒛𝟐
𝝏𝑳
𝝏𝑾𝟏𝟎
(𝟐)
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝑾𝟏𝟎
𝟐
= − 𝒚 − ෝ
𝒚 𝝈′
𝒃𝒐𝒖𝒕 = 𝜹𝒐𝒖𝒕
𝝈′
𝒃𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕
𝜹𝑶𝒖𝒕 = − 𝒚 − ෝ
𝒚 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕
= − 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕
= −𝟎. 𝟏𝟏𝟗𝟎𝟖
𝑾𝟏𝟏
𝟐
𝒕 + 𝟏 = 𝑾𝟏𝟏
𝟐
𝒕 − 𝜼
𝝏𝑳
𝝏𝑾𝟏𝟏
𝟐
= 𝟎. 𝟒 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 × 𝟎. 𝟓𝟒𝟗𝟖 = 𝟎. 𝟒𝟏𝟔𝟒
𝑾𝟏𝟐
𝟐
𝒕 + 𝟏 = 𝑾𝟏𝟐
𝟐
𝒕 − 𝜼
𝝏𝑳
𝝏𝑾𝟏𝟐
𝟐
= 𝟎. 𝟏 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 × 𝟎. 𝟕𝟏𝟎𝟗 = 𝟎. 𝟏𝟐𝟏𝟐
𝑾𝟏𝟎
𝟐
𝒕 + 𝟏 = 𝑾𝟏𝟎
𝟐
𝒕 − 𝜼
𝝏𝑳
𝝏𝑾𝟏𝟎
𝟐
= −𝟎. 𝟐 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 = −𝟎. 𝟏𝟕𝟎𝟐𝟑

Lecture 5 backpropagation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Lecture 5 backpropagation

Ähnlich wie Lecture 5 backpropagation (20)

Mehr von ParveenMalik18

Mehr von ParveenMalik18 (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Lecture 5 backpropagation