SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Memo:	
  Backpropaga.on	
  	
  
in	
  Convolu.onal	
  Neural	
  Network	
Hiroshi	
  Kuwajima	
  
13-­‐03-­‐2014	
  Created	
  
14-­‐08-­‐2014	
  Revised	
  
1 /14
2	
  /14	
  
Note	
n  Purpose	
  
The	
  purpose	
  of	
  this	
  memo	
  is	
  trying	
  to	
  understand	
  and	
  remind	
  the	
  backpropaga.on	
  algorithm	
  in	
  
Convolu.onal	
  Neural	
  Network	
  based	
  on	
  a	
  discussion	
  with	
  Prof.	
  Masayuki	
  Tanaka.	
  	
  
n  Table	
  of	
  Contents	
  
In	
  this	
  memo,	
  backpropaga.on	
  algorithms	
  in	
  different	
  neural	
  networks	
  are	
  explained	
  in	
  the	
  following	
  
order.	
  	
  
	
  
p  Single	
  neuron 	
   	
  3	
  
p  Mul.-­‐layer	
  neural	
  network 	
  5	
  
p  General	
  cases 	
   	
  7	
  
p  Convolu.on	
  layer 	
   	
  9	
  
p  Pooling	
  layer	
   	
   	
  11	
  
p  Convolu.onal	
  Neural	
  Network 	
  13	
  
n  Nota.on	
  
This	
  memo	
  follows	
  the	
  nota.on	
  in	
  UFLDL	
  tutorial	
  (h[p://ufldl.stanford.edu/tutorial)	
  
3	
  /14	
  
Neural	
  Network	
  as	
  a	
  Composite	
  Func4on	
A	
  neural	
  network	
  is	
  decomposed	
  into	
  a	
  composite	
  func.on	
  where	
  each	
  func.on	
  element	
  
corresponds	
  to	
  a	
  differen.able	
  opera.on.	
  	
  
	
  
n  Single	
  neuron	
  (the	
  simplest	
  neural	
  network)	
  example	
  
A	
  single	
  neuron	
  is	
  decomposed	
  into	
  a	
  composite	
  func.on	
  of	
  an	
  affine	
  func.on	
  element	
  parameterized	
  by	
  
W	
  and	
  b	
  and	
  an	
  ac.va.on	
  func.on	
  element	
  	
  f	
  which	
  we	
  choose	
  to	
  be	
  the	
  sigmoid	
  func.on.	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
Deriva.ves	
  of	
  both	
  affine	
  and	
  sigmoid	
  func.on	
  elements	
  w.r.t.	
  both	
  inputs	
  and	
  parameters	
  are	
  known.	
  
Note	
  that	
  sigmoid	
  func.on	
  does	
  not	
  have	
  neither	
  parameters	
  nor	
  deriva.ves	
  parameters.	
  	
  
Sigmoid	
  func.on	
  is	
  applied	
  element-­‐wise.	
  ‘●’	
  denotes	
  Hadamard	
  product,	
  or	
  element-­‐wise	
  product.	
  	
  
hW,b x( ) = f WT
x + b( )= sigmoid affineW,b x( )( )= sigmoid!affineW,b( ) x( )
∂a
∂z
= a • 1− a( ) where a = hw,b x( ) = sigmoid z( ) =
1
1+ exp −z( )
∂z
∂x
= W,
∂z
∂W
= x,
∂z
∂b
= I where z = affineW,b x( ) = WT
x + b, and I is identity matrix
Decomposi.on	
Neuron	
Standard	
  network	
  representa.on	
x1	
x2	
x3	
+1	
hW,b(x)	
Affine	
   Ac.va.on	
  
(e.g.	
  sigmoid)	
  
Composite	
  func.on	
  representa.on	
x1	
x2	
x3	
+1	
hW,b(x)	
z	
 a
∇W J W,b;x, y( ) =
∂
∂W
J W,b;x, y( ) =
∂J
∂z
∂z
∂W
= δ z( )
xT
∇bJ W,b;x, y( ) =
∂
∂b
J W,b;x, y( ) =
∂J
∂z
∂z
∂b
= δ z( )
4	
  /14	
  
Chain	
  Rule	
  of	
  Error	
  Signals	
  and	
  Gradients	
Error	
  signals	
  are	
  defined	
  as	
  the	
  deriva.ves	
  of	
  any	
  cost	
  func.on	
  J	
  which	
  we	
  choose	
  to	
  be	
  
the	
  square	
  error.	
  Error	
  signals	
  are	
  computed	
  (propagated	
  backward)	
  by	
  the	
  chain	
  rule	
  of	
  
deriva.ve	
  and	
  useful	
  for	
  compu.ng	
  the	
  gradient	
  of	
  the	
  cost	
  func.on.	
  	
  
	
  
n  Single	
  neuron	
  example	
  
Suppose	
  we	
  have	
  m	
  labeled	
  training	
  examples	
  {(x(1),	
  y(1)),	
  …,	
  (x(m),	
  y(m))}.	
  Square	
  error	
  cost	
  func.on	
  for	
  each	
  
example	
  is	
  as	
  follows.	
  Overall	
  cost	
  func.on	
  is	
  the	
  summa.on	
  of	
  cost	
  func.ons	
  over	
  all	
  examples.	
  	
  
	
  
	
  
Error	
  signals	
  of	
  the	
  square	
  error	
  cost	
  func.on	
  for	
  each	
  example	
  are	
  propagated	
  using	
  deriva.ves	
  of	
  
func.on	
  elements	
  w.r.t.	
  inputs.	
  	
  
	
  
	
  
Gradient	
  of	
  the	
  cost	
  func.on	
  w.r.t.	
  parameters	
  for	
  each	
  example	
  is	
  computed	
  using	
  error	
  signals	
  and	
  
deriva.ves	
  of	
  func.on	
  elements	
  w.r.t.	
  parameters.	
  Summing	
  gradients	
  for	
  all	
  examples	
  gets	
  overall	
  
gradient.	
  	
  
δ a( )
=
∂
∂a
J W,b;x, y( ) = − y − a( )
δ z( )
=
∂
∂z
J W,b;x, y( ) =
∂J
∂a
∂a
∂z
= δ a( )
• a • 1− a( )
J W,b;x, y( ) =
1
2
y − hw,b x( )
2
5	
  /14	
  
Decomposi4on	
  of	
  Mul4-­‐Layer	
  Neural	
  Network	
n  Composite	
  func.on	
  representa.on	
  of	
  a	
  mul.-­‐layer	
  neural	
  network	
  
	
  
n  Deriva.ves	
  of	
  func.on	
  elements	
  w.r.t.	
  inputs	
  and	
  parameters	
  
a 1( )
= x, a lmax( )
= hw,b x( )
∂a l+1( )
∂z l+1( ) = a l+1( )
• 1− a l+1( )
( ) where a l+1( )
= sigmoid z l+1( )
( )=
1
1+ exp −z l+1( )
( )
∂z l+1( )
∂a l( ) = W l( )
,
∂z l+1( )
∂W l( ) = a l( )
,
∂z l+1( )
∂b l( ) = I where z l+1( )
= W l( )
( )
T
a l( )
+ b l( )
hW,b x( ) = sigmoid!affineW 2( ),b 2( ) !sigmoid!affineW 1( ),b 1( )( ) x( )
Decomposi.on	
Standard	
  network	
  representa.on	
x1	
x2	
x3	
+1	
Layer	
  1	
+1	
Layer	
  2	
x	
hW,b(x)	
a2
(2)	
a1
(2)	
a3
(2)	
Composite	
  func.on	
  representa.on	
x1	
x2	
x3	
+1	
Affine	
  1	
 Sigmoid	
  1	
x	
z2
(2)	
z1
(2)	
z3
(2)	
+1	
Affine	
  2	
a2
(2)	
a1
(2)	
a3
(2)	
hW,b(x)	
z1
(3)	
 a1
(3)	
a1
(1)	
a2
(1)	
a3
(1)	
Sigmoid	
  2
6	
  /14	
  
Error	
  Signals	
  and	
  Gradients	
  in	
  Mul4-­‐Layer	
  NN	
n  Error	
  signals	
  of	
  the	
  square	
  error	
  cost	
  func.on	
  for	
  each	
  example	
  
n  Gradient	
  of	
  the	
  cost	
  func.on	
  w.r.t.	
  parameters	
  for	
  each	
  example	
  
δ
a l( )
( ) =
∂
∂a l( ) J W,b;x, y( ) =
− y − a l( )
( ) for l = lmax
∂J
∂z l+1( )
∂z l+1( )
∂a l( ) = W l( )
( )
T
δ
z l+1( )
( ) otherwise
⎧
⎨
⎪
⎩
⎪
δ
z l( )
( ) =
∂
∂z l( ) J W,b;x, y( ) =
∂J
∂a l( )
∂a l( )
∂z l( ) = δ
a l( )
( ) • a l( )
• 1− a l( )
( )
∇W l( ) J W,b;x, y( ) =
∂
∂W l( ) J W,b;x, y( ) =
∂J
∂z l+1( )
∂z l+1( )
∂W l( ) = δ
z l+1( )
( ) a l( )
( )
T
∇b l( ) J W,b;x, y( ) =
∂
∂b l( ) J W,b;x, y( ) =
∂J
∂z l+1( )
∂z l+1( )
∂b l( ) = δ
z l+1( )
( )
7	
  /14	
  
Backpropaga4on	
  in	
  General	
  Cases	
1.  Decompose	
  opera.ons	
  in	
  layers	
  of	
  a	
  neural	
  network	
  into	
  func.on	
  elements	
  whose	
  
deriva.ves	
  w.r.t	
  inputs	
  are	
  known	
  by	
  symbolic	
  computa.on.	
  	
  
2.  Backpropagate	
  error	
  signals	
  corresponding	
  to	
  a	
  differen.able	
  cost	
  func.on	
  by	
  
numerical	
  computa.on	
  (Star.ng	
  from	
  cost	
  func.on,	
  plug	
  in	
  error	
  signals	
  backward).	
  	
  
3.  Use	
  backpropagated	
  error	
  signals	
  to	
  compute	
  gradients	
  w.r.t.	
  parameters	
  only	
  for	
  the	
  
func.on	
  elements	
  with	
  parameters	
  where	
  their	
  deriva.ves	
  w.r.t	
  parameters	
  are	
  
known	
  by	
  symbolic	
  computa.on.	
  	
  
4.  Sum	
  gradients	
  over	
  all	
  example	
  to	
  get	
  overall	
  gradient.	
  	
  
hθ x( ) = f lmax( )
!…! fθ l( )
l( )
!…! fθ 2( )
2( )
! f 1( )
( ) x( ) where f 1( )
= x, f lmax( )
= hθ x( ) and ∀l :
∂f l+1( )
∂f l( ) is known
δ l( )
=
∂
∂f l( ) J θ;x, y( ) =
∂J
∂f l+1( )
∂f l+1( )
∂f l( ) = δ l+1( ) ∂f l+1( )
∂f l( ) where
∂J
∂f lmax( ) is known
∇θ l( ) J θ;x, y( ) =
∂
∂θ l( ) J θ;x, y( ) =
∂J
∂f l( )
∂fθ l( )
l( )
∂θ l( ) = δ l( ) ∂fθ l( )
l( )
∂θ l( ) where
∂fθ l( )
l( )
∂θ l( ) is known
∇θ l( ) J θ( ) = ∇θ l( ) J θ;x i( )
, y i( )
( )i=1
m
∑
8	
  /14	
  
Convolu4onal	
  Neural	
  Network	
A	
  convolu.on-­‐pooling	
  layer	
  in	
  Convolu.onal	
  Neural	
  Network	
  is	
  a	
  composite	
  func.on	
  
decomposed	
  into	
  func.on	
  elements	
  f(conv),	
  f(sigm),	
  and	
  f(pool).	
  	
  
Let	
  x	
  be	
  the	
  output	
  from	
  the	
  previous	
  layer.	
  Sigmoid	
  nonlinearity	
  is	
  op.onal.	
  	
  
f pool( )
! f sigm( )
! fw
conv( )
( ) x( )
Convolu.on	
 Sigmoid	
x	
 Pooling	
Forward	
  propaga.on	
 Backward	
  propaga.on	
Convolu.on	
 Sigmoid	
x	
 Pooling
9	
  /14	
  
Deriva4ves	
  of	
  Convolu4on	
n  Discrete	
  convolu.on	
  parameterized	
  by	
  a	
  feature	
  w	
  and	
  its	
  deriva.ves	
  
Let	
  x	
  be	
  the	
  input,	
  and	
  y	
  be	
  the	
  output	
  of	
  convolu.on	
  layer.	
  Here	
  we	
  focus	
  on	
  only	
  one	
  feature	
  vector	
  w,	
  
although	
  a	
  convolu.on	
  layer	
  usually	
  has	
  mul.ple	
  features	
  W	
  =	
  [w1	
  w2	
  …	
  wn].	
  n	
  indexes	
  x	
  and	
  y	
  where	
  	
  
1	
  ≤	
  n	
  ≤	
  |x|	
  for	
  xn,	
  1	
  ≤	
  n	
  ≤	
  |y|	
  =	
  |x|	
  -­‐	
  |w|	
  +	
  1	
  for	
  yn.	
  i	
  indexes	
  w	
  where	
  1	
  ≤	
  i	
  ≤	
  |w|.	
  (f*g)[n]	
  denotes	
  the	
  n-­‐th	
  
element	
  of	
  f*g.	
  	
  
y = x ∗w = yn[ ], yn = x ∗w( ) n[ ]= xn+i−1wi
i=1
w
∑ = wT
xn:n+ w −1
∂yn−i+1
∂xn
= wi,
∂yn
∂wi
= xn+i−1 for 1≤ i ≤ w
xn	
w1	
 yn	
w2	
…	
yn-­‐1	
…	
From	
  a	
  fixed	
  xn	
  stand	
  point,	
  	
  
xn	
  has	
  outgoing	
  connec.ons	
  	
  
to	
  yn-­‐|W|+1:n,	
  i.e.,	
  	
  
all	
  yn-­‐|W|+1:n	
  have	
  deriva.ves	
  	
  
w.r.t.	
  xn.	
  Note	
  that	
  y	
  and	
  w	
  
indices	
  are	
  reverse	
  order.	
  	
x	
 Convolu.on	
|w|	
xn	
…	
w1	
w2	
…	
yn	
yn	
  has	
  incoming	
  	
  
connec.ons	
  from	
  xn:n+|W|-­‐1.	
  	
  
x	
 Convolu.on	
|w|	
 xn+1
10	
  /14	
  
Backpropaga4on	
  in	
  Convolu4on	
  Layer	
Error	
  signals	
  and	
  gradient	
  for	
  each	
  example	
  are	
  computed	
  by	
  convolu.on	
  using	
  	
  
the	
  commuta.vity	
  property	
  of	
  convolu.on	
  and	
  the	
  mul.variable	
  chain	
  rule	
  of	
  deriva.ve.	
  	
  
Let’s	
  focus	
  on	
  single	
  elements	
  of	
  error	
  signals	
  and	
  a	
  gradient	
  w.r.t.	
  w.	
  	
  
	
  
δn
x( )
=
∂J
∂xn
=
∂J
∂y
∂y
∂xn
=
∂J
∂yn−i+1
∂yn−i+1
∂xni=1
w
∑ = δn−i+1
y( )
wi
i=1
w
∑ = δ
y( )
∗flip w( )( ) n[ ], δ
x( )
= δn
x( )
⎡
⎣
⎤
⎦
= δ
y( )
∗flip w( )
∂J
∂wi
=
∂J
∂y
∂y
∂wi
=
∂J
∂yn
∂yn
∂win=1
x − w +1
∑ = δn
y( )
xn+i−1
n=1
x − w +1
∑ = δ
y( )
∗ x( ) i[ ],
∂J
∂w
=
∂J
∂wi
⎡
⎣
⎢
⎤
⎦
⎥ = δ
y( )
∗ x = x ∗δ
y( )
↑	
  Reverse	
  order	
  linear	
  combina.on	
x	
  	
  	
  	
  	
  *	
  	
  	
  	
  	
  w	
  	
  	
  	
  	
  =	
  	
  	
  	
  	
  y	
xn	
…	
W1	
W2	
…	
yn	
Forward	
  propaga.on	
  (convolu.on)	
(Valid	
  convolu.on)	
|w|	
 xn+1	
Backward	
  propaga.on	
…	
…	
δ(y)
n	
w1	
w2	
δ(x)	
  	
  =	
  	
  	
  flip(w)	
  	
  *	
  	
  	
  δ(y)
	
δ(x)
n	
(Full	
  convolu.on)	
|w|	
δ(y)
n-­‐1	
…	
∂J/∂wi	
xn	
δ(y)
1	
δ(y)
2	
x	
  	
  	
  	
  *	
  	
  	
  	
  δ(y)	
  	
  	
  =	
  	
  	
  ∂J/∂W	
(Valid	
  convolu.on)	
Gradient	
  computa.on	
|y|	
xn+1	
…
11	
  /14	
  
Deriva4ves	
  of	
  Pooling	
Pooling	
  layer	
  subsamples	
  sta.s.cs	
  to	
  obtain	
  summary	
  sta.s.cs	
  with	
  any	
  aggregate	
  
func.on	
  (or	
  filter)	
  g	
  whose	
  input	
  is	
  vector,	
  and	
  output	
  is	
  scalar.	
  Subsampling	
  is	
  an	
  
opera.on	
  like	
  convolu.on,	
  however	
  g	
  is	
  applied	
  to	
  disjoint	
  (non-­‐overlapping)	
  regions.	
  
	
  
n  Defini.on:	
  subsample	
  (or	
  downsample)	
  
Let	
  m	
  be	
  the	
  size	
  of	
  pooling	
  region,	
  x	
  be	
  the	
  input,	
  and	
  y	
  be	
  the	
  output	
  of	
  the	
  pooling	
  layer.	
  	
  
subsample(f,	
  g)[n]	
  denotes	
  the	
  n-­‐th	
  element	
  of	
  subsample(f,	
  g).	
  	
  
yn = subsample x,g( ) n[ ]= g x n−1( )m+1:nm( )
y = subsample x,g( ) = yn[ ]
g x( ) =
xk
k=1
m
∑
m
,
∂g
∂x
=
1
m
mean pooling
max x( ),
∂g
∂xi
=
1 if xi = max x( )
0 otherwise
⎧
⎨
⎩
max pooling
x p
= xk
p
k=1
m
∑
⎛
⎝⎜
⎞
⎠⎟
1/p
,
∂g
∂xi
= xk
p
k=1
m
∑
⎛
⎝⎜
⎞
⎠⎟
1/p−1
xi
p−1
Lp
pooling
or any other differentiable Rm
→ R functions
⎧
⎨
⎪
⎪
⎪
⎪
⎪
⎩
⎪
⎪
⎪
⎪
⎪
x	
 Pooling	
yn	
g	
m	
x(n-­‐1)m+1	
…
12	
  /14	
  
Backpropaga4on	
  in	
  Pooling	
  Layer	
Error	
  signals	
  for	
  each	
  example	
  are	
  computed	
  by	
  upsampling.	
  Upsampling	
  is	
  an	
  opera.on	
  
which	
  backpropagates	
  (distributes)	
  the	
  error	
  signals	
  over	
  the	
  aggregate	
  func.on	
  g	
  using	
  
its	
  deriva.ves	
  g’n	
  =	
  ∂g/∂x(n-­‐1)m+1:nm.	
  g’n	
  can	
  change	
  depending	
  on	
  pooling	
  region	
  n.	
  	
  
p  In	
  max	
  pooling,	
  the	
  unit	
  which	
  was	
  the	
  max	
  at	
  forward	
  propaga.on	
  receives	
  all	
  the	
  error	
  at	
  backward	
  
propaga.on	
  and	
  the	
  unit	
  is	
  different	
  depending	
  on	
  the	
  region	
  n.	
  	
  
	
  
n  Defini.on:	
  upsample	
  
upsample(f,	
  g)[n]	
  denotes	
  the	
  n-­‐th	
  element	
  of	
  upsample(f,	
  g).	
  	
  
δ n−1( )m+1:nm
x( )
= upsample δ
y( )
, ′g( ) n[ ]= δn
y( )
′gn = δn
y( ) ∂g
∂x n−1( )m+1:nm
=
∂J
∂yn
∂yn
∂x n−1( )m+1:nm
=
∂J
∂x n−1( )m+1:nm
δ
x( )
= upsample δ
a( )
, ′g( )= δ n−1( )m+1:nm
x( )
⎡
⎣
⎤
⎦
subsample(x,	
  g)	
  =	
  y	
yn	
Forward	
  propaga.on	
  (subsapmling)	
g	
x(n-­‐1)m+1	
…	
m	
δ(x)	
  =	
  upsample(δ(y),	
  g’)	
δ(y)
n	
δ(x)
(n-­‐1)m+1	
…	
Backward	
  propaga.on	
  (upsapmling)	
∂g/∂x	
m
13	
  /14	
  
Backpropaga4on	
  in	
  CNN	
  (Summary)	
Plug	
  in	
  
δ(conv)
	
Plug	
  in	
  
δ(conv)
	
…	
∂J/∂Wn	
xn	
xn+1	
…	
(Valid	
  convolu.on)	
δ(conv)
1	
δ(conv)
2	
x ∗δ
conv( )
= ∇wJ
3.	
  Compute	
  gradient	
  ∇wJ	
…	
…	
δ(conv)
n-­‐1	
δ(conv)
n	
W1	
W2	
δ(x)
n	
(Full	
  convolu.on)	
2.	
  Propagate	
  error	
  signals	
  δ(conv)
	
δ
x( )
= δ
conv( )
∗flip w( ) δ
conv( )
= upsample δ
pool( )
, ′g( )• f sigm( )
• 1− f sigm( )
( )
1.	
  Propagate	
  error	
  signals	
  δ(pool)
	
δ(pool)
n	
δ(sigm)
(n-­‐1)m+1	
…	
δ(conv)
(n-­‐1)m+1	
…	
Deriva.ve	
  of	
  sigmoid	
  
Convolu.on	
 Convolu.on	
 Sigmoid	
 Pooling
14	
  /14	
  
Remarks	
n  References	
  
p  UFLDL	
  Tutorial,	
  h[p://ufldl.stanford.edu/tutorial	
  
p  Chain	
  Rule	
  of	
  Neural	
  Network	
  is	
  Error	
  Back	
  Propaga.on,	
  	
  
h[p://like.silk.to/studymemo/ChainRuleNeuralNetwork.pdf	
  
n  Acknowledgement	
  
This	
  memo	
  was	
  wri[en	
  thanks	
  to	
  a	
  good	
  discussion	
  with	
  Prof.	
  Masayuki	
  Tanaka.	
  	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Cnn backpropagation derivation
Cnn backpropagation derivationCnn backpropagation derivation
Cnn backpropagation derivationPunnoose A.K
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...Edureka!
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxShubham Jaybhaye
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Lecture 15 DCT, Walsh and Hadamard Transform
Lecture 15 DCT, Walsh and Hadamard TransformLecture 15 DCT, Walsh and Hadamard Transform
Lecture 15 DCT, Walsh and Hadamard TransformVARUN KUMAR
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 

Was ist angesagt? (20)

Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Deep learning
Deep learningDeep learning
Deep learning
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Cnn backpropagation derivation
Cnn backpropagation derivationCnn backpropagation derivation
Cnn backpropagation derivation
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptx
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Lecture 15 DCT, Walsh and Hadamard Transform
Lecture 15 DCT, Walsh and Hadamard TransformLecture 15 DCT, Walsh and Hadamard Transform
Lecture 15 DCT, Walsh and Hadamard Transform
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 

Ähnlich wie Backpropagation in Convolutional Neural Network

Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkIldar Nurgaliev
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...Joe Suzuki
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
GradStudentSeminarSept30
GradStudentSeminarSept30GradStudentSeminarSept30
GradStudentSeminarSept30Ryan White
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryHidenoriOgata
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and transDr Fereidoun Dejahang
 
1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 

Ähnlich wie Backpropagation in Convolutional Neural Network (20)

Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Derivatives
DerivativesDerivatives
Derivatives
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
MSR
MSRMSR
MSR
 
3. Functions II.pdf
3. Functions II.pdf3. Functions II.pdf
3. Functions II.pdf
 
Ece3075 a 8
Ece3075 a 8Ece3075 a 8
Ece3075 a 8
 
GradStudentSeminarSept30
GradStudentSeminarSept30GradStudentSeminarSept30
GradStudentSeminarSept30
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theory
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Colloquium
ColloquiumColloquium
Colloquium
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and trans
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
exponen dan logaritma
exponen dan logaritmaexponen dan logaritma
exponen dan logaritma
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 

Kürzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Backpropagation in Convolutional Neural Network

  • 1. Memo:  Backpropaga.on     in  Convolu.onal  Neural  Network Hiroshi  Kuwajima   13-­‐03-­‐2014  Created   14-­‐08-­‐2014  Revised   1 /14
  • 2. 2  /14   Note n  Purpose   The  purpose  of  this  memo  is  trying  to  understand  and  remind  the  backpropaga.on  algorithm  in   Convolu.onal  Neural  Network  based  on  a  discussion  with  Prof.  Masayuki  Tanaka.     n  Table  of  Contents   In  this  memo,  backpropaga.on  algorithms  in  different  neural  networks  are  explained  in  the  following   order.       p  Single  neuron    3   p  Mul.-­‐layer  neural  network  5   p  General  cases    7   p  Convolu.on  layer    9   p  Pooling  layer      11   p  Convolu.onal  Neural  Network  13   n  Nota.on   This  memo  follows  the  nota.on  in  UFLDL  tutorial  (h[p://ufldl.stanford.edu/tutorial)  
  • 3. 3  /14   Neural  Network  as  a  Composite  Func4on A  neural  network  is  decomposed  into  a  composite  func.on  where  each  func.on  element   corresponds  to  a  differen.able  opera.on.       n  Single  neuron  (the  simplest  neural  network)  example   A  single  neuron  is  decomposed  into  a  composite  func.on  of  an  affine  func.on  element  parameterized  by   W  and  b  and  an  ac.va.on  func.on  element    f  which  we  choose  to  be  the  sigmoid  func.on.                 Deriva.ves  of  both  affine  and  sigmoid  func.on  elements  w.r.t.  both  inputs  and  parameters  are  known.   Note  that  sigmoid  func.on  does  not  have  neither  parameters  nor  deriva.ves  parameters.     Sigmoid  func.on  is  applied  element-­‐wise.  ‘●’  denotes  Hadamard  product,  or  element-­‐wise  product.     hW,b x( ) = f WT x + b( )= sigmoid affineW,b x( )( )= sigmoid!affineW,b( ) x( ) ∂a ∂z = a • 1− a( ) where a = hw,b x( ) = sigmoid z( ) = 1 1+ exp −z( ) ∂z ∂x = W, ∂z ∂W = x, ∂z ∂b = I where z = affineW,b x( ) = WT x + b, and I is identity matrix Decomposi.on Neuron Standard  network  representa.on x1 x2 x3 +1 hW,b(x) Affine   Ac.va.on   (e.g.  sigmoid)   Composite  func.on  representa.on x1 x2 x3 +1 hW,b(x) z a
  • 4. ∇W J W,b;x, y( ) = ∂ ∂W J W,b;x, y( ) = ∂J ∂z ∂z ∂W = δ z( ) xT ∇bJ W,b;x, y( ) = ∂ ∂b J W,b;x, y( ) = ∂J ∂z ∂z ∂b = δ z( ) 4  /14   Chain  Rule  of  Error  Signals  and  Gradients Error  signals  are  defined  as  the  deriva.ves  of  any  cost  func.on  J  which  we  choose  to  be   the  square  error.  Error  signals  are  computed  (propagated  backward)  by  the  chain  rule  of   deriva.ve  and  useful  for  compu.ng  the  gradient  of  the  cost  func.on.       n  Single  neuron  example   Suppose  we  have  m  labeled  training  examples  {(x(1),  y(1)),  …,  (x(m),  y(m))}.  Square  error  cost  func.on  for  each   example  is  as  follows.  Overall  cost  func.on  is  the  summa.on  of  cost  func.ons  over  all  examples.         Error  signals  of  the  square  error  cost  func.on  for  each  example  are  propagated  using  deriva.ves  of   func.on  elements  w.r.t.  inputs.         Gradient  of  the  cost  func.on  w.r.t.  parameters  for  each  example  is  computed  using  error  signals  and   deriva.ves  of  func.on  elements  w.r.t.  parameters.  Summing  gradients  for  all  examples  gets  overall   gradient.     δ a( ) = ∂ ∂a J W,b;x, y( ) = − y − a( ) δ z( ) = ∂ ∂z J W,b;x, y( ) = ∂J ∂a ∂a ∂z = δ a( ) • a • 1− a( ) J W,b;x, y( ) = 1 2 y − hw,b x( ) 2
  • 5. 5  /14   Decomposi4on  of  Mul4-­‐Layer  Neural  Network n  Composite  func.on  representa.on  of  a  mul.-­‐layer  neural  network     n  Deriva.ves  of  func.on  elements  w.r.t.  inputs  and  parameters   a 1( ) = x, a lmax( ) = hw,b x( ) ∂a l+1( ) ∂z l+1( ) = a l+1( ) • 1− a l+1( ) ( ) where a l+1( ) = sigmoid z l+1( ) ( )= 1 1+ exp −z l+1( ) ( ) ∂z l+1( ) ∂a l( ) = W l( ) , ∂z l+1( ) ∂W l( ) = a l( ) , ∂z l+1( ) ∂b l( ) = I where z l+1( ) = W l( ) ( ) T a l( ) + b l( ) hW,b x( ) = sigmoid!affineW 2( ),b 2( ) !sigmoid!affineW 1( ),b 1( )( ) x( ) Decomposi.on Standard  network  representa.on x1 x2 x3 +1 Layer  1 +1 Layer  2 x hW,b(x) a2 (2) a1 (2) a3 (2) Composite  func.on  representa.on x1 x2 x3 +1 Affine  1 Sigmoid  1 x z2 (2) z1 (2) z3 (2) +1 Affine  2 a2 (2) a1 (2) a3 (2) hW,b(x) z1 (3) a1 (3) a1 (1) a2 (1) a3 (1) Sigmoid  2
  • 6. 6  /14   Error  Signals  and  Gradients  in  Mul4-­‐Layer  NN n  Error  signals  of  the  square  error  cost  func.on  for  each  example   n  Gradient  of  the  cost  func.on  w.r.t.  parameters  for  each  example   δ a l( ) ( ) = ∂ ∂a l( ) J W,b;x, y( ) = − y − a l( ) ( ) for l = lmax ∂J ∂z l+1( ) ∂z l+1( ) ∂a l( ) = W l( ) ( ) T δ z l+1( ) ( ) otherwise ⎧ ⎨ ⎪ ⎩ ⎪ δ z l( ) ( ) = ∂ ∂z l( ) J W,b;x, y( ) = ∂J ∂a l( ) ∂a l( ) ∂z l( ) = δ a l( ) ( ) • a l( ) • 1− a l( ) ( ) ∇W l( ) J W,b;x, y( ) = ∂ ∂W l( ) J W,b;x, y( ) = ∂J ∂z l+1( ) ∂z l+1( ) ∂W l( ) = δ z l+1( ) ( ) a l( ) ( ) T ∇b l( ) J W,b;x, y( ) = ∂ ∂b l( ) J W,b;x, y( ) = ∂J ∂z l+1( ) ∂z l+1( ) ∂b l( ) = δ z l+1( ) ( )
  • 7. 7  /14   Backpropaga4on  in  General  Cases 1.  Decompose  opera.ons  in  layers  of  a  neural  network  into  func.on  elements  whose   deriva.ves  w.r.t  inputs  are  known  by  symbolic  computa.on.     2.  Backpropagate  error  signals  corresponding  to  a  differen.able  cost  func.on  by   numerical  computa.on  (Star.ng  from  cost  func.on,  plug  in  error  signals  backward).     3.  Use  backpropagated  error  signals  to  compute  gradients  w.r.t.  parameters  only  for  the   func.on  elements  with  parameters  where  their  deriva.ves  w.r.t  parameters  are   known  by  symbolic  computa.on.     4.  Sum  gradients  over  all  example  to  get  overall  gradient.     hθ x( ) = f lmax( ) !…! fθ l( ) l( ) !…! fθ 2( ) 2( ) ! f 1( ) ( ) x( ) where f 1( ) = x, f lmax( ) = hθ x( ) and ∀l : ∂f l+1( ) ∂f l( ) is known δ l( ) = ∂ ∂f l( ) J θ;x, y( ) = ∂J ∂f l+1( ) ∂f l+1( ) ∂f l( ) = δ l+1( ) ∂f l+1( ) ∂f l( ) where ∂J ∂f lmax( ) is known ∇θ l( ) J θ;x, y( ) = ∂ ∂θ l( ) J θ;x, y( ) = ∂J ∂f l( ) ∂fθ l( ) l( ) ∂θ l( ) = δ l( ) ∂fθ l( ) l( ) ∂θ l( ) where ∂fθ l( ) l( ) ∂θ l( ) is known ∇θ l( ) J θ( ) = ∇θ l( ) J θ;x i( ) , y i( ) ( )i=1 m ∑
  • 8. 8  /14   Convolu4onal  Neural  Network A  convolu.on-­‐pooling  layer  in  Convolu.onal  Neural  Network  is  a  composite  func.on   decomposed  into  func.on  elements  f(conv),  f(sigm),  and  f(pool).     Let  x  be  the  output  from  the  previous  layer.  Sigmoid  nonlinearity  is  op.onal.     f pool( ) ! f sigm( ) ! fw conv( ) ( ) x( ) Convolu.on Sigmoid x Pooling Forward  propaga.on Backward  propaga.on Convolu.on Sigmoid x Pooling
  • 9. 9  /14   Deriva4ves  of  Convolu4on n  Discrete  convolu.on  parameterized  by  a  feature  w  and  its  deriva.ves   Let  x  be  the  input,  and  y  be  the  output  of  convolu.on  layer.  Here  we  focus  on  only  one  feature  vector  w,   although  a  convolu.on  layer  usually  has  mul.ple  features  W  =  [w1  w2  …  wn].  n  indexes  x  and  y  where     1  ≤  n  ≤  |x|  for  xn,  1  ≤  n  ≤  |y|  =  |x|  -­‐  |w|  +  1  for  yn.  i  indexes  w  where  1  ≤  i  ≤  |w|.  (f*g)[n]  denotes  the  n-­‐th   element  of  f*g.     y = x ∗w = yn[ ], yn = x ∗w( ) n[ ]= xn+i−1wi i=1 w ∑ = wT xn:n+ w −1 ∂yn−i+1 ∂xn = wi, ∂yn ∂wi = xn+i−1 for 1≤ i ≤ w xn w1 yn w2 … yn-­‐1 … From  a  fixed  xn  stand  point,     xn  has  outgoing  connec.ons     to  yn-­‐|W|+1:n,  i.e.,     all  yn-­‐|W|+1:n  have  deriva.ves     w.r.t.  xn.  Note  that  y  and  w   indices  are  reverse  order.   x Convolu.on |w| xn … w1 w2 … yn yn  has  incoming     connec.ons  from  xn:n+|W|-­‐1.     x Convolu.on |w| xn+1
  • 10. 10  /14   Backpropaga4on  in  Convolu4on  Layer Error  signals  and  gradient  for  each  example  are  computed  by  convolu.on  using     the  commuta.vity  property  of  convolu.on  and  the  mul.variable  chain  rule  of  deriva.ve.     Let’s  focus  on  single  elements  of  error  signals  and  a  gradient  w.r.t.  w.       δn x( ) = ∂J ∂xn = ∂J ∂y ∂y ∂xn = ∂J ∂yn−i+1 ∂yn−i+1 ∂xni=1 w ∑ = δn−i+1 y( ) wi i=1 w ∑ = δ y( ) ∗flip w( )( ) n[ ], δ x( ) = δn x( ) ⎡ ⎣ ⎤ ⎦ = δ y( ) ∗flip w( ) ∂J ∂wi = ∂J ∂y ∂y ∂wi = ∂J ∂yn ∂yn ∂win=1 x − w +1 ∑ = δn y( ) xn+i−1 n=1 x − w +1 ∑ = δ y( ) ∗ x( ) i[ ], ∂J ∂w = ∂J ∂wi ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = δ y( ) ∗ x = x ∗δ y( ) ↑  Reverse  order  linear  combina.on x          *          w          =          y xn … W1 W2 … yn Forward  propaga.on  (convolu.on) (Valid  convolu.on) |w| xn+1 Backward  propaga.on … … δ(y) n w1 w2 δ(x)    =      flip(w)    *      δ(y) δ(x) n (Full  convolu.on) |w| δ(y) n-­‐1 … ∂J/∂wi xn δ(y) 1 δ(y) 2 x        *        δ(y)      =      ∂J/∂W (Valid  convolu.on) Gradient  computa.on |y| xn+1 …
  • 11. 11  /14   Deriva4ves  of  Pooling Pooling  layer  subsamples  sta.s.cs  to  obtain  summary  sta.s.cs  with  any  aggregate   func.on  (or  filter)  g  whose  input  is  vector,  and  output  is  scalar.  Subsampling  is  an   opera.on  like  convolu.on,  however  g  is  applied  to  disjoint  (non-­‐overlapping)  regions.     n  Defini.on:  subsample  (or  downsample)   Let  m  be  the  size  of  pooling  region,  x  be  the  input,  and  y  be  the  output  of  the  pooling  layer.     subsample(f,  g)[n]  denotes  the  n-­‐th  element  of  subsample(f,  g).     yn = subsample x,g( ) n[ ]= g x n−1( )m+1:nm( ) y = subsample x,g( ) = yn[ ] g x( ) = xk k=1 m ∑ m , ∂g ∂x = 1 m mean pooling max x( ), ∂g ∂xi = 1 if xi = max x( ) 0 otherwise ⎧ ⎨ ⎩ max pooling x p = xk p k=1 m ∑ ⎛ ⎝⎜ ⎞ ⎠⎟ 1/p , ∂g ∂xi = xk p k=1 m ∑ ⎛ ⎝⎜ ⎞ ⎠⎟ 1/p−1 xi p−1 Lp pooling or any other differentiable Rm → R functions ⎧ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ x Pooling yn g m x(n-­‐1)m+1 …
  • 12. 12  /14   Backpropaga4on  in  Pooling  Layer Error  signals  for  each  example  are  computed  by  upsampling.  Upsampling  is  an  opera.on   which  backpropagates  (distributes)  the  error  signals  over  the  aggregate  func.on  g  using   its  deriva.ves  g’n  =  ∂g/∂x(n-­‐1)m+1:nm.  g’n  can  change  depending  on  pooling  region  n.     p  In  max  pooling,  the  unit  which  was  the  max  at  forward  propaga.on  receives  all  the  error  at  backward   propaga.on  and  the  unit  is  different  depending  on  the  region  n.       n  Defini.on:  upsample   upsample(f,  g)[n]  denotes  the  n-­‐th  element  of  upsample(f,  g).     δ n−1( )m+1:nm x( ) = upsample δ y( ) , ′g( ) n[ ]= δn y( ) ′gn = δn y( ) ∂g ∂x n−1( )m+1:nm = ∂J ∂yn ∂yn ∂x n−1( )m+1:nm = ∂J ∂x n−1( )m+1:nm δ x( ) = upsample δ a( ) , ′g( )= δ n−1( )m+1:nm x( ) ⎡ ⎣ ⎤ ⎦ subsample(x,  g)  =  y yn Forward  propaga.on  (subsapmling) g x(n-­‐1)m+1 … m δ(x)  =  upsample(δ(y),  g’) δ(y) n δ(x) (n-­‐1)m+1 … Backward  propaga.on  (upsapmling) ∂g/∂x m
  • 13. 13  /14   Backpropaga4on  in  CNN  (Summary) Plug  in   δ(conv) Plug  in   δ(conv) … ∂J/∂Wn xn xn+1 … (Valid  convolu.on) δ(conv) 1 δ(conv) 2 x ∗δ conv( ) = ∇wJ 3.  Compute  gradient  ∇wJ … … δ(conv) n-­‐1 δ(conv) n W1 W2 δ(x) n (Full  convolu.on) 2.  Propagate  error  signals  δ(conv) δ x( ) = δ conv( ) ∗flip w( ) δ conv( ) = upsample δ pool( ) , ′g( )• f sigm( ) • 1− f sigm( ) ( ) 1.  Propagate  error  signals  δ(pool) δ(pool) n δ(sigm) (n-­‐1)m+1 … δ(conv) (n-­‐1)m+1 … Deriva.ve  of  sigmoid   Convolu.on Convolu.on Sigmoid Pooling
  • 14. 14  /14   Remarks n  References   p  UFLDL  Tutorial,  h[p://ufldl.stanford.edu/tutorial   p  Chain  Rule  of  Neural  Network  is  Error  Back  Propaga.on,     h[p://like.silk.to/studymemo/ChainRuleNeuralNetwork.pdf   n  Acknowledgement   This  memo  was  wri[en  thanks  to  a  good  discussion  with  Prof.  Masayuki  Tanaka.