More Related Content
Similar to Neural Turing Machine Tutorial (20)
More from Mark Chang (20)
Neural Turing Machine Tutorial
- 2. 大綱
• 神經元 -> 類神經網路
• 短期記憶 -> 類神經網路到深度學習
• 神經圖靈機(Neural Turing Machine)
- 7. 二元分類:AND Gate
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
(0,0)
(0,1) (1,1)
(1,0)
0
1
n20
20
b
-30
yx1
x2
- 8. 二元分類:OR Gate
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 1
(0,0)
(0,1) (1,1)
(1,0)
0
1
n20
20
b
-10
yx1
x2
- 10. 二元分類:XOR Gate
n
-20
20
b
-10
y
(0,0)
(0,1) (1,1)
(1,0)
0
1
(0,0)
(0,1) (1,1)
(1,0)
1
0
(0,0)
(0,1) (1,1)
(1,0)
0
0
1
n1
20
20
b
-30
x1
x2
n2
20
20
b
-10
x1
x2
x1 x2 n1 n2 y
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
- 11. 類神經網路
x
y
n11
n12
n21
n22W12,y
W12,x
b
W11,y
W11,bW12,b
b
W11,x W21,11
W22,12
W21,12
W22,11
W21,bW22,b
z1
z2
Input
Layer
Hidden
Layer
Output
Layer
- 17. 初始化
• 將所有的W隨機設成-N~N之間的數
• 每層之間W的值都不能相同
x
y
n11
n12
n21
n22W12,y
W12,x
b
W11,y
W11,bW12,b
b
W11,x W21,11
W22,12
W21,12
W22,11
W21,bW22,b
z1
z2
- 51. Read Operation
11 2
21 3
42 1
Read Operation:
0 000.9 0.1
0 1 … i … n
Read Vector:
Head Location:
Memory :
1.1
1.0
2.2
- 55. 0 0000 1
.45 .05 .500 0 0
.45 .05 .50 0 0 0
0 0 0 1 0 0
Head Location:
11 2 04 0
21 3 01 1
42 1 15 00 000.9 0.1
Head Location:
Memory:Previous State
2
3
1
Memory
Key:
00 1
Controller
Outputs
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
- 56. Content Addressing
11 2 04 0
21 3 01 1
42 1 15 0
2
3
1
.16 .16 .16 .16 .16 .160 0000 1 .15 .10 .47 .08 .13 .17
Memory Key:Memory :
Head Location:
找出記憶體 中與 內容相近的位置。
參數 :調整集中度
- 58. Convolutional Shift
.45 .05 .50 0 0 0 .45 .05 .50 0 0 0
.45.05 .50 0 0 0 .45 .05 .500 0 0
.45 .05 .50 0 0 0
.025 .475 .025 .25 0 .225
01 0 00 1 .5 0 .5
-1 0 1-1 0 1 -1 0 1
將 內的數值做平移。
參數 :調整平移方向
- 59. Sharpening
0 0 0 1 0 0 0 .37 0 .62 0 0
0 .45 .05 .50 0 0
.16 .16 .16 .16 .16 .16
使 中的值更集中(或分散)。
參數 :調整集中度
- 62. Evolution of Recurrent Neural Network
Recurrent Neural Network
Long Short Term Memory
Neural Turing Machine
短期記憶
可控制記憶體的讀寫
可更靈活地控制記憶體讀寫頭
的位置
- 64. 延伸閱讀
• 機器學習相關
– Logistic Regression
• http://cpmarkchang.logdown.com/posts/189069-logisti-regression-model
– Overfitting and Regularization
• http://cpmarkchang.logdown.com/posts/193261-machine-learning-overfitting-and-regularization
– Model Selection
• http://cpmarkchang.logdown.com/posts/193914-machine-learning-model-selection
• 類神經網路相關
– Neural Network Backward Propagation
• http://cpmarkchang.logdown.com/posts/277349-neural-network-backward-propagation
– Recurrent Neural Network
• http://cpmarkchang.logdown.com/posts/278457-neural-network-recurrent-neural-network
– Long Short Term Memory
• http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf
• http://www.felixgers.de/papers/phd.pdf
– Neural Turing Machine
• http://arxiv.org/pdf/1410.5401.pdf
• http://awawfumin.blogspot.tw/2015/03/neural-turing-machines-implementation.html
- 67. 講者聯絡方式:
• Mark Chang
– facebook:
https://www.facebook.com/ckmarkoh.chang
– Github:http://github.com/ckmarkoh
– Blog:http://cpmarkchang.logdown.com
– email:ckmarkoh at gmail.com
• Fumin
– Github:https://github.com/fumin
– Email:awawfumin at gmail.com
Editor's Notes
- y = \frac{1}{ 1+e^{- ( w_{1} x_{1} + w_{2}x_{2}+w_{b} ) }}
& n_{in} = w_{1} x_{1} + w_{2}x_{2}+w_{b} \\
& n_{out} = \frac{1}{1+e^{-n_{in}}}
- w_{1}x_{1}+w_{2}x_{2}+w_{b} = 0
w_{1}x_{1}+w_{2}x_{2}+w_{b} < 0
w_{1}x_{1}+w_{2}x_{2}+w_{b} >0
- y = \frac{1}{1+e^{-(20x_{1}+20x_{2}-30)}}
20x_{1}+20x_{2}-30 = 0
- y = \frac{1}{1+e^{-(20x_{1}+20x_{2}-10)}}
20x_{1}+20x_{2}-30 = 0
- & J = -( z_{1} log(n_{21(out)}) + (1-z_{1}) log (1 -n_{21(out)} )) \\
&\mspace{30mu} -( z_{2} log(n_{22(out)}) + (1-z_{2}) log (1 -n_{22(out)} )) \\
& n_{out} \approx 0 \text{ and } z = 0 \Rightarrow J \approx 0 \\
& n_{out} \approx 1 \text{ and } z = 1 \Rightarrow J \approx 0 \\
& n_{out} \approx 0 \text{ and } z = 1 \Rightarrow J \approx \infty \\
& n_{out} \approx 1 \text{ and } z = 0 \Rightarrow J \approx \infty \\
- & w_{21,11} \leftarrow w_{21,11} - \eta \dfrac{\partial J}{\partial w_{21,11}} \\
& w_{21,12} \leftarrow w_{21,12} - \eta \dfrac{\partial J}{\partial w_{21,12}} \\
& w_{21,b} \leftarrow w_{21,b} - \eta \dfrac{\partial J}{\partial w_{21,b}} \\
& w_{22,11} \leftarrow w_{21,11} - \eta \dfrac{\partial J}{\partial w_{22,11}} \\
& w_{22,12} \leftarrow w_{21,12} - \eta \dfrac{\partial J}{\partial w_{22,12}} \\
& w_{22,b} \leftarrow w_{21,b} - \eta \dfrac{\partial J}{\partial w_{22,b}} \\
&w_{11,x} \leftarrow w_{11,x} - \eta \dfrac{\partial J}{\partial w_{11,x}} \\
&w_{11,y} \leftarrow w_{11,y} - \eta \dfrac{\partial J}{\partial w_{11,y}} \\
&w_{11,b} \leftarrow w_{11,b} - \eta \dfrac{\partial J}{\partial w_{11,b}} \\
&w_{12,x} \leftarrow w_{12,x} - \eta \dfrac{\partial J}{\partial w_{12,x}} \\
&w_{12,y} \leftarrow w_{12,y} - \eta \dfrac{\partial J}{\partial w_{12,y}} \\
&w_{12,b} \leftarrow w_{12,b} - \eta \dfrac{\partial J}{\partial w_{12,b}} \\
( – \dfrac{ \partial J}{\partial w_{0}} , – \dfrac{ \partial J}{\partial w_{1}} )
- \dfrac{\partial J}{\partial w_{21,11}} =
\dfrac{\partial J}{\partial n_{21(out)}}
\dfrac{\partial n_{21(out)}}{\partial n_{21(in)}}
\dfrac{\partial n_{21(in)}}{\partial w_{21,11}}
= (n_{21(out)}-z_{1}) n_{11(out)} \\
\delta_{21(out)}
\delta_{21(in)}
n_{11(out)}
w_{21,11} \leftarrow w_{21,11} - \eta
- \dfrac{\partial J}{\partial w_{11,x}} =
\dfrac{\partial J}{\partial n_{21(out)}}
\dfrac{\partial n_{21(out)}}{\partial n_{21(in)}}
\dfrac{\partial n_{21(in)}}{\partial w_{21,11}}
w_{11,x} \leftarrow w_{11,x} - \eta
\delta_{11(in)}
x
- & {\color[rgb]{0.597455,0.000000,0.759310}\delta_{11(in)}}
=\dfrac{\partial J}{\partial n_{11(in)}}
={\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J}{\partial n_{21(out)}} } \dfrac{\partial n_{21(out)}}{\partial n_{11(in)}}
+ {\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J}{\partial n_{22(out)}}} \dfrac{\partial n_{22(out)}}{\partial n_{11(in)}}
\\
& {\color[rgb]{0.597455,0.000000,0.759310}\delta_{11(in)}}
=\dfrac{\partial J}{\partial n_{11(in)}}
={\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J}{\partial n_{21(out)}} } \dfrac{\partial n_{21(out)}}{\partial n_{11(in)}}
+ {\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J}{\partial n_{22(out)}}} \dfrac{\partial n_{22(out)}}{\partial n_{11(in)}}
\\
&= {\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J}{\partial n_{21(out)}}}
{\color[rgb]{1.000000,0.000000,0.000000}\dfrac{\partial n_{21(out)}}{\partial n_{21(in)}} }
{\color[rgb]{0.795165,0.000000,0.447221}\dfrac{\partial n_{21(in)}}{\partial n_{11(out)}} }
{\color[rgb]{0.597455,0.000000,0.759310}\dfrac{\partial n_{11(out)}}{\partial n_{11(in)}} }
+ {\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J_{2}}{\partial n_{22(out)}} }
{\color[rgb]{1.000000,0.000000,0.000000}\dfrac{\partial n_{22(out)}}{\partial n_{22(in)}} }
{\color[rgb]{0.795165,0.000000,0.447221}\dfrac{\partial n_{22(in)}}{\partial n_{11(out)}} }
{\color[rgb]{0.597455,0.000000,0.759310}\dfrac{\partial n_{11(out)}}{\partial n_{11(in)}}} \\
&= ({\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J}{\partial n_{21(out)}}}
{\color[rgb]{1.000000,0.000000,0.000000}\dfrac{\partial n_{21(out)}}{\partial n_{21(in)}} }
{\color[rgb]{0.795165,0.000000,0.447221}\dfrac{\partial n_{21(in)}}{\partial n_{11(out)}} }
+ {\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J_{2}}{\partial n_{22(out)}} }
{\color[rgb]{1.000000,0.000000,0.000000}\dfrac{\partial n_{22(out)}}{\partial n_{22(in)}} }
{\color[rgb]{0.795165,0.000000,0.447221}\dfrac{\partial n_{22(in)}}{\partial n_{11(out)}} })
{\color[rgb]{0.597455,0.000000,0.759310}\dfrac{\partial n_{11(out)}}{\partial n_{11(in)}}} \\
&= (
{\color[rgb]{1.000000,0.000000,0.000000}\delta_{21(in)} }
{\color[rgb]{0.795165,0.000000,0.447221}w_{21,11} }
+
{\color[rgb]{1.000000,0.000000,0.000000}\delta_{22(in)} }
{\color[rgb]{0.795165,0.000000,0.447221}w_{22,11} })
{\color[rgb]{0.597455,0.000000,0.759310}\dfrac{\partial n_{11(out)}}{\partial n_{11(in)}}} \\
- & n_{in,t} = w_{c}x_{t}+ w_{p}n_{out,t-1} + w_{b} \\
& n_{out,t} = \frac{1}{1+e^{-n_{in,t}}} \\
- & n_{in,t} = w_{c}x_{t}+ w_{p}n_{out,t-1} + w_{b} \\
& n_{out,t} = \frac{1}{1+e^{-n_{in,t}}} \\
- & {\color[rgb]{1.000000,0.000000,0.000000}\delta_{in,0} } = {\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J}{\partial n_{out,0}} }{\color[rgb]{1.000000,0.000000,0.000000}\dfrac{\partial n_{out,0}}{\partial n_{in,0}}} \\
& = {\color[rgb]{1.000000,0.500000,0.000000}\delta_{out,0}} {\color[rgb]{1.000000,0.000000,0.000000}\dfrac{\partial n_{out,0}}{\partial n_{in,0}} }
& {\color[rgb]{0.597455,0.000000,0.759310}\delta_{in,0} }
{\color[rgb]{0.000000,0.000000,0.000000}=}
{\color[rgb]{1.000000,0.500000,0.000000}\dfrac{\partial J}{\partial n_{out,1}} }{\color[rgb]{1.000000,0.000000,0.000000}\dfrac{\partial n_{out,1}}{\partial n_{in,1}}}
{\color[rgb]{0.795165,0.000000,0.447221}\dfrac{\partial n_{in,1}}{\partial n_{out,0} }}
{\color[rgb]{0.597455,0.000000,0.759310}\dfrac{\partial n_{out,0}}{\partial n_{in,0} }}
\\
&
{\color[rgb]{0.000000,0.000000,0.000000}=}
{\color[rgb]{1.000000,0.500000,0.000000}\delta_{out,1}} {\color[rgb]{1.000000,0.000000,0.000000}\dfrac{\partial n_{out,1}}{\partial n_{in,1}} }
{\color[rgb]{0.795165,0.000000,0.447221}\dfrac{\partial n_{in,1}}{\partial n_{out,0} }}
{\color[rgb]{0.597455,0.000000,0.759310}\dfrac{\partial n_{out,0}}{\partial n_{in,0} }}
\\
&
{\color[rgb]{0.000000,0.000000,0.000000}=}
{\color[rgb]{1.000000,0.000000,0.000000}\delta_{in,1} }
{\color[rgb]{0.795165,0.000000,0.447221}\dfrac{\partial n_{in,1}}{\partial n_{out,0} }}
{\color[rgb]{0.597455,0.000000,0.759310}\dfrac{\partial n_{out,0}}{\partial n_{in,0} }}
{\color[rgb]{0.000000,0.000000,0.000000}=}
{\color[rgb]{0.795165,0.000000,0.447221}\delta_{out,0} }
{\color[rgb]{0.597455,0.000000,0.759310}\dfrac{\partial n_{out,0}}{\partial n_{in,0} }}\\
- \delta_{in,s}=
\begin{cases}
\dfrac{\partial J}{ \partial n_{out,s} }
\dfrac{ \partial n_{out,s}}{\partial n_{in,s} } & \text{if } s = t \\
\delta_{in,s+1}
\dfrac{ \partial n_{in,s+1}}{\partial n_{out,s} }
\dfrac{ \partial n_{out,s}}{\partial n_{in,s} }
& \text{otherwise}
\end{cases}
- \delta_{in,0} = \dfrac{\partial J}{\partial n_{in,0}} = \dfrac{\partial J}{\partial n_{out,t}} \dfrac{\partial n_{out,t} }{\partial n_{in,t}} \dfrac{\partial n_{in,t} }{\partial n_{out,t-1}} ...
\dfrac{\partial n_{in,1} }{\partial n_{out,0}} \dfrac{\partial n_{out,0} }{\partial n_{in,0}}
\delta_{in,0} = \delta_{out,t} \dfrac{\partial n_{out,t} }{\partial n_{in,t}} \dfrac{\partial n_{in,t} }{\partial n_{out,t-1}} ...
\dfrac{\partial n_{in,1} }{\partial n_{out,0}} \dfrac{\partial n_{out,0} }{\partial n_{in,0}}
- k_{out} = sigmoid(w_{k,x}x_{t}+w_{k,b})
C_{write} = sigmoid(w_{cw,x}x_{t}+w_{cw,y}y_{t-1}+w_{cw,b})
m_{in,t}
=
k_{out}
C_{write}
- C_{forget}= sigmoid(w_{cf,x}x_{t} + w_{cf,y}y_{t} + w_{cf,b})
m_{out,t}
=
m_{in,t}
+
C_{forget}
m_{out,t-1}
- n_{out}=sigmoid(m_{out,t})
C_{read}= sigmoid(w_{cr,x} x_{t} + w_{cr,y} y_{t-1} + w_{cr,b})
C_{out}
=
n_{out}
C_{read}
- {\color[rgb]{0.036634,0.303698,0.550063}\dfrac{\partial m_{out,t}}{\partial w_{k,x}} }= {\color[rgb]{0.036634,0.303698,0.550063}\dfrac{\partial m_{in,t}}{\partial w_{k,x}}} + {\color[rgb]{0.615686,0.188235,0.215686}C_{forget} }{\color[rgb]{0.813054,0.443433,0.792399}\dfrac{\partial m_{out,t-1}}{\partial w_{k,x}}}
- \begin{bmatrix}
r_{0} \\[0.3em]
r_{1} \\[0.3em]
r_{2} \\[0.3em]
\end{bmatrix}
=\begin{bmatrix}
1*0.9+2*0.1 \\[0.3em]
1*0.9+1*0.1 \\[0.3em]
2*0.9+4*0.1 \\[0.3em]
\end{bmatrix}
=
\begin{bmatrix}
1.1 \\[0.3em]
1.0 \\[0.3em]
2.2 \\[0.3em]
\end{bmatrix}
\textbf{r} \leftarrow \sum_{i}w(i)\textbf{M}(i)
&\sum_{i}w(i) = 1 \\
& 0 \leq w(i) \leq 1, \forall i \\
- \textbf{M}(i) \leftarrow (1-w(i) \textbf{e} ) \textbf{M}(i)
0 \leq e(j) \leq 1, \forall j
M=
\begin{bmatrix}
1(1-0.9) & 2(1-0.1) & 3 & ... \\[0.3em]
1 & 1 & 2 & ... \\[0.3em]
2(1-0.9) & 4(1-0.1) & 1 & ... \\[0.3em]
\end{bmatrix}
=\begin{bmatrix}
0.1 & 1.8 & 3 & ... \\[0.3em]
1 & 1 & 2 & ... \\[0.3em]
0.2 & 3.6 & 1 & ... \\[0.3em]
\end{bmatrix}
- \textbf{M}(i) \leftarrow \textbf{M}(i) + w(i) \textbf{a}
M=
\begin{bmatrix}
0.1+0.9 & 1.8+0.1 & 3 & ... \\[0.3em]
1.0+0.9 & 1.0+0.1 & 2 & ... \\[0.3em]
0.2 & 3.6 & 1 & ... \\[0.3em]
\end{bmatrix}
=\begin{bmatrix}
1.0 & 1.9 & 3 & ... \\[0.3em]
1.9 & 1.1 & 2 & ... \\[0.3em]
0.2 & 3.6 & 1 & ... \\[0.3em]
\end{bmatrix}
- \textbf{k}
- w(i) \leftarrow \frac{e^{\beta K[\textbf{k},\textbf{M}(i)] } }{ \sum_{j} e^{ \beta K[\textbf{k},\textbf{M}(j)] } }
K[\textbf{u},\textbf{v} ] = \frac{ \textbf{u} \cdot \textbf{v} }{ |\textbf{u}| \cdot |\textbf{v}| }
- \textbf{w}_{t} \leftarrow g \textbf{w}_{t} + (1-g) \textbf{w}_{t-1}
- w(i) \leftarrow w(i-1) s(1) + w(i)s(i) + w(i+1)s(-1)
- w(i) \leftarrow \frac{w(i)^{\gamma}}{\sum{j}w(j)^{\gamma}}