SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Recurrent Neural Networks
2019. 3
김홍배
1
Outline
2
1. Sequence modeling
2. Feed-forward networks review
3. Vanilla RNN
4. Vanishing gradient
5. Gating methodology
6. Use cases
Sequence modeling
 Language Applications
• Language Modeling (probability)
• Machine Translation
• Speech Recognition
3
 Energy signal (Price)
4
Sequence modeling
Current time
External signal
(e.g. Weather, load, generation)
Feed-forward networks review
5
 Where is the Memory ?
If we have a sequence of samples...
predict sample x[t+1] knowing previous values {x[t], x[t-1], x[t-2], …, x[t-τ]}
6
Feed-forward networks review
7
Feed-forward networks review
Where is the Memory ?
Feed Forward approach:
• static window of size L
• slide the window time-step wise
x[t+1]
L
 Where is the Memory ?
8
Feed Forward approach:
• static window of size L
• slide the window time-step wise
x[t+1]
L
Feed-forward networks review
 Problems for the FNN + static window approach I
• If increasing L, fast growth of num. of parameters !
• Decisions are independent between time-steps!
 The network doesn’t care about what happened at
previous time-step, only present window matters →
doesn’t look good
• Can’t work with variable sequence lengths
9
Feed-forward networks review
Vanilla RNN
 Recurrent Neural Network (RNN) adding
the “temporal” evolution
10
Allow to build specific connections
capturing ”history”
x
h
y
𝒚𝒕 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑽𝒉𝒕)
W
U
V
 RNN: parameters
11
Vanilla RNN
x
h
yW
U
V
𝒚𝒕 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑽𝒉𝒕)
 RNN : unfolding
 BEWARE: We have extra depth now !
Every time-step is an extra level of depth
(as a deeper stack of layers in a feed-forward fashion !)
12
Vanilla RNN
 RNN : depth 1
Forward in space propagation
13
Vanilla RNN
14
 RNN : depth 2
Forward in time propagation
Vanilla RNN
𝒚𝒕 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑽𝒉𝒕)
15
Vanilla RNN
 Training a RNN : BPTT
 Backpropagation through time (BPTT):
The training algorithm for updating network weights to minimize
error including time
 Cross Entropy Loss
𝒚𝒕 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑽𝒉𝒕)
 Training a RNN : BPTT
16
Vanilla RNN
𝜕𝐸
𝜕𝑊
=
𝑡
𝜕𝐸𝑡
𝜕𝑊
NOTE: our goal is to calculate the gradients of the error
with respect to our parameters U, W and V and and then
learn good parameters using Stochastic Gradient Descent.
Just like we sum up the errors, we also sum up the
gradients at each time step for one training example:
Training a RNN : BPTT
17
Vanilla RNN
𝜕𝐸3
𝜕𝑊
=
𝜕𝐸3
𝜕 𝑦3
𝜕 𝑦3
𝜕ℎ3
𝜕ℎ3
𝜕𝑊
ℎ3 = 𝑓 𝑈𝑥𝑡 + 𝑊ℎ2
ℎ2 = 𝑓 𝑈𝑥𝑡 + 𝑊ℎ1
ℎ1 = 𝑓(𝑈𝑥𝑡 + 𝑊ℎ0)
𝜕𝐸3
𝜕𝑊
=
𝑘=0
3 𝜕𝐸3
𝜕 𝑦3
𝜕 𝑦3
𝜕ℎ3
𝜕ℎ3
𝜕ℎ 𝑘
𝜕ℎ 𝑘
𝜕𝑊
E3 computation for example
 Vanishing gradient
 During training gradients explode/vanish easily because of
depth-in-time → Exploding/Vanishing gradients !
18
Vanilla RNN
𝜕𝐸3
𝜕𝑊
=
𝑘=0
3 𝜕𝐸3
𝜕 𝑦3
𝜕 𝑦3
𝜕ℎ3
𝜕ℎ3
𝜕ℎ 𝑘
𝜕ℎ 𝑘
𝜕𝑊
𝜕ℎ3
𝜕ℎ1
=
𝜕ℎ3
𝜕ℎ2
𝜕ℎ2
𝜕ℎ1
𝜕𝐸3
𝜕𝑊
=
𝑘=0
3 𝜕𝐸3
𝜕 𝑦3
𝜕 𝑦3
𝜕ℎ3
𝑗=𝑘+1
3
𝜕ℎ𝑗
𝜕ℎ𝑗−1
𝜕ℎ 𝑘
𝜕𝑊
 Vanishing gradient
19
Vanilla RNN
tanh and derivative. Source: http://nn.readthedocs.org/en/rtd/transfer/
𝜕𝐸3
𝜕𝑊
=
𝑘=0
3 𝜕𝐸3
𝜕 𝑦3
𝜕 𝑦3
𝜕ℎ3
𝑗=𝑘+1
3
𝜕ℎ𝑗
𝜕ℎ𝑗−1
𝜕ℎ 𝑘
𝜕𝑊
 Vanishing gradient
 Standard Solutions
• Proper initialization of Weight Matrix
• Regularization of outputs or Dropout
• Use of ReLU Activations as it’s derivative is either 0 or 1
20
Vanilla RNN
Gating method
 Standard RNN
21
22
Long-Short Term Memory (LSTM)
1. Change the way in which past information is kept → create the
notion of cell state, a memory unit that keeps long-term
information in a safer way by protecting it from recursive
operations
2. Make every RNN unit able to decide whether the current time-
step information matters or not, to accept or discard (optimized
reading mechanism)
3. Make every RNN unit able to forget whatever may not be
useful anymore by clearing that info from the cell state (optimized
clearing mechanism)
4. Make every RNN unit able to output the decisions whenever it
is ready to do so (optimized output mechanism)
23
Long-Short Term Memory (LSTM)
24
Long-Short Term Memory (LSTM)
• Internal Memory (Cell State, or data) 사용
• 현시점 입력(입력과 이전 시점 출력)을 이용하여
- Internal Memory 정보의 부분 가감
- 현시점 입력의 Internal Memory 저장여부
- Internal Memory로 부터 출력값의 설정
depth
time
RNN
LSTM
tt-1
l
l-1
ℎ 𝑡−1
𝑙
ℎ 𝑡
𝑙−1
ℎ 𝑡
𝑙
ℎ 𝑡
𝑙
Long-Short Term Memory (LSTM)
 RNN과 LSTM의 수식적 차이
f
x
i g
x
+
tanh
o
x
f
x
i g
x
+
tanh
o
x
@ time t
ht-1
xt xt+1
ht ht+1
ct-1
Cell state
ct ct+1
Long-Short Term Memory (LSTM)
@ time t+1
 LSTM의 각각의 Cell은 다음과 같으며, 여러 개의 gate로 구성
입력 또는 하부층 출력
전시점(t-1)
cell 데이터
전시점(t-1)
출력
출력
Cell state
(Valuable information
Worth keeping long term)
Long-Short Term Memory (LSTM)
 LSTM의 gate함수에 대한 이해
Sigmoide :
- Sigmoide 출력값은 0~1사이에 존재
- Cell state 값이나 입출력값의 상대적인 중요도를 설정
- “0”이면 필요 없으므로 삭제, “1”이면 중요하므로 유지
- Hyperbolic tanget 출력값은 -1~1사이에 존재
- Cell state, 입출력값을 Normalization 하기 위함.
- 따라서 LSTM을 쉽게 이해하기 위해서 무시해도 됨.
f
Forget Gate
 과거 계열 데이터의 사용/미사용을 제어
𝑓𝑡 = 𝜎(𝑊𝑓 𝑥 𝑥𝑡 + 𝑊 𝑓h ht-1
)
x
Long-Short Term Memory (LSTM)
ct-1
ht-1
xt
- Sigmoid ft’n의 출력값은 0 ~ 1 사이에 존재
 ft가 “1”이면 이전 State 값을 유지
 ft가 “0”이면 이전 State 값을 삭제
Cell state
학습하는 변수
입력 또는 하부층 출력
전시점(t-1)
cell 데이터
전시점(t-1)
출력
∙ : Element-wise multi
Input Gate
 입력데이터의 사용/미사용을 제어
Long-Short Term Memory (LSTM)
i g
x
f
gt= tanh(Wgx x𝑡 + 𝑊ghht-1)
xct-1
ht-1
xt - gt 는 Hyperbolic tangent ft’n의 출력값이므로 -1 ~ 1 사이에 존재
 입력데이터의 Normalization
- it는 Sigmoid ft’n의 출력값이므로 0~1 사이에 존재
Cell state
+
it= σ(Wix x𝑡 + 𝑊ihht-1)
yt = gt⨀it
y
학습하는 변수
ht-1
xt
ct
현시점(t)
cell 데이터
Output Gate
 출력데이터의 사용/미사용을 제어
Long-Short Term Memory (LSTM)
x
f
ot = σ(Wox x𝑡 + 𝑊ohht-1)
xct-1
Cell state
+
ht = ot⨀tanh(ct)
y
학습하는 변수
tanh
o x
ht
ht-1
xt
ct
현시점(t)
출력
Long-Short Term Memory (LSTM)
i
f
o
g
sigmoid
sigmoid
tanh
sigmoid
4n x 2n 4n 4*n
nx1
nx1
Wix Wih
Wfx Wfh
Wox Woh
Wgx Wgh
xt
ℎ 𝑡−1
𝑙
2n
 Matrix와 Vector 형태로 간략화 시키면
LSTM weight matrix
to be identified
하층 출력 또는
입력벡터 (x)
전시점(t-1)
출력벡터
ct-1
Cell state
x +
x
ct
x
tanh
ht
Design Patterns for RNN
RNN Sequences
Blog post by A. Karpathy. “The Unreasonable Effectiveness of Recurrent Neural Networks” (2015)
Task Input Output
Image classification fixed-sized image fixed-sized class
Image captioning image input sentence of words
Sentiment analysis sentence positive or negative sentiment
Machine translation sentence in English sentence in French
Video classification video sequence label each frame
Page 32
RNN Implementation using TensorFlow
How we design RNN model
for time series prediction?
 How manipulate our time
series data as input of RNN?
Page 33
LAB-5) Connect input and recurrent
layers
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([rnn_cell] * depth)
x_split = tf.split(batch_size, time_steps, x_data)
output, state = tf.nn.rnn(stacked_lstm, x_split)
𝑥𝑡−9 𝑥𝑡−8 𝑥𝑡−7 … 𝑥𝑡
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
…
𝑜𝑡−9 𝑜𝑡−8 𝑜𝑡−7 … 𝑜𝑡
Page 34
Long Short-Term Memory Network for
Remaining Useful Life Estimation
Deep LSTM model for RUL estimation
NASA C-MAPSS (Commercial Modular Aero-Propulsion
System Simulation) data set (Turbofan Engine Degradation
Simulation Data Set)
Deep LSTM model for RUL estimation
Electricity Price Forecasting (EPF)
Current timeEnergy signal (Price)
External signal
(e.g. Weather, load, generation)
Page 37
Experiment results
LSTM + DNN + LinearRegression
predicted
test
hour
price
(euro/MWh)
Page 38
Experiment results
Models Mean Absolute Error (euro/MWh)
LinearRegression 4.04
RidgeRegression 4.04
LassoRegression 3.73
ElasticNet 3.57
LeastAngleRegression 6.27
LSTM+DNN+LinearRegression 2.13
Page 39
Show and Tell :
A Neural Image Caption Generator
참고자료
1. “Show and Tell: A Neural Image Caption Generator”, O.Vinyals, A.Toshev,
S.Bengio, D.Erhan
2. CV勉強会@関東「CVPR2015読み会」発表資料, 皆川卓也
3. Lecture Note “Recurrent Neural Networks”, CS231n, Andrej Karpathy
2017.
김홍배
한국항공우주연구원
개요
 1장의 스틸사진으로 부터 설명문(Caption)을 생성
 자동번역등에 사용되는 Recurrent Neural Networks (RNN)에 Deep
Convolutional Neural Networks에서 생성한 이미지의 특징벡터를
입력
 Neural Image Caption (NIC)
 종래방법을 크게 상회하는 정확도
Neural Image Caption (NIC)
 사진(I)를 입력으로 주었을 때
 정답 “설명문“, S를 만들어 낼 가능성을 최대가 되도록
 학습데이터(I, S)를 이용하여
 넷의 변수(w)들을 찾아내는 과정
설명문
w∗ = argmax 𝐼,𝑆 log ‫(݌‬S|I;w)
w 사진, 변수
확률
손실함수
전체 학습데이터 셋에 대한 손실함수
손실함수를 최소화 시키는 변수, w*를 구하는 작업
Neural Image Caption (NIC)
 사진으로부터 설명문 생성
𝑝 𝑆 𝐼; 𝑤 =
𝑡=0
𝑁
𝑝 𝑆𝑡 𝐼, 𝑆0, 𝑆1,···, 𝑆𝑡−1; 𝑤
단어수
각 단어는 그전 단어열의 영향을 받는다.
𝑆 ={𝑆0, 𝑆1, ⋯}
단어, 따라서 설명문 S는 길이가 변하는 계열데이터
Neural Image Caption (NIC)
 사진으로부터 설명문 생성
𝑝 𝑆 𝐼; 𝑤 =
𝑡=0
𝑁
𝑝 𝑆𝑡 𝐼, 𝑆0, 𝑆1,···, 𝑆𝑡−1; 𝑤
학습 데이터 셋(I,S)로 부터 훈련을 통해 찾아내는 변수
ht-1
xt
단어 @ t
Neural Image Caption (NIC)
St
L
S
T
M
WeSt
입력 @ t
출력@t
Pt+1(St+1)=softmax(ℎ𝑡)
 LSTM based Sentence Generator의 기본 구조
ℎ𝑡
: 단어별 확률적 분포를 계산
ht
log ‫(݌‬St+1) : 손실함수 계산
: word embedding 과정
출력@t-1
46
Neural Image Caption (NIC)
 Word Embedding
일반적으로 “one hot“ vector형태로 단어를 나타내는데,
단어들로 구성된 Dictionary의 크기가 바뀌기 쉬움
이경우 LSTM의 모델링등에 어려움이 있음
이에 따라 가변의 “one hot“ vector형태를 고정된 길이의
Vector형태로 변형시키는 과정이 필요
dog
0010000000
cat
one hot vector
representation
0000001000
Word embedding vector
representation
dog
0.10.30.20.10.20.3
cat
we
0.20.10.20.20.10.1
xtSt
47
Neural Image Caption (NIC)
 손실함수
For 𝑦_𝑖 = 1 𝑐𝑎𝑠𝑒 J(w)=-log𝑦𝑖
𝑦𝑖
1
J(w)
As 𝑦𝑖 approaches to 1,
J(w) becomes 0
J(w)=-∑𝑦_𝑖•log𝑦𝑖
y : 분류기에서 추정한 확률값
y_ : 정답
Cross entropy로 정의함
Neural Image Caption (NIC)
Neural Image Caption (NIC)
사진의 특징벡터를
Deep CNN에서
가져움
LSTM으로최초의
입력이됨(𝒙−𝟏)
Neural Image Caption (NIC)
단어𝑺 𝟎을입력
다음 단어가
𝑺 𝟏일확률
Neural Image Caption (NIC)
h 𝟎, c𝟎
NIC의 학습과정
ImageNet+ Drop out
으로 Pretraining
랜던하게변수를초기화
NIC의 학습과정
 학습용 사진과 설명문 셋
학습데이터
NIC의 학습과정
예측확률
손실함수
학습데이터
오차
역전파
NIC의 학습과정
손실함수
NIC로 예측 (Sampling)
DeepCNN에서특징
벡터를 가져옴
사진이 주어짐
SpecialStart Word
가장 확률이 높은
단어 𝑺 𝟏을선택
NIC로 예측 (Sampling)
선택된 단어
𝑺 𝟏을입력
end- of- sentence
token이 나타날때
까지 계속
NIC로 예측 (Sampling)

Weitere ähnliche Inhalte

Was ist angesagt?

[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우
NAVER D2
 

Was ist angesagt? (20)

알아두면 쓸데있는 신비한 딥러닝 이야기
알아두면 쓸데있는 신비한 딥러닝 이야기알아두면 쓸데있는 신비한 딥러닝 이야기
알아두면 쓸데있는 신비한 딥러닝 이야기
 
딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
[GomGuard] 뉴런부터 YOLO 까지 - 딥러닝 전반에 대한 이야기
[GomGuard] 뉴런부터 YOLO 까지 - 딥러닝 전반에 대한 이야기[GomGuard] 뉴런부터 YOLO 까지 - 딥러닝 전반에 대한 이야기
[GomGuard] 뉴런부터 YOLO 까지 - 딥러닝 전반에 대한 이야기
 
[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
딥러닝의 기본
딥러닝의 기본딥러닝의 기본
딥러닝의 기본
 
Precise LSTM Algorithm
Precise LSTM AlgorithmPrecise LSTM Algorithm
Precise LSTM Algorithm
 
DeepLearningTutorial
DeepLearningTutorialDeepLearningTutorial
DeepLearningTutorial
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 

Ähnlich wie Recurrent Neural Net의 이론과 설명

Ähnlich wie Recurrent Neural Net의 이론과 설명 (20)

A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
 
R.T.Bach
R.T.BachR.T.Bach
R.T.Bach
 
Neural network (perceptron)
Neural network (perceptron)Neural network (perceptron)
Neural network (perceptron)
 
Cnn 발표자료
Cnn 발표자료Cnn 발표자료
Cnn 발표자료
 
Rnn개념정리
Rnn개념정리Rnn개념정리
Rnn개념정리
 
2.linear regression and logistic regression
2.linear regression and logistic regression2.linear regression and logistic regression
2.linear regression and logistic regression
 
4.convolutional neural networks
4.convolutional neural networks4.convolutional neural networks
4.convolutional neural networks
 
Code로 이해하는 RNN
Code로 이해하는 RNNCode로 이해하는 RNN
Code로 이해하는 RNN
 
Adversarial Attack in Neural Machine Translation
Adversarial Attack in Neural Machine TranslationAdversarial Attack in Neural Machine Translation
Adversarial Attack in Neural Machine Translation
 
03.12 cnn backpropagation
03.12 cnn backpropagation03.12 cnn backpropagation
03.12 cnn backpropagation
 
MNIST for ML beginners
MNIST for ML beginnersMNIST for ML beginners
MNIST for ML beginners
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
 
Convolutional rnn
Convolutional rnnConvolutional rnn
Convolutional rnn
 
Variational AutoEncoder(VAE)
Variational AutoEncoder(VAE)Variational AutoEncoder(VAE)
Variational AutoEncoder(VAE)
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder
 
[Tf2017] day4 jwkang_pub
[Tf2017] day4 jwkang_pub[Tf2017] day4 jwkang_pub
[Tf2017] day4 jwkang_pub
 
Text summarization
Text summarizationText summarization
Text summarization
 
LSTM 네트워크 이해하기
LSTM 네트워크 이해하기LSTM 네트워크 이해하기
LSTM 네트워크 이해하기
 
Vs^3 net for machine reading comprehension question answering
Vs^3 net for machine reading comprehension question answeringVs^3 net for machine reading comprehension question answering
Vs^3 net for machine reading comprehension question answering
 

Mehr von 홍배 김

Mehr von 홍배 김 (20)

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Lecture Summary : Camera Projection
Lecture Summary : Camera Projection Lecture Summary : Camera Projection
Lecture Summary : Camera Projection
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots
 
Robotics of Quadruped Robot
Robotics of Quadruped RobotRobotics of Quadruped Robot
Robotics of Quadruped Robot
 
Basics of Robotics
Basics of RoboticsBasics of Robotics
Basics of Robotics
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 
Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)
 
Convolution 종류 설명
Convolution 종류 설명Convolution 종류 설명
Convolution 종류 설명
 
Learning by association
Learning by associationLearning by association
Learning by association
 
Binarized CNN on FPGA
Binarized CNN on FPGABinarized CNN on FPGA
Binarized CNN on FPGA
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 

Recurrent Neural Net의 이론과 설명

  • 2. Outline 2 1. Sequence modeling 2. Feed-forward networks review 3. Vanilla RNN 4. Vanishing gradient 5. Gating methodology 6. Use cases
  • 3. Sequence modeling  Language Applications • Language Modeling (probability) • Machine Translation • Speech Recognition 3
  • 4.  Energy signal (Price) 4 Sequence modeling Current time External signal (e.g. Weather, load, generation)
  • 6.  Where is the Memory ? If we have a sequence of samples... predict sample x[t+1] knowing previous values {x[t], x[t-1], x[t-2], …, x[t-τ]} 6 Feed-forward networks review
  • 7. 7 Feed-forward networks review Where is the Memory ? Feed Forward approach: • static window of size L • slide the window time-step wise x[t+1] L
  • 8.  Where is the Memory ? 8 Feed Forward approach: • static window of size L • slide the window time-step wise x[t+1] L Feed-forward networks review
  • 9.  Problems for the FNN + static window approach I • If increasing L, fast growth of num. of parameters ! • Decisions are independent between time-steps!  The network doesn’t care about what happened at previous time-step, only present window matters → doesn’t look good • Can’t work with variable sequence lengths 9 Feed-forward networks review
  • 10. Vanilla RNN  Recurrent Neural Network (RNN) adding the “temporal” evolution 10 Allow to build specific connections capturing ”history” x h y 𝒚𝒕 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑽𝒉𝒕) W U V
  • 11.  RNN: parameters 11 Vanilla RNN x h yW U V 𝒚𝒕 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑽𝒉𝒕)
  • 12.  RNN : unfolding  BEWARE: We have extra depth now ! Every time-step is an extra level of depth (as a deeper stack of layers in a feed-forward fashion !) 12 Vanilla RNN
  • 13.  RNN : depth 1 Forward in space propagation 13 Vanilla RNN
  • 14. 14  RNN : depth 2 Forward in time propagation Vanilla RNN 𝒚𝒕 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑽𝒉𝒕)
  • 15. 15 Vanilla RNN  Training a RNN : BPTT  Backpropagation through time (BPTT): The training algorithm for updating network weights to minimize error including time  Cross Entropy Loss 𝒚𝒕 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑽𝒉𝒕)
  • 16.  Training a RNN : BPTT 16 Vanilla RNN 𝜕𝐸 𝜕𝑊 = 𝑡 𝜕𝐸𝑡 𝜕𝑊 NOTE: our goal is to calculate the gradients of the error with respect to our parameters U, W and V and and then learn good parameters using Stochastic Gradient Descent. Just like we sum up the errors, we also sum up the gradients at each time step for one training example:
  • 17. Training a RNN : BPTT 17 Vanilla RNN 𝜕𝐸3 𝜕𝑊 = 𝜕𝐸3 𝜕 𝑦3 𝜕 𝑦3 𝜕ℎ3 𝜕ℎ3 𝜕𝑊 ℎ3 = 𝑓 𝑈𝑥𝑡 + 𝑊ℎ2 ℎ2 = 𝑓 𝑈𝑥𝑡 + 𝑊ℎ1 ℎ1 = 𝑓(𝑈𝑥𝑡 + 𝑊ℎ0) 𝜕𝐸3 𝜕𝑊 = 𝑘=0 3 𝜕𝐸3 𝜕 𝑦3 𝜕 𝑦3 𝜕ℎ3 𝜕ℎ3 𝜕ℎ 𝑘 𝜕ℎ 𝑘 𝜕𝑊 E3 computation for example
  • 18.  Vanishing gradient  During training gradients explode/vanish easily because of depth-in-time → Exploding/Vanishing gradients ! 18 Vanilla RNN 𝜕𝐸3 𝜕𝑊 = 𝑘=0 3 𝜕𝐸3 𝜕 𝑦3 𝜕 𝑦3 𝜕ℎ3 𝜕ℎ3 𝜕ℎ 𝑘 𝜕ℎ 𝑘 𝜕𝑊 𝜕ℎ3 𝜕ℎ1 = 𝜕ℎ3 𝜕ℎ2 𝜕ℎ2 𝜕ℎ1 𝜕𝐸3 𝜕𝑊 = 𝑘=0 3 𝜕𝐸3 𝜕 𝑦3 𝜕 𝑦3 𝜕ℎ3 𝑗=𝑘+1 3 𝜕ℎ𝑗 𝜕ℎ𝑗−1 𝜕ℎ 𝑘 𝜕𝑊
  • 19.  Vanishing gradient 19 Vanilla RNN tanh and derivative. Source: http://nn.readthedocs.org/en/rtd/transfer/ 𝜕𝐸3 𝜕𝑊 = 𝑘=0 3 𝜕𝐸3 𝜕 𝑦3 𝜕 𝑦3 𝜕ℎ3 𝑗=𝑘+1 3 𝜕ℎ𝑗 𝜕ℎ𝑗−1 𝜕ℎ 𝑘 𝜕𝑊
  • 20.  Vanishing gradient  Standard Solutions • Proper initialization of Weight Matrix • Regularization of outputs or Dropout • Use of ReLU Activations as it’s derivative is either 0 or 1 20 Vanilla RNN
  • 23. 1. Change the way in which past information is kept → create the notion of cell state, a memory unit that keeps long-term information in a safer way by protecting it from recursive operations 2. Make every RNN unit able to decide whether the current time- step information matters or not, to accept or discard (optimized reading mechanism) 3. Make every RNN unit able to forget whatever may not be useful anymore by clearing that info from the cell state (optimized clearing mechanism) 4. Make every RNN unit able to output the decisions whenever it is ready to do so (optimized output mechanism) 23 Long-Short Term Memory (LSTM)
  • 24. 24 Long-Short Term Memory (LSTM) • Internal Memory (Cell State, or data) 사용 • 현시점 입력(입력과 이전 시점 출력)을 이용하여 - Internal Memory 정보의 부분 가감 - 현시점 입력의 Internal Memory 저장여부 - Internal Memory로 부터 출력값의 설정
  • 25. depth time RNN LSTM tt-1 l l-1 ℎ 𝑡−1 𝑙 ℎ 𝑡 𝑙−1 ℎ 𝑡 𝑙 ℎ 𝑡 𝑙 Long-Short Term Memory (LSTM)  RNN과 LSTM의 수식적 차이
  • 26. f x i g x + tanh o x f x i g x + tanh o x @ time t ht-1 xt xt+1 ht ht+1 ct-1 Cell state ct ct+1 Long-Short Term Memory (LSTM) @ time t+1  LSTM의 각각의 Cell은 다음과 같으며, 여러 개의 gate로 구성 입력 또는 하부층 출력 전시점(t-1) cell 데이터 전시점(t-1) 출력 출력 Cell state (Valuable information Worth keeping long term)
  • 27. Long-Short Term Memory (LSTM)  LSTM의 gate함수에 대한 이해 Sigmoide : - Sigmoide 출력값은 0~1사이에 존재 - Cell state 값이나 입출력값의 상대적인 중요도를 설정 - “0”이면 필요 없으므로 삭제, “1”이면 중요하므로 유지 - Hyperbolic tanget 출력값은 -1~1사이에 존재 - Cell state, 입출력값을 Normalization 하기 위함. - 따라서 LSTM을 쉽게 이해하기 위해서 무시해도 됨.
  • 28. f Forget Gate  과거 계열 데이터의 사용/미사용을 제어 𝑓𝑡 = 𝜎(𝑊𝑓 𝑥 𝑥𝑡 + 𝑊 𝑓h ht-1 ) x Long-Short Term Memory (LSTM) ct-1 ht-1 xt - Sigmoid ft’n의 출력값은 0 ~ 1 사이에 존재  ft가 “1”이면 이전 State 값을 유지  ft가 “0”이면 이전 State 값을 삭제 Cell state 학습하는 변수 입력 또는 하부층 출력 전시점(t-1) cell 데이터 전시점(t-1) 출력 ∙ : Element-wise multi
  • 29. Input Gate  입력데이터의 사용/미사용을 제어 Long-Short Term Memory (LSTM) i g x f gt= tanh(Wgx x𝑡 + 𝑊ghht-1) xct-1 ht-1 xt - gt 는 Hyperbolic tangent ft’n의 출력값이므로 -1 ~ 1 사이에 존재  입력데이터의 Normalization - it는 Sigmoid ft’n의 출력값이므로 0~1 사이에 존재 Cell state + it= σ(Wix x𝑡 + 𝑊ihht-1) yt = gt⨀it y 학습하는 변수 ht-1 xt ct 현시점(t) cell 데이터
  • 30. Output Gate  출력데이터의 사용/미사용을 제어 Long-Short Term Memory (LSTM) x f ot = σ(Wox x𝑡 + 𝑊ohht-1) xct-1 Cell state + ht = ot⨀tanh(ct) y 학습하는 변수 tanh o x ht ht-1 xt ct 현시점(t) 출력
  • 31. Long-Short Term Memory (LSTM) i f o g sigmoid sigmoid tanh sigmoid 4n x 2n 4n 4*n nx1 nx1 Wix Wih Wfx Wfh Wox Woh Wgx Wgh xt ℎ 𝑡−1 𝑙 2n  Matrix와 Vector 형태로 간략화 시키면 LSTM weight matrix to be identified 하층 출력 또는 입력벡터 (x) 전시점(t-1) 출력벡터 ct-1 Cell state x + x ct x tanh ht
  • 32. Design Patterns for RNN RNN Sequences Blog post by A. Karpathy. “The Unreasonable Effectiveness of Recurrent Neural Networks” (2015) Task Input Output Image classification fixed-sized image fixed-sized class Image captioning image input sentence of words Sentiment analysis sentence positive or negative sentiment Machine translation sentence in English sentence in French Video classification video sequence label each frame Page 32
  • 33. RNN Implementation using TensorFlow How we design RNN model for time series prediction?  How manipulate our time series data as input of RNN? Page 33
  • 34. LAB-5) Connect input and recurrent layers rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units) stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([rnn_cell] * depth) x_split = tf.split(batch_size, time_steps, x_data) output, state = tf.nn.rnn(stacked_lstm, x_split) 𝑥𝑡−9 𝑥𝑡−8 𝑥𝑡−7 … 𝑥𝑡 LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM … 𝑜𝑡−9 𝑜𝑡−8 𝑜𝑡−7 … 𝑜𝑡 Page 34
  • 35. Long Short-Term Memory Network for Remaining Useful Life Estimation Deep LSTM model for RUL estimation NASA C-MAPSS (Commercial Modular Aero-Propulsion System Simulation) data set (Turbofan Engine Degradation Simulation Data Set)
  • 36. Deep LSTM model for RUL estimation
  • 37. Electricity Price Forecasting (EPF) Current timeEnergy signal (Price) External signal (e.g. Weather, load, generation) Page 37
  • 38. Experiment results LSTM + DNN + LinearRegression predicted test hour price (euro/MWh) Page 38
  • 39. Experiment results Models Mean Absolute Error (euro/MWh) LinearRegression 4.04 RidgeRegression 4.04 LassoRegression 3.73 ElasticNet 3.57 LeastAngleRegression 6.27 LSTM+DNN+LinearRegression 2.13 Page 39
  • 40. Show and Tell : A Neural Image Caption Generator 참고자료 1. “Show and Tell: A Neural Image Caption Generator”, O.Vinyals, A.Toshev, S.Bengio, D.Erhan 2. CV勉強会@関東「CVPR2015読み会」発表資料, 皆川卓也 3. Lecture Note “Recurrent Neural Networks”, CS231n, Andrej Karpathy 2017. 김홍배 한국항공우주연구원
  • 41. 개요  1장의 스틸사진으로 부터 설명문(Caption)을 생성  자동번역등에 사용되는 Recurrent Neural Networks (RNN)에 Deep Convolutional Neural Networks에서 생성한 이미지의 특징벡터를 입력  Neural Image Caption (NIC)  종래방법을 크게 상회하는 정확도
  • 42. Neural Image Caption (NIC)  사진(I)를 입력으로 주었을 때  정답 “설명문“, S를 만들어 낼 가능성을 최대가 되도록  학습데이터(I, S)를 이용하여  넷의 변수(w)들을 찾아내는 과정 설명문 w∗ = argmax 𝐼,𝑆 log ‫(݌‬S|I;w) w 사진, 변수 확률 손실함수 전체 학습데이터 셋에 대한 손실함수 손실함수를 최소화 시키는 변수, w*를 구하는 작업
  • 43. Neural Image Caption (NIC)  사진으로부터 설명문 생성 𝑝 𝑆 𝐼; 𝑤 = 𝑡=0 𝑁 𝑝 𝑆𝑡 𝐼, 𝑆0, 𝑆1,···, 𝑆𝑡−1; 𝑤 단어수 각 단어는 그전 단어열의 영향을 받는다. 𝑆 ={𝑆0, 𝑆1, ⋯} 단어, 따라서 설명문 S는 길이가 변하는 계열데이터
  • 44. Neural Image Caption (NIC)  사진으로부터 설명문 생성 𝑝 𝑆 𝐼; 𝑤 = 𝑡=0 𝑁 𝑝 𝑆𝑡 𝐼, 𝑆0, 𝑆1,···, 𝑆𝑡−1; 𝑤 학습 데이터 셋(I,S)로 부터 훈련을 통해 찾아내는 변수
  • 45. ht-1 xt 단어 @ t Neural Image Caption (NIC) St L S T M WeSt 입력 @ t 출력@t Pt+1(St+1)=softmax(ℎ𝑡)  LSTM based Sentence Generator의 기본 구조 ℎ𝑡 : 단어별 확률적 분포를 계산 ht log ‫(݌‬St+1) : 손실함수 계산 : word embedding 과정 출력@t-1
  • 46. 46 Neural Image Caption (NIC)  Word Embedding 일반적으로 “one hot“ vector형태로 단어를 나타내는데, 단어들로 구성된 Dictionary의 크기가 바뀌기 쉬움 이경우 LSTM의 모델링등에 어려움이 있음 이에 따라 가변의 “one hot“ vector형태를 고정된 길이의 Vector형태로 변형시키는 과정이 필요 dog 0010000000 cat one hot vector representation 0000001000 Word embedding vector representation dog 0.10.30.20.10.20.3 cat we 0.20.10.20.20.10.1 xtSt
  • 47. 47 Neural Image Caption (NIC)  손실함수 For 𝑦_𝑖 = 1 𝑐𝑎𝑠𝑒 J(w)=-log𝑦𝑖 𝑦𝑖 1 J(w) As 𝑦𝑖 approaches to 1, J(w) becomes 0 J(w)=-∑𝑦_𝑖•log𝑦𝑖 y : 분류기에서 추정한 확률값 y_ : 정답 Cross entropy로 정의함
  • 49. Neural Image Caption (NIC) 사진의 특징벡터를 Deep CNN에서 가져움 LSTM으로최초의 입력이됨(𝒙−𝟏)
  • 50. Neural Image Caption (NIC) 단어𝑺 𝟎을입력 다음 단어가 𝑺 𝟏일확률
  • 51. Neural Image Caption (NIC) h 𝟎, c𝟎
  • 52. NIC의 학습과정 ImageNet+ Drop out 으로 Pretraining 랜던하게변수를초기화
  • 53. NIC의 학습과정  학습용 사진과 설명문 셋
  • 57. SpecialStart Word 가장 확률이 높은 단어 𝑺 𝟏을선택 NIC로 예측 (Sampling)
  • 58. 선택된 단어 𝑺 𝟏을입력 end- of- sentence token이 나타날때 까지 계속 NIC로 예측 (Sampling)