SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Gated Feedback Recurrent Neural Networks
Matsuo lab. paper reading session
Jul.17 2015
School of Engineering, The University of Tokyo
Hiroki Kurotaki
kurotaki@weblab.t.u-tokyo.ac.jp
Contents
2
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Contents
3
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Paper Information
・Gated Feedback Recurrent Neural Networks
・Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
Dept. IRO, Universite de Montreal, CIFAR Senior Fellow
・Proceedings of The 32nd International Conference on Machine Learning, pp.
2067–2075, 2015 (ICML 2015)
・1st submission to arXiv.org is on 9 Feb 2015
・Cited by 9 (Google Scholar, Jul 17 2015)
・http://jmlr.org/proceedings/papers/v37/chung15.html
4
Contents
5
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Introduction 1/3
・They propose a novel recurrent neural network(RNN) architecture,
Gated-feedback RNN (GN-RNN).
・GN-RNN allows connections from upper layers to lower layers,
then controlls the signals by global gating unit.
6
(Each circle represents a layer consists of recurrent units (i.e. LSTM-Cell))
Introduction 2/3
・The proposed GF-RNN outperforms the baseline methods in these tasks.
7
1. Character-level Lauguage Modeling
(from a subsequence of structured data, predict the rest of the characters.)
Introduction 3/3
・The proposed GF-RNN outperforms the baseline methods in these tasks.
8
2. Python Program Evaluation
(predict script execution results from the input as a raw charater sequence.)
( [Zaremba 2014] Figure 1)
Contents
9
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Related works(Unit) : Long short-term memory
・A LSTM cell is just a neuron
・but it decides when to memorize, forget and expose the content value
10
( [Zaremba 2014] Figure 1)
(The notation used in the figure is slightly different from this paper)
Related works(Unit) : Gated recurrent unit
・Cho et al. 2014
・Like LSTM, adaptively reset(forget) or update(input) its memory content.
・But unlike LSTM, no output gate
・balances between the previous and new memory contents adaptively
11( [Cho 2014] Figure 2)
Related works(Architecture) : Conventional Stacked RNN
・Each circle represents a layer consists of many recurrent units
・Several hidden recurrent layers are stacked to model and capture hierahchical
structure between short and long-term dependencies.
12
Related works(Architecture) : Clockwork RNN
・i-th hidden module is only updated at the rate of 2^(i-1)
・Neurons in faster module i are connected to neurons in a slower
module j only if a clock period T_i < T_j.
13( [Koutnik 2014] Figure 1)
Contents
14
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Proposed Method : Gated Feedback RNN
・Generalize the Clockwork RNN in both connection and work rate
・Flows back from the upper recurrent layers into the lower layers
・Adaptively control when to connect each layer with "global reset gates".
(Small bullets on the edges)
15
: the concatenation of all the
hidden states from the previous
timestep (t-1)
: from layer i in timestep (t-1)
to layer j in timestep t
global reset gate
Proposed Method : GF-RNN with LSTM unit
・Only used when computing new memory state
16
( [Zaremba 2014] Figure 1)
(The notation used in the figure is slightly different from this paper)
Proposed Method : GF-RNN with GRN unit
・Only used when computing new memory state
17( [Cho 2014] Figure 2)
Contents
18
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Experiment : Tasks (Lauguage Modeling)
・From a subsequence of structured data, predict the rest of the characters.)
19
Experiment : Tasks (Lauguage Modeling)
・Hutter dataset
・English Wikipedia, contains 100 MBytes of characters which include Latin
alphabets, non-Latin alphabets, XML markups and special characters
・Training set : the first 90 MBytes
Validation set : the next 5 MBytes
Test set : the last 10 MBytes
・Performance measure :
the average number of bits-per-character (BPC)
20
Experiment : Models (Lauguage Modeling)
・3 RNN architectures : single, (conventional) stacked, Gated-feedback
・3 recurrent units : tanh, LSTM, Gated Recurrent Unit (GRU)
・The number of parameters are constrained to be roughly 1000
・Detail
- RMSProp & momentum
- 100 epochs
- learning rate : 0.001 (GRU, LSTM)
5×10^(-5) (tanh)
- momentum coef. : 0.9
- Each update is done using a
minibatch of 100 subsequences
of length 100 each.
21
Experiment : Results and Analysis (Lauguage Modeling)
22
・GF-RNN is good when used together with GRU and LSTM
・But failed to improve the performance with tanh units
・GF-RNN with LSTM is better than the Non-Gated (The undermost)
Experiment : Results and Analysis (Lauguage Modeling)
23
・the stacked LSTM failed to close the tags with </username> and
</contributor> in both trials
・However, the GF-LSTM succeeded to close both of them,
which shows that it learned about the structure of XML tags
Experiment : Additional results (Lauguage Modeling)
24
・They trained another GF-RNN with LSTM which includes
larger number of parameters, and obtained comparable results.
・(They wrote it's better than the previously reported best results,
but there is a non-RNN work that acheived 1.278)
Experiment : Tasks (Python Program Evaluation)
・input : a python program ends with a print statement, 41symbols
output : the result of a print statement, 13 symbols
・Scripts used in this task include addition, multiplication, subtraction,
for-loop, variable assignment, logical comparison and if-else statement.
・Both the input & output are sequences of characters.
・Nesting : [1,5]
・length : [1, 1^10]
25( [Zaremba 2014] Figure 1)
Experiment : Models (Python Program Evaluation)
・RNN encoder-decoder approach, used for translation task previously
・Encoder RNN : the hidden state of the encoder RNN is unfolded for 50 timesteps.
・Decoder RNN : initial hidden state is initialized with the last hidden state of the
encoder RNN.
・Detail
- GRU & LSTM with and without Gated
- 3 hidden layers for each Encoder &
Decoder RNN
- hidden layer contains : 230 units(GRM)
200 units(LSTM)
- mixed curriculum strategy [Zaremba '14]
- Adam [Kingma '14]
- minibatch with 128 sequences
- 30 epochs
26( [Cho 2014] Figure 1)
Experiment : Results & Analysis (Python Program Evaluation)
・From the 3rd column, GF-RNN is better with almost all target script.
27
GRULSTM
Contents
28
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Conclusion
・They proposed a novel architecture for deep stacked RNNs which uses gated-
feedback connections between different layers.
・The proposed method outperformed previous results in the tasks of character-level
language modeling and Python program evaluation.
・Gated-feedback architecture is faster and better (in performance) than the
standard stacked RNN even with a same amount of capacity.
・More thorough investigation into the interaction between the gated- feedback
connections and the role of recurrent activation function is required in the future.
(because the proposed gated-feedback architecture works bad with
the tanh activation function)
29
References
[Cho 2014] Cho, Kyunghyun, Van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger,
and Bengio, Yoshua. Learning phrase representations using rnn encoder-decoder for statistical machine
translation. arXiv preprint arXiv:1406.1078, 2014.
[Koutnik 2014] Koutnik, Jan, Greff, Klaus, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. A clockwork rnn. In
Proceedings of the 31st International Conference on Machine Learning (ICML’14), 2014.
[Schmidhuber 1992] Schmidhuber, Jurgen. Learning complex, extended sequences using the principle of
history compression. Neural Computation, 4(2):234–242, 1992.
[Stollenga 2014] Stollenga, Marijn F, Masci, Jonathan, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. Deep
networks with internal selective attention through feedback connections. In Ad- vances in Neural
Information Processing Systems, pp. 3545–3553, 2014.
[Zaremba 2014] Zaremba, Wojciech and Sutskever, Ilya. Learning to execute. arXiv preprint arXiv:1410.4615,
2014.
30
論文輪読資料「Gated Feedback Recurrent Neural Networks」

Weitere ähnliche Inhalte

Was ist angesagt?

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowAltoros
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning FrameworksSeiya Tokui
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...Kazuki Fujikawa
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 

Was ist angesagt? (20)

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
 
LSTM
LSTMLSTM
LSTM
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning Frameworks
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 

Andere mochten auch

集合知プログラミング勉強会 7章(前半)
集合知プログラミング勉強会 7章(前半)集合知プログラミング勉強会 7章(前半)
集合知プログラミング勉強会 7章(前半)koba cky
 
アンサンブル学習
アンサンブル学習アンサンブル学習
アンサンブル学習Hidekazu Tanaka
 
現在のDNNにおける未解決問題
現在のDNNにおける未解決問題現在のDNNにおける未解決問題
現在のDNNにおける未解決問題Daisuke Okanohara
 
パターン認識 第10章 決定木
パターン認識 第10章 決定木 パターン認識 第10章 決定木
パターン認識 第10章 決定木 Miyoshi Yuya
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual TalksYuya Unno
 

Andere mochten auch (7)

集合知プログラミング勉強会 7章(前半)
集合知プログラミング勉強会 7章(前半)集合知プログラミング勉強会 7章(前半)
集合知プログラミング勉強会 7章(前半)
 
Beamertemplete
BeamertempleteBeamertemplete
Beamertemplete
 
NLP2017 NMT Tutorial
NLP2017 NMT TutorialNLP2017 NMT Tutorial
NLP2017 NMT Tutorial
 
アンサンブル学習
アンサンブル学習アンサンブル学習
アンサンブル学習
 
現在のDNNにおける未解決問題
現在のDNNにおける未解決問題現在のDNNにおける未解決問題
現在のDNNにおける未解決問題
 
パターン認識 第10章 決定木
パターン認識 第10章 決定木 パターン認識 第10章 決定木
パターン認識 第10章 決定木
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks
 

Ähnlich wie 論文輪読資料「Gated Feedback Recurrent Neural Networks」

SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...Sharath TS
 
Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...TELKOMNIKA JOURNAL
 
Lexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchLexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchSatoru Katsumata
 
A Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingA Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingMartha Brown
 
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitSatoru Katsumata
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...ssuser4b1f48
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...ssuser4b1f48
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Android Malware
Android Malware Android Malware
Android Malware Nambiraju
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
Ire presentation
Ire presentationIre presentation
Ire presentationRaj Patel
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINERaj Patel
 
aMCfast: Automation of Fast NLO Computations for PDF fits
aMCfast: Automation of Fast NLO Computations for PDF fitsaMCfast: Automation of Fast NLO Computations for PDF fits
aMCfast: Automation of Fast NLO Computations for PDF fitsjuanrojochacon
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...ssuser4b1f48
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Simulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksSimulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksDaniel Zuniga
 

Ähnlich wie 論文輪読資料「Gated Feedback Recurrent Neural Networks」 (20)

SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
 
Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...
 
Lexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchLexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam search
 
Conformer review
Conformer reviewConformer review
Conformer review
 
A Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingA Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series Forecasting
 
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
Scene understanding
Scene understandingScene understanding
Scene understanding
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Android Malware
Android Malware Android Malware
Android Malware
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
aMCfast: Automation of Fast NLO Computations for PDF fits
aMCfast: Automation of Fast NLO Computations for PDF fitsaMCfast: Automation of Fast NLO Computations for PDF fits
aMCfast: Automation of Fast NLO Computations for PDF fits
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Simulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksSimulation of Wireless Sensor Networks
Simulation of Wireless Sensor Networks
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 

Kürzlich hochgeladen

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

論文輪読資料「Gated Feedback Recurrent Neural Networks」

  • 1. Gated Feedback Recurrent Neural Networks Matsuo lab. paper reading session Jul.17 2015 School of Engineering, The University of Tokyo Hiroki Kurotaki kurotaki@weblab.t.u-tokyo.ac.jp
  • 2. Contents 2 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 3. Contents 3 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 4. Paper Information ・Gated Feedback Recurrent Neural Networks ・Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio Dept. IRO, Universite de Montreal, CIFAR Senior Fellow ・Proceedings of The 32nd International Conference on Machine Learning, pp. 2067–2075, 2015 (ICML 2015) ・1st submission to arXiv.org is on 9 Feb 2015 ・Cited by 9 (Google Scholar, Jul 17 2015) ・http://jmlr.org/proceedings/papers/v37/chung15.html 4
  • 5. Contents 5 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 6. Introduction 1/3 ・They propose a novel recurrent neural network(RNN) architecture, Gated-feedback RNN (GN-RNN). ・GN-RNN allows connections from upper layers to lower layers, then controlls the signals by global gating unit. 6 (Each circle represents a layer consists of recurrent units (i.e. LSTM-Cell))
  • 7. Introduction 2/3 ・The proposed GF-RNN outperforms the baseline methods in these tasks. 7 1. Character-level Lauguage Modeling (from a subsequence of structured data, predict the rest of the characters.)
  • 8. Introduction 3/3 ・The proposed GF-RNN outperforms the baseline methods in these tasks. 8 2. Python Program Evaluation (predict script execution results from the input as a raw charater sequence.) ( [Zaremba 2014] Figure 1)
  • 9. Contents 9 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 10. Related works(Unit) : Long short-term memory ・A LSTM cell is just a neuron ・but it decides when to memorize, forget and expose the content value 10 ( [Zaremba 2014] Figure 1) (The notation used in the figure is slightly different from this paper)
  • 11. Related works(Unit) : Gated recurrent unit ・Cho et al. 2014 ・Like LSTM, adaptively reset(forget) or update(input) its memory content. ・But unlike LSTM, no output gate ・balances between the previous and new memory contents adaptively 11( [Cho 2014] Figure 2)
  • 12. Related works(Architecture) : Conventional Stacked RNN ・Each circle represents a layer consists of many recurrent units ・Several hidden recurrent layers are stacked to model and capture hierahchical structure between short and long-term dependencies. 12
  • 13. Related works(Architecture) : Clockwork RNN ・i-th hidden module is only updated at the rate of 2^(i-1) ・Neurons in faster module i are connected to neurons in a slower module j only if a clock period T_i < T_j. 13( [Koutnik 2014] Figure 1)
  • 14. Contents 14 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 15. Proposed Method : Gated Feedback RNN ・Generalize the Clockwork RNN in both connection and work rate ・Flows back from the upper recurrent layers into the lower layers ・Adaptively control when to connect each layer with "global reset gates". (Small bullets on the edges) 15 : the concatenation of all the hidden states from the previous timestep (t-1) : from layer i in timestep (t-1) to layer j in timestep t global reset gate
  • 16. Proposed Method : GF-RNN with LSTM unit ・Only used when computing new memory state 16 ( [Zaremba 2014] Figure 1) (The notation used in the figure is slightly different from this paper)
  • 17. Proposed Method : GF-RNN with GRN unit ・Only used when computing new memory state 17( [Cho 2014] Figure 2)
  • 18. Contents 18 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 19. Experiment : Tasks (Lauguage Modeling) ・From a subsequence of structured data, predict the rest of the characters.) 19
  • 20. Experiment : Tasks (Lauguage Modeling) ・Hutter dataset ・English Wikipedia, contains 100 MBytes of characters which include Latin alphabets, non-Latin alphabets, XML markups and special characters ・Training set : the first 90 MBytes Validation set : the next 5 MBytes Test set : the last 10 MBytes ・Performance measure : the average number of bits-per-character (BPC) 20
  • 21. Experiment : Models (Lauguage Modeling) ・3 RNN architectures : single, (conventional) stacked, Gated-feedback ・3 recurrent units : tanh, LSTM, Gated Recurrent Unit (GRU) ・The number of parameters are constrained to be roughly 1000 ・Detail - RMSProp & momentum - 100 epochs - learning rate : 0.001 (GRU, LSTM) 5×10^(-5) (tanh) - momentum coef. : 0.9 - Each update is done using a minibatch of 100 subsequences of length 100 each. 21
  • 22. Experiment : Results and Analysis (Lauguage Modeling) 22 ・GF-RNN is good when used together with GRU and LSTM ・But failed to improve the performance with tanh units ・GF-RNN with LSTM is better than the Non-Gated (The undermost)
  • 23. Experiment : Results and Analysis (Lauguage Modeling) 23 ・the stacked LSTM failed to close the tags with </username> and </contributor> in both trials ・However, the GF-LSTM succeeded to close both of them, which shows that it learned about the structure of XML tags
  • 24. Experiment : Additional results (Lauguage Modeling) 24 ・They trained another GF-RNN with LSTM which includes larger number of parameters, and obtained comparable results. ・(They wrote it's better than the previously reported best results, but there is a non-RNN work that acheived 1.278)
  • 25. Experiment : Tasks (Python Program Evaluation) ・input : a python program ends with a print statement, 41symbols output : the result of a print statement, 13 symbols ・Scripts used in this task include addition, multiplication, subtraction, for-loop, variable assignment, logical comparison and if-else statement. ・Both the input & output are sequences of characters. ・Nesting : [1,5] ・length : [1, 1^10] 25( [Zaremba 2014] Figure 1)
  • 26. Experiment : Models (Python Program Evaluation) ・RNN encoder-decoder approach, used for translation task previously ・Encoder RNN : the hidden state of the encoder RNN is unfolded for 50 timesteps. ・Decoder RNN : initial hidden state is initialized with the last hidden state of the encoder RNN. ・Detail - GRU & LSTM with and without Gated - 3 hidden layers for each Encoder & Decoder RNN - hidden layer contains : 230 units(GRM) 200 units(LSTM) - mixed curriculum strategy [Zaremba '14] - Adam [Kingma '14] - minibatch with 128 sequences - 30 epochs 26( [Cho 2014] Figure 1)
  • 27. Experiment : Results & Analysis (Python Program Evaluation) ・From the 3rd column, GF-RNN is better with almost all target script. 27 GRULSTM
  • 28. Contents 28 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 29. Conclusion ・They proposed a novel architecture for deep stacked RNNs which uses gated- feedback connections between different layers. ・The proposed method outperformed previous results in the tasks of character-level language modeling and Python program evaluation. ・Gated-feedback architecture is faster and better (in performance) than the standard stacked RNN even with a same amount of capacity. ・More thorough investigation into the interaction between the gated- feedback connections and the role of recurrent activation function is required in the future. (because the proposed gated-feedback architecture works bad with the tanh activation function) 29
  • 30. References [Cho 2014] Cho, Kyunghyun, Van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger, and Bengio, Yoshua. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. [Koutnik 2014] Koutnik, Jan, Greff, Klaus, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. A clockwork rnn. In Proceedings of the 31st International Conference on Machine Learning (ICML’14), 2014. [Schmidhuber 1992] Schmidhuber, Jurgen. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234–242, 1992. [Stollenga 2014] Stollenga, Marijn F, Masci, Jonathan, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. Deep networks with internal selective attention through feedback connections. In Ad- vances in Neural Information Processing Systems, pp. 3545–3553, 2014. [Zaremba 2014] Zaremba, Wojciech and Sutskever, Ilya. Learning to execute. arXiv preprint arXiv:1410.4615, 2014. 30

Hinweis der Redaktion

  1. TOK-AAA123-20100706-
  2. ----- 会議メモ (2014/02/10 18:36) ----- あまり体系だっていない 人工知能 = 深層学習とかよくわからん 雑誌の1行目をそのまま書いたみたいになっている 深層学習と人工知能の結びつけ