SlideShare ist ein Scribd-Unternehmen logo
1 von 12
A BRIEF INTRODUCTION ON
GATED RECURRENT UNIT
COMP7404 – GROUP U
Image: Illustration of gated recurrent unit. From Chung et al., 2014:
https://arxiv.org/abs/1412.3555
AGENDA
1) A Quick Recap on Deep Learning Architectures
① Standard Neural Network (NN)
② Recurrent Neural Network (RNN)
2) A Deep Dive in Gated Recurrent Unit (GRU)
3) Rainfall Project Overview
① Competition Overview
② Source Data
③ Code and Demo: Rainfall Prediction
4) Q&A
LIMITATIONS OF STANDARD NEURAL
NETWORK
Source: A Critical Review of Recurrent Neural Networks for Sequence Learning, Lipton et al., 2015; Simulation of Neural Networks (in German) (1st ed.)., Zell, Andreas, 1994
Major Constraints:
 Input and output of fixed length; not
efficient with sequential data
 Do not well exploit features learned
previously
Image: Coursera: Sequence Models, Andrew Ng
Image: Wrist mounted device sleep graph, Lucid Dreaming App
Est-ce que vous
êtes prêt?
Are you ready?
RECURRENT NEURAL NETWORK (RNN)
Source: Understanding LSTM Networks, Olah, 2015; A Critical Review of
Recurrent Neural Networks for Sequence Learning, Lipton et al,. 2015;
The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej
2015
 Has loops to deal with sequential data
 Can handle vectors of variable length
10/04/2018 Predicting E-commerce Consumer Behavior using Recurrent Neural Networks
Figure 2 showssome output h, we call thisthe hidden state of the cell.
So, what exactly are cells? Let’shave quick look at the two cell
architecturesmentioned above; LSTM and GRU. We will be testing both
architectureswhen training the models.
Long Short-Term Memory (LSTM)
The Long Short-Term Memory (LSTM) cell, like the basic cell,
computesa new state and an output from an input and the previous
state. The (hidden) state of LSTM issplit in two vectors, one for short-
term state and one for long-term state. Figure 3 illustratesthe
architecture of the LSTM cell. In gure 3, cisthe long-term state and h
isthe short-term state. LSTM receivesboth short-term and long-term
statesfrom previoustimesteps[5].
Figure 2: Cell folded/unfolded. Inspired by Aurélien Géron.
Recurrent architecture
10/04/2018 Predicting E-commerce Consumer Behavior using Recurrent Neural Networks
Figure 2 showssome output h, we call thisthe hidden state of the cell.
So, what exactly are cells?Let’shave quick look at the two cell
architecturesmentioned above; LSTM and GRU. We will be testing both
architectureswhen training the models.
Long Short-Term Memory (LSTM)
TheLong Short-Term Memory (LSTM) cell, like thebasic cell,
computesa new state and an output from an input and the previous
state. The (hidden) state of LSTM issplit in two vectors, onefor short-
term state and onefor long-term state. Figure3 illustratesthe
architectureof theLSTM cell. In gure 3, cisthe long-term state and h
isthe short-term state. LSTM receivesboth short-term and long-term
statesfrom previoustimesteps[5].
Figure 2: Cell folded/unfolded. Inspired by Aurélien Géron.#Update the hidden state
𝒉 𝒕 = 𝒇 𝑾(𝒙)
. 𝒙 𝒕 + 𝑽(𝒉)
. 𝒉 𝒕−𝟏
#Compute the output vector
𝑦𝑡 = 𝑔(𝑊(𝑦)
. ℎ 𝑡)
Notations
𝑥𝑡 = input at time t
ℎ 𝑡 = hidden state at time t
𝑦𝑡 = output at time t
f = activation functionliketanh
g = activation function for the output like a sigmoid
𝑊(𝑥)
, 𝑉(ℎ)
, 𝑊(𝑦)
= parameters
(Like standard RNN )
Image classification
Sentiment analysis,
video recognition
Text translationMusic
generation
 Unable to handle the “long-term dependencies”
well in practice  “Vanishing Gradient” It was a handwritten
application from Steve
Jobs for jobs in HP. We
use online application
for jobs nowadays. How
do jobs and career
mean for college
graduates? Those who
have read the
bibliography of Jobs
may have a different
viewpoint …
Loops make it a longer
path and more
complicated to
calculate the derivative
Limitations with RNNs
Solution:
Use a more sophisticated architecture which allows for shorter path and less multiplication to calculate the gradient.
 Learns how to keep memories for long distance dependencies
 Avoid vanishing gradient problem
mer Behavior using Recurrent Neural Networks
thisthe hidden state of the cell.
quick look at the two cell
and GRU. We will be testing both
s.
STM)
nspired by Aurélien Géron.
g E-commerce Consumer Behavior using Recurrent Neural Networks
output h, we call thisthe hidden state of the cell.
ells? Let’shave quick look at the two cell
ed above; LSTM and GRU. We will be testing both
aining the models.
 Memory (LSTM)
Memory (LSTM) cell, like the basic cell,
and an output from an input and the previous
Cell folded/unfolded. Inspired by Aurélien Géron.
Equations
Different diagrams in the literature :
Update gate 𝒛𝒕 = 𝑓 𝑊(𝑧)
. 𝒙 𝒕 + 𝑉(𝑧)
. 𝒉 𝒕−𝟏
Reset gate 𝒓 𝒕 = 𝑓 𝑊(𝑟)
. 𝒙 𝒕 + 𝑉(𝑟)
. 𝒉 𝒕−𝟏
Reset gate memory 𝒉 𝒕 = 𝑔 𝑊(ℎ)
. 𝒙 𝒕 + 𝑉(ℎ)
. 𝒓 𝒕 ∗ 𝒉 𝒕−𝟏
Memory to transmit 𝒉 𝒕 = (1 − 𝒛𝒕)∗ 𝒉 𝒕−𝟏 + 𝒛𝒕 ∗ 𝒉 𝒕
Gated Recurrent Unit (GRU)
11/04/2018 Understanding GRU networks – Towards Data Science
https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be 5/14
First, let’sintroducethe notations:
If you are not familiar with theabove terminology, I recommend
watching these tutorialsabout “sigmoid” and “tanh” function and
“Hadamard product” operation.
#1. Update gate
We start with calculating theupdate gate z_t for time step t using the
formula:
Gated Recurrent Unit
12/04/2018 Predicting E-commerce Consumer Behavior using Recurrent Neural Networks
https://blog.nirida.ai/predicting-e-commerce-consumer-behavior-using-recurrent-neural-networks-36e37f1aed22 9/43
Figure 4 illustratestheGRU cell. In GRU, both state vectorsof the LSTM
havebeen merged into a singlevector, h. Instead of threegate
controllers, GRU usestwo; one controlling both the forget gate and the
input gate, while thereisno output gate. Thefull state vector isthe
output at every timestep.
Figure 4: GRU cell. Inspired by Christopher Olah.
Jobs is founder of a company, which company has its headquarter … . Jobs …
𝑧𝑡 = 1
ℎ 𝑡 = 1 … … … … ℎ 𝑡 = 1
Source: Cho et al. 2014, Coursera: Sequence Models, Andrew Ng
COMPETITION: HOW MUCH DID IT RAIN?
 https://www.kaggle.com/c/how-much-did-it-rain-ii/data
 Our solution
o Goal: showcase the application of GRU
o Methodology: train GRU using the radar snapshot data in
order to predict the rainfall
o Tools: Keras
 Data preprocessing challenges
o Irregular radar measurement times
o Outliers
o Overfitting
 Training data and test data may not be fully independent
Source: www. kaggle.com/c/how-much-did-it-rain-ii/data;
Scripts and sources (incl.
image, script quotation,
etc.) to be updated / Danny
TRAINING DATA
Training data sample:
Source: www. kaggle.com/c/how-much-did-it-rain-ii/data;
Scripts and sources (incl.
image, script quotation,
etc.) to be updated / Danny
DATA PREPROCESSING AND TESTING
Source: www. kaggle.com/c/how-much-did-it-rain-ii/data;
Scripts and sources (incl.
image, script quotation,
etc.) to be updated / Danny
CODE IMPLEMENTATION
Source: []
Slides, scripts and sources
(incl. image, script
quotation, codes, etc.) to be
updated / Paul
DEMO
Source: []
Items to be included in GitHub uploads:
• Demo with source and all dependencies and detailed instructions in
markdown format on how to run the demo to be uploaded as a single
zip file.
• Please ensure that the instructions on how to run the demo are
sufficiently detailed. You won’t be able to get a good grade if we are
not able to run you demo.
Items to be included in this PPT:
• The presentation must include this live demo. If a live demo is not
possible, a video demo can be presented. The file size limit for this zip
file is 100MB.
• A link to GitHub version. The link should be included as a QR code in
this PPT.
Slides, scripts and sources
(incl. image, script
quotation, codes, etc.) to be
updated / Paul
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...csandit
 
Deep learning nlp
Deep learning nlpDeep learning nlp
Deep learning nlpHeng-Xiu Xu
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 
Training course lect3
Training course lect3Training course lect3
Training course lect3Noor Dhiya
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Universitat Politècnica de Catalunya
 
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo,  EPFL, SwitzerlandHybrid neural networks for time series learning by Tian Guo,  EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo, EPFL, SwitzerlandEuroIoTa
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsKusano Hitoshi
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeFrancisco Zamora-Martinez
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLRtaeseon ryu
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabImry Kissos
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into valueNAVER D2
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...Hiroki Nakahara
 

Was ist angesagt? (20)

NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
 
Deep learning nlp
Deep learning nlpDeep learning nlp
Deep learning nlp
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Training course lect3
Training course lect3Training course lect3
Training course lect3
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
 
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo,  EPFL, SwitzerlandHybrid neural networks for time series learning by Tian Guo,  EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLR
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimization
 
Lec07 threading hw
Lec07 threading hwLec07 threading hw
Lec07 threading hw
 
Lec08 optimizations
Lec08 optimizationsLec08 optimizations
Lec08 optimizations
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and Matlab
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...
 

Ähnlich wie Comp7404 ai group_project_15apr2018_v2.1

A Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingA Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingMartha Brown
 
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIOR
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIORANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIOR
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIORijsptm
 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...cscpconf
 
A Novel Neuroglial Architecture for Modelling Singular Perturbation System
A Novel Neuroglial Architecture for Modelling Singular Perturbation System  A Novel Neuroglial Architecture for Modelling Singular Perturbation System
A Novel Neuroglial Architecture for Modelling Singular Perturbation System IJECEIAES
 
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...IJCNCJournal
 
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...IJCNCJournal
 
Simulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksSimulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksDaniel Zuniga
 
1st review android malware.pptx
1st review  android malware.pptx1st review  android malware.pptx
1st review android malware.pptxNambiraju
 
Survey on Artificial Neural Network Learning Technique Algorithms
Survey on Artificial Neural Network Learning Technique AlgorithmsSurvey on Artificial Neural Network Learning Technique Algorithms
Survey on Artificial Neural Network Learning Technique AlgorithmsIRJET Journal
 
A PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURON
A PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURONA PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURON
A PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURONijdpsjournal
 
Android Malware
Android Malware Android Malware
Android Malware Nambiraju
 
Library Characterization Flow
Library Characterization FlowLibrary Characterization Flow
Library Characterization FlowSatish Grandhi
 
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...cscpconf
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxSagarTekwani4
 
Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...elelijjournal
 
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...csandit
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...csandit
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 

Ähnlich wie Comp7404 ai group_project_15apr2018_v2.1 (20)

A Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingA Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series Forecasting
 
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIOR
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIORANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIOR
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIOR
 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
 
A Novel Neuroglial Architecture for Modelling Singular Perturbation System
A Novel Neuroglial Architecture for Modelling Singular Perturbation System  A Novel Neuroglial Architecture for Modelling Singular Perturbation System
A Novel Neuroglial Architecture for Modelling Singular Perturbation System
 
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
 
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...
 
Simulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksSimulation of Wireless Sensor Networks
Simulation of Wireless Sensor Networks
 
H235055
H235055H235055
H235055
 
1st review android malware.pptx
1st review  android malware.pptx1st review  android malware.pptx
1st review android malware.pptx
 
Survey on Artificial Neural Network Learning Technique Algorithms
Survey on Artificial Neural Network Learning Technique AlgorithmsSurvey on Artificial Neural Network Learning Technique Algorithms
Survey on Artificial Neural Network Learning Technique Algorithms
 
A PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURON
A PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURONA PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURON
A PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURON
 
Android Malware
Android Malware Android Malware
Android Malware
 
Library Characterization Flow
Library Characterization FlowLibrary Characterization Flow
Library Characterization Flow
 
Gaze detection
Gaze detectionGaze detection
Gaze detection
 
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 
Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...
 
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 

Kürzlich hochgeladen

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Comp7404 ai group_project_15apr2018_v2.1

  • 1. A BRIEF INTRODUCTION ON GATED RECURRENT UNIT COMP7404 – GROUP U Image: Illustration of gated recurrent unit. From Chung et al., 2014: https://arxiv.org/abs/1412.3555
  • 2. AGENDA 1) A Quick Recap on Deep Learning Architectures ① Standard Neural Network (NN) ② Recurrent Neural Network (RNN) 2) A Deep Dive in Gated Recurrent Unit (GRU) 3) Rainfall Project Overview ① Competition Overview ② Source Data ③ Code and Demo: Rainfall Prediction 4) Q&A
  • 3. LIMITATIONS OF STANDARD NEURAL NETWORK Source: A Critical Review of Recurrent Neural Networks for Sequence Learning, Lipton et al., 2015; Simulation of Neural Networks (in German) (1st ed.)., Zell, Andreas, 1994 Major Constraints:  Input and output of fixed length; not efficient with sequential data  Do not well exploit features learned previously Image: Coursera: Sequence Models, Andrew Ng Image: Wrist mounted device sleep graph, Lucid Dreaming App Est-ce que vous êtes prêt? Are you ready?
  • 4. RECURRENT NEURAL NETWORK (RNN) Source: Understanding LSTM Networks, Olah, 2015; A Critical Review of Recurrent Neural Networks for Sequence Learning, Lipton et al,. 2015; The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej 2015  Has loops to deal with sequential data  Can handle vectors of variable length 10/04/2018 Predicting E-commerce Consumer Behavior using Recurrent Neural Networks Figure 2 showssome output h, we call thisthe hidden state of the cell. So, what exactly are cells? Let’shave quick look at the two cell architecturesmentioned above; LSTM and GRU. We will be testing both architectureswhen training the models. Long Short-Term Memory (LSTM) The Long Short-Term Memory (LSTM) cell, like the basic cell, computesa new state and an output from an input and the previous state. The (hidden) state of LSTM issplit in two vectors, one for short- term state and one for long-term state. Figure 3 illustratesthe architecture of the LSTM cell. In gure 3, cisthe long-term state and h isthe short-term state. LSTM receivesboth short-term and long-term statesfrom previoustimesteps[5]. Figure 2: Cell folded/unfolded. Inspired by Aurélien Géron. Recurrent architecture 10/04/2018 Predicting E-commerce Consumer Behavior using Recurrent Neural Networks Figure 2 showssome output h, we call thisthe hidden state of the cell. So, what exactly are cells?Let’shave quick look at the two cell architecturesmentioned above; LSTM and GRU. We will be testing both architectureswhen training the models. Long Short-Term Memory (LSTM) TheLong Short-Term Memory (LSTM) cell, like thebasic cell, computesa new state and an output from an input and the previous state. The (hidden) state of LSTM issplit in two vectors, onefor short- term state and onefor long-term state. Figure3 illustratesthe architectureof theLSTM cell. In gure 3, cisthe long-term state and h isthe short-term state. LSTM receivesboth short-term and long-term statesfrom previoustimesteps[5]. Figure 2: Cell folded/unfolded. Inspired by Aurélien Géron.#Update the hidden state 𝒉 𝒕 = 𝒇 𝑾(𝒙) . 𝒙 𝒕 + 𝑽(𝒉) . 𝒉 𝒕−𝟏 #Compute the output vector 𝑦𝑡 = 𝑔(𝑊(𝑦) . ℎ 𝑡) Notations 𝑥𝑡 = input at time t ℎ 𝑡 = hidden state at time t 𝑦𝑡 = output at time t f = activation functionliketanh g = activation function for the output like a sigmoid 𝑊(𝑥) , 𝑉(ℎ) , 𝑊(𝑦) = parameters (Like standard RNN ) Image classification Sentiment analysis, video recognition Text translationMusic generation
  • 5.  Unable to handle the “long-term dependencies” well in practice  “Vanishing Gradient” It was a handwritten application from Steve Jobs for jobs in HP. We use online application for jobs nowadays. How do jobs and career mean for college graduates? Those who have read the bibliography of Jobs may have a different viewpoint … Loops make it a longer path and more complicated to calculate the derivative Limitations with RNNs Solution: Use a more sophisticated architecture which allows for shorter path and less multiplication to calculate the gradient.
  • 6.  Learns how to keep memories for long distance dependencies  Avoid vanishing gradient problem mer Behavior using Recurrent Neural Networks thisthe hidden state of the cell. quick look at the two cell and GRU. We will be testing both s. STM) nspired by Aurélien Géron. g E-commerce Consumer Behavior using Recurrent Neural Networks output h, we call thisthe hidden state of the cell. ells? Let’shave quick look at the two cell ed above; LSTM and GRU. We will be testing both aining the models.  Memory (LSTM) Memory (LSTM) cell, like the basic cell, and an output from an input and the previous Cell folded/unfolded. Inspired by Aurélien Géron. Equations Different diagrams in the literature : Update gate 𝒛𝒕 = 𝑓 𝑊(𝑧) . 𝒙 𝒕 + 𝑉(𝑧) . 𝒉 𝒕−𝟏 Reset gate 𝒓 𝒕 = 𝑓 𝑊(𝑟) . 𝒙 𝒕 + 𝑉(𝑟) . 𝒉 𝒕−𝟏 Reset gate memory 𝒉 𝒕 = 𝑔 𝑊(ℎ) . 𝒙 𝒕 + 𝑉(ℎ) . 𝒓 𝒕 ∗ 𝒉 𝒕−𝟏 Memory to transmit 𝒉 𝒕 = (1 − 𝒛𝒕)∗ 𝒉 𝒕−𝟏 + 𝒛𝒕 ∗ 𝒉 𝒕 Gated Recurrent Unit (GRU) 11/04/2018 Understanding GRU networks – Towards Data Science https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be 5/14 First, let’sintroducethe notations: If you are not familiar with theabove terminology, I recommend watching these tutorialsabout “sigmoid” and “tanh” function and “Hadamard product” operation. #1. Update gate We start with calculating theupdate gate z_t for time step t using the formula: Gated Recurrent Unit 12/04/2018 Predicting E-commerce Consumer Behavior using Recurrent Neural Networks https://blog.nirida.ai/predicting-e-commerce-consumer-behavior-using-recurrent-neural-networks-36e37f1aed22 9/43 Figure 4 illustratestheGRU cell. In GRU, both state vectorsof the LSTM havebeen merged into a singlevector, h. Instead of threegate controllers, GRU usestwo; one controlling both the forget gate and the input gate, while thereisno output gate. Thefull state vector isthe output at every timestep. Figure 4: GRU cell. Inspired by Christopher Olah. Jobs is founder of a company, which company has its headquarter … . Jobs … 𝑧𝑡 = 1 ℎ 𝑡 = 1 … … … … ℎ 𝑡 = 1 Source: Cho et al. 2014, Coursera: Sequence Models, Andrew Ng
  • 7. COMPETITION: HOW MUCH DID IT RAIN?  https://www.kaggle.com/c/how-much-did-it-rain-ii/data  Our solution o Goal: showcase the application of GRU o Methodology: train GRU using the radar snapshot data in order to predict the rainfall o Tools: Keras  Data preprocessing challenges o Irregular radar measurement times o Outliers o Overfitting  Training data and test data may not be fully independent Source: www. kaggle.com/c/how-much-did-it-rain-ii/data; Scripts and sources (incl. image, script quotation, etc.) to be updated / Danny
  • 8. TRAINING DATA Training data sample: Source: www. kaggle.com/c/how-much-did-it-rain-ii/data; Scripts and sources (incl. image, script quotation, etc.) to be updated / Danny
  • 9. DATA PREPROCESSING AND TESTING Source: www. kaggle.com/c/how-much-did-it-rain-ii/data; Scripts and sources (incl. image, script quotation, etc.) to be updated / Danny
  • 10. CODE IMPLEMENTATION Source: [] Slides, scripts and sources (incl. image, script quotation, codes, etc.) to be updated / Paul
  • 11. DEMO Source: [] Items to be included in GitHub uploads: • Demo with source and all dependencies and detailed instructions in markdown format on how to run the demo to be uploaded as a single zip file. • Please ensure that the instructions on how to run the demo are sufficiently detailed. You won’t be able to get a good grade if we are not able to run you demo. Items to be included in this PPT: • The presentation must include this live demo. If a live demo is not possible, a video demo can be presented. The file size limit for this zip file is 100MB. • A link to GitHub version. The link should be included as a QR code in this PPT. Slides, scripts and sources (incl. image, script quotation, codes, etc.) to be updated / Paul
  • 12. Q & A

Hinweis der Redaktion

  1. A major limitation of the standard neural network we learnt in the class is that in general it only works with fixed-length vectors. The size of the input should be fixed to the size of output, which means standard network is not efficient working with various length such as sequential data, sequences of texts, videos; sounds, or time series found in medical, financial, or industrial data.   Another limitation is that a standard neural network is not good at remembering the features learned previously. For example, it would not work perfectly if you want to build a model that can learn to recognize names in the text. A situation could be Jobs has appeared as the last name in the previous text and you’d like the model to recognize Jobs as a name in the latter text.
  2. Recurrent neural network has a more sophisticated architecture in order to fix the limitations. The main idea is that it introduces loops which allow for information to persist between time steps. The flexibility given by the loops allows for working with sequential data of various length. A few examples: one to one, many, etc   This diagram will help us understand how a recurrent model works: When x_t enters the network unit, it is multiplied by its own weight Wx. This should be a familiar step to you as we learnt this function in class already. You will see the improvement of loops from the second part of the equation: h_(t-1) which holds the information from the previous t-1 units with its own weight Vh. Both results are added together and squashed by an activation function. (The output of the network unit is 𝑦 which is also calculated by a nonlinear function, usually a sigmoid. It is simply a way to have the result between 0 and 1 and convert it into probabilities).   Remark : Unlike feed forward neural networks, which have different parameters at each layer, a RNN shares the same parameters (U, V, W) through all steps.
  3. It is difficult to train a RNN model. A key problem is called vanishing gradient. We have learnt in class how to minimize the loss function using the gradient descent method. However, this method could cause problems under loops of RNNs as now we have to do a backpropagation through the layers as well as each time unit, which makes derivative calculation much more complex given the compounding effect. As a result, the long term dependencies will be dominated by the short term dependencies. We are not going to speak in full details given the complexity of this topic. Instead, we attached some links for your further reference. Let’s use the same example of “Jobs” to illustrate the idea. Jobs as a last name could occur multiple times at the beginning of a very long, think of extremely long sentence. Though the RNN could be able to identify the word Jobs in the latter part of the sentence, in reality it might cause problems due to vanishing gradient as it is too far away from the beginning.   How to deal with those problems. Many techniques have been proposed. We’ll explain one of them today. The main idea is to change the architecture using a more sophisticated activation function in order to create a shorter path and avoid a vanishing gradient.
  4. This new structure presented here is called GRU (Gated Recurrent Unit), it is a less complex version of another architecture called LSTM (Long Short Term Memory) which was introduced by another group. Btw, it has been tested that both structures perfom equally well in most cases.   Most papers we read takes a lot of time to clearly explain the GRU with their own diagrams. It is very unlikely for us to explain GRU in full details in a few minutes given its complexity. Instead, we attached links containing explanations for your further reference and we will focus on the equations. The main idea is to add two more vectors to standard RNNs, called update gate and reset gate, which decide how much relevant information to keep and how much irrelevant to skip or delete from the past. Those gates help to learn long time dependencies and avoid vanishing gradient.   So how GRU and these gates work? Update gate and reset gate have an equation similar to the standard RNN: calculate a non-linear function (usually a sigmoid) of a linear combination of the new input xt and the previous information ht-1. The difference between the two gates is their weight and the way they are used : We want the update gate to determine how much information from the previous time step to be passed to the next and the reset gate to determine how much information to be skipped.   How we do that? We first create a memory vector ht hat to store relevant information from the past through the reset gate. When a reset gate is close to 0, the ht-1 hat will be skipped. And we combine the memory vector ht hat and the update gate to calculate the vector ht which contains the information to be transmitted to the network.   Let’s take our name example again: The relevant information for recognizing “Jobs” as a name was at the beginning of the sequence. The model will learn to have update gate zt close to 1, so ht will keep the previous information fully, while 1 – zt will then be close to 0 and all the current irrelevant information will be skipped.   If the irrelevant information is at the beginning, with the learning capability of the model, it will set the reset vector close to 0, in order to skip the irrelevant information from the past.   It is amazing that the model can learn what to transmit or reset using both gates. The beauty of these simple equations is being extremely powerful yet not hard to implement.   Now it’s time to implement this GRU ideas to our rain problem.
  5. # In the competition, U.S. National Weather Service upgraded their radar in order to improve the rainfall predictors. The new radar is called polarimetric radars which can provide higher quality data than conventional Doppler radars. Rainfall measurements are very important in the agricultural field. In the old days rain gauges are used to measure the rainfall for each hour. However, though they can measure the rainfall in a specific location accurately, rainfalls are different from one location to another. In order to have a widespread coverage, nowadays weather radars are used to measure the hourly rainfall. The technology of the weather radars is improving. One type of radars, the polarimetric one, are able to provide higher quality data than conventional Doppler radars because they transmit radio wave pulses with both horizontal and vertical orientations.  In this competition, data from the U.S. National Weather Service are used. We are asked to predict the hourly rain gauge reading from a set of snapshots of radar values obtained in the same hour. #The training data is collected in first 20 days in each month between Apr and Aug 2014 of US midwestern corn-growing states #The test data consists of data from the same radars for the rest of days in that month. We are required to predict the gauge observation at the end of each hour. The data is collected in midwestern US during the five-month corn growing season from Apr to Aug 2014. The data from the first 20 days of each month is used for training our model. Each record consists of snapshots of radar values obtained in each hour and the corresponding hourly gauge reading. and the data from the rest of those 10/11 days With the training data, we train our GRU model. With the test data, we exam our GRU model and predict the “Expected”rainfall data. However, When analyzing the data source, we found we are facing three major challenges: Irregular radar measurement times (the time series of the observation are not regular) Outliers (some noise data in the training data) Overfitting (in the same year, we cannot say the rainfall data in the first 20 days of that months has no relationship with the data collected in rest of the days in that month. So Training data and test data may not be fully independent) Let’s see next slides
  6. Let me brief more on the training data: As we can see here, there are totally 24 columns: The first column represent the hour of the gauge observation; The second column is the minutes of the gauge observation in that hour Id; (Here We can see the first challenge we’ve just said here: Irregular radar measurement times. For example, for the hour one, the gauge observation are collected at minutes 3,16,25,35 etc for around 6 times; however, in the hour two, the gauge observation are collected at minutes 1,6,11,16 etc for around 12 times) The third column is the distance from the radar to the gauge (km); Other columns are the all kinds of parameters that the polarmetric radar has collected. The last column is the volume of the rainfall (in mm) of that gauge observation in the end of that hour. For example, Here, we highlighted this row in Red. It is the observation of the first minute in the second hour and the distance from the radar to the observation point is 2 kilometers. And the volume of the rainfall for the end of the second hour is 1.016 mm Let’s go to next slide to see the challenges and what we need to do
  7. We are required to generate a file with two columns (As you may see the snapshot of the sample solution data provided by the organizer on the right hand side): One is Column “Id”:  A unique number for the set of observations over an hour at a gauge. The other is Column “Expected”:  Actual gauge observation in mm at the end of the hour. Here you can see the snapshot of the training data on the left hand side, there are a lot of outliers for the data, This is the second challenge we’ve just said: From the sample solution data, we can see the acceptable or reasonable volume of rainfall should below 164 mm However, from the training data, we can see some Unexpected data which is 32740 mm which does not make sense. The snapshot of test data shown on the below. We can see it is same format but only ‘missing’ the “expected” column. We are required to predict the gauge observation at the end of each hour for it. My teammate Paul will introduce our codes and our results…