PR-043: HyperNetworks

•

0 gefällt mir•1,038 views

Paper review: "HyperNetworks" by David Ha, Andrew Dai, Quoc V. Le (ICLR2017) Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim Paper link: https://arxiv.org/abs/1609.09106 Video link: https://www.youtube.com/watch?v=-tUQXSdEsMk (in Korean) http://www.neosapience.com

Technologie

HyperNetworks
Presented by Taesu Kim
Oct 29, 2017
Daivd Ha, Andrew Dai, Quoc V. Le
Google Brain
Published at ICLR 2017

HyperNetworks overview
› An approach of using one network to generate the weight for another network
› Motivated by HyperNEAT (Stanley et al 2009) and tried to resemble genotype
and phenotype in nature
› HyperNetwork can be viewed relaxed form of weight sharing across layers.
› It generates non-shared weights for LSTM and achieved near state-of-the-art
result
› It generates shared weights for CNN and achieve respectable results with fewer
learnable parameters

Conventional Networks
Feedforward
Networks
Recurrent
Networks

Modified HyperRNN
› HyperRNN requires Nz times larger memory requirements than basic RNN
› Make it more scalable and memory efficient
› Use intermediate hidden vector to parameterize a weight matrix: d(z) is
linear projection of z

HyperLSTM
https://github.com/hardmaru/supercell/
LSTM implementation

MNIST and CIFAR-10
40-1: N=6 k=1
40-2: N=6 k=2

Character-level Penn Treebank Language Model
› 1000 units of MainLSTM & Two version of HyperLSTM
– 128 units of HyperLSTM cell & 4 embedding size
– 128 units of HyperLSTM cell & 16 embedding size à dropout keep probability of 85%
› HyperLSTM outperforms than standard LSTM
› HyperLSTM also achieves similar improvements compared to Layer Normalization à combination of
Layer Normalization and Hyper LSTM achieves the best test perp.

Hutter Prize Wikipedia Language Model
› 1800 units of MainLSTM & 256 units of HyperLSTM cell with 64 embedding size & max sequence length : 250
› 2048 units of MainLSTM & 256 units of HyperLSTM cell with 64 embedding size & max sequence length : 300
› HyperLSTM also achieves similar improvements compared to Layer Normalization à combination of Layer Normalization and
Hyper LSTM achieves the best test perp.
› HyperLSTM converges more quickly compared to LSTM and Layer Norm LSTM

Hutter Prize Wikipedia Language Model
› Visualizing how the weight scaling vectors of the main LSTM change during the character sampling process.
› Regions of low intensity, where the weights of the main LSTM are relatively static, the types of phrases
generated seem more deterministic
– For example, the weights do not change much during the words Europeans, possessions and reservation.
› The regions of high intensity is when the Hyper LSTM cell is making relatively large changes to the weights
of the main LSTM

Hutter Prize Wikipedia Language Model
› Normalized Histogram plots of 𝜙(𝑐$) for different models during sampling
– 𝜙(𝑐$) is the hidden state of the LSTM before applying the output gate.
–
› Layer Norm reduces the saturation effects compared to the vanilla LSTM…..
› In HyperLSTM, most of the time the cell is saturated
– HyperLSTM cell’s dynamic weight adjustment policy appears to be doing something very different compared to statistical
normalization.
– Although this policy came up with ended up providing similar performance as LayerNorm

Handwriting sequence generation
› 12179 handwritten lines from 221 writers
› LSTM input is (x, y) coordinate of the pen location and binary indicator of pen-up/pen-down
› It can see that many of these weight changes occur at the boundaries between words, and between characters
› Dynamically generate the generative model is one of the key advantages of HyperLSTM over a normal LSTM

Machine translation
› WMT’14 En→Fr using the same test/validation set split described in the GNMT paper.
– GMNT network has 8 layers each of encoder/decoder
› HyperLSTM cell improves the performance of the existing GNMT model, achieving state-
of-the-art single model results for this dataset.
› It is demonstrated the applicability of Hyper Networks to large-scale models used in
production systems.

Follow us:
Contact us:
contact@neosapience.com
For more information:
http://www.neosapience.com

Empfohlen

Graph Representation LearningJure Leskovec

Introduction to Deep Learning, Keras, and TensorFlowSri Ambati

08. spectal clusteringJeonghun Yoon

PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee

Graph Neural Network in practicetuxette

What’s next for deep learning for Search?Bhaskar Mitra

Word embeddings, RNN, GRU and LSTMDivya Gera

Recurrent neural networkSyed Annus Ali SHah

Empfohlen

Graph Representation LearningJure Leskovec

Introduction to Deep Learning, Keras, and TensorFlowSri Ambati

08. spectal clusteringJeonghun Yoon

PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee

Graph Neural Network in practicetuxette

What’s next for deep learning for Search?Bhaskar Mitra

Word embeddings, RNN, GRU and LSTMDivya Gera

Recurrent neural networkSyed Annus Ali SHah

TensorFlow and Keras: An OverviewPoo Kuan Hoong

Focal loss for dense object detectionDaeHeeKim31

Graph Neural Network - IntroductionJungwon Kim

Random Features Strengthen Graph Neural Networksjoisino

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris

Detailed Description on Cross Entropy Loss Function범준 김

Attention Mechanism in Language Understanding and its ApplicationsArtifacia

Introduction to Tree-LSTMsDaniel Perez

[AIoTLab]attention mechanism.pptxTuCaoMinh2

Notes on attention mechanismKhang Pham

Graph neural networks overviewRodion Kiryukhin

Integer quantization for deep learning inference: principles and empirical ev...jemin lee

Visualization using tSNEYan Xu

LSTM佳蓉倪

LSTM BasicsAkshay Sehgal

Recurrent Neural NetworksRakuten Group, Inc.

Convolutional neural network from VGG to DenseNetSungminYou

5 cramer-rao lower boundSolo Hermelin

Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10

Graph Neural Network (한국어)Jungwon Kim

Long Short Term Memory (Neural Networks)Olusola Amusan

Conformer reviewJune-Woo Kim

Weitere ähnliche Inhalte

Was ist angesagt?

TensorFlow and Keras: An OverviewPoo Kuan Hoong

Focal loss for dense object detectionDaeHeeKim31

Graph Neural Network - IntroductionJungwon Kim

Random Features Strengthen Graph Neural Networksjoisino

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris

Detailed Description on Cross Entropy Loss Function범준 김

Attention Mechanism in Language Understanding and its ApplicationsArtifacia

Introduction to Tree-LSTMsDaniel Perez

[AIoTLab]attention mechanism.pptxTuCaoMinh2

Notes on attention mechanismKhang Pham

Graph neural networks overviewRodion Kiryukhin

Integer quantization for deep learning inference: principles and empirical ev...jemin lee

Visualization using tSNEYan Xu

LSTM佳蓉倪

LSTM BasicsAkshay Sehgal

Recurrent Neural NetworksRakuten Group, Inc.

Convolutional neural network from VGG to DenseNetSungminYou

5 cramer-rao lower boundSolo Hermelin

Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10

Graph Neural Network (한국어)Jungwon Kim

Was ist angesagt? (20)

TensorFlow and Keras: An Overview

Focal loss for dense object detection

Graph Neural Network - Introduction

Random Features Strengthen Graph Neural Networks

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks

Detailed Description on Cross Entropy Loss Function

Attention Mechanism in Language Understanding and its Applications

Introduction to Tree-LSTMs

[AIoTLab]attention mechanism.pptx

Notes on attention mechanism

Graph neural networks overview

Integer quantization for deep learning inference: principles and empirical ev...

Visualization using tSNE

LSTM

LSTM Basics

Recurrent Neural Networks

Convolutional neural network from VGG to DenseNet

5 cramer-rao lower bound

Diffusion Schrödinger bridges for score-based generative modeling

Graph Neural Network (한국어)

Ähnlich wie PR-043: HyperNetworks

Long Short Term Memory (Neural Networks)Olusola Amusan

Conformer reviewJune-Woo Kim

Speech Separation under Reverberant Condition.pdfssuser849b73

Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data StreamsDiego Marrón Vida

Convolutional Neural Network and RNN for OCR problem.Vishal Mishra

Talk about apache cassandra, TWJUG 2011Boris Yen

Talk About Apache CassandraJacky Chu

Convolutional Neural Networks : Popular Architecturesananth

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee

Design Patterns for Distributed Non-Relational Databasesguestdfd1ec

A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...Subhajit Sahu

In datacenter performance analysis of a tensor processing unitJinwon Lee

DL for sentence classification project Write-upHoàng Triều Trịnh

Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Lviv Startup Club

FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...Numenta

MapR M7: Providing an enterprise quality Apache HBase APImcsrivas

What is 3d torusEurotech Aurora

ML Module 3 Non Linear Learning.pptxDebabrataPain1

tankala srinivas, palasashiva782

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme

Ähnlich wie PR-043: HyperNetworks (20)

Long Short Term Memory (Neural Networks)

Conformer review

Speech Separation under Reverberant Condition.pdf

Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams

Convolutional Neural Network and RNN for OCR problem.

Talk about apache cassandra, TWJUG 2011

Talk About Apache Cassandra

Convolutional Neural Networks : Popular Architectures

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...

Design Patterns for Distributed Non-Relational Databases

A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...

In datacenter performance analysis of a tensor processing unit

DL for sentence classification project Write-up

Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...

FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...

MapR M7: Providing an enterprise quality Apache HBase API

What is 3d torus

ML Module 3 Non Linear Learning.pptx

tankala srinivas, palasa

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...

Mehr von Taesu Kim

PR12-193 NISP: Pruning Networks using Neural Importance Score PropagationTaesu Kim

PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal AttentionTaesu Kim

PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head ModelsTaesu Kim

PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricTaesu Kim

PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim

Issues in AI product development and practices in audio applicationsTaesu Kim

Mehr von Taesu Kim (6)

PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation

PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks

Issues in AI product development and practices in audio applications

Kürzlich hochgeladen

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Slack Application Development 101 Slidespraypatel2

🐬 The future of MySQL is Postgres 🐘RTylerCroy

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Histor y of HAM Radio presentation slidevu2urc

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Kürzlich hochgeladen (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Slack Application Development 101 Slides

🐬 The future of MySQL is Postgres 🐘

GenCyber Cyber Security Day Presentation

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

SQL Database Design For Developers at php[tek] 2024

Boost PC performance: How more available memory can improve productivity

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Histor y of HAM Radio presentation slide

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Salesforce Community Group Quito, Salesforce 101

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Unblocking The Main Thread Solving ANRs and Frozen Frames

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Codex of Business Writing Software for Real-World Solutions 2.pptx

IAC 2024 - IA Fast Track to Search Focused AI Solutions

08448380779 Call Girls In Friends Colony Women Seeking Men

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

PR-043: HyperNetworks

1. HyperNetworks Presented by Taesu Kim Oct 29, 2017 Daivd Ha, Andrew Dai, Quoc V. Le Google Brain Published at ICLR 2017

2. HyperNetworks overview › An approach of using one network to generate the weight for another network › Motivated by HyperNEAT (Stanley et al 2009) and tried to resemble genotype and phenotype in nature › HyperNetwork can be viewed relaxed form of weight sharing across layers. › It generates non-shared weights for LSTM and achieved near state-of-the-art result › It generates shared weights for CNN and achieve respectable results with fewer learnable parameters

3. Conventional Networks Feedforward Networks Recurrent Networks

4. Static HyperNetworks

5. HyperCNN

6. Dynamic HyperNetworks

7. HyperRNN

8. Modified HyperRNN › HyperRNN requires Nz times larger memory requirements than basic RNN › Make it more scalable and memory efficient › Use intermediate hidden vector to parameterize a weight matrix: d(z) is linear projection of z

9. HyperLSTM https://github.com/hardmaru/supercell/ LSTM implementation

10. MNIST and CIFAR-10 40-1: N=6 k=1 40-2: N=6 k=2

11. Character-level Penn Treebank Language Model › 1000 units of MainLSTM & Two version of HyperLSTM – 128 units of HyperLSTM cell & 4 embedding size – 128 units of HyperLSTM cell & 16 embedding size à dropout keep probability of 85% › HyperLSTM outperforms than standard LSTM › HyperLSTM also achieves similar improvements compared to Layer Normalization à combination of Layer Normalization and Hyper LSTM achieves the best test perp.

12. Hutter Prize Wikipedia Language Model › 1800 units of MainLSTM & 256 units of HyperLSTM cell with 64 embedding size & max sequence length : 250 › 2048 units of MainLSTM & 256 units of HyperLSTM cell with 64 embedding size & max sequence length : 300 › HyperLSTM also achieves similar improvements compared to Layer Normalization à combination of Layer Normalization and Hyper LSTM achieves the best test perp. › HyperLSTM converges more quickly compared to LSTM and Layer Norm LSTM

13. Hutter Prize Wikipedia Language Model › Visualizing how the weight scaling vectors of the main LSTM change during the character sampling process. › Regions of low intensity, where the weights of the main LSTM are relatively static, the types of phrases generated seem more deterministic – For example, the weights do not change much during the words Europeans, possessions and reservation. › The regions of high intensity is when the Hyper LSTM cell is making relatively large changes to the weights of the main LSTM

14. Hutter Prize Wikipedia Language Model › Normalized Histogram plots of 𝜙(𝑐$) for different models during sampling – 𝜙(𝑐$) is the hidden state of the LSTM before applying the output gate. – › Layer Norm reduces the saturation effects compared to the vanilla LSTM….. › In HyperLSTM, most of the time the cell is saturated – HyperLSTM cell’s dynamic weight adjustment policy appears to be doing something very different compared to statistical normalization. – Although this policy came up with ended up providing similar performance as LayerNorm

15. Handwriting sequence generation › 12179 handwritten lines from 221 writers › LSTM input is (x, y) coordinate of the pen location and binary indicator of pen-up/pen-down › It can see that many of these weight changes occur at the boundaries between words, and between characters › Dynamically generate the generative model is one of the key advantages of HyperLSTM over a normal LSTM

16. Machine translation › WMT’14 En→Fr using the same test/validation set split described in the GNMT paper. – GMNT network has 8 layers each of encoder/decoder › HyperLSTM cell improves the performance of the existing GNMT model, achieving state- of-the-art single model results for this dataset. › It is demonstrated the applicability of Hyper Networks to large-scale models used in production systems.

17. Follow us: Contact us: contact@neosapience.com For more information: http://www.neosapience.com