SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
HyperNetworks
Presented by Taesu Kim
Oct 29, 2017
Daivd Ha, Andrew Dai, Quoc V. Le
Google Brain
Published at ICLR 2017
HyperNetworks overview
› An approach of using one network to generate the weight for another network
› Motivated by HyperNEAT (Stanley et al 2009) and tried to resemble genotype
and phenotype in nature
› HyperNetwork can be viewed relaxed form of weight sharing across layers.
› It generates non-shared weights for LSTM and achieved near state-of-the-art
result
› It generates shared weights for CNN and achieve respectable results with fewer
learnable parameters
Conventional Networks
Feedforward
Networks
Recurrent
Networks
Static HyperNetworks
HyperCNN
Dynamic HyperNetworks
HyperRNN
Modified HyperRNN
› HyperRNN requires Nz times larger memory requirements than basic RNN
› Make it more scalable and memory efficient
› Use intermediate hidden vector to parameterize a weight matrix: d(z) is
linear projection of z
HyperLSTM
https://github.com/hardmaru/supercell/
LSTM implementation
MNIST and CIFAR-10
40-1: N=6 k=1
40-2: N=6 k=2
Character-level Penn Treebank Language Model
› 1000 units of MainLSTM & Two version of HyperLSTM
– 128 units of HyperLSTM cell & 4 embedding size
– 128 units of HyperLSTM cell & 16 embedding size à dropout keep probability of 85%
› HyperLSTM outperforms than standard LSTM
› HyperLSTM also achieves similar improvements compared to Layer Normalization à combination of
Layer Normalization and Hyper LSTM achieves the best test perp.
Hutter Prize Wikipedia Language Model
› 1800 units of MainLSTM & 256 units of HyperLSTM cell with 64 embedding size & max sequence length : 250
› 2048 units of MainLSTM & 256 units of HyperLSTM cell with 64 embedding size & max sequence length : 300
› HyperLSTM also achieves similar improvements compared to Layer Normalization à combination of Layer Normalization and
Hyper LSTM achieves the best test perp.
› HyperLSTM converges more quickly compared to LSTM and Layer Norm LSTM
Hutter Prize Wikipedia Language Model
› Visualizing how the weight scaling vectors of the main LSTM change during the character sampling process.
› Regions of low intensity, where the weights of the main LSTM are relatively static, the types of phrases
generated seem more deterministic
– For example, the weights do not change much during the words Europeans, possessions and reservation.
› The regions of high intensity is when the Hyper LSTM cell is making relatively large changes to the weights
of the main LSTM
Hutter Prize Wikipedia Language Model
› Normalized Histogram plots of 𝜙(𝑐$) for different models during sampling
– 𝜙(𝑐$) is the hidden state of the LSTM before applying the output gate.
–
› Layer Norm reduces the saturation effects compared to the vanilla LSTM…..
› In HyperLSTM, most of the time the cell is saturated
– HyperLSTM cell’s dynamic weight adjustment policy appears to be doing something very different compared to statistical
normalization.
– Although this policy came up with ended up providing similar performance as LayerNorm
Handwriting sequence generation
› 12179 handwritten lines from 221 writers
› LSTM input is (x, y) coordinate of the pen location and binary indicator of pen-up/pen-down
› It can see that many of these weight changes occur at the boundaries between words, and between characters
› Dynamically generate the generative model is one of the key advantages of HyperLSTM over a normal LSTM
Machine translation
› WMT’14 En→Fr using the same test/validation set split described in the GNMT paper.
– GMNT network has 8 layers each of encoder/decoder
› HyperLSTM cell improves the performance of the existing GNMT model, achieving state-
of-the-art single model results for this dataset.
› It is demonstrated the applicability of Hyper Networks to large-scale models used in
production systems.
Follow us:
Contact us:
contact@neosapience.com
For more information:
http://www.neosapience.com

Weitere ähnliche Inhalte

Was ist angesagt?

TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detectionDaeHeeKim31
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networksjoisino
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsArtifacia
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptx[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptxTuCaoMinh2
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...jemin lee
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNEYan Xu
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
5 cramer-rao lower bound
5 cramer-rao lower bound5 cramer-rao lower bound
5 cramer-rao lower boundSolo Hermelin
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 
Graph Neural Network (한국어)
Graph Neural Network (한국어)Graph Neural Network (한국어)
Graph Neural Network (한국어)Jungwon Kim
 

Was ist angesagt? (20)

TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detection
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networks
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptx[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptx
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
LSTM
LSTMLSTM
LSTM
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
5 cramer-rao lower bound
5 cramer-rao lower bound5 cramer-rao lower bound
5 cramer-rao lower bound
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 
Graph Neural Network (한국어)
Graph Neural Network (한국어)Graph Neural Network (한국어)
Graph Neural Network (한국어)
 

Ähnlich wie PR-043: HyperNetworks

Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Olusola Amusan
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfssuser849b73
 
Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams
Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data StreamsLow-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams
Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data StreamsDiego Marrón Vida
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache CassandraJacky Chu
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...Subhajit Sahu
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
DL for sentence classification project Write-up
DL for sentence classification project Write-upDL for sentence classification project Write-up
DL for sentence classification project Write-upHoàng Triều Trịnh
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Lviv Startup Club
 
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...Numenta
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APImcsrivas
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
tankala srinivas, palasa
tankala srinivas, palasatankala srinivas, palasa
tankala srinivas, palasashiva782
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 

Ähnlich wie PR-043: HyperNetworks (20)

Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
 
Conformer review
Conformer reviewConformer review
Conformer review
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
 
Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams
Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data StreamsLow-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams
Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
DL for sentence classification project Write-up
DL for sentence classification project Write-upDL for sentence classification project Write-up
DL for sentence classification project Write-up
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
 
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
What is 3d torus
What is 3d torusWhat is 3d torus
What is 3d torus
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
tankala srinivas, palasa
tankala srinivas, palasatankala srinivas, palasa
tankala srinivas, palasa
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 

Mehr von Taesu Kim

PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PR12-193 NISP: Pruning Networks using Neural Importance Score PropagationPR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PR12-193 NISP: Pruning Networks using Neural Importance Score PropagationTaesu Kim
 
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal AttentionPR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal AttentionTaesu Kim
 
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head ModelsPR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head ModelsTaesu Kim
 
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricPR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricTaesu Kim
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 

Mehr von Taesu Kim (6)

PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PR12-193 NISP: Pruning Networks using Neural Importance Score PropagationPR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
 
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal AttentionPR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
 
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head ModelsPR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
 
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricPR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 

Kürzlich hochgeladen

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

PR-043: HyperNetworks

  • 1. HyperNetworks Presented by Taesu Kim Oct 29, 2017 Daivd Ha, Andrew Dai, Quoc V. Le Google Brain Published at ICLR 2017
  • 2. HyperNetworks overview › An approach of using one network to generate the weight for another network › Motivated by HyperNEAT (Stanley et al 2009) and tried to resemble genotype and phenotype in nature › HyperNetwork can be viewed relaxed form of weight sharing across layers. › It generates non-shared weights for LSTM and achieved near state-of-the-art result › It generates shared weights for CNN and achieve respectable results with fewer learnable parameters
  • 8. Modified HyperRNN › HyperRNN requires Nz times larger memory requirements than basic RNN › Make it more scalable and memory efficient › Use intermediate hidden vector to parameterize a weight matrix: d(z) is linear projection of z
  • 10. MNIST and CIFAR-10 40-1: N=6 k=1 40-2: N=6 k=2
  • 11. Character-level Penn Treebank Language Model › 1000 units of MainLSTM & Two version of HyperLSTM – 128 units of HyperLSTM cell & 4 embedding size – 128 units of HyperLSTM cell & 16 embedding size à dropout keep probability of 85% › HyperLSTM outperforms than standard LSTM › HyperLSTM also achieves similar improvements compared to Layer Normalization à combination of Layer Normalization and Hyper LSTM achieves the best test perp.
  • 12. Hutter Prize Wikipedia Language Model › 1800 units of MainLSTM & 256 units of HyperLSTM cell with 64 embedding size & max sequence length : 250 › 2048 units of MainLSTM & 256 units of HyperLSTM cell with 64 embedding size & max sequence length : 300 › HyperLSTM also achieves similar improvements compared to Layer Normalization à combination of Layer Normalization and Hyper LSTM achieves the best test perp. › HyperLSTM converges more quickly compared to LSTM and Layer Norm LSTM
  • 13. Hutter Prize Wikipedia Language Model › Visualizing how the weight scaling vectors of the main LSTM change during the character sampling process. › Regions of low intensity, where the weights of the main LSTM are relatively static, the types of phrases generated seem more deterministic – For example, the weights do not change much during the words Europeans, possessions and reservation. › The regions of high intensity is when the Hyper LSTM cell is making relatively large changes to the weights of the main LSTM
  • 14. Hutter Prize Wikipedia Language Model › Normalized Histogram plots of 𝜙(𝑐$) for different models during sampling – 𝜙(𝑐$) is the hidden state of the LSTM before applying the output gate. – › Layer Norm reduces the saturation effects compared to the vanilla LSTM….. › In HyperLSTM, most of the time the cell is saturated – HyperLSTM cell’s dynamic weight adjustment policy appears to be doing something very different compared to statistical normalization. – Although this policy came up with ended up providing similar performance as LayerNorm
  • 15. Handwriting sequence generation › 12179 handwritten lines from 221 writers › LSTM input is (x, y) coordinate of the pen location and binary indicator of pen-up/pen-down › It can see that many of these weight changes occur at the boundaries between words, and between characters › Dynamically generate the generative model is one of the key advantages of HyperLSTM over a normal LSTM
  • 16. Machine translation › WMT’14 En→Fr using the same test/validation set split described in the GNMT paper. – GMNT network has 8 layers each of encoder/decoder › HyperLSTM cell improves the performance of the existing GNMT model, achieving state- of-the-art single model results for this dataset. › It is demonstrated the applicability of Hyper Networks to large-scale models used in production systems.
  • 17. Follow us: Contact us: contact@neosapience.com For more information: http://www.neosapience.com