SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
><
WHAT WOULD SHANNON DO?
BAYESIAN COMPRESSION FOR DL
KAREN ULLRICH

UNIVERSITY OF AMSTERDAM

DEEP LEARNING AND REINFORCEMENT LEARNING 

SUMMER SCHOOL MONTREAL 2017
1KAREN ULLRICH, JUN 2017
><
Motivation
2KAREN ULLRICH, JUN 201701 MOTIVATION
><
Motivation
3
- 1 Wh costs 0.0225 cent
- running a Titan X for 1h: 5.625 cent
KAREN ULLRICH, JUN 201701 MOTIVATION
- facebook has 1.86 billion active users
- VGG takes ~147ms/16 predictions
- making one prediction for all users costs
20 k€
><
Motivation - Summary
4
- mobile devices have limited hardware
- energy costs for predictions
KAREN ULLRICH, JUN 201701 MOTIVATION
- bandwidth transmitting models
- speeding up inference for real time
processing
- relation to privacy
><02 RECENT WORK
Practical view on compression
A : Sparsity learning
• Structured Pruning:

• CR:
5KAREN ULLRICH, MAR 2017
A = [ ]
IC = [0 1 3 0 1 2]
IR = [3, 0, 2, 3, …, 1]
|w|
|w6=0|
• (Unstructured) Pruning 

• CR:
.size = |w|
|w6=0|.size =
⇡ |w|
2|w6=0|
><02 RECENT WORK
Practical view on compression
B : Bit per weight reduction
• precision quantisation
6KAREN ULLRICH, MAR 2017
• Set quantisation by clustering
1 -3.5667
2 0.4545
3 0.8735
4 0.33546
• CR: 32/10 = 3

• PRO: fast inference

• CON: savings is not too big
?
• CR: 32/4 = 8

• PRO: extreme compressible
with e.g. further Hoffman
encoding 

• CON: inference?
><02 RECENT WORK
Practical view on compression
Summary - Properties
7KAREN ULLRICH, MAR 2017
Set quantisation Bit quantisation
Unstructured
pruning
Structured
pruning
Set quantisation Bit quantisation
Unstructured
pruning
- highest compression
- flop and energy savings
moderate
Structured
pruning
- lowest expected compression
- BUT will save considerable
amount of flops and thus energy
><02 RECENT WORK
Practical view on compression
Summary - Applications
8KAREN ULLRICH, MAR 2017
Set quantisation Bit quantisation
Unstructured
pruning
-“ZIP”-format for NN
- transmitting via limited channels
- save millions of nets efficiently
Structured
pruning
- inference at scale
- real time predictions
- hardware limited devices
>< 9
Variational lower bound
KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING
log p(D) L(q(w), w)) = Eq(w)[log p(D,w)
q(w) ]
= Eq(w)[log p(D|w)]] KL(q(w)||p(w))
Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks
simple by minimizing the description length of the weights." Proceedings of
the sixth annual conference on Computational learning theory. ACM, 1993.
><
MDL principle and Variational Learning
10
The best model is the one that compresses the data best.
There are two costs, one for transmitting a model and one
for reporting the data misfit.
KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING
Jorma Rissanen, 1978
>< 11
Variational lower bound
KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING
transmitting
the model
transmitting
data misfit
log p(D) L(q(w), w)) = Eq(w)[log p(D,w)
q(w) ]
= Eq(w)[log p(D|w)]] KL(q(w)||p(w))
Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks
simple by minimizing the description length of the weights." Proceedings of
the sixth annual conference on Computational learning theory. ACM, 1993.
><
Variational lower bound
12KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING
log p(D) L(q(w), w)) = Eq(w)[log p(D,w)
q(w) ]
= Eq(w)[log p(D|w)]] KL(q(w)||p(w))
Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks
simple by minimizing the description length of the weights." Proceedings of
the sixth annual conference on Computational learning theory. ACM, 1993.
><02 RECENT WORK
Practical view on compression
Summary - Properties
13KAREN ULLRICH, MAR 2017
Set quantisation Bit quantisation
Unstructured
pruning
- highest compression
- flop and energy savings
moderate
Structured
pruning
- lowest expected compression
- BUT will save considerable
amount of flops and thus energy
>< 14
• Solution: train a neural
network with gaussian
mixture model prior
Soft weight-sharing for NN compression
KAREN ULLRICH, EDWARD MEED & MAX WELLING

ICLR 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
q(w) =
Y
q(wi) = (wi|µi)
• Pruning by setting one
component to zero with
high mixing proportion
Nowlan, Steven J., and Geoffrey E. Hinton. "Simplifying neural networks by soft weight-sharing." Neural computation 4.4 (1992): 473-493.
>< 15
Soft weight-sharing for NN compression
KAREN ULLRICH, EDWARD MEED & MAX WELLING

ICLR 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
><02 RECENT WORK
Practical view on compression
Summary - Properties
16KAREN ULLRICH, MAR 2017
Set quantisation Bit quantisation
Unstructured
pruning
- highest compression
- flop and energy savings
moderate
Structured
pruning
- lowest expected compression
- BUT will save considerable
amount of flops and thus energy
><
Bayesian Compression for Deep Learning
17
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
• Idea: use dropout to learn architecture 

• the variational version of dropout learns the dropout rate 

• Solution: Learn dropout rate for each weight structure, when
weights have a high dropout rate we can safely ignore them 

• uncertainty in left over weights to compute bit precision
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
Kingma, Diederik P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick." NIPS. 2015.
Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. "Variational Dropout Sparsifies Deep Neural Networks." arXiv preprint arXiv:1701.05369 (2017).
><
Bayesian Compression for Deep Learning
18
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
push to zero for high dropout rates
force high dropout rates
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
q(w|z) =
Y
q(wi|zi) = N(wi|ziµi, z2
i
2
i )
q(z) =
Y
q(zi) = N(zi|µz
i , ↵i)
><
Bayesian Compression for Deep Learning
19
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
p(w) =
Z
p(z)p(w|z)dz
q(w|z) =
Y
q(wi|zi) = N(wi|ziµi, z2
i
2
i )
q(z) =
Y
q(zi) = N(zi|µz
i , ↵i)
><
Bayesian Compression for Deep Learning
20
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
><
Bayesian Compression for Deep Learning
21
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
><
Bayesian Compression for Deep Learning
22
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
><
Warning: Don’t be too enthusiastic!
23
These algorithms are merely proposals, little can be realised by common
frameworks today.
• Architecture pruning
• Sparse matrix support (partially in big frameworks)
• Reduced bit precision (NVIDIA is starting)
• Clustering
KAREN ULLRICH, JUN 201705 WARNING
>< 24KAREN ULLRICH, JUN 2017
Thank you for your attention.
Any questions?
KARENULLRICH.INFO

_KARRRN_

Weitere ähnliche Inhalte

Ähnlich wie What Would Shannon Do?

K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
Small Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their DesignSmall Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their DesignForrest Iandola
 
k means clustering-based data compression
k means clustering-based data compressionk means clustering-based data compression
k means clustering-based data compressionmohammed alrekabe
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data StreamIRJET Journal
 
Paper Reviews on Visual Attention
Paper Reviews on Visual AttentionPaper Reviews on Visual Attention
Paper Reviews on Visual Attention민기 정
 
Compressed sensing applications in image processing &amp; communication (1)
Compressed sensing applications in image processing &amp; communication (1)Compressed sensing applications in image processing &amp; communication (1)
Compressed sensing applications in image processing &amp; communication (1)Mayur Sevak
 
Density based methods
Density based methodsDensity based methods
Density based methodsSVijaylakshmi
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
Multiscale Granger Causality and Information Decomposition
Multiscale Granger Causality and Information Decomposition Multiscale Granger Causality and Information Decomposition
Multiscale Granger Causality and Information Decomposition danielemarinazzo
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM France Lab
 
A hybrid security and compressive sensing based sensor data gathering scheme
A hybrid security and compressive sensing based sensor data gathering schemeA hybrid security and compressive sensing based sensor data gathering scheme
A hybrid security and compressive sensing based sensor data gathering schemeredpel dot com
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_SlideKang-Ho Lee
 
West denmark short term load forecast_for smart grids
West denmark short term load forecast_for smart gridsWest denmark short term load forecast_for smart grids
West denmark short term load forecast_for smart gridsmoseifu
 
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...Edge AI and Vision Alliance
 

Ähnlich wie What Would Shannon Do? (20)

K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Small Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their DesignSmall Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their Design
 
k means clustering-based data compression
k means clustering-based data compressionk means clustering-based data compression
k means clustering-based data compression
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
 
Paper Reviews on Visual Attention
Paper Reviews on Visual AttentionPaper Reviews on Visual Attention
Paper Reviews on Visual Attention
 
Compressed sensing applications in image processing &amp; communication (1)
Compressed sensing applications in image processing &amp; communication (1)Compressed sensing applications in image processing &amp; communication (1)
Compressed sensing applications in image processing &amp; communication (1)
 
Density based methods
Density based methodsDensity based methods
Density based methods
 
CNN
CNNCNN
CNN
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Multiscale Granger Causality and Information Decomposition
Multiscale Granger Causality and Information Decomposition Multiscale Granger Causality and Information Decomposition
Multiscale Granger Causality and Information Decomposition
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
 
A hybrid security and compressive sensing based sensor data gathering scheme
A hybrid security and compressive sensing based sensor data gathering schemeA hybrid security and compressive sensing based sensor data gathering scheme
A hybrid security and compressive sensing based sensor data gathering scheme
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
 
Green computing
Green computingGreen computing
Green computing
 
West denmark short term load forecast_for smart grids
West denmark short term load forecast_for smart gridsWest denmark short term load forecast_for smart grids
West denmark short term load forecast_for smart grids
 
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
 

Kürzlich hochgeladen

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Kürzlich hochgeladen (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

What Would Shannon Do?

  • 1. >< WHAT WOULD SHANNON DO? BAYESIAN COMPRESSION FOR DL KAREN ULLRICH UNIVERSITY OF AMSTERDAM DEEP LEARNING AND REINFORCEMENT LEARNING SUMMER SCHOOL MONTREAL 2017 1KAREN ULLRICH, JUN 2017
  • 3. >< Motivation 3 - 1 Wh costs 0.0225 cent - running a Titan X for 1h: 5.625 cent KAREN ULLRICH, JUN 201701 MOTIVATION - facebook has 1.86 billion active users - VGG takes ~147ms/16 predictions - making one prediction for all users costs 20 k€
  • 4. >< Motivation - Summary 4 - mobile devices have limited hardware - energy costs for predictions KAREN ULLRICH, JUN 201701 MOTIVATION - bandwidth transmitting models - speeding up inference for real time processing - relation to privacy
  • 5. ><02 RECENT WORK Practical view on compression A : Sparsity learning • Structured Pruning: • CR: 5KAREN ULLRICH, MAR 2017 A = [ ] IC = [0 1 3 0 1 2] IR = [3, 0, 2, 3, …, 1] |w| |w6=0| • (Unstructured) Pruning • CR: .size = |w| |w6=0|.size = ⇡ |w| 2|w6=0|
  • 6. ><02 RECENT WORK Practical view on compression B : Bit per weight reduction • precision quantisation 6KAREN ULLRICH, MAR 2017 • Set quantisation by clustering 1 -3.5667 2 0.4545 3 0.8735 4 0.33546 • CR: 32/10 = 3 • PRO: fast inference • CON: savings is not too big ? • CR: 32/4 = 8 • PRO: extreme compressible with e.g. further Hoffman encoding • CON: inference?
  • 7. ><02 RECENT WORK Practical view on compression Summary - Properties 7KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning Structured pruning Set quantisation Bit quantisation Unstructured pruning - highest compression - flop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of flops and thus energy
  • 8. ><02 RECENT WORK Practical view on compression Summary - Applications 8KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning -“ZIP”-format for NN - transmitting via limited channels - save millions of nets efficiently Structured pruning - inference at scale - real time predictions - hardware limited devices
  • 9. >< 9 Variational lower bound KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • 10. >< MDL principle and Variational Learning 10 The best model is the one that compresses the data best. There are two costs, one for transmitting a model and one for reporting the data misfit. KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING Jorma Rissanen, 1978
  • 11. >< 11 Variational lower bound KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING transmitting the model transmitting data misfit log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • 12. >< Variational lower bound 12KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • 13. ><02 RECENT WORK Practical view on compression Summary - Properties 13KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning - highest compression - flop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of flops and thus energy
  • 14. >< 14 • Solution: train a neural network with gaussian mixture model prior Soft weight-sharing for NN compression KAREN ULLRICH, EDWARD MEED & MAX WELLING ICLR 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK q(w) = Y q(wi) = (wi|µi) • Pruning by setting one component to zero with high mixing proportion Nowlan, Steven J., and Geoffrey E. Hinton. "Simplifying neural networks by soft weight-sharing." Neural computation 4.4 (1992): 473-493.
  • 15. >< 15 Soft weight-sharing for NN compression KAREN ULLRICH, EDWARD MEED & MAX WELLING ICLR 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
  • 16. ><02 RECENT WORK Practical view on compression Summary - Properties 16KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning - highest compression - flop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of flops and thus energy
  • 17. >< Bayesian Compression for Deep Learning 17 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 • Idea: use dropout to learn architecture • the variational version of dropout learns the dropout rate • Solution: Learn dropout rate for each weight structure, when weights have a high dropout rate we can safely ignore them • uncertainty in left over weights to compute bit precision KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK Kingma, Diederik P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick." NIPS. 2015. Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. "Variational Dropout Sparsifies Deep Neural Networks." arXiv preprint arXiv:1701.05369 (2017).
  • 18. >< Bayesian Compression for Deep Learning 18 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 push to zero for high dropout rates force high dropout rates KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK q(w|z) = Y q(wi|zi) = N(wi|ziµi, z2 i 2 i ) q(z) = Y q(zi) = N(zi|µz i , ↵i)
  • 19. >< Bayesian Compression for Deep Learning 19 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK p(w) = Z p(z)p(w|z)dz q(w|z) = Y q(wi|zi) = N(wi|ziµi, z2 i 2 i ) q(z) = Y q(zi) = N(zi|µz i , ↵i)
  • 20. >< Bayesian Compression for Deep Learning 20 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
  • 21. >< Bayesian Compression for Deep Learning 21 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
  • 22. >< Bayesian Compression for Deep Learning 22 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
  • 23. >< Warning: Don’t be too enthusiastic! 23 These algorithms are merely proposals, little can be realised by common frameworks today. • Architecture pruning • Sparse matrix support (partially in big frameworks) • Reduced bit precision (NVIDIA is starting) • Clustering KAREN ULLRICH, JUN 201705 WARNING
  • 24. >< 24KAREN ULLRICH, JUN 2017 Thank you for your attention. Any questions? KARENULLRICH.INFO _KARRRN_