Do wide and deep networks learn the same things? Uncovering how neural network representations vary with width and depth

Do Wide and Deep Networks Learn the Same Things? Uncovering
How Neural Network Representations Vary with Width and Depth
Hwang seung hyun
Google Research | arxiv preprint
2020.11.04

Introduction Methods and
Experiments
01 02
Conclusion
03
Contents

Depth & Width
Introduction – Background
• Key factor in the success of Deep Neural Nets
→ Scaling models by varying “Depth” and “Width”
• Limited understanding of how varying these
properties affects the model beyond its
performance.
• Investigating this question is critical especially
with continually increasing computing resources.
Introduction / Related Work / Methods and Experiments / Conclusion
02

Depth & Width
Introduction – Questions
1. How do Depth & Width affect the final learned representations?
2. Do these different model architectures also learn different hidden layer features?
3. Are there discernible differences in the outputs?
03

Depth & Width
Introduction – Contribution
• Apply CKA (centered kernel alignment) to measure the similarity of the hidden
representations of different NNs, finding that representations in wide or deep models
exhibit a characteristic structure, “Block Structure”.
• Block Structure corresponds to hidden representations having a single principal
component that explains most of the variance in the representation → Possible Pruning
• Block Structures are unique to each model, whereas the other part remain similar within
different networks.
• Found that wide and deep models make systematically different mistakes at the level of
individual examples. (Wide networks better at scenes, Deep networks better at objects)
04

Methods and Experiments
Experimental Settings
Introduction / Methods and Experiments / Conclusion
05
• Models: Family of ResNets
• Datasets: CIFAR-10, CIFAR-100, ImageNet
• Representational Similarity Measures:
Linear centered kernel alignment (CKA)
→ Compute CKA as a function of average HSIC scores
computed over k mini-batches) [1]
Num of
Channels x 2
Num of
Channels x 2
[1] Kornblith, Simon, et al. "Similarity of neural network representations revisited.“ ICML(2019)

Emergence of the block structure with increasing width or depth
06
Yellow square
on the heatmap
mostly appears
in the later layers
of the network

Emergence of the block structure with increasing width or depth
07
CNN with No
Residual
Connections
Block Structure
varies across
Random-
Initializations

Block structure in narrower networks with less data
08
Block structure
in the internal-
representations
arises in models
that are heavily
overparameterized
relative to the
training dataset.

Block structure and the first principal component
09
Block structure
arises from
preserving and
propagating the first
principal component
across its
constituent layers.
Deep Model Wide Model

Linear probe accuracy
10
In models with the
block structure,
linear probe
accuracy shows little
improvement inside
the block structure.
Residual connections
play an important
role in preserving
representations in
the block structure.

Effect of deleting blocks on accuracy for models with or w.o block structure
11
Block structure could
be an indication of
redundant modules in
model design.
Similarity of its
constituent layer
representations could
be leveraged for model
compression.

Per-example performance differences between Wide and Deep models
12

Per-class performance differences between Wide and Deep models
13
Deep Architecture:
Consumer goods
Wide Architecture:
Scenes

Conclusion
• Studied the effects of width and depth on neural network
representations.
• Emergence of a characteristic “block structure” that reflects the
similarity of a dominant first principal component, propagated across
many network hidden layers.
• While block structure is unique to each model, other learned features
are shared across different initializations and architectures.
• Width and Depth have different effects on network predictions at the
example and class levels.
14

Conclusion
• How does block structure arises through training?
• Controlling depth and width properly to optimize task-specific
model design?
• How to adjust depth and width wisely in Medical domain.
15
Future Work

Do wide and deep networks learn the same things? Uncovering how neural network representations vary with width and depth

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Do wide and deep networks learn the same things? Uncovering how neural network representations vary with width and depth

Ähnlich wie Do wide and deep networks learn the same things? Uncovering how neural network representations vary with width and depth (20)

Mehr von Seunghyun Hwang

Mehr von Seunghyun Hwang (16)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Do wide and deep networks learn the same things? Uncovering how neural network representations vary with width and depth