The document discusses capsule networks, a type of neural network proposed by Geoff Hinton in 2017 as an alternative to convolutional neural networks (CNNs) for computer vision tasks. Capsule networks aim to address some limitations of CNNs, such as their inability to capture spatial relationships and pose information. The key concepts discussed include dynamic routing between capsules, which allows for parts-based representation, and equivariance, where capsules can learn transformation properties like position and orientation. The document provides an overview of a capsule network architecture and routing algorithm proposed in a 2017 paper by Sabour et al.
2. Today’s contents
Dynamic Routing Between Capsules
by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
Oct. 2017: https://arxiv.org/abs/1710.09829
NIPS 2017 Paper
3. Convolutional Neural Networks
What is the problem with CNNs?
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
1) If the images have rotation, tilt or any other different
orientation then CNNs have poor performance.
2) In CNN each layer understands an image at a much
more granular level (slow increase in receptive field).
DATA AUGMENTATION,
MAX POOLING
4. Convolutional Neural Networks
What is the problem with CNNs?
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
“Pooling helps in creating the positional invariance. Otherwise This invariance
also triggers false positive for images which have the components of a ship
but not in the correct order.”
5. Convolutional Neural Networks
What is the problem with CNNs?
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
“Pooling helps in creating the positional invariance. Otherwise This invariance
also triggers false positive for images which have the components of a ship
but not in the correct order.”
This was never the intention of pooling layer!
6. Convolutional Neural Networks
What we need : EQUIVARIANCE (not invariance)
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
“Equivariance makes a CNN understand the rotation or proportion change
and adapt itself accordingly so that the spatial positioning inside an image is
not lost.”
7. Capsules
“A capsule is a group of neurons whose activity
vector represents the instantiation parameters of a
specific type of entity such as an object or an object
part.”
8D capsule e.g.
Hue, Position, Size, Orientation, deformation, texture, etc.
23. Aurélien Géron, 2017
The rectangle and triangle
capsules should be routed to
the boat capsules.
Routing by
Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!
39. Aurélien Géron, 2017
Handling Crowded
Scenes
=
=
=
=
House
Thanks to routing by agreement,
the ambiguity is quickly resolved
(explaining away).
Boat
41. Aurélien Géron, 2017
Training
|| ℓ2 || Estimated Class Probability
To allow multiple classes,
minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:
m- = 0.1
m+ = 0.9
λ = 0.5
42. Aurélien Géron, 2017
Training
Translated to English:
“If an object of class k
is present, then ||vk||2
should be no less than
0.9. If not, then ||vk||2
should be no more
than 0.1.”
|| ℓ2 || Estimated Class Probability
To allow multiple classes,
minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:
m- = 0.1
m+ = 0.9
λ = 0.5
44. Aurélien Géron, 2017
Regularization by
Reconstruction
|| ℓ2 ||
Feedforward
Neural Network
Decoder
Reconstruction
Loss = margin loss + α reconstruction loss
The reconstruction loss is the squared difference
between the reconstructed image and the input image.
In the paper, α = 0.0005.
48. Aurélien Géron, 2017
Pros
● Reaches high accuracy on MNIST, and promising on CIFAR10
● Requires less training data
● Position and pose information are preserved (equivariance)
● This is promising for image segmentation and object detection
● Routing by agreement is great for overlapping objects (explaining away)
● Capsule activations nicely map the hierarchy of parts
● Offers robustness to affine transformations
● Activation vectors are easier to interpret (rotation, thickness, skew…)
● It’s Hinton! ;-)
49. Aurélien Géron, 2017
● Not state of the art on CIFAR10 (but it’s a good start)
● Not tested yet on larger images (e.g., ImageNet): will it work well?
● Slow training, due to the inner loop (in the routing by agreement algorithm)
● A CapsNet cannot see two very close identical objects
○ This is called “crowding”, and it has been observed as well in human vision
Cons
52. Questions Remained
Does capsules really work as the real neurons do?
perceptual illusions
Thompson, P. (1980). Margaret Thatcher: a new illusion. Perception, 38, (6). 483-484.