[PR12] Capsule Networks - Jaejun Yoo

Capsule Networks
PR12와 함께 이해하는
Jaejun Yoo
Ph.D. Candidate @KAIST
PR12
17th Dec, 2017

Today’s contents
Dynamic Routing Between Capsules
by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
Oct. 2017: https://arxiv.org/abs/1710.09829
NIPS 2017 Paper

Convolutional Neural Networks
What is the problem with CNNs?
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
1) If the images have rotation, tilt or any other different
orientation then CNNs have poor performance.
2) In CNN each layer understands an image at a much
more granular level (slow increase in receptive field).
DATA AUGMENTATION,
MAX POOLING

“Pooling helps in creating the positional invariance. Otherwise This invariance
also triggers false positive for images which have the components of a ship
but not in the correct order.”

“Pooling helps in creating the positional invariance. Otherwise This invariance
also triggers false positive for images which have the components of a ship
but not in the correct order.”
This was never the intention of pooling layer!

What we need : EQUIVARIANCE (not invariance)
“Equivariance makes a CNN understand the rotation or proportion change
and adapt itself accordingly so that the spatial positioning inside an image is
not lost.”

Capsules
“A capsule is a group of neurons whose activity
vector represents the instantiation parameters of a
specific type of entity such as an object or an object
part.”
8D capsule e.g.
Hue, Position, Size, Orientation, deformation, texture, etc.

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets
8D capsule e.g.
Capsules
8D vector
Inverse Rendering

8D capsule e.g.
Capsules
8D vector

8D capsule e.g.
Capsules
8D vector
Equivariance of Capsules

Contents from https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-iii-dynamic-routing-between-capsules-349f6d30418
Routing by Agreements

Aurélien Géron, 2017
Primary Capsules
=
=
Primary Capsules

Predict Next Layer’s
Output
=
=
Primary Capsules

Predict Next Layer’s
Output
=
=
One transformation matrix Wi,j
per part/whole pair (i, j).
ûj|i = Wi,j ui
Primary Capsules

Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules

Routing by
Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!

The rectangle and triangle
capsules should be routed to
the boat capsules.
Routing by
Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!

Routing Weights
=
=
Predicted Outputs
Primary Capsules
bi,j=0 for all i, j

Routing Weights
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
bi,j=0 for all i, j
ci = softmax(bi)

Compute Next
Layer’s Output
=
=
Predicted Outputs
sj = weighted sum
Primary Capsules
0.5
0.5
0.5
0.5

Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
sj = weighted sum
vj = squash(sj)

Actual outputs
of the next layer capsules
(round #1)
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
sj = weighted sum
vj = squash(sj)

Actual outputs
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement

Actual outputs
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement bi,j += ûj|i . vj

Actual outputs
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement bi,j += ûj|i . vj
Large

Actual outputs
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Disagreement bi,j += ûj|i . vj
Small

Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.2
0.1
0.8
0.9

Compute Next
Layer’s Output
=
=
Predicted Outputs
sj = weighted sum
Primary Capsules
0.2
0.1
0.8
0.9

Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
sj = weighted sum
vj = squash(sj)0.2
0.1
0.8
0.9

Actual outputs
(round #2)
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.2
0.1
0.8
0.9

Handling Crowded
Scenes
=
=
=
=

Handling Crowded
Scenes
=
=
=
=
Is this an upside
down house?

Handling Crowded
Scenes
=
=
=
=
House
Thanks to routing by agreement,
the ambiguity is quickly resolved
(explaining away).
Boat

Classification
CapsNet
|| ℓ2 || Estimated Class Probability

Training
To allow multiple classes,
minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:
m- = 0.1
m+ = 0.9
λ = 0.5

Training
Translated to English:
“If an object of class k
is present, then ||vk||2
should be no less than
0.9. If not, then ||vk||2
should be no more
than 0.1.”
To allow multiple classes,
minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:
m- = 0.1
m+ = 0.9
λ = 0.5

Regularization by
Reconstruction
|| ℓ2 ||
Feedforward
Neural Network
Decoder
Reconstruction

Regularization by
Reconstruction
|| ℓ2 ||
Feedforward
Neural Network
Decoder
Reconstruction
Loss = margin loss + α reconstruction loss
The reconstruction loss is the squared difference
between the reconstructed image and the input image.
In the paper, α = 0.0005.

A CapsNet for
MNIST
(Figure 1 from the paper)

A CapsNet for
MNIST – Decoder

Interpretable
Activation Vectors

Pros
● Reaches high accuracy on MNIST, and promising on CIFAR10
● Requires less training data
● Position and pose information are preserved (equivariance)
● This is promising for image segmentation and object detection
● Routing by agreement is great for overlapping objects (explaining away)
● Capsule activations nicely map the hierarchy of parts
● Offers robustness to affine transformations
● Activation vectors are easier to interpret (rotation, thickness, skew…)
● It’s Hinton! ;-)

● Not state of the art on CIFAR10 (but it’s a good start)
● Not tested yet on larger images (e.g., ImageNet): will it work well?
● Slow training, due to the inner loop (in the routing by agreement algorithm)
● A CapsNet cannot see two very close identical objects
○ This is called “crowding”, and it has been observed as well in human vision
Cons

Results
What the individual dimensions of a capsule represent

Results
MultiMNIST
Segmenting Highly Overlapping Digits

Questions Remained
Does capsules really work as the real neurons do?
perceptual illusions
Thompson, P. (1980). Margaret Thatcher: a new illusion. Perception, 38, (6). 483-484.

• https://arxiv.org/abs/1710.09829 (paper)
• https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-
Capsules/
• https://hackernoon.com/what-is-a-capsnet-or-capsule-network-
2bfbe48769cc
• https://medium.com/ai%C2%B3-theory-practice-
business/understanding-hintons-capsule-networks-part-i-intuition-
b4b559d1159b
• https://www.youtube.com/watch?v=pPN8d0E3900 (video)
• https://www.slideshare.net/aureliengeron/introduction-to-capsule-
networks-capsnets (video slides)
References

[PR12] Capsule Networks - Jaejun Yoo

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie [PR12] Capsule Networks - Jaejun Yoo

Ähnlich wie [PR12] Capsule Networks - Jaejun Yoo (8)

Mehr von JaeJun Yoo

Mehr von JaeJun Yoo (13)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

[PR12] Capsule Networks - Jaejun Yoo