5. c|c
(TM)
(TM)
5
calculation | consulting capsule networks
Where ConvNets come from: LeNet 5
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
Gradient-based learning applied to document recognition,
Proc. IEEE 86(11): 2278â2324, 1998.
6. c|c
(TM)
(TM)
6
calculation | consulting capsule networks
Convolutions usually w/ max pooling
we get gross spatial invariance by ignoring
exactly where a feature occurs
âA vision system needs to use the same
knowledge at all locations in the imageâ Hinton
ConvNet: share weights + max pooling
7. c|c
(TM)
(TM)
7
calculation | consulting capsule networks
Hierarchical model of the visual system
HMax model, Riesenhuber and Poggio (1999)
dotted line selects max pooled features from lower layer
8. c|c
(TM)
(TM)
8
calculation | consulting capsule networks
Hierarchical model of the visual system
Pooling proposed by Hubel andWiesel in1962
A. Receptive ïŹeld (RF) of simple cell
(green) formed by pooling over
(center-surround) cells (yellow) in
the same orientation row
B. RF of complex cell (green) formed by
pooling over over simple cells.
here: (crude) translation invariance
9. c|c
(TM)
(TM)
9
calculation | consulting capsule networks
Hierarchical model of the visual system
ConvNets resemble hierarchical models (but notice the hyper-column)
HMax model, Riesenhuber and Poggio (1999)
10. c|c
(TM)
(TM)
10
calculation | consulting capsule networks
Hinton: why max pooling is bad ?
(If) the brain embeds things in rectangular space, then
Translation is easy; Rotation is hard
Experiment: time for mind to process rotation ~ amount
Conv Nets:
Crude translation invariance
No explicit pose (orientation) information
Can not distinguish left from right
(actually some people have stopped using pooling)
A vision system needs to use the same knowledge at all locations in the image
11. c|c
(TM)
(TM)
11
calculation | consulting capsule networks
2 streams hypothesis: what and where
Ventral: what objects are
Dorsal: where objects are in space
How do we know ? Neurological disorders
Simultanagnosia: can only see one object at a time
idea dates back to 1968
lots of other evidence as well
https://www.youtube.com/watch?v=mCoYOFzSS9A
12. c|c
(TM)
(TM)
12
calculation | consulting capsule networks
Cortical Microcolumns
Capsules may encode
orientation scale
velocity color âŠ
Column through cortical layers of the brain
80-120 neurons (2X long inV1)
share the same receptive ïŹeld
part of Hubel andWiesel, Nobel Prize 1981
also see recent review: https://www.sciencedirect.com/science/article/pii/S0166223615001484
13. c|c
(TM)
(TM)
13
calculation | consulting capsule networks
Canonical object based frames of reference:
Hinton 1981
Hinton has been thinking about this a long time
A kind of inverse computer graphics
14. c|c
(TM)
(TM)
14
calculation | consulting capsule networks
Capsule networks: inverse computer graphics
computer graphics: rendering engine
capsule network: inverse graphics
matrix of pose
information
Hinton proposes that our brain does a kind-of inverse computer graphics transformation.
15. c|c
(TM)
(TM)
15
calculation | consulting capsule networks
Invariance vs Equivariance
Max pooling provides spatial Invariance, but Hinton argues we need spatial Equivariance.
so use vectors and AfïŹne transformations
Invariance: similar results if
image is shifted or rotated
Equivariance: invariance
under a Symmetry Transformations (S,A,âŠ)
Group homomorphism: f(g*x)=g*f(x)=f(x)*g-1
Geometric: i.e. triangle
centers invariant under Similarity (S)
centroid invariant under AfïŹne (A)
Statistics:
mean: invariant under change of units
median: more generally invariant; a better statistic
16. c|c
(TM)
(TM)
16
calculation | consulting capsule networks
Segmenting highly overlapping objects
Explaining away: Even if two hidden causes are independent, they can become
dependent when we observe an effect that they can both inïŹuence. Hinton
17. c|c
(TM)
(TM)
17
calculation | consulting capsule networks
Capsule networks: architecture
+ unsupervised | reconstruction loss
supervised | max norm loss
Hinton et. al. Dynamic Routing Between Capsules (2017)
19. c|c
(TM)
(TM)
19
calculation | consulting capsule networks
Capsule networks by Hinton
conv2D
Reshape conv2d into primary capsule vectors (red), and
replace max pooling with routing-by-agreement algo
20. c|c
(TM)
(TM)
20
calculation | consulting capsule networks
Capsule networks by Hinton
âActive capsules at one level (red) make predictions, via transformation matrices,
for the instantiation parameters of higher-level capsules (blue).
When multiple predictions agree, a higher level capsule (blue) becomes activeâ
conv2D
22. c|c
(TM)
(TM)
22
calculation | consulting capsule networks
Capsule networks: encodes poses
Capsules can represent objects w/ different poses (3D orientations)
Latest results (matrix capsules, below) improve best accuracy on SmallNORB by %45
23. c|c
(TM)
(TM)
23
calculation | consulting capsule networks
Capsules capture visual features
âA capsule is a group of neurons whose outputs represent different properties of the same entity.â
Capsules encode SIFT-like features
Perturbing an image causes speciïŹc capsules to activate
24. c|c
(TM)
(TM)
24
calculation | consulting capsule networks
Place-coding vs Rate-coding
Place-coding:
convNet w/out pooling
low level features for
small receptive ïŹelds
when a part moves, it may
gets a new capsule
position maps to active
capsules (u) in primary layer
Rate-coding:
traditional neurological way of coding (1926)
stimulus info encoded in rate of ïŹring
(as opposed to magnitude, population, timing, âŠ)
when a part rotates or moves,
the capsule values change
maps to real-values of capsule output vectors (v)
rates
encoded
in
vector
values
aside: are ReLUs a kind of rate coding ?
25. c|c
(TM)
(TM)
25
calculation | consulting capsule networks
Hierarchy of parts: coupled layers
A higher level entity is present if the lower / primary layer capsules
agree on their predictions for its pose.
26. c|c
(TM)
(TM)
26
calculation | consulting capsule networks
Routining algo: some pose prose
An effective way to implement the âexplaining awayâ
that is needed for segmenting highly overlapping objects.
Like an Attention mechanism: The competition ⊠is between the higher-level
capsules that a lower-level capsule might send its vote to.
stuff Hinton saysâŠ
A capsule is activated only if the transformed poses coming from the layer
below match each other. This is a more effective way to capture covariance
and leads to models with many fewer parameters that generalize better.
âŠa powerful segmentation principle that allows knowledge of familiar shapes to
drive segmentation, rather than just using low-level cues such as proximity or
agreement in color or velocity.
27. c|c
(TM)
(TM)
27
calculation | consulting capsule networks
Data-speciïŹc dynamic routes
squash
softmax
âc are determined by an iterative dynamic routing processâij
weighted sum weighted mean prediction
29. c|c
(TM)
(TM)
29
calculation | consulting capsule networks
Capsule: afïŹne transformation
Primary rectangle and triangle capsules (prediction vectors) routed to
boat and house capsules (parent layer), and then routes pruned
âCapsNet is moderately robust to small afïŹne transformations of the training dataâ
30. c|c
(TM)
(TM)
30
calculation | consulting capsule networks
Capsule: squashing function
https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66
length of the capsule vector ~ probability entity represented by capsule
31. c|c
(TM)
(TM)
31
calculation | consulting capsule networks
Routing by agreement
Algo selects data-speciïŹc routes b by matching
primary outputs and squashed (secondary) outputs
ij
ïŹrst paper uses vector overlap / cosine distance to ïŹnd cluster centers: ok, but can not tell great from good
second paper (matrix capsules) uses a Free Energy cost function
33. c|c
(TM)
(TM)
33
calculation | consulting capsule networks
Routing algo: EM ïŹxed point equation
in forward pass of Backprop
(like an EM step)
must terminate to take dW
dot product ~ log likelihood (Energy*)
*Similar to ïŹxed point equation for TAP Free Energy in the EMF RBM
**and in the later matrix capsule paper, a Free Energy is used explicitly
37. c|c
(TM)
(TM)
37
calculation | consulting capsule networks
Routing algo: matrix capsules
cluster score = [ log p(x | mixture) - log p(x | uniform)]ii
cosine distance â> Free Energy cost:
EM to ïŹnd mean, variance, and mixing proportion of Gaussians
âdata-points that form a tight cluster from the perspective of one capsule
may be widely scattered from the perspective of another capsuleâ
p(x | mixture)
ih
39. c|c
(TM)
(TM)
39
calculation | consulting capsule networks
Capsule networks: architecture
+ unsupervised | reconstruction loss
supervised | multi-label max-norm loss each digit capsule ~ single digit
for MNIST data
|v| ~ Prob(digit)
image
size
40. c|c
(TM)
(TM)
40
calculation | consulting capsule networks
From max pool to max |vector|
mask selects (squashed) max vector (by length)
- does not throw away position information
- inputs vector into Fully Connected Net
- reconstructs the image from the vector
- similar to a variational auto-encoder
43. Reconstruction: overlapping images
c|c
(TM)
(TM)
43
calculation | consulting capsule networks
individual (8, 6) reconstructed
after removing a speciïŹc capsule
and does not reconstruct absent (0, 1)
trained on overlapping
MNIST images
like (8,1) (6,7)
does have trouble with close images (like humans)
https://www.youtube.com/watch?v=gq-7HgzfDBM&t=62s
44. c|c
(TM)
(TM)
44
calculation | consulting capsule networks
Matrix capsules : Nov 2017
capsule vectors â> matrices
cosine distance â> Free Energy cost function (Gaussian mixtures)
+ convolutions between layers + lots more details ⊠for another video