With the deep learning 'revolution' barely a decade old, the field of machine learning is accumulating a growing number of interesting research problems. The Amsterdam Machine Learning Laboratory (AMLAB), headed by Profs. Max Welling and Joris Mooij, has enjoyed considerable participation in the creation of many of these areas. Our research spans many subdisciplines including: approximate Bayesian methods, causal inference, equivariant representations, graph neural networks, spiking neural networks, neural compression, low-cost computation, reinforcement learning, explainable AI, medical imaging, generative modelling, flow models, and many more. In this talk, Daniel Worrall (postdoc) will introduce and showcase some of the recent advances from the lab.
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Machine Learning Today: Current Research And Advances From AMLAB, UvA
1. M A C H I N E L E A R N I N G T O D AY: C U R R E N T
R E S E A R C H A N D A D VA N C E S F R O M A M L A B , U V A
D A N I E L W O R R A L L
2. WHO ARE WE?
~30 researchers working under Max Welling and Joris Mooij
- 4 industrially funded ‘labs’
- Everyone works in deep learning
- We do fundamental research in machine learning
3. WHAT IS MACHINE LEARNING?
Mauna Loa is one of five volcanoes that form the Island of Hawaii in the U.S.
state of Hawaii in the Pacific Ocean. The largest subaerial volcano in both
mass and volume, Mauna Loa has historically been considered the largest
volcano on Earth, dwarfed only by Tamu Massif.
4. WHAT IS MACHINE LEARNING?
In machine learning we use past data to make predictions about the future.
p(y⇤|x⇤, D) = N y⇤|µ(x⇤), 2
(x⇤)
DataTest inputTest output Gaussian
5. WHAT IS MACHINE LEARNING?
Predictions are probability distributions.
Our main tool, conditional distributions:
Data
p(x|✓)
Parameters/models/unknowns
Symmetry constraints
Domain choice
Flexibility
Approximations
Computation
Memory
How do we choose p?
How do we learn θ?
6. WHAT IS MACHINE LEARNING?
Probability
p(x|✓)
✓
{x1, x2, ...} ✓
{x1, x2, ...}
Statistics
Machine
Learning
✓
{x1, x2, ...}
{x⇤}
Some terminology
7. WHAT WE DO
- Variational methods
- Normalizing flows
- Graphs
- Symmetry
- Reinforcement learning
- Transfer learning
- Medical imaging
- Generative modelling
- Compression
- Low precision neural networks
- Spiking neural networks
- Semi-supervised learning
8. VARIATIONAL METHODS
Approximate inference
Intractable
p(✓|D) =
p(D|✓)p(✓)
p(D)
=
p(D|✓)p(✓)
R
p(D|✓)p(✓) d✓
Log-likelihood Regulariser
ELBO
q (✓) = arg min DKL [q (✓)kp(✓|D)]
= arg min
Z
q (✓) log
q (✓)
p(✓|D)
d✓
= arg min
Z
q (✓) log
q (✓)p(D)
p(D|✓)p(✓)
d✓
= arg max Eq (✓)[p(D|✓)] + DKL [q (✓)kp(✓)]
q (✓) = arg min DKL [q (✓)kp(✓|D)]
= arg min
Z
q (✓) log
q (✓)
p(✓|D)
d✓
= arg min
Z
q (✓) log
q (✓)p(D)
p(D|✓)p(✓)
d✓
= arg max Eq (✓)[p(D|✓)] + DKL [q (✓)kp(✓)]
q (✓) = arg min DKL [q (✓)kp(✓|D)]
= arg min
Z
q (✓) log
q (✓)
p(✓|D)
d✓
= arg min
Z
q (✓) log
q (✓)p(D)
p(D|✓)p(✓)
d✓
= arg max Eq (✓)[p(D|✓)] + DKL [q (✓)kp(✓)]
‘Distance’ between
distributions
q (✓) = arg min DKL [q (✓)kp(✓|D)]
= arg min
Z
q (✓) log
q (✓)
p(✓|D)
d✓
= arg min
Z
q (✓) log
q (✓)p(D)
p(D|✓)p(✓)
d✓
= arg max Eq (✓)[p(D|✓)] DKL [q (✓)kp(✓)]
p(✓|D) =
p(D|✓)p(✓)
p(D)
=
p(D|✓)p(✓)
R
p(D|✓)p(✓) d✓
9. VARIATIONAL METHODS
Approximate inference
If we use latents (each x has a z) then we have a variational auto-encoder
arg max Ep(x)
⇥
Eq (z|x)[p(x|z)] DKL [q (z|x)kp(z)]
⇤
= arg min q (✓) log
q (✓)
p(✓|D)
d✓
= arg min
Z
q (✓) log
q (✓)p(D)
p(D|✓)p(✓)
d✓
= arg max Eq (✓)[p(D|✓)] DKL [q (✓)kp(✓)]
Neural network
Kingma and Welling (2013)
11. NORMALIZING FLOWS
What is a flexible probability distribution?
e.g. p(x) = N(x|µ, 2
)
e.g. p(x) =
X
i
⇡iN(x|µi, 2
i )
x = f✓(z), z ⇠ p(z)
Implicitly define a distribution via a change
of variables
=) p(x) = p(z) det
@z
@x
= p(z) det
@f
@z
1
Rather expensive
Goal: design flexible f with cheap determinants
Target Flow
Rezende & Mohamed (2016)
17. SYMMETRY?
f(I) = f(T✓[I])
Notational aside:
T✓[I](x) = I(x ✓)
T✓[I](x) = I(R 1
✓ x)
T [I] = (I µ)/. 1
function/
feature mapping
image
transformation
Symmetry is a property of functions/tasks, e.g.
Classification
Disentangling
(cocktail party)
Signal discovery/detection
e.g. Geometric translation
e.g. Geometric rotation
e.g. Pixel normalisation
Set of input transformations leaving invariantS✓[f](I) = f(T✓[I])
18. EQUIVARIANCE
S✓[f](I) = f(T✓[I])S✓[f](I) = f(T✓[I])S✓[f](I) = f(T✓[I])S✓[f](I) = f(T✓[I])
transformation in feature space
Mapping preserves algebraic
structure of transformation
Different representations of same
transformation
https://github.com/vdumoulin/conv_arithmetic
Convolution (and correlation)
[I ⇤ W](x ✓) = T✓[I] ⇤ W(x)
S✓ = Id
Invariance
Convolutions Symmetry
19. GROUP EXAMPLES
*Current research direction: Scalings are probably better modelled as semigroups, i.e. groups without
the invertibility condition.
Scalings*
Translation
Reflections
Roto-translationRotation
Occlusions
Non-example
20. GROUP CONVOLUTIONS
“Convolution” examples [I ⇤ W](✓) =
X
x2Z2
I(x)W(R 1
✓ x)
[I ⇤ W](y) =
X
x2Z2
I(x)W(x y)
[I ⇤ W](✓, y) =
X
x2Z2
I(x)W(R 1
✓ x y)
[I ⇤ W](✓) =
X
x2Z2
I(x)T✓[W](x)Group convolution
[I ⇤ W](✓) =
X
x2Z2
T✓[I](x)W(x)Semigroup convolution