This talk was given at the Cologne AI and Machine Learning Meetup on July 17, 2019 (https://www.meetup.com/de-DE/Cologne-AI-and-Machine-Learning-Meetup/events/259758559/) by Christian Bauckhage: Machine Learning and Common Sense
Despite its undeniable success in the past couple of years, there still are situations where Machine Learning will not work well. Especially if training data is limited, not representative, or severely biased, current approaches cannot learn to generalize well. That is, trained systems may run the risk of making silly mistakes during application. This raises several questions: How can "common sense" be integrated into the machine learning pipeline? Are there algorithms or design principles that allow for informed learning and introspection? Are there algorithms or design principles that lead to white box solutions whose computations are transparent and whose decisions are accountable? In short, are there approaches towards more explainable ML systems that could be deployed in situations where there is few data to learn from and traceable decisions are a necessity? These are the questions this presentation focuses on.
4. big data
+ affordable HPC
+ open source software
+ deep learning systems
= progress in AI
5. dramatic progress in cognitive computing
text analysis and understanding
image understanding
speech recognition
robotics
...
Socher et al., Proc. EMNLP, 2013
6. super human performance in medical diagnostics
Ciresan, Giusti, Gambardella, and Schmidhuber, Proc. MICCAI, 2013
Gulshan et al., JAMA, 316(22), 2016
Esteva et al., Nature, 542(7936), 2017
7. how is this possible ?
a human expert sees
100 images per day
500 images per week
25.000 images per year
1.000.000 images in 40 years
a neural network sees
10.000.000 images for training
a human experts gets
tired, distracted, . . .
a neural network never gets
tired, distracted, . . .
8. a frenzy in finance
considerable investments by
BlackRock, Bridgewater, Schroders,
MAN AHL, . . .
AI-based hedge funds / fintechs
Aidyia, Numerai, Sentient, . . .
research on predicting
stock momentum, volatility of futures,
insolvency risk, . . .
reports accuracies of 53% – 60%
Krauss et al., Europ. J. Operation Research, 2016
Ding et al., Proc. IJCAI, 2015
19. state of affairs in 2019
big data
+ affordable HPC
+ open source software
+ deep learning systems
= progress in AI
problems in industry
1) VC theory demands that complex
models are trained with massive
data, but labeled data are scarce
2) even labeled data may be biased
3) (deep) neural networks are black
boxes, connectionist architectures
are not accountable
20. how to avoid silly mistakes ?
how to incorporate common sense ?
how to overcome thin data problems ?
22. further details
von Rueden, Mayer, Garcke, Bauckhage & Schuecker
Informed Machine Learning – Towards a Taxonomy of
Explicit Integration of Knowledge into Machine Learning
arXiv:1903/12394 [stat.ML], 2019
24. adjusting learned representations to semantic structures
Dong, Wang, Li, Bauckhage & Cremers
Triple Classification Using Regions and Fine-Grained Entity Typing
Proc. AAAI, 2019
Dong, Bauckhage, Jin, Li, Cremers, Speicher, Cremers & Zimmermann
Imposing Category Trees Onto Word-Embeddings Using A Geometric
Construction
Proc. ICLR, 2019
25. how to increase accountability ?
how to improve explainability ?
27. observe
a deep neural network computes a
composite function
y x = f . . . f W2
f W1
x
where, for instance
f s i
= tanh si
1 2 3 4 5
6 7 8 9
10 11 12 13
14 15 16
17 18 19
20 21
W 1
W 2
...
W L
28. consider this . . .
let
s0 = x1, x2, . . . , xn (all training data)
sg = y1, y2, . . . , yn (all training targets)
sl = f Wl
sl−1 (output of layer l)
then
f : S × A → S
where
S ⊆ Rm1×min ∪ . . . ∪ RmL×mout (quasi continuous state space)
A = W1, W2, . . . , WN (very large set of actions)
29. consider this . . .
seen from this point of view, training a neural network is to find a sequence
W1
→ W2
→ . . . → WL
, Wl
∈ A
that minimizes
E = sL − sg
2
30. observe
we can generalize this idea
we can think of a data analytics system as a function
y x = fL
◦ . . . ◦ f2
◦ f1
x
and “just” need to find sequences of functions fl where
fl
∈ A = f1, f2, . . . , fN
31. didactic example: the “noisy bi-polar XOR problem”
given labeled data xi, yi
n
i=1
xi ∈ R2
yi ∈ −1, +1
train a classifier such that
y(x) =
−1 if x1 ≈ x2
+1 otherwise
−1 0 1
−1
0
1
32. operations provided by an expert
R2 → R2 R2 → R R → R
f0(x) = id(x) f5(x) = i xi f10(x) = |x|
f1(x) = R x f6(x) = i xi f11(x) = −x
f2(x) = wT
a x · wa f7(x) = i x2
i f12(x) = x − 1
f3(x) = wT
b x · wb f8(x) = wT
a x f13(x) = sign(x)
f4(x) = wawT
a − I x f9(x) = wT
b x f14(x) = tanh(x)
f15(x) = 2 e−x2
− 1
where
R =
0 1
−1 0
wa =
sin π
4
cos π
4
wa =
− sin π
4
cos π
4
35. RL finds and a 2 × 3 × 1 net would do something like this
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f13 ◦ f6 ◦ f1(x) = sign i R x i
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f14 f14 f8(x) + 1 − f14 f8(x) − 1 − 1
36. observe
given a library of trusted and tested modules / functions,
learning becomes a sequencing rather than a parameter
estimation problem
stochastic exploration (RL, MCTS) of the solution space
is generally cumbersome but can be guided using expert
knowledge