Machine Learning and Common Sense

Machine Learning
and Common Sense ?
Christian Bauckhage

suddenly, everybody is interested in machine learning

big data
+ affordable HPC
+ open source software
+ deep learning systems
= progress in AI

dramatic progress in cognitive computing
text analysis and understanding
image understanding
speech recognition
robotics
...
Socher et al., Proc. EMNLP, 2013

super human performance in medical diagnostics
Ciresan, Giusti, Gambardella, and Schmidhuber, Proc. MICCAI, 2013
Gulshan et al., JAMA, 316(22), 2016
Esteva et al., Nature, 542(7936), 2017

how is this possible ?
a human expert sees
100 images per day
500 images per week
25.000 images per year
1.000.000 images in 40 years
a neural network sees
10.000.000 images for training
a human experts gets
tired, distracted, . . .
a neural network never gets
tired, distracted, . . .

a frenzy in ﬁnance
considerable investments by
BlackRock, Bridgewater, Schroders,
MAN AHL, . . .
AI-based hedge funds / ﬁntechs
Aidyia, Numerai, Sentient, . . .
research on predicting
stock momentum, volatility of futures,
insolvency risk, . . .
reports accuracies of 53% – 60%
Krauss et al., Europ. J. Operation Research, 2016
Ding et al., Proc. IJCAI, 2015

reading business reports
vs.
Fraunhofer IAIS and PwC, 2018

it does not even have to be neural networks . . .

biased data / the Google photos incident of 2015

great results
automatic captioning :-)

great results vs. strange results
automatic captioning :-) automatic captioning :-(

strange decisions
husky classiﬁed as wolf
Ribeiro et. al, arXiv, 2016

strange decisions
husky classiﬁed as wolf reason why
Ribeiro et. al, arXiv, 2016

adversarial input
Brown et. al, arXiv, 2017

state of affairs in 2019
big data
+ affordable HPC
+ open source software
+ deep learning systems
= progress in AI
problems in industry
1) VC theory demands that complex
models are trained with massive
data, but labeled data are scarce
2) even labeled data may be biased
3) (deep) neural networks are black
boxes, connectionist architectures
are not accountable

how to avoid silly mistakes ?
how to incorporate common sense ?
how to overcome thin data problems ?

informed machine learning
von Rueden et. al, arXiv, 2019

further details
von Rueden, Mayer, Garcke, Bauckhage & Schuecker
Informed Machine Learning – Towards a Taxonomy of
Explicit Integration of Knowledge into Machine Learning
arXiv:1903/12394 [stat.ML], 2019

more speciﬁcally, how to integrate data-
and knowledge driven approaches ?

adjusting learned representations to semantic structures
Dong, Wang, Li, Bauckhage & Cremers
Triple Classiﬁcation Using Regions and Fine-Grained Entity Typing
Proc. AAAI, 2019
Dong, Bauckhage, Jin, Li, Cremers, Speicher, Cremers & Zimmermann
Imposing Category Trees Onto Word-Embeddings Using A Geometric
Construction
Proc. ICLR, 2019

how to increase accountability ?
how to improve explainability ?

deep learning as functional composition

observe
a deep neural network computes a
composite function
y x = f . . . f W2
f W1
x
where, for instance
f s i
= tanh si
1 2 3 4 5
6 7 8 9
10 11 12 13
14 15 16
17 18 19
20 21
W 1
W 2
...
W L

consider this . . .
let
s0 = x1, x2, . . . , xn (all training data)
sg = y1, y2, . . . , yn (all training targets)
sl = f Wl
sl−1 (output of layer l)
then
f : S × A → S
where
S ⊆ Rm1×min ∪ . . . ∪ RmL×mout (quasi continuous state space)
A = W1, W2, . . . , WN (very large set of actions)

consider this . . .
seen from this point of view, training a neural network is to ﬁnd a sequence
W1
→ W2
→ . . . → WL
, Wl
∈ A
that minimizes
E = sL − sg
2

observe
we can generalize this idea
we can think of a data analytics system as a function
y x = fL
◦ . . . ◦ f2
◦ f1
x
and “just” need to ﬁnd sequences of functions fl where
fl
∈ A = f1, f2, . . . , fN

didactic example: the “noisy bi-polar XOR problem”
given labeled data xi, yi
n
i=1
xi ∈ R2
yi ∈ −1, +1
train a classiﬁer such that
y(x) =
−1 if x1 ≈ x2
+1 otherwise
−1 0 1
−1
0
1

operations provided by an expert
R2 → R2 R2 → R R → R
f0(x) = id(x) f5(x) = i xi f10(x) = |x|
f1(x) = R x f6(x) = i xi f11(x) = −x
f2(x) = wT
a x · wa f7(x) = i x2
i f12(x) = x − 1
f3(x) = wT
b x · wb f8(x) = wT
a x f13(x) = sign(x)
f4(x) = wawT
a − I x f9(x) = wT
b x f14(x) = tanh(x)
f15(x) = 2 e−x2
− 1
where
R =
0 1
−1 0
wa =
sin π
4
cos π
4
wa =
− sin π
4
cos π
4

possible solutions
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f14 ◦ f11 ◦ f6(x)
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f11 ◦ f14 ◦ f12 ◦ f10 ◦ f5(x)
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f12 ◦ f15 ◦ f7 ◦ f4(x)
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f15 ◦ f7 ◦ f4(x)
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f11 ◦ f14 ◦ f12 ◦ f10 ◦ f5(x)
· · ·

RL ﬁnds
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f13 ◦ f6 ◦ f1(x) = sign i R x i

RL ﬁnds and a 2 × 3 × 1 net would do something like this
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f13 ◦ f6 ◦ f1(x) = sign i R x i
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
f14 f14 f8(x) + 1 − f14 f8(x) − 1 − 1

observe
given a library of trusted and tested modules / functions,
learning becomes a sequencing rather than a parameter
estimation problem
stochastic exploration (RL, MCTS) of the solution space
is generally cumbersome but can be guided using expert
knowledge

thank you!
Prof. Dr.-Ing. Christian Bauckhage
www.iais.fraunhofer.de
www.cit.fraunhofer.de

Machine Learning and Common Sense

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Empfohlen

Empfohlen (20)

Machine Learning and Common Sense