BAT40 ZHAW Stadelmann Deep Learning-based Pattern Recognition in Business

Zürcher Fachhochschule
Deep Learning-based Pattern Recognition in
Business
40. Berner Architektentreffen , June 29, 2018
Thilo Stadelmann

2
Why?

3
Why? – deep learning in a nutshell
(0.2, 0.4, …)
Container ship
Tiger
Classical
pattern recognition
(0.4, 0.3, …)
Hand-crafted features
(SIFT, SURF, LBP, HOG, etc.)
Container ship
Tiger
Convolutional
neural network
Learns salient features from
pure pixels
Classifikation
(SVM, neuronal net, etc.)
…
…

4
Why? – deep learning in a nutshell
(0.2, 0.4, …)
Container ship
Tiger
Classical
pattern recognition
(0.4, 0.3, …)
Hand-crafted features
(SIFT, SURF, LBP, HOG, etc.)
Container ship
Tiger
Convolutional
neural network
Learns salient features from
pure pixels
Classifikation
(SVM, neuronal net, etc.)
…
…

5
Agenda
Face
matching
Music
scanning
Speaker
recognition
Lessons
learned in
data
availability,
robustness &
interpretability

6
Face matching

7
Face matching

8
Face matching – challenges & solutions
Stadelmann, Amirian, Arabaci, Arnold, Duivesteijn, Elezi,
Geiger, Lörwald, Meier, Rombach & Tuggener (2018).
«Deep Learning in the Wild». ANNPR’2018.

9

10

11
Music scanning

12
Music scanning – challenges & solutions
Tuggener, Elezi, Schmidhuber, Pelillo & Stadelmann (2018). «DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects». ICPR’2018.
Tuggener, Elezi, Schmidhuber & Stadelmann (2018). «Deep Watershed Detector for Music Object Recognition». ISMIR’2018.

13

14
,

15
,

16
,

17
Speaker clustering
Stadelmann & Freisleben (2009). «Unfolding Speaker Clustering Potential: A Biomimetic Approach». ACMMM’2009.
http://www.oxfordwaveresearch.com/
Cluster 1 Cluster 2

18
Speaker clustering
Stadelmann & Freisleben (2009). «Unfolding Speaker Clustering Potential: A Biomimetic Approach». ACMMM’2009.
http://www.oxfordwaveresearch.com/
Cluster 1 Cluster 2

19
Speaker clustering – exploiting time information
Lukic, Vogt, Dürr & Stadelmann (2016). «Speaker Identification and Clustering using Convolutional Neural Networks». MLSP’2016.
Lukic, Vogt, Dürr & Stadelmann (2017). «Learning Embeddings for Speaker Clustering based on Voice Equality». MLSP’2017.
Stadelmann, Glinski-Haefeli, Gerber & Dürr (2018). «Capturing Suprasegmental Features of a Voice with RNNs for Improved Speaker Clustering». ANNPR’2018.
CNN (MLSP’16) CNN & clustering-loss (MLSP’17) RNN & clustering-loss (ANNPR’18)

20
Speaker clustering – learnings & future work
«Pure» voice modeling seem largely solved
• RNN embeddings work well (see t-SNE plot of single segments)
• RNN model robustly exhibits the predicted «sweet spot» for the used time information
• Speaker clustering on clean & reasonably long input works an order of magnitude better (as predicted)
• Additionally, using a smarter clustering algorithm on top of embeddings makes clustering on TIMIT as
good as identification (see ICPR’18 paper on dominant sets)
Future work
• Make models robust on real-worldish data (noise and more speakers/segments)
• Exploit findings for robust reliable speaker diarization
• Learn embeddings and the clustering algorithm end to end
Hibraj, Vascon, Stadelmann & Pelillo (2018). «Speaker Clustering Using Dominant Sets». ICPR’2018.
Meier, Elezi, Amirian, Dürr & Stadelmann (2018). «Learning Neural Models for End-to-End Clustering». ANNPR’2018.

21
Lessons learned
Data is key.
• Many real-world projects miss the required quantity & quality of data
 even though «big data» is not needed
• Class imbalance needs careful dealing
 special loss, resampling (also in unorthodox ways)
Robustness is important.
• Training processes can be tricky
 give hints via a unique loss, proper preprocessing and pretraining
• Risk minimization instead of error minimization
 detect all defects at the expense of lower precision

22
Lessons learned – model interpretability
Interpretability is required.
• Helps the developer in «debugging», needed by the user to trust
 visualizations of learned features, training process, learning curves etc. should be «always on»
Stadelmann, Amirian, Arabaci, Arnold, Duivesteijn, Elezi, Geiger, Lörwald, Meier, Rombach & Tuggener (2018). «Deep Learning in the Wild». ANNPR’2018.
Schwartz-Ziv & Tishby (2017). «Opening the Black Box of Deep Neural Networks via Information».
https://distill.pub/2017/feature-visualization/, https://stanfordmlgroup.github.io/competitions/mura/
negative X-ray positive X-ray

23
DNN training on the Information Plane

24
DNN training on the Information Plane a learning curve

25
DNN training on the Information Plane a learning curve feature visualization

26
Goody – trace & detect adversarial attacks
…using average local spatial entropy of feature response maps
Amirian, Schwenker & Stadelmann (2018). «Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps». ANNPR’2018.

27
Conclusions
• Deep learning is applied and deployed in «normal» businesses (non-AI, SME)
• It does not need big-, but some data (effort usually underestimated)
• DL/RL training for new use cases can be tricky ( needs thorough experimentation)
• New theory and visualizations help to debug & understand
 the training process
 individual results
On me:
• Head ZHAW Datalab, vice president SGAICO, board Data+Service
• thilo.stadelmann@zhaw.ch
• 058 934 72 08
• https://stdm.github.io/
On the topics:
• AI: https://sgaico.swissinformatics.org/
• Data+Service Alliance: www.data-service-alliance.ch
• Collaboration: datalab@zhaw.ch
 Happy to answer questions & requets.

28
APPENDIX

29
Grundlage
Induktives überwachtes Lernen
Annahme
• Ein an genügend viele Beispiele
angepasstes Modell…
• …wird auch auf
unbekannte Daten generalisieren
Methode
• Suchen der Parameter einer
gegebenen Funktion…
• …so dass für alle Beispiele Eingabe (Bild)
auf Ausgabe («Auto») abgebildet wird
Quelle: http://lear.inrialpes.fr/job/postdoc-large-scale-classif-11-img/attribs_patchwork.jpg

30
Grundlage
Induktives überwachtes Lernen
Annahme
• Ein an genügend viele Beispiele
angepasstes Modell…
• …wird auch auf
unbekannte Daten generalisieren
Methode
• Suchen der Parameter einer
gegebenen Funktion…
• …so dass für alle Beispiele Eingabe (Bild)
auf Ausgabe («Auto») abgebildet wird
Quelle: http://lear.inrialpes.fr/job/postdoc-large-scale-classif-11-img/attribs_patchwork.jpg
𝒇 𝒙 = 𝒚

31
Suche der Parameter einer Funktion?
Neuron
Merkmale (z.B. Pixel)
Anpassbare Parameter
Entscheidung
(Schwellwert)
Ergebnis (z.B. «Auto»)
𝑦t

32
Neuron
Entscheidung
(Schwellwert)
𝑦t

33
Neuron Neuronales Netz
Entscheidung
(Schwellwert)
𝑦t

34
Neuron Neuronales Netz
Entscheidung
(Schwellwert)
𝑦t

35
• Unser Neuronales Netz: 𝑓 𝑾 𝑥 = 𝑦
mit Bild 𝑥, echtem Resultat 𝑦 und Parametern 𝑾
(𝑾 = {𝑤1, 𝑤2} anfangs zufällig gewählt)
• Fehlermass: 𝑙 𝑾 =
1
𝑁
σ𝑖=1
𝑁
𝑓 𝑾 𝑥𝑖 − 𝑦𝑖
2
Durchschnitt der quadratischen Abweichungen
über alle Bilder (Loss)
 Fehlerlandschaft
𝑤2
𝑤1
𝑙(𝑤1, 𝑤2)

36
1
𝑁
σ𝑖=1
𝑁
2
𝑤2
𝑤1
𝑙(𝑤1, 𝑤2)
Wahrscheinlichkeit [%] für bestimmtes Ergebnis

37
1
𝑁
σ𝑖=1
𝑁
2
𝑤2
𝑤1
𝑙(𝑤1, 𝑤2)
Methode: Anpassung der Gewichte
von 𝑓 in Richtung der steilsten
Steigung (abwärts) von 𝐽

38
1
𝑁
σ𝑖=1
𝑁
2
𝑤2
𝑤1
𝑙(𝑤1, 𝑤2)

39
1
𝑁
σ𝑖=1
𝑁
2
𝑤2
𝑤1
𝑙(𝑤1, 𝑤2)

40
1
𝑁
σ𝑖=1
𝑁
2
𝑤2
𝑤1
𝑙(𝑤1, 𝑤2)

41
1
𝑁
σ𝑖=1
𝑁
2
𝑤2
𝑤1
𝑙(𝑤1, 𝑤2)

42
1
𝑁
σ𝑖=1
𝑁
2
𝑤2
𝑤1
𝑙(𝑤1, 𝑤2)

43
Was «sieht» das Neuronale Netz?
Hierarchien komplexer werdender Merkmale
Quellen: https://www.pinterest.com/explore/artificial-neural-network/
Olah, et al., "Feature Visualization", Distill, 2017, https://distill.pub/2017/feature-visualization/.

44
Was «sieht» das Neuronale Netz?
Hierarchien komplexer werdender Merkmale
Quellen: https://www.pinterest.com/explore/artificial-neural-network/
Olah, et al., "Feature Visualization", Distill, 2017, https://distill.pub/2017/feature-visualization/.

45
Game playing
(symbolic figure)

46
Game playing – challenges & solutions
Large discrete action space  use heuristic
• makes exploration difficult
• elongates training time
Delayed and sparse reward  do reward shaping
• sequence of actions crucial to get a reward
Distance encoding  use reference points
Transfer Learning  difficult: more complex environment needs other action sequence
Reinforcement learning: deep Q network

47
Learning to cluster

48
Learning to cluster – architecture & examples
Meier, Elezi, Amirian, Dürr & Stadelmann (2018). «Learning Neural Models for End-to-End Clustering». ANNPR’2018.

49
Optical Music Recognition
Foundation of digitization in orchestras and music schools
image label per pixel objectness
named symbols

,

50
Segmentation of newspaper articles
Semi-automatic print media monitoring
Meier, Stadelmann, Stampfli, Arnold & Cieliebak (2017). «Fully Convolutional Neural Networks for Newspaper Article Segmentation». ICDAR’2017.
Stadelmann, Tolkachev, Sick, Stampfli & Dürr (2018). «Beyond ImageNet - Deep Learning in Industrial Practice». In: Braschler et al., «Applied Data Science», Springer.

51
Condition monitoring
Maintaining machines on predicted failure only
We use machine learning approaches for anomaly detection to learn the normal state
of each machine and deviations of it purely from observed sensor signals; the approach
combines classic and industry-proven features with e.g. deep learning auto-encoders.
Stadelmann, Tolkachev, Sick, Stampfli & Dürr (2018). «Beyond ImageNet - Deep Learning in Industrial Practice». In: Braschler et al., «Applied Data Science», Springer.
early detection of faulte.g., RNN autoencoder
feature extraction
vibration sensors

BAT40 ZHAW Stadelmann Deep Learning-based Pattern Recognition in Business

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von BATbern

Mehr von BATbern (20)

BAT40 ZHAW Stadelmann Deep Learning-based Pattern Recognition in Business