A high-level introduction to the current buzz around "Deep Learning" (That it is famous, successfull, and a continuation of neural network research; what is new since the last century, what is the basic idea, what is our outlook into ints future).
Followed by our stake in it and two use cases (face recognition, text analytics).
4. Zürcher Fachhochschule
4
Deep Learning is…
…Continued Neural Network Research
What’s new?
• Novel architectures (wider, deeper)
• Faster and better training
(e.g., understanding of Backpropagation’s “vanishing gradient” problem, good initial weights)
• Better regularization (e.g., Dropout, Max-pooling etc.)
• Big Data (or augmentation) and corresponding computational power on GPUs
«Add as many parameters as possible for your hardware and train the hell out of
it with proper regularization» (Yann LeCun)
5. Zürcher Fachhochschule
5
Deep Learning is…
… Successful
Areas of successful application:
• Computer Vision (detection, segmentation, recognition, OCR, video analysis)
• Speech Processing (Recognition, Siri etc.)
• Natural Language Processing (Translation, Sentiment Analysis)
• Metric Learning (distances, invariances, hashing)
• Prediction & Forecasting (financial, time series)
Red titled slides by Jonathan Masci
6. Zürcher Fachhochschule
6
Technical Idea
Learning Features, not just rules
Hand-engineering features is tedious
Let each layer learn a new representation of the data by itself
Actual learning is…
• governed by the learning target (input-output pairs & objective function),
• facilitated by constraints & regularizations (e.g., sparsity to learn distributed codes),
• enforced by the Backpropagation algorithm (1970-1989)
What is learned?
• Highly non-linear functions purely from data
• Hierarchies of features, combinations of elements (distributed codes)
State of the Art
• CNNs (Convolutional Neural Networks) for vision tasks and beyond
Relatively easy to use, very successful, biologically inspired, broad user basis
• RNNs (Recurrent Neural networks) for sequences and hard tasks
Turing complete, hot research topic Honglak Lee, University of Michigan
Yan et al., National University of Singapore
7. Zürcher Fachhochschule
7
The Deep Learning Market
… and what we do about it!
Strategic relevance
• 3 years ago: <10 research groups at «ivy league» universities
• 01/2014: Google acquires DeepMind for 500 Mio. $ (startup by IDSIA / Ticino)
• Currently:
• Courses / books / software frameworks are all «beta versions»
• Boundaries between research and application are strongly domain-specific
• Outlook: Could be a tool like «SVM» in 2-5 years
Deep Learning @ Datalab
• Hardware invests: 2 multi-GPU Workstations
http://www.zhaw.ch/de/zhaw/institute-zentren/uebergreifende-institute-zentren/dlab/hardware.html
• People invests: 13 researchers formed the Deep Learning Journals Club in 2014
deeplearning@downbirn.zhaw.ch
• Projects:
• 2 internal projects finished (see use cases later!)
• 2 CTI projects just got funded (start this summer)
• Several proposals pending
9. Zürcher Fachhochschule
9
Goal: Turn text
into information
Sentiment Analysis
Q&A
Named Entity Extraction
Text Summarization
Machine Translation
Spelling Correction
Information Retrieval
What is "Text Analytics"?
12. Zürcher Fachhochschule
12
Sample Features for Tweets
Word ngrams: presence or absence of contiguous sequences of 1, 2, 3, and 4
tokens; noncontiguous ngrams
POS: the number of occurrences of each part-of-speech tag
Sentiment Lexica: each word annotated with tonality score (-1..0..+1)
Negation: the number of negated contexts
Punctuation: the number of contiguous sequences of exclamation marks, question
marks, and both exclamation and question marks
Emoticons: presence or absence, last token is a positive or negative emoticon;
Hashtags: the number of hashtags;
Elongated words: the number of words with one character repeated (e.g. ‘soooo’)
from: Mohammad et al., SemEval 2013
15. Zürcher Fachhochschule
15
Word2Vec
• Huge set of text samples (billions of
words)
• Extract dictionary
• Word-Matrix: k-dimensional vector for
each word (k typically 50-500)
• Word vector initialized randomly
• Train word vectors to predict next
words, given a sequence of words
from sample text
Major contributions by Bengio et al. 2003, Collobert&Weston 2008, Socher et al. 2011, Mikolov et al. 2013
16. Zürcher Fachhochschule
16
The Magic of Word Vectors
King - Man + Woman ≈ Queen
Live Demo on 100b words from Google News dataset: http://radimrehurek.com/2014/02/word2vec-tutorial/
18. Zürcher Fachhochschule
18
Using Word Vectors in NLP
Collobert et al., 2011:
• SENNA: Generic NLP System based on word vectors
• Solves many NLP-Tasks as good as benchmark systems
20. Zürcher Fachhochschule
20
Deep Learning and Sentiment
• Maas et al., 2011: word vectors with sentiment context
• Socher et al, 2013: Representing sentence structures
as trees with sentiment annotation
• Quoc and Mikolov, 2014:
"Paragraph Vectors"
wonderful terrible
amazing awful
24. Zürcher Fachhochschule
24
What is face recognition?
Detection: Is this a face or not?
Verification: Are these two pictures showing the same face?
Identification: Is this Yves?
27. Zürcher Fachhochschule
27
Baseline: Fisherfaces (OpenCV)
Detect Align
Feature
extractor
Train
Pre-processor Model
Filter
Recognizer
Predict
Viola & Jones 2D similarity
transform
Gamma +
DoG
Principal
Component
Analysis
Linear Discriminate
Analysis
28. Zürcher Fachhochschule
28
Deep Learning
Detect Align
Feature
extractor
Train
Pre-processor Model
Filter
Recognizer
Predict
Viola & Jones Local binary
pattern +
ellipse
Convolutional Neural Network:
Features are learned
29. Zürcher Fachhochschule
29
Experiment
Testing outdoors (used exclusively for testing)
Training indoors (used for learning)
Approx. 40 images of 6 individuals acquired in 2 batches.
For CNN training, an augmented set was used, i.e.
additional training images were synthetically created.
31. Zürcher Fachhochschule
31
Interesting findings
• Alignment is crucial for baseline algorithm – time consuming
• CNN needs to be trained on desktop PC with GPU
• Training data augmentation for CNN can effectively replace
the alignment step – saving time
• CNN outperforms baseline algorithm 99.6 % : 96.9 %,
dropping less images and saving time.
• Let’s see it running:
https://www.youtube.com/watch?v=oI1eJa-UWNU
32. Zürcher Fachhochschule
32
Further Reading
• Very brief history with some links (2015)
http://dublin.zhaw.ch/~stdm/?p=241
• Comprehensive history & survey (2015)
Schmidhuber, “Deep Learning in Neural Networks: An Overview”
http://arxiv.org/abs/1404.7828
• Deep Learning Kick-off (2006 of historical interest)
Hinton et al., “A Fast Learning Algorithm for Deep Belief Nets”
http://www.cs.toronto.edu/~hinton/absps/ncfast.pdf
• Very practical overview of Convolutional Neural Networks (CNNs, 1998)
LeCun et al., “Gradient-Based Learning Applied to Document Recognition”
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
• Cool application for which Google paid 500 Mio. $ (2015)
Mnih et al, “Human-Level Control through Deep Reinforcement Learning”
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html