Why is Deep learning hot right now? and How can we apply it on each day job?
1. Why is Deep Learning hot
right now, and How can we
apply it on each day job
ISSAM A. AL-ZINATI
OUTREACH & TECHNICAL ADVISOR
UCAS TECHNOLOGY INCUBATOR
ISSAM A. AL-ZINATI - UCASTI 1
2. What is Deep Learning
ISSAM A. AL-ZINATI - UCASTI 2
Is a Neural Network
3. What is Deep Learning
ISSAM A. AL-ZINATI - UCASTI 3
Is a Neural Network Neuron
Can run small specific
mathematical task
4. What is Deep Learning
ISSAM A. AL-ZINATI - UCASTI 4
Is a Neural Network Neuron
Can run small specific
mathematical task
Edge
Connects Neurons
Holds weights to adjust inputs
5. What is Deep Learning
ISSAM A. AL-ZINATI - UCASTI 5
Is a Neural Network
With More Layers
6. What is Deep Learning
ISSAM A. AL-ZINATI - UCASTI 6
Is a Neural Network
With More Layers
And More Neurons
9. Why Now- Scale
ISSAM A. AL-ZINATI - UCASTI 9
Data
Small Meduim Large
Performance Based on Data Size
Performance
The more data
you feed the
model, the
better results
you get
11. Why Now- Scale
ISSAM A. AL-ZINATI - UCASTI 11
Model Size & GPU
Small Meduim Large
Performance Based on Model Size
Performance
Bigger model could
achieve better
results.
GPUs help to train
those models in
much faster, 20X!!
12. Why Now– vs Others
ISSAM A. AL-ZINATI - UCASTI 12
What about other kind of machine learning algorithms, i.e. SVM, DT, Boosting, ….
Would they do better if they got more data and power?
13. Why Now– vs Others
Small Data Medium Data Large Data
Performance of NN VS Others
Based on Model Size and Data Amount
Others Small NN Medium NN Large NN
ISSAM A. AL-ZINATI - UCASTI 13
14. Why Now– End-To-End
ISSAM A. AL-ZINATI - UCASTI 14
Usual machine learning approach contains a pipeline of stages that are
responsible of feature extraction.
Each stage passes a set of engineered features which help model to better
understand the case it works on.
This approach is complex and prone to errors.
15. Why Now– End-To-End
ISSAM A. AL-ZINATI - UCASTI 15
Data (Audio)
Speech Recognition Pipeline
Audio
Features
Phonemes
Language
Model
Transcript
16. Why Now– End-To-End
ISSAM A. AL-ZINATI - UCASTI 16
Data (Audio)
Speech Recognition - DL
Audio
Features Phonemes Language
Model
Transcript
17. Why Now– End-To-End
ISSAM A. AL-ZINATI - UCASTI 17
Data (Audio)
Speech Recognition - DL
Audio
Features Phonemes Language
Model
Transcript
The Magic
18. How it wok – The Magic
ISSAM A. AL-ZINATI - UCASTI 18
19. How it work – No Magic
Deep Neural network is not magic. But it is very good at finding patterns.
“The hierarchy of concepts allows the computer to learn complicated concepts
by building them out of simpler ones. If we draw a graph showing how these
concepts are built on top of each other, the graph is deep, with many layers. For
this reason, we call this approach to AI deep learning”, Ian Goodfellow.
Deep Learning is Hierarchical Feature Learning.
ISSAM A. AL-ZINATI - UCASTI 19
20. Deep Learning Models
ISSAM A. AL-ZINATI - UCASTI 20
General
Model
FC
Sequence
Model
RNN
LSTM
Image
Model
CNN
Other
Models
Unsupervised
RL
21. Deep Learning Models
ISSAM A. AL-ZINATI - UCASTI 21
General
Model
FC
Sequence
Model
RNN
LSTM
Image
Model
CNN
Other
Models
Unsupervised
RL
Hot Research Topic
22. Advanced Deep Learning Models –
VGGNET - ResNet
Achieves 7.3% on ImageNet-2014 classification Challenge, come in the first
place.
It Used
120 million
parameters.
ISSAM A. AL-ZINATI - UCASTI 22
23. Advanced Deep Learning Models –
Google Inception V3
Achieves 5.64% on ImageNet-2015 classification Challenge, come in the second place.
ISSAM A. AL-ZINATI - UCASTI 23
24. Advanced Deep Learning Models –
Google Inception V3
Based on ConvNet concept with the addition
of inception module.
ISSAM A. AL-ZINATI - UCASTI 24
Using a network with a
computational cost of 5 billion
multiply-adds per inference and
with using less than 25 million
parameters.
25. Deep Learning Applications – Deep Voice
Baidu Research presents Deep Voice, a production-quality text-to-speech system
constructed entirely from deep neural networks.
Ground Truth
Generated Voice
ISSAM A. AL-ZINATI - UCASTI 25
26. Deep Learning Applications – Image
Captioning
Multimodal Recurrent Neural Architecture generates sentence descriptions from
images. Source.
ISSAM A. AL-ZINATI - UCASTI 26
"man in black shirt is playing guitar." "two young girls are playing with lego toy."
27. Deep Learning Applications – Generating
Videos
ISSAM A. AL-ZINATI - UCASTI 27
This approach was driven by using Adversarial Network to
1) Generate Videos
2) Conditional Video Generation based on Static Images
Source
30. Applying Deep Learning – Bias/Variance
The goal is to build a model that is close to human-level performance.
ISSAM A. AL-ZINATI - UCASTI 30
31. Applying Deep Learning – Bias/Variance
The goal is to build a model that is close to human-level performance.
ISSAM A. AL-ZINATI - UCASTI 31
Training Set – 70% Val Set – 15% Test Set – 15%
32. Applying Deep Learning – Bias/Variance
The goal is to build a model that is close to human-level performance.
ISSAM A. AL-ZINATI - UCASTI 32
Training Set – 70% Val Set – 15% Test Set – 15%
You need to know the following values:
1- Human-Level Error
2- Training Level Error
3- Validation Level Error
33. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 33
1%
5%
Human-Level
Training-Level
6%
Validation-Level
34. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 34
1%
5%
Human-Level
Training-Level
6%
Validation-Level
High bias/
underfitting
35. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 35
1%
5%
Human-Level
Training-Level
6%
Validation-Level
High bias/
underfitting
1- Bigger Model
2- Train Longer
3- New Model Arch
36. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 36
1%
2%
Human-Level
Training-Level
6%
Validation-Level
37. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 37
1%
2%
Human-Level
Training-Level
6%
Validation-Level
High variance/
overfitting
38. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 38
1%
2%
Human-Level
Training-Level
6%
Validation-Level
High variance/
overfitting
1- More Data
2- Early Stopping
3- Regularization
4- New Model Arch
39. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 39
1%
5%
Human-Level
Training-Level
10%
Validation-Level
40. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 40
1%
5%
Human-Level
Training-Level
10%
Validation-Level
High bias/
underfitting
High variance/
overfitting
41. Applying Deep Learning – Bias/Variance
ISSAM A. AL-ZINATI - UCASTI 41
1%
5%
Human-Level
Training-Level
10%
Validation-Level
High bias/
underfitting
1- Bigger Model
2- Train Longer
3- New Model Arch
High variance/
overfitting
1- More Data
2- Early Stopping
3- Regularization
4- New Model Arch
42. Applying Deep Learning – Data Synthesis
Usually, to overcome the problem of bias/variance we tend to create new set of
handy engineered features and try to retrain our model to see if we get more
accurate one.
In Deep Learning, Having more data can be a great solution to many scenarios.
But Its not always the case that we had this data ready to use.
So playing with data and try to create new handy engineered data set can be the
solution.
ISSAM A. AL-ZINATI - UCASTI 42
43. Applying Deep Learning – Data Synthesis
1) OCR
Getting more data for an OCR model is easy. We can follow these steps to get
those new data sets:
- Downloading images from the internet
- Generate text from MSWord in different font, size, color, blur, …
- Combine these two steps and you get millions of new images for training.
ISSAM A. AL-ZINATI - UCASTI 43
44. Applying Deep Learning – Data Synthesis
2) Speech Recognition
- Collect a set of clean audio files
- Collect random background sounds
- Combine these two steps and you get millions of new audio files for training.
ISSAM A. AL-ZINATI - UCASTI 44
45. Applying Deep Learning – Data Synthesis
3) NLP – Grammar correction
- Collect a set of correct sentences
- Randomly shuffle the word in this sentence
- Those new sentences are the new data set that we can fed to our model.
ISSAM A. AL-ZINATI - UCASTI 45
46. Applying Deep Learning – Data Synthesis
4) Image Recognition
- Having a set of labeled images
- Randomly add new effects to those images, i.e. rotate, blur, flip, luminosity, …
- Those new images are the new data set that we can fed to our model.
ISSAM A. AL-ZINATI - UCASTI 46
47. Applying Deep Learning – Data Synthesis
Data Syntheses has a limit, it can not always work but it good to start with.
ISSAM A. AL-ZINATI - UCASTI 47
48. Applying Deep Learning – Transfer Learning
Another approach to overcome the problem of bias/variance is to have:
1- Larger Model
2- New Model Architecture
But these two approaches needs more powerful machine to do the training.
Also, sometimes you don’t have enough resources to train it, i.e. Data and GPUs.
ISSAM A. AL-ZINATI - UCASTI 48
49. Applying Deep Learning – Transfer Learning
Imagine if we can run Google Inception V3 as our model for image classification,
would not be Great!!
Transfer learning allow us to use these popular model by replacing the last fully
connected layer (1000 class classifier) with our classifier. Here we are using the
other layers as a feature extractors.
ISSAM A. AL-ZINATI - UCASTI 49
50. Applying Deep Learning – Transfer Learning
1) Fixed feature extractor
◦ Import one of the famous model with its weights.
◦ Replace last layer with custom classifier. It could be another fully connected NN or
other ML models like SVM.
◦ Train the new classifier based on the features and weights that this network had
extracted already.
ISSAM A. AL-ZINATI - UCASTI 50
51. Applying Deep Learning – Transfer Learning
2) Fine-Tunning
◦ Import one of the famous model with its weights.
◦ Replace last layer with custom classifier. It could be another fully connected NN or
other ML models like SVM.
◦ Fine-tune the weights of the pretrained network by continuing the backpropagation.
Also, it will train your new classifier at the same time.
ISSAM A. AL-ZINATI - UCASTI 51
52. Applying Deep Learning – Transfer Learning
3) Retraining
◦ Import one of the famous model with its weights.
◦ Replace last layer with custom classifier. It could be another fully connected NN or
other ML models like SVM.
◦ Retrain the whole model.
ISSAM A. AL-ZINATI - UCASTI 52
53. Applying Deep Learning – Transfer Learning
Best practices
1- New dataset is small and similar to original dataset. – use the first approach.
2- New dataset is large and similar to the original dataset. – use the second
approach.
3- New dataset is small but very different from the original dataset. – use the
second approach but only on early activations in network.
4- New dataset is large and very different from the original dataset. – use the
third approach.
ISSAM A. AL-ZINATI - UCASTI 53
54. Applying Deep Learning – Use GPU/Cloud
The last point that we should consider is using GPUs on cloud.
There are two famous provider who give you the ability to configure a machine with GPU for
good prices
1- Amazon AWS – By using its P2 instances you can train and run your model under 1$ per hour.
Another advantage of using AWS is the ability to use preconfigure images that has every thing
installed and configured for you.
2- Microsoft Azure – By using its NC instances you can train and run your model for 1.05$ per
hour.
ISSAM A. AL-ZINATI - UCASTI 54