Transfer Learning and Fine-tuning Deep Neural Networks
1. Anusua Trivedi, Data Scientist
Algorithm Data Science (ADS)
antriv@microsoft.com
Transfer Learning and Fine-
tuning Deep Neural Networks
2. 1. Traditional Machine Learning (ML)
2. ML Vs. Deep Learning
3. Why Deep Learning for Image Analysis
4. Deep Convolutional Neural Network (DCNN)
5. Transfer Learning DCNN
6. Fine-tuning DCNN
7. Recurrent Neural Network (RNN)
8. Case Studies
Talk Outline
2
3. Vision Analytics
Recommenda-tion
engines
Advertising
analysis
Weather
forecasting for
business planning
Social network
analysis
Legal
discovery and
document
archiving
Pricing analysis
Fraud
detection
Churn
analysis
Equipment
monitoring
Location-based
tracking and
services
Personalized
Insurance
Machine learning
& predictive
analytics are core
capabilities that
help business
decisions
What is ML?
3
4. Traditional ML Vs Deep Learning
Deep learning can
automatically learn
features in data
Deep learning is largely a
"black box" technique,
updating learned weights
at each layer
Traditional ML requires
manual feature
extraction/engineering
Feature extraction for
unstructured data is very
difficult
4
5. 1. Image data requires subject-matter expertise to extract
key features
2. Deep learning extracts feature automatically from
domain-specific images, without any feature engineering
technique
3. This step makes the image analysis process much easier
Why use Deep Learning for Image
Analysis?
5
6. Early Work
1. Fukushima (1980) – Neo-Cognitron
2. LeCun (1989) – Convolutional Neural Networks
(CNN)
3. With the advent of GPUs, DCNN popularity grew
4. Most popular – AlexNet (on ImageNet images)
6
7. 1. Train networks with many layers
2. Multiple layers work to build an improved feature space
3. First layer learns 1st order features (e.g. edges)
4. 2nd layer learns higher order features
5. Lastly, final layer features are fed into supervised layer(s)
Deep Neural Network (DNN)
7
10. Convolution
10
• Conv layers consist of a
rectangular grid of
neurons.
• The weights for this are
the same for each
neuron in the conv layer.
• The conv layer weights
specify the convolution
filter.
11. Pooling
11
The pooling layer takes small rectangular blocks from the
convolutional layer and subsamples it to produce a single
output from that block
14. 1.Non-symbolic frameworks
• The main drawback of imperative frameworks
(like torch, caffe etc. ) is manual optimization.
• Most imperative frameworks are not easily modified.
2.Symbolic frameworks
• Symbolic frameworks (like Theano, Tensorflow, CNTK,
MXNET etc.) can infer optimization automatically from the
dependency graph.
• A symbolic framework can exploit much more memory reuse
Deep Learning Frameworks
14
15. 1. Easy to implement new networks
2. Easy to modify existing networks using Lasagne/Keras
3. Very mature python interface
4. Easy to customize with domain-specific data.
5. Transfer learning and fine tuning in Lasagne/Keras is very
easy
Theano
15
16. 1. Here we use labeled fluorescein angiography images of
eyes to improve Diabetic Retinopathy (DR) prediction.
2. We use a DCNN to improve DR prediction.
Case Study: Diabetic Retinopathy
Prediction
16
19. 1. We use an ImageNet pre-trained DCNN
2. We fine-tune that DCNN to transfer generic learned
features to DR prediction.
3. Lower layers of the pre-trained DCNN contain generic
features that can be used for the DR prediction task.
Transfer Learning & Fine-tuning our
DCNN model
19
27. Re-usability of this DCNN Model
1. We fine-tune ImageNet-trained DCNN for medical
image analysis
2. We can fine-tune the same ImageNet-trained
DCNN model in a completely different domain, and
for a completely different task.
27
28. 1. We use the ImageNet-trained DCNN and learn Apparel
Classification with Style (ACS) image features through
transfer learning and fine-tuning.
2. Then we use a Long short-term memory (LSTM)
Recurrent Neural Network (RNN) on the learned image
features for the image caption generation.
Case Study: Fashion Image Caption
Generation
28
31. Recurrent Neural Network
(RNN-LSTM)
31
• Recurrent neural networks (RNN) are
networks with loops in them, allowing
information to persist.
• Long Short Term Memory (LSTM)
networks are a special kind of RNN,
capable of learning long-term
dependencies.
• Good for state-wise (step-by-step)
caption generation task.
The lower-layers are composed to alternating convolution and max-pooling layers.
The upper-layers are fully-connected and correspond to a traditional MLP (hidden layer + logistic regression).