by Vikram Madan, Sr. Product Manager, AWS Deep Learning
In this workshop, we will provide cover deep learning fundamentals and focus on the powerful and scalable Apache MXNet open source deep learning framework. At the end of this tutorial you’ll be able to train your own deep neural network and fine tune existing state of the art models for image and object recognition. We’ll also deep dive on setting up your deep learning infrastructure on AWS and model deployment on AWS Lambda.
10. Inputs: Data Preprocessing, Batches, Epochs
Preprocessing
§ Random separation of data into
training, validation, and test sets
§ Necessary to measuring the
accuracy of the model
Batch
§ Amount of data propagated
through network at every iteration
§ Enables faster optimization
through shorter iteration cycles
Epoch
§ Complete pass through all the
training data
§ Optimization will have multiple
epochs to reduce error rate
12. Activation Functions
Add nonlinearity to a layer and
applied to the layer’s output
There are several options:
§ Rectified Linear Unit (ReLU)
§ Sigmoid
§ Hyperbolic Tangent (tanh)
§ Softplus
ReLU functions are the most
commonly used today
13. Deep Neural Network
Hidden layers
Optimal size of a hidden layer
(number of nodes) is typically
between the size of the input
and size of the output layers
Input layer
Output
14. The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
backpropagation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
-
-
X
input
label
...
X1
15. Classification with the Softmax Function
Softmax Function
Source: https://stats.stackexchange.com/questions/273465/neural-network-softmax-activation
Softmax converts the output layer into probabilities – necessary for classification
16. Loss Function
• It is an objective function that quantifies how successful
the model was in its predictions
• It is a measure of the difference between a neural net’s
prediction and the actual value – that is, the error
• Typically, we use Cross Entropy Loss, which adjusts
the plain loss calculation to mitigate learning slowdown
• Backpropagation is performed to calculate the error
contribution of each neuron after processing one batch
18. Stochastic Gradient Descent
Gradient Descent
A single iteration for the
parameter update runs through
ALL of the training data
Stochastic Gradient Descent,
A single iteration for the
parameter update runs through
a BATCH of the training data
32. Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
88% efficiency
on 256 GPUs
Resnet 1024 layer network
is ~4GB
33. Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1 2 4 8 16 32 64 128 256
No. of GPUs
• Cloud formation with Deep Learning AMI
• 16x P2.16xlarge. Mounted on EFS
• Inception and Resnet: batch size 32, Alex net: batch
size 512
• ImageNet, 1.2M images,1K classes
• 152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch),
0.22 top-1 error
Scaling with MXNet
35. http://bit.ly/deepami
Deep Learning any way you want on AWS
Tool for data scientists and developers
Setting up a DL system takes (install) time & skill
Keep packages up to date and compiled (MXNet, TensorFlow, Caffe, Torch,
Theano, Keras)
Anaconda, Jupyter, Python 2 and 3
NVIDIA Drivers for G2 and P2 instances
Intel MKL Drivers for all other instances (C4, M4, …)
Deep Learning AMIs
37. import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
• Straightforward and flexible.
• Take advantage of language
native features (loop,
condition, debugger)
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONS
d = c + 1c
Easy to tweak
with python codes
Imperative Programing
38. • More chances for optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
• Less flexible
PROS
CONS
C can share memory with D
because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programing
39. IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidde
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
MXNet: Mixed programming paradigm
40. Embed symbolic expressions into imperative programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):
param -= 0.2 * grad
MXNet: Mixed programming paradigm