Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, we will provide a short background on Deep Learning focusing on relevant application domains and an introduction to the powerful and scalable Deep Learning framework, Apache MXNet. At the end of this tutorial you’ll be able to train your own deep neural network, fine tune existing state of the art models for image and object recognition. We’ll also deep dive on setting up your deep learning infrastructure on AWS and model deployment on AWS Lambda.
4. BigDL on AWS
Github: github.com/intel-analytics/BigDL
http://software.intel.com/bigdl
§ BigDL, A Distributed Deep learning framework for
Apache Spark*
§ Deploying BigDL on AWS is super easy!
§ Option 1: Install BigDL on Amazon EMR with Bootstrap action
s3://aws-bigdata-blog/artifacts/aws-blog-emr-jupyter/install-jupyter-emr5-
latest.sh –bigdl
§ Option 2: Launch Public AMI on EC2 w/Xeon E5 v3 or v4
https://github.com/intel-analytics/BigDL/wiki/Running-on-EC2
https://aws.amazon.com/blogs/ai/
running-bigdl-deep-learning-for-apache-
spark-on-aws/
8. Biological Neuron
slide from http://cs231n.stanford.edu/
Neural Network basics: http://cs231n.github.io/neural-networks-1/
9. Artificial Neuron
output
synaptic
weights
• Input
Vector of training data x
• Output
Linear function of inputs
• Nonlinearity
Transform output into desired range
of values, e.g. for classification we
need probabilities [0, 1]
• Training
Learn the weights w and bias b
10. Deep Neural Network
hidden layers
The optimal size of the hidden
layer (number of neurons) is
usually between the size of the
input and size of the output
layers
Input layer
output
11. The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
back propogation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
--
X
input
label
...
X1
16. Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
88% efficiency
on 256 GPUs
Resnet 1024 layer network
is ~4GB
17. Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1! 2! 4! 8! 16! 32! 64! 128! 256!
No. of GPUs
• Cloud formation with Deep Learning AMI
• 16x P2.16xlarge. Mounted on EFS
• Inception and Resnet: batch size 32, Alex net: batch
size 512
• ImageNet, 1.2M images,1K classes
• 152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch),
0.22 top-1 error
Scaling with MXNet
18. http://bit.ly/deepami
Deep Learning any way you want on AWS
Tool for data scientists and developers
Setting up a DL system takes (install) time & skill
Keep packages up to date and compiled (MXNet, TensorFlow, Caffe, Torch,
Theano, Keras)
Anaconda, Jupyter, Python 2 and 3
NVIDIA Drivers for G2 and P2 instances
Intel MKL Drivers for all other instances (C4, M4, …)
Deep Learning AMIs
20. import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
• Straightforward and flexible.
• Take advantage of language
native features (loop,
condition, debugger)
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONS
d = c + 1c
Easy to tweak
with python codes
Imperative Programing
21. • More chances for optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
• Less flexible
PROS
CONS
C can share memory with D
because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programing
22. IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidden=12
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
MXNet: Mixed programming paradigm
23. Embed symbolic expressions into imperative programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):
param -= 0.2 * grad
MXNet: Mixed programming paradigm
26. Batch, Epoch
Batch:
• Number of samples propagated through the network at every iteration
• Helps utilize the GPU compute power
Epoch:
An Epoch is a complete pass through all the training data. A neural network
is trained until the error rate is acceptable, and this will often take multiple
passes through the complete data se
27. Loss Function
• Objective function defines what success looks like when
an algorithm learns.
• It is a measure of the difference between a neural net’s
guess and the ground truth; that is, the error.
• Eror resulting from the loss function is fed into
backpropagation in order to update the weights & biases
• Common loss functions
• Cross entropy
• L1 (linear), L2 (quadratic)
• Mean square error (MSE)
29. Fully Connected Layer
Fully connected layer of a neural
network
If any activation isnt’ applied, you can
image this to be just a linear
regression on the input attributes.