This document provides an overview and agenda for a Deep Learning with MXNet workshop. It begins with background on deep learning basics like biological and artificial neurons. It then introduces Apache MXNet and discusses its key features like scalability, efficiency, and programming models. The remainder of the document provides hands-on examples for attendees to train their first neural network using MXNet, including linear regression, MNIST digit classification using a multilayer perceptron, and convolutional neural networks.
2. Agenda
• Deep Learning motivation and basics
• Apache MXNet overview
• MXNet programing model deep dive
• Train our first neural network using MXNet
4. Biological Neuron
slide from http://cs231n.stanford.edu/
Neural Network basics: http://cs231n.github.io/neural-networks-1/
5. Artificial Neuron
output
synaptic
weights
• Input
Vector of training data x
• Output
Linear function of inputs
• Nonlinearity
Transform output into desired range
of values, e.g. for classification we
need probabilities [0, 1]
• Training
Learn the weights w and bias b
6. • Activation functions governs behavior of
neurons.
• Transition of input is called forward propagation.
• Activations are the values passed on to the next
layer from each previous layer. These values are
the output of the activation function of each
artificial neuron.
• Some of the more popular activation functions
include:
• Linear
• Sigmoid
• Hiberbolic Tangant
• Relu
• Softmax
• Step function
Activation Functions
7. Deep Neural Network
hidden layers
The optimal size of the hidden
layer (number of neurons) is
usually between the size of the
input and size of the output
layers
Input layer
output
8. The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
back propogation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
-
-
X
input
label
...
X1
10. Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
88% efficiency
on 256 GPUs
Resnet 1024 layer network
is ~4GB
11. Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1 2 4 8 16 32 64 128 256
No. of GPUs
• Cloud formation with Deep Learning AMI
• 16x P2.16xlarge. Mounted on EFS
• Inception and Resnet: batch size 32, Alex net: batch
size 512
• ImageNet, 1.2M images,1K classes
• 152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch),
0.22 top-1 error
Scaling with MXNet
12.
13. http://bit.ly/deepami
Deep Learning any way you want on AWS
Tool for data scientists and developers
Setting up a DL system takes (install) time & skill
Keep packages up to date and compiled (MXNet, TensorFlow, Caffe, Torch,
Theano, Keras)
Anaconda, Jupyter, Python 2 and 3
NVIDIA Drivers for G2 and P2 instances
Intel MKL Drivers for all other instances (C4, M4, …)
Deep Learning AMIs
15. import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
• Straightforward and flexible.
• Take advantage of language
native features (loop,
condition, debugger)
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONS
d = c + 1c
Easy to tweak
with python codes
Imperative Programing
16. • More chances for optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
• Less flexible
PROS
CONS
C can share memory with D
because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programing
17. IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidde
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
MXNet: Mixed programming paradigm
18. Embed symbolic expressions into imperative programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):
param -= 0.2 * grad
MXNet: Mixed programming paradigm
21. Linear regression
train_data = np.array([[1,2],[3,4],[5,6],[3,2],[7,1],[6,9]])
Y = ax +b ; such that
the error is
minimized
22. Defining the Model
Variables: A variable is a placeholder for future data
X = mx.sym.Variable('data')
Y = mx.symbol.Variable('lin_reg_label')
Neural Network Layers: The layers of a network or any other type of model are also defined by
Symbols
fully_connected_layer = mx.sym.FullyConnected(data=X,
name='fc1', num_hidden = 1)
Output Symbols: Output symbols are MXNet's way of defining a loss
lro = mx.sym.LinearRegressionOutput(data=fully_connected_layer,
label=Y, name="lro”)
23. Layers: Fully Connected
Fully connected layer of a neural
network (without any activation being
applied), which in essence, is just a
linear regression on the input
attributes.
It takes the following parameters:
a. data: Input to the layer
b. num_hidden: # of hidden
dimensions, specifies the size of the
output of the layer
24. Layers: Linear Regression Output
Linear Regression Output: Output layers in MXNet aim at
implementing a loss function.
We apply an L2 loss (Least Square errors)
The parameters to this layer are:
a. data: Input to this layer (specify the symbol whose
output should be fed here)
b. Label: The training label against whom we will compare
the input to the layer for calculation of l2 loss
25. Defining the Model
model = mx.mod.Module(
symbol = lro ,
data_names=['data'],
label_names = ['lin_reg_label']# network structure
)
model.fit(train_iter, eval_iter,
optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
num_epoch=1000,
batch_end_callback = mx.callback.Speedometer(batch_size,
2))
29. NDArray Data Iterator
import mxnet as mx
def to4d(img):
return img.reshape(img.shape[0], 1, 28, 28).astype(np.float32)/255
batch_size = 100
train_iter = mx.io.NDArrayIter(to4d(train_img), train_lbl, batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(to4d(val_img), val_lbl, batch_size)
Batch of 4-D matrix with shape (batch_size, num_channels, width, height)
For the MNIST dataset, there is only one color channel, and both width
and height are 28
30. • Input Layer: This layer is how we get input data
(vectors) fed into our network. The number of neurons in
an input layer is typically the same number as the input
feature to the network.
• Hidden Layer: The weight values on the connections
between the layers are how an ANN encodes what it
learns. Hidden layers are crucial in learning non-linear
functions.
• Output Layer: Output layer represents predictions.
Output can be regression or classification.
• Connections Between Layers: In a feed-forward
network connections link a layer to the next layer of an
ANN. Each connection has a weight. The weights of
connections are encoding of the knowledge of the
network.
Neural Network basics: http://cs231n.github.io/neural-networks-1/
Feed Forward Network
33. Model
model = mx.model.FeedForward(
symbol = mlp, # network structure
num_epoch = 10, # number of data passes for training
learning_rate = 0.1 # learning rate of SGD
)
model.fit(
X=train_iter, # training data
eval_data=val_iter, # validation data
batch_end_callback = mx.callback.Speedometer(batch_size,
200) # output progress for each 200 data batches
)
34. Predictions and Validation
# prediction on a single image
prob = model.predict(val_img[0:1].astype(np.float32)/255)[0]
# get the class with highest probablity
print 'Classified as %d with probability %f' % (prob.argmax(),
max(prob))
# Run the model on the validation setand calculate the score with
eval_metric.
valid_acc = model.score(val_iter)
39. Running the model
model = mx.model.FeedForward(
ctx = mx.gpu(0), # use GPU 0 for training, others are same as
before
symbol = lenet,
num_epoch = 10,
learning_rate = 0.1)
model.fit(
X=train_iter,
eval_data=val_iter,
batch_end_callback = mx.callback.Speedometer(batch_size, 200)
)
Learn about the features and benefits of Apache MXNet
Learn about the deep learning AMIs with the tools you need for DL
Learn how to train a neural network using MXNet
Cell body – SOMA ; Dendrites appendages that listen to other neurons
Single axon that carries the output of the computation that the neuron performs
Cell body receives multiple inputs and if the cell body aligns the cell body can spike and sends an activation potential down the neuron and
Then branches out to axons to other neurons.
We have neurons connected through the synapses
Crude model :: Neuron to Neuron connection is through the synapse. Each neuron has a weight, which is a function of “how much does this neuron like the other neuron”
For computation efficiency Neurons are arranged in layers
Hard to define the network
the definition of the inception network has >1k lines of codes in Caffe
Memory consumption is linear with number of layers
Execute operations step by step.
c = b ⨉ a invokes a kernel operation
Numpy programs are imperative
Declares the computation
Compiles into a function
C = B ⨉ A only specifies the requirement
SQL is declarative
@zz: “data_shape” is confusing, one would expect it to bind with some input data, not "shape"
Execute operations step by step.
c = b ⨉ a invokes a kernel operation
Numpy programs are imperative