MXNet Workshop

Pop-up Loft
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Deep Learning with MXNet workshop
Sunil Mallya
Solutions Architect, Deep Learning
smallya@amazon.com
@sunilmallya

Agenda
•  AWS AI Stack
•  Deep Learning motivation and basics
•  Apache MXNet overview
•  MXNet programing model
•  Fine tuning pre-trained models
•  MXNet serverless deployment

Amazon AI
Intelligent Services Powered By Deep Learning

BigDL on AWS
Github: github.com/intel-analytics/BigDL
http://software.intel.com/bigdl
§  BigDL, A Distributed Deep learning framework for
Apache Spark*
§  Deploying BigDL on AWS is super easy!
§  Option 1: Install BigDL on Amazon EMR with Bootstrap action
s3://aws-bigdata-blog/artifacts/aws-blog-emr-jupyter/install-jupyter-emr5-
latest.sh –bigdl
§  Option 2: Launch Public AMI on EC2 w/Xeon E5 v3 or v4
https://github.com/intel-analytics/BigDL/wiki/Running-on-EC2
https://aws.amazon.com/blogs/ai/
running-bigdl-deep-learning-for-apache-
spark-on-aws/

Pop-up Loft
Deep Learning basics

Machine Learning 101
Shallow Learning
•  Extract clever features
(preprocessing)
•  Map into feature space
(kernel methods)
•  Set of rules
(decision tree)
•  Combine multiple estimates
(boosting)
Deep Learning
•  Many simple neurons
•  Specialized layers
(images, text, audio, …)
•  Stack layers
(hence deep learning)
•  Optimization is difficult
•  Backpropagation
•  Stochastic gradient descent
usually simple to learn better accuracy

0.2
-0.1
...
0.7
Input Output
1 1 1
1 0 1
0 0 0
3
mx.sym.Pooling(data, pool_type="max", kernel=(2,2), stride=(2,2)
lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed)
4 2
2 0
4=Max
1
3
...
4
0.2
-0.1
...
0.7
mx.sym.FullyConnected(data, num_hidden=128)
2
mx.symbol.Embedding(data, input_dim, output_dim = k)
Queen
4 2
2 0
2=Avg
Input Weights
cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman)
mx.sym.Activation(data, act_type="xxxx")
"relu"
"tanh"
"sigmoid"
"softrelu"
Neural Art
Face Search
Image Segmentation
Image Caption
“People Riding Bikes”
Bicycle, People,
Road, Sport
Image Labels
Image
Video
Speech
Text
“People Riding Bikes”
Machine Translation
“Οι άνθρωποι
ιππασίας ποδήλατα”
Events
mx.model.FeedForward model.fit
mx.sym.SoftmaxOutput
Anatomy of a Deep Learning Model
mx.sym.Convolution(data, kernel=(5,5), num_filter=20)
Deep Learning Models

Biological Neuron
slide from http://cs231n.stanford.edu/
Neural Network basics: http://cs231n.github.io/neural-networks-1/

Artificial Neuron
output
synaptic
weights
•  Input
Vector of training data x
•  Output
Linear function of inputs
•  Nonlinearity
Transform output into desired range
of values, e.g. for classification we
need probabilities [0, 1]
•  Training
Learn the weights w and bias b

Deep Neural Network
hidden layers
The optimal size of the hidden
layer (number of neurons) is
usually between the size of the
input and size of the output
layers
Input layer
output

The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
back propogation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
--
X
input
label
...
X1

Hierarchical Feature Representation

Neural Net Simulation
http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

Pop-up Loft
Apache MXNet

Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
88% efficiency
on 256 GPUs
Resnet 1024 layer network
is ~4GB

Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1! 2! 4! 8! 16! 32! 64! 128! 256!
No. of GPUs
•  Cloud formation with Deep Learning AMI
•  16x P2.16xlarge. Mounted on EFS
•  Inception and Resnet: batch size 32, Alex net: batch
size 512
•  ImageNet, 1.2M images,1K classes
•  152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch),
0.22 top-1 error
Scaling with MXNet

http://bit.ly/deepami
Deep Learning any way you want on AWS
Tool for data scientists and developers
Setting up a DL system takes (install) time & skill
Keep packages up to date and compiled (MXNet, TensorFlow, Caffe, Torch,
Theano, Keras)
Anaconda, Jupyter, Python 2 and 3
NVIDIA Drivers for G2 and P2 instances
Intel MKL Drivers for all other instances (C4, M4, …)
Deep Learning AMIs

Pop-up Loft
MXNet Programing model

import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
•  Straightforward and flexible.
•  Take advantage of language
native features (loop,
condition, debugger)
•  E.g. Numpy, Matlab, Torch, …
•  Hard to optimize
PROS
CONS
d = c + 1c
Easy to tweak
with python codes
Imperative Programing

•  More chances for optimization
•  Cross different languages
•  E.g. TensorFlow, Theano,
Caffe
•  Less flexible
PROS
CONS
C can share memory with D
because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programing

IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidden=12
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
MXNet: Mixed programming paradigm

Embed symbolic expressions into imperative programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):
param -= 0.2 * grad
MXNet: Mixed programming paradigm

Pop-up Loft
Neural Nets and Deep
Learning Glossary

Training, Validation Set and Overfitting
Best model

Batch, Epoch
Batch:
•  Number of samples propagated through the network at every iteration
•  Helps utilize the GPU compute power
Epoch:
An Epoch is a complete pass through all the training data. A neural network
is trained until the error rate is acceptable, and this will often take multiple
passes through the complete data se

Loss Function
•  Objective function defines what success looks like when
an algorithm learns.
•  It is a measure of the difference between a neural net’s
guess and the ground truth; that is, the error.
•  Eror resulting from the loss function is fed into
backpropagation in order to update the weights & biases
•  Common loss functions
•  Cross entropy
•  L1 (linear), L2 (quadratic)
•  Mean square error (MSE)

Activation Functions
Adds non linearity
ReLU is most
commonly
used today

Fully Connected Layer
Fully connected layer of a neural
network
If any activation isnt’ applied, you can
image this to be just a linear
regression on the input attributes.

Multilayer Perceptron (MLP)
Y = WX +b

Dropout
Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting”, JMLR 2014

Learning Rates and SGD Visualization
Source: https://twitter.com/alecrad

Convolution Neural Network (CNN)
CNN Layers
Convolutional Layer
Pooling Layer
Activation
Fully-Connected Layer

Recurrent Neural Network (RNN) Examples
Image Caption Sentiment Analysis Machine Translation Video LabelingImage Labeling

MXNet Model Zoo
http://mxnet.io/model_zoo/

MXNet Lambda Deployment
import boto3
mport mxnet as mx
import numpy as np
….
bucket = 'smallya-test'
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
mod = None
with tempfile.NamedTemporaryFile(delete=True) as f_params_file, tempfile.NamedTemporaryFile(delete=True) as f_symbol_file:
s3_client.download_file(bucket, f_params, f_params_file.name) ; f_params_file.flush()
s3_client.download_file(bucket, f_symbol, f_symbol_file.name) ; f_symbol_file.flush()
sym, arg_params, aux_params = load_model(f_symbol_file.name, f_params_file.name)
mod = mx.mod.Module(symbol=sym)
mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))])
mod.set_params(arg_params, aux_params)
def lambda_handler(event, context):
….
labels = predict(url, data_url)
…
Outside context handler

Distributed Deep Learning with MXNet

Pop-up Loft
Thank You
smallya@amazon.com
sunilmallya

MXNet Workshop

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie MXNet Workshop

Ähnlich wie MXNet Workshop (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MXNet Workshop