SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Pop-up Loft
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Deep Learning with MXNet workshop
Vikram Madan
Sr. Product Manager, AWS Deep Learning
vikmadan@amazon.com
Agenda
• Deep Learning Basics
• Apache MXNet - Overview
• Apache MXNet – Programming Model
• Apache MXNet – MNIST Code Deep Dive
• Closing Demo
• Q & A
Pop-up Loft
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Deep Learning Basics
Machine Learning 101
Shallow Learning
• Extract clever features
(preprocessing)
• Map into feature space
(kernel methods)
• Set of rules
(decision tree)
• Combine multiple estimates
(boosting)
Deep Learning
• Many simple neurons
• Specialized layers
(images, text, audio, …)
• Stack layers
(hence deep learning)
• Optimization is difficult
§Backpropagation
§Stochastic gradient descent
usually simple to learn better accuracy
0.2
-0.1
...
0.7
Input Output
1 1 1
1 0 1
0 0 0
3
mx.sym.Pooling(data, pool_type="max", kernel=(2,2), stride=(2,2)
lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed)
4 2
2 0
4=Max
1
3
...
4
0.2
-0.1
...
0.7
mx.sym.FullyConnected(data, num_hidden=128)
2
mx.symbol.Embedding(data, input_dim, output_dim = k)
Queen
4 2
2 0
2=Avg
Input Weights
cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman)
mx.sym.Activation(data, act_type="xxxx")
"relu"
"tanh"
"sigmoid"
"softrelu"
Neural Art
Face Search
Image Segmentation
Image Caption
“People Riding Bikes”
Bicycle, People,
Road, Sport
Image Labels
Image
Video
Speech
Text
“People Riding Bikes”
Machine Translation
“Οι άνθρωποι
ιππασίας ποδήλατα”
Events
mx.model.FeedForward model.fit
mx.sym.SoftmaxOutput
Anatomy of a Deep Learning Model
mx.sym.Convolution(data, kernel=(5,5), num_filter=20)
Deep Learning Models
Biological & Artificial Neuron
slide from http://cs231n.stanford.edu/
Source: http://cs231n.github.io/neural-networks-1/
Converting Pictures into Data
Linear Algebra & Matrix Multiplication
Requirement
# of Columns in A
must equal
# of Rows in B
Output
# of Rows in A
# of Columns in B
Matrix Multiplication with Neural Networks
Inputs: Data Preprocessing, Batches, Epochs
Preprocessing
§ Random separation of data into
training, validation, and test sets
§ Necessary to measuring the
accuracy of the model
Batch
§ Amount of data propagated
through network at every iteration
§ Enables faster optimization
through shorter iteration cycles
Epoch
§ Complete pass through all the
training data
§ Optimization will have multiple
epochs to reduce error rate
Fully Connected Layer
Each node (“neuron”) in a layer is connected to every node in the previous layer
Activation Functions
Add nonlinearity to a layer and
applied to the layer’s output
There are several options:
§ Rectified Linear Unit (ReLU)
§ Sigmoid
§ Hyperbolic Tangent (tanh)
§ Softplus
ReLU functions are the most
commonly used today
Deep Neural Network
Hidden layers
Optimal size of a hidden layer
(number of nodes) is typically
between the size of the input
and size of the output layers
Input layer
Output
The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
backpropagation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
-
-
X
input
label
...
X1
Classification with the Softmax Function
Softmax Function
Source: https://stats.stackexchange.com/questions/273465/neural-network-softmax-activation
Softmax converts the output layer into probabilities – necessary for classification
Loss Function
• It is an objective function that quantifies how successful
the model was in its predictions
• It is a measure of the difference between a neural net’s
prediction and the actual value – that is, the error
• Typically, we use Cross Entropy Loss, which adjusts
the plain loss calculation to mitigate learning slowdown
• Backpropagation is performed to calculate the error
contribution of each neuron after processing one batch
Gradient Descent
Iteratively update parameters to get the most optimal value for the objective function
Stochastic Gradient Descent
Gradient Descent
A single iteration for the
parameter update runs through
ALL of the training data
Stochastic Gradient Descent,
A single iteration for the
parameter update runs through
a BATCH of the training data
Optimizers and Learning Rates Visualization
http://imgur.com/a/Hqolp
Why do we need a Validation and Training Set?
Best model
When only evaluating accuracy using the training set, we face the Overfitting issue
Dropout
Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting”, JMLR 2014
Multilayer Perceptron (MLP)
Convolution Neural Networks (CNN)
CNN Layers
Convolutional Layer
Pooling Layer
Activation
Fully-Connected Layer
Convolution Neural Networks (CNN)
Convolutions
Activation Function
Pooling and Strides
Pooling Output
Full Convolutional Neural Network Structure
Recurrent Neural Networks (RNN)
Image Caption Sentiment Analysis Machine Translation Video LabelingImage Labeling
Pop-up Loft
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Apache MXNet – Overview
Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
88% efficiency
on 256 GPUs
Resnet 1024 layer network
is ~4GB
Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1 2 4 8 16 32 64 128 256
No. of GPUs
• Cloud formation with Deep Learning AMI
• 16x P2.16xlarge. Mounted on EFS
• Inception and Resnet: batch size 32, Alex net: batch
size 512
• ImageNet, 1.2M images,1K classes
• 152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch),
0.22 top-1 error
Scaling with MXNet
MXNet Model Zoo
http://mxnet.io/model_zoo/
http://bit.ly/deepami
Deep Learning any way you want on AWS
Tool for data scientists and developers
Setting up a DL system takes (install) time & skill
Keep packages up to date and compiled (MXNet, TensorFlow, Caffe, Torch,
Theano, Keras)
Anaconda, Jupyter, Python 2 and 3
NVIDIA Drivers for G2 and P2 instances
Intel MKL Drivers for all other instances (C4, M4, …)
Deep Learning AMIs
Pop-up Loft
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved
MXNet – Programing Model
import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
• Straightforward and flexible.
• Take advantage of language
native features (loop,
condition, debugger)
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONS
d = c + 1c
Easy to tweak
with python codes
Imperative Programing
• More chances for optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
• Less flexible
PROS
CONS
C can share memory with D
because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programing
IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidde
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
MXNet: Mixed programming paradigm
Embed symbolic expressions into imperative programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):
param -= 0.2 * grad
MXNet: Mixed programming paradigm
Pop-up Loft
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Thank You
Vikram Madan
vikmadan@amazon.com

Weitere ähnliche Inhalte

Was ist angesagt?

Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Simplilearn
 
Ch 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfCh 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfKrishnaMadala1
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2inovex GmbH
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database SystemSulemang
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...Simplilearn
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data VisualizationStephen Tracy
 

Was ist angesagt? (20)

Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Azure hands on lab
Azure hands on labAzure hands on lab
Azure hands on lab
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Ch 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfCh 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdf
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
lec6
lec6lec6
lec6
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Kdd process
Kdd processKdd process
Kdd process
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Voldemort
VoldemortVoldemort
Voldemort
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 

Ähnlich wie Distributed Deep Learning on AWS with Apache MXNet

Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetAmazon Web Services
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Scalable Deep Learning on AWS with Apache MXNet
Scalable Deep Learning on AWS with Apache MXNetScalable Deep Learning on AWS with Apache MXNet
Scalable Deep Learning on AWS with Apache MXNetJulien SIMON
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
 
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Amazon Web Services
 
Deep Learning with Apache MXNet
Deep Learning with Apache MXNetDeep Learning with Apache MXNet
Deep Learning with Apache MXNetJulien SIMON
 
Optimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsOptimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsAmazon Web Services
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...Vandana Kannan
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...Apache MXNet
 
Intro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetIntro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetAmazon Web Services
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017MLconf
 
Deep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWSDeep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWSKristana Kane
 
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAmazon Web Services
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Amazon Web Services
 

Ähnlich wie Distributed Deep Learning on AWS with Apache MXNet (20)

MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Scalable Deep Learning on AWS with Apache MXNet
Scalable Deep Learning on AWS with Apache MXNetScalable Deep Learning on AWS with Apache MXNet
Scalable Deep Learning on AWS with Apache MXNet
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
 
Deep Learning with Apache MXNet
Deep Learning with Apache MXNetDeep Learning with Apache MXNet
Deep Learning with Apache MXNet
 
Optimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsOptimize Your Machine Learning Workloads
Optimize Your Machine Learning Workloads
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...
 
Intro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetIntro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNet
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
 
Deep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWSDeep Dive into Apache MXNet on AWS
Deep Dive into Apache MXNet on AWS
 
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Distributed Deep Learning on AWS with Apache MXNet

  • 1. Pop-up Loft © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved Deep Learning with MXNet workshop Vikram Madan Sr. Product Manager, AWS Deep Learning vikmadan@amazon.com
  • 2. Agenda • Deep Learning Basics • Apache MXNet - Overview • Apache MXNet – Programming Model • Apache MXNet – MNIST Code Deep Dive • Closing Demo • Q & A
  • 3. Pop-up Loft © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved Deep Learning Basics
  • 4. Machine Learning 101 Shallow Learning • Extract clever features (preprocessing) • Map into feature space (kernel methods) • Set of rules (decision tree) • Combine multiple estimates (boosting) Deep Learning • Many simple neurons • Specialized layers (images, text, audio, …) • Stack layers (hence deep learning) • Optimization is difficult §Backpropagation §Stochastic gradient descent usually simple to learn better accuracy
  • 5. 0.2 -0.1 ... 0.7 Input Output 1 1 1 1 0 1 0 0 0 3 mx.sym.Pooling(data, pool_type="max", kernel=(2,2), stride=(2,2) lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed) 4 2 2 0 4=Max 1 3 ... 4 0.2 -0.1 ... 0.7 mx.sym.FullyConnected(data, num_hidden=128) 2 mx.symbol.Embedding(data, input_dim, output_dim = k) Queen 4 2 2 0 2=Avg Input Weights cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman) mx.sym.Activation(data, act_type="xxxx") "relu" "tanh" "sigmoid" "softrelu" Neural Art Face Search Image Segmentation Image Caption “People Riding Bikes” Bicycle, People, Road, Sport Image Labels Image Video Speech Text “People Riding Bikes” Machine Translation “Οι άνθρωποι ιππασίας ποδήλατα” Events mx.model.FeedForward model.fit mx.sym.SoftmaxOutput Anatomy of a Deep Learning Model mx.sym.Convolution(data, kernel=(5,5), num_filter=20) Deep Learning Models
  • 6. Biological & Artificial Neuron slide from http://cs231n.stanford.edu/ Source: http://cs231n.github.io/neural-networks-1/
  • 8. Linear Algebra & Matrix Multiplication Requirement # of Columns in A must equal # of Rows in B Output # of Rows in A # of Columns in B
  • 9. Matrix Multiplication with Neural Networks
  • 10. Inputs: Data Preprocessing, Batches, Epochs Preprocessing § Random separation of data into training, validation, and test sets § Necessary to measuring the accuracy of the model Batch § Amount of data propagated through network at every iteration § Enables faster optimization through shorter iteration cycles Epoch § Complete pass through all the training data § Optimization will have multiple epochs to reduce error rate
  • 11. Fully Connected Layer Each node (“neuron”) in a layer is connected to every node in the previous layer
  • 12. Activation Functions Add nonlinearity to a layer and applied to the layer’s output There are several options: § Rectified Linear Unit (ReLU) § Sigmoid § Hyperbolic Tangent (tanh) § Softplus ReLU functions are the most commonly used today
  • 13. Deep Neural Network Hidden layers Optimal size of a hidden layer (number of nodes) is typically between the size of the input and size of the output layers Input layer Output
  • 14. The “Learning” in Deep Learning 0.4 0.3 0.2 0.9 ... backpropagation (gradient descent) X1 != X 0.4 ± 𝛿 0.3 ± 𝛿 new weights new weights 0 1 0 1 1 . . - - X input label ... X1
  • 15. Classification with the Softmax Function Softmax Function Source: https://stats.stackexchange.com/questions/273465/neural-network-softmax-activation Softmax converts the output layer into probabilities – necessary for classification
  • 16. Loss Function • It is an objective function that quantifies how successful the model was in its predictions • It is a measure of the difference between a neural net’s prediction and the actual value – that is, the error • Typically, we use Cross Entropy Loss, which adjusts the plain loss calculation to mitigate learning slowdown • Backpropagation is performed to calculate the error contribution of each neuron after processing one batch
  • 17. Gradient Descent Iteratively update parameters to get the most optimal value for the objective function
  • 18. Stochastic Gradient Descent Gradient Descent A single iteration for the parameter update runs through ALL of the training data Stochastic Gradient Descent, A single iteration for the parameter update runs through a BATCH of the training data
  • 19. Optimizers and Learning Rates Visualization http://imgur.com/a/Hqolp
  • 20. Why do we need a Validation and Training Set? Best model When only evaluating accuracy using the training set, we face the Overfitting issue
  • 21. Dropout Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014
  • 23. Convolution Neural Networks (CNN) CNN Layers Convolutional Layer Pooling Layer Activation Fully-Connected Layer
  • 29. Full Convolutional Neural Network Structure
  • 30. Recurrent Neural Networks (RNN) Image Caption Sentiment Analysis Machine Translation Video LabelingImage Labeling
  • 31. Pop-up Loft © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved Apache MXNet – Overview
  • 32. Apache MXNet Programmable Portable High Performance Near linear scaling across hundreds of GPUs Highly efficient models for mobile and IoT Simple syntax, multiple languages 88% efficiency on 256 GPUs Resnet 1024 layer network is ~4GB
  • 33. Ideal Inception v3 Resnet Alexnet 88% Efficiency 1 2 4 8 16 32 64 128 256 No. of GPUs • Cloud formation with Deep Learning AMI • 16x P2.16xlarge. Mounted on EFS • Inception and Resnet: batch size 32, Alex net: batch size 512 • ImageNet, 1.2M images,1K classes • 152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch), 0.22 top-1 error Scaling with MXNet
  • 35. http://bit.ly/deepami Deep Learning any way you want on AWS Tool for data scientists and developers Setting up a DL system takes (install) time & skill Keep packages up to date and compiled (MXNet, TensorFlow, Caffe, Torch, Theano, Keras) Anaconda, Jupyter, Python 2 and 3 NVIDIA Drivers for G2 and P2 instances Intel MKL Drivers for all other instances (C4, M4, …) Deep Learning AMIs
  • 36. Pop-up Loft © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved MXNet – Programing Model
  • 37. import numpy as np a = np.ones(10) b = np.ones(10) * 2 c = b * a • Straightforward and flexible. • Take advantage of language native features (loop, condition, debugger) • E.g. Numpy, Matlab, Torch, … • Hard to optimize PROS CONS d = c + 1c Easy to tweak with python codes Imperative Programing
  • 38. • More chances for optimization • Cross different languages • E.g. TensorFlow, Theano, Caffe • Less flexible PROS CONS C can share memory with D because C is deleted later A = Variable('A') B = Variable('B') C = B * A D = C + 1 f = compile(D) d = f(A=np.ones(10), B=np.ones(10)*2) A B 1 + X Declarative Programing
  • 39. IMPERATIVE NDARRAY API DECLARATIVE SYMBOLIC EXECUTOR >>> import mxnet as mx >>> a = mx.nd.zeros((100, 50)) >>> b = mx.nd.ones((100, 50)) >>> c = a + b >>> c += 1 >>> print(c) >>> import mxnet as mx >>> net = mx.symbol.Variable('data') >>> net = mx.symbol.FullyConnected(data=net, num_hidde >>> net = mx.symbol.SoftmaxOutput(data=net) >>> texec = mx.module.Module(net) >>> texec.forward(data=c) >>> texec.backward() NDArray can be set as input to the graph MXNet: Mixed programming paradigm
  • 40. Embed symbolic expressions into imperative programming texec = mx.module.Module(net) for batch in train_data: texec.forward(batch) texec.backward() for param, grad in zip(texec.get_params(), texec.get_grads()): param -= 0.2 * grad MXNet: Mixed programming paradigm
  • 41. Pop-up Loft © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved Thank You Vikram Madan vikmadan@amazon.com