In this session, we will share about cutting-edge deep learning innovations, and present emerging trends in the AI community. This session is for data scientists, developers who have a keen interest in getting started in an AI project, and wants to learn the tools of the trade. We will draw on practical experiences from working on various AI projects, and share the key learning, and pitfalls
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
1. Discovering Your
AI Super Powers
Tips And Tricks To Jumpstart Your AI Project
Wee Hyong Tok, PhD
Principal Data Science Manager
Microsoft
@weehyong
Global Artificial Intelligence Conference 2018, Seattle
2.
3. How long does it take to train a
deep learning model?
4. Before 2017
2017
April
ResNet-50
32 CPU
256 Nvidia P100 GPUs
1
hour
ResNet-50
NVIDIA M40 GPU
14
days
1018 single precision
operations
Sept
ResNet-50
1,600 CPUs
31
minutes
Nov
15
minutes
ResNet-50
1,024 P100 GPUs
UC Berkeley, TACC, UC DavisFacebook Preferred Network
ChainerMN
14. Deep Learning Virtual Machine (DLVM)
14
• Requirements of agile data science
• Elasticity.
• Efficiency.
• Cost-effectiveness.
• Features of DLVM
• Languages.
• Data platforms.
• ML and AI tools.
• Data Exploration and Visualization.
• Data Ingestion tools.
• Development tools
15. 3 Jumpstart with Transfer Learning
Credits: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/
16. Types of Transfer Learning
Type How to Initialize
Featurization
Layers
Output Layer
Initialization
How is Transfer Learning
used?
How to Train?
Standard DNN Random Random None Train featurization and output
jointly
Headless DNN Learn using
another task
Separate ML
algorithm
Use the features learned
on a related task
Use the features to train a
separate classifier
Fine Tune DNN Learn using
another task
Random Use and fine tune
features learned on a
related task
Train featurization and output
jointly with a small learning
rate
Multi-Task DNN Random Random Learned features need to
solve many related tasks
Share a featurization network
across both tasks. Train all
networks jointly with a loss
function (sum of individual task
loss function)
17. Deep Neural Network for Computer Vision
cat? YES
dog? NO
car? NO
Convolutional Layers
Fully
Connected
Layers
Complex Objects
& Scenes
(people, animals,
cars, beach
scene, etc.)
Image
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)
20. • Research dataset with >10
million images
• Image annotated with labels
from on ontology (>22K
labels)
• Generic images covering an
extremely wide ranges of labels
ImageNet dataset
21. Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Classi
fier
e.g.
SVM
dotted?
Complex
Objects &
Scenes
(people, animals,
cars, beach
scene, etc.)
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)
Outputs of penultimate layer of ImageNet Trained CNN
provide excellent general purpose image features
22. Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Using a pre-trained DNN, an accurate
model can be achieved with thousands (or
less) of labeled examples instead of millions
dotted?
Train one or more
layers in new network
26. Distributed Training Architecture
Data Parallelism Model Parallelism
1. Parallel training on different
machines
2. Update the parameter server
synchronously/asynchronously
3. Refresh the local model with
new parameters, go to 1 and
repeat
1. The global model is partitioned
into K sub-models without
overlap.
2. The sub-models are distributed
over K local workers and serve
as their local models.
3. In each mini-batch, the local
workers compute the gradients
of the local weights by back
propagation.
Credits: Taifeng Wang, DMTK team
27. Selecting Big Data and AI Infrastructure
Desktop / Laptop Spark
Clusters
CloudKubernetes
Clusters
29. What can we run on the deep
learning infrastructure?
30. Using Spark for Machine Learning
• User needs to write lots of “glue” code to prepare features for ML
algorithms.
• Coerce types and data layout to that what’s expected by learner
• Use different conventions for different learners
• Lack of domain-specific libraries: computer vision or text
analytics…
• Latest: Image Data Support in Apache Spark 2.3
https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image-
data-support-in-apache-spark/
• Limited capabilities for model evaluation & model management
31. Microsoft Machine Learning Library
for Apache Spark (MMLSpark)
GitHub Repo: https://github.com/Azure/mmlspark
Get started now using Docker image:
docker run -it -p 8888:8888 -e ACCEPT_EULA=yes microsoft/mmlspark
Navigate to http://localhost:8888 to view example Jupyter notebooks
34. Discovering Your AI Super
Powers - Tips And Tricks
To Jumpstart Your AI
Project
Wee Hyong Tok, PhD
Principal Data Science Manager
Microsoft
@weehyong
Global Artificial Intelligence
Conference 2018, Seattle
Thank You!