Women in Big Data Event Deep Learning Workflows and Use Cases
1. 10/20/2017 Women in Big Data Event Hashtags: #IamAI, #WiBD
Oct 18th AI Connect Speakers
WiBD Introduction & DL Use Cases
Renee Yao
Product Marketing Manager,
Deep Learning and Analytics
NVIDIA
Deep Learning Workflows (w/ a demo)
Kari Briski
Director of Deep Learning
Software Product
NVIDIA
Deep Learning in Enterprise
Nazanin Zaker
Data Scientist
SAP Innovation Center Network
3. Agenda
AI Connect
• 6:00-7:00pm – Registration and Networking
• 7:00-7:15pm – “WiBD Introduction & DL Use
Cases”, Renee Yao, Product Marketing
Manager, Deep Learning and Analytics, NVIDIA
• 7:15-7:45pm – “Deep Learning Workflows
(with a live demo)”, Kari Briski, Director of
Deep Learning Software Product, NVIDIA
• 7:45-8:15pm – “Deep Learning in Enterprise”
by Nazanin Zaker, Data Scientist, SAP
Innovation Center Network
• 8:15-8:30pm - Wrap-up & Giveaways
February Apache Hadoop
Training @ Cloudera
May Apache Drill and
Apache Spark @ MapR
June Career Empowerment
@ Andreessen Horowitz
June @ Spark Summit
June @ Hadoop SummitMarch @ Strata+Hadoop
World SJ
10/20/2017 Women in Big Data Event Hashtags: #IamAI, #WiBD
4. 10/20/2017 Women in Big Data Forum
Be Part of The Solution
Become a member or a sponsor
• Website: womeninbigdata.org
• LinkedIn: “Women in Big Data Forum”
• Meetup: meetup.com/Women-in-Big-Data-Meetup/
• Twitter: @DataWomen
• Video: https://www.youtube.com/channel/UCOaMT7A9SVkeBdvYNxiITVA
Join us
Event Hashtags: #IamAI, #WiBD
7. 7
NATURAL LANGUAGE
PROCESSING
SPEECH & AUDIO
AI APPLICATIONS
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION
8. 8
NATURAL LANGUAGE
PROCESSING
SPEECH & AUDIO
AI APPLICATIONS
Object Detection
Classification
Segmentation
Visual Q&A
Neural Machine
Translation
Question & Answer
Sentiment Analysis
Search and
recommendation engines
ASR
automatic speech recognition
Generation
Processing
Audio-classification
Denoising
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION
9. 9
ACCELERATED DEEP LEARNING TRAINING STACK
AI Applications
are Built on NVIDIA Hardware and Software
End-to-End
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
10. 10
NVIDIA TOOLS FOR DEEP LEARNING WORKFLOW
NVIDIA DEEP LEARNING SDK
TRAINING DEPLOY WITH TENSORRT
TRAINED
NETWORK
TRAINING
DATA TRAINING
DATA MANAGEMENT
MODEL ASSESSMENT
EMBEDDED
Jetson TX
AUTOMOTIVE
Drive PX (XAVIER)
DATA CENTER
Tesla (Pascal, Volta)
DATA: GATHER AND LABEL
Rapidly label data,
guide training get
insights
Gather Data
Curate data sets
Accelerated Deep Learning Training Software Stack
15. 15
Project Manager
STEP 1
Project Setup
Project named
Classifier types defined
Labeling task settings
defined
Sequences added
Data Labeler
STEP 3
Labeling
Labels created
Attributes of labels selected
Frames committed for QA
Curator
STEP 2
Data Labeler
STEP 4
QA
Frames accepted or rejected
Rejection reason specified
Data Labeler
STEP 5
Export
Data sent to training
Export
Data set sent to training
Curation
Which pieces of data make the
most sense to us
17. 17
UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION
DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras
NVIDIA DEEP LEARNING SOFTWARE TRAINING STACK
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
At Your Desk On-Prem In-the-Cloud
18. 18
DEEP LEARNING
cuDNN
MATH LIBRARIES
cuBLAS cuSPARSE
COMMUNICATION
cuFFT
ACCELERATED DEEP LEARNING TRAINING STACK
UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION
DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras
DEEP LEARNING FRAMEWORKS
Deep Learning Software Libraries (AKA Frameworks)
NCCLArchitecture Specific Libraries
Productivity: Workflow, Data and Job Management, Experiments
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
At Your Desk On-Prem In-the-Cloud
19. 19
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment Analysis
DEEP LEARNING
cuDNN
MATH LIBRARIES
cuBLAS cuSPARSE
COMMUNICATION
cuFFT
Image Classification
ACCELERATED DEEP LEARNING TRAINING STACK
UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION
DIGITS, NVIDIA GPU Cloud, NVDocker, Keras, Kubernetes
NV OPTIMIZED NV ACCELERATED
COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
NCCL
Paddle
At Your Desk On-Prem In-the-Cloud
20. 20
GENERATIONAL GPU PERFORMANCE & TENSOR CORES
0
1
2
3
4
5
6
7
8
k80 p100 v100 v100 TC
Single GPU Generational Training Scaling ResNet-50; 1,4,8 GPU training on DGX-1 Volta
21. 21
GENERATIONAL GPU PERFORMANCE & TENSOR CORES
0
1
2
3
4
5
6
7
8
k80 p100 v100 v100 TC
Single GPU Generational Training Scaling ResNet-50; 1,4,8 GPU training on DGX-1 Volta
with Volta Tensor Core math
3-3.5X CNN training
over Pascal
22. 22
0 10 20 30 40 50
8x-V100
8x P100
8x K80
TIME TO SOLUTION (HOURS)
1 weekend
1 day
1 afternoon
Convolutional Neural Networks
Recursive Neural Networks
Training ImageNet to accuracy(90 epochs) with ResNet-50
Training OpenNMT to accuracy (13 epochs)
0 10 20 30 40
V100
P100
K80
27. 27
NVIDIA TENSOR RT
Maximize inference throughput for latency critical services
TRAINED
NETWORK
MODEL
EMBEDDED
Jetson TX
AUTOMOTIVE
Drive PX (XAVIER)
DATA CENTER
Tesla (Pascal, Volta)
High performance neural network inference optimizer and runtime engine for production
deployment
TensorRT Optimizer
TensorRT
Runtime
Engine
OPTIMIZED
NETWORK
28. 28
TESLA V100
DRIVE PX 2
TESLA P4
JETSON TX2
NVIDIA DLA
TensorRT
NVIDIA TENSORRT PROGRAMMABLE
INFERENCING PLATFORM
NVIDIA TENSORRT PROGRAMMABLE
INFERENCING PLATFORM
29. 29
NVIDIA TensorRT
Maximize throughput and minimize latency
Deploy reduced precision without retraining and
without accuracy loss
Train in any framework, deploy in TensorRT without
overhead
Programmable Inference Accelerator
Embedded Automotive Data center
Jetson Drive PX Tesla
developer.nvidia.com/tensorrt
30. 30
VOLTA ON A BUDGET
LATENCY BENCHMARKS
0
1000
2000
3000
4000
5000
6000
CPU-Only V100 + TensorFlow V100 + TensorRT
Throughput (image/s) vs Latency (ms)
CPU-Only
V100 + TensorFlow
V100 + TensorRT
3X
19
6
7
ResNet-50 (ImageNet) OpenNMT (English to Deutsch)
Throughput on a 200 ms latency budget
6X
31. 31
ENABLE INT8 INFERENCE
TensorRT is ENABLER
for entropy quantization
Training
Framework
TensorRT
Calibrate
&
Quantize
fp32 int8 Inference
100’s of samples
of training data
FP32 TOP 1 INT8 TOP 1 DIFFERENCE
Alexnet 57.22% 56.96% 0.26%
Googlenet 68.87% 68.49% 0.38%
VGG 68.56% 68.45% 0.11%
Resnet-50 73.11% 72.54% 0.57%
Resnet-
101
74.58% 74.14% 0.44%
Resnet-
152
75.18% 74.56% 0.61%
Maintain accuracy without
retraining
32. 32
NVIDIA TENSOR RT
Maximize inference throughput for latency critical services
EMBEDDED
Jetson TX
AUTOMOTIVE
Drive PX (XAVIER)
DATA CENTER
Tesla (Pascal, Volta)
Large Batch,
Low Latency, Production-ready
Real-time execution, high resolution,
high throughput, small footprint
Low power
small footprint, multi-inference
33. 33
“On average TensorRT has doubled the speed of
our inference which is pretty amazing!”
Source: Paul Kruszewski; CEO WRNCH
“On average we see around 10x speedup, with between
3-70x speedups depending on the scenarios ”
Source: Matthew Zieler CEO Clarifai
“Self-driving car’s having real-time execution is obviously
very important. With our ResNet101 network, TensorRT
brought our inference time down from 250ms to 89ms.”
37. 37
DL DATACENTER WORKFLOW
TensorRT increases productivity and time to results
INFERENCE &
MICROSERVICES
TRAIN
SCORE +
OPTIMIZE,
VISUALIZATION
DEPLOY
tune,
compile
+ runtime
REST
API
RESULT
inference,
prediction
MODEL
ZOO
Automated
with TensorRT
A/B Testing, Use
data
41. 41
WHO, WHAT, WHERE
RESEARCHERS
Explore the “next big thing”
opportunity to fuel business
APPLIED DL/ DATA SCIENTISTS
Retrain w/ data, productize models
for consistency, focus on quality
APPLICATION DEVELOPER
Scale and deploy successful
applications w/ great user ex.
42. 42
WHO, WHAT, WHERE
RESEARCHERS
Explore the “next big thing”
opportunity to fuel business, and find
ways to productize it
APPLIED DL/ DATA SCIENTISTS
Retrain, productize models for
consistency, quality, tuning with
right data
APPLICATION DEVELOPER
Scale and deploy successful
applications w/ great user ex.
Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification
Paddle
43. 43
WHO, WHAT, WHERE
RESEARCHERS
Explore the “next big thing”
opportunity to fuel business, and find
ways to productize it
DATA SCIENTISTS
Retrain, productize models for
consistency, quality, tuning with
right data
APPLICATION DEVELOPER
Scale and deploy successful
applications w/ great user ex.
TensorRT
Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification
Training Deployingor
Paddle