SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Adel El Hallak – Director of Product Management
Phil Rogers – Chief Software Architect
NGC
March 2019
2
GRAND CHALLENGES REQUIRE MASSIVE COMPUTING
MEDICAL IMAGINGASTROPHYSICS NUCLEAR FUSIONAUTONOMOUS DRIVING WEATHERGENOMICS
3
DIFFERENT ROLES. SAME GOALS.
Driving Productivity and Faster Time-to-Solutions
Developers
Speed up development with
existing building blocks
Data Scientists and
Researchers
Eliminate mundane tasks, focus on
science and research
Sysadmins
Deploy to production
immediately
4©2018 VMware, Inc.
OPTIMIZATIONINSTALLATION
Complex, time
consuming, and error-
prone
Requires expertise to
optimize framework
performance
IT can’t keep up with
frequent software
upgrades
Users limited to older
features and lower
performance
CHALLENGES UTILIZING AI & HPC SOFTWARE
MAINTAINENCEPRODUCTIVITYEXPERTISE
Building AI-centric
solutions requires
expertise
5©2018 VMware, Inc.
OPTIMIZED
SOFTWARE
FASTER
DEPLOYMENTS
Eliminates installations.
Simply Pull & Run the
app
Key DL frameworks
updated monthly for perf
optimization
Empowers users to
deploy the latest versions
with IT support
Better Insights and faster
time-to-solution
NGC – SIMPLIFYING AI & HPC WORKFLOWS
ZERO
MAINTENANCE
HIGHER
PRODUCTIVITY
EMBEDDING
EXPERTISE
Deliver greater value,
faster
6©2018 VMware, Inc.
ANNOUNCING NEW NGC CAPABILITIES
Workflow/ValueChain
Containers Containers Models
Model Scripts
Classification | Object Detection | NLP/Translation
Text to Speech | Recommender
Industry Solutions
Smart Cities | Medical Imaging
Training SDK Deployment SDK
KubeFlow
Pipelines
Proficiency of Skills
Advanced ML/DL Practitioner Developers & Data Scientists
7
THE NEW NGC
GPU-optimized Software Hub. Simplifying DL, ML and HPC Workflows
NGC
50+ Containers
DL, ML, HPC
Pre-trained Models
NLP, Classification, Object Detection & more
Industry Workflows
Medical Imaging, Intelligent Video Analytics
Model Training Scripts
NLP, Image Classification, Object Detection & more
Innovate Faster
Deploy Anywhere
Simplify Deployments
8
CONTAINERS
9
CONTAINERS: SIMPLIFYING WORKFLOWS
Simplifies Deployments
- Eliminates complex, time-consuming builds and
installs
Get started in minutes
- Simply Pull & Run the app
Portable
- Deploy across various environments, from test to
production with minimal changes
WHY CONTAINERS
10
NGC CONTAINERS: ACCELERATING WORKFLOWS
Simplifies Deployments
- Eliminates complex, time-consuming builds and
installs
Get started in minutes
- Simply Pull & Run the app
Portable
- Deploy across various environments, from test to
production with minimal changes
WHY CONTAINERS
Optimized for Performance
- Monthly DL container releases offer latest features and
superior performance on NVIDIA GPUs
Scalable Performance
- Supports multi-GPU & multi-node systems for scale-up &
scale-out environments
Designed for Enterprise & HPC environments
- Supports Docker & Singularity runtimes
Run Anywhere
- Pascal/Volta/Turing-powered NVIDIA DGX, PCs,
workstations, servers and top cloud platforms
WHY NGC CONTAINERS
11
GPU-OPTIMIZED SOFTWARE CONTAINERS
Over 50 Containers on NGC
DEEP LEARNING
HPC
INFERENCE
NAMD | GROMACS | more
MACHINE LEARNING
RAPIDS | H2O | moreTensorRT | DeepStream | more
GENOMICS
Parabricks
VISUALIZATION
ParaView | IndeX | more
TensorFlow | PyTorch | more
12
DALI
CPU Bottleneck Waste GPU Cycles
- Complex I/O pipelines
- Multi-pipeline frameworks
- Decreasing CPU:GPU ratio
Eliminating CPU Bottleneck for DL Workflows
DALI Shifts Workloads to GPUs
- Full input pipeline acceleration including
data loading and augmentation
- Integrated in PyTorch, TF, MxNET
- Supports Resnet50 & SSD
Loader Decode Resize Augment Training Loader Decode Resize Augment Training
Image classification (ResNet-50)
13
FP32
Higher Precision
Range: +/- 3.402823x1038
TENSOR CORES BUILT FOR AI AND HPC
Mixed Precision Accelerator – Enabled by AMP
4x4 Product and Accumulate
FP32 = FP16 x FP16 + FP32
FP16
Reduced Precision
Higher Performance
Range: +/- 65,504
4x4 Matrix
16 FP16 values
4x4 Matrix
16 FP16 values
8X hardware throughput of FP32
2-4X gains seen in DL Training
D=A*B+C
Memory Savings
• Half Storage Requirements (larger
batch size)
• Half the memory traffic by reducing
size of gradient/activation tensors
14
MIXED PRECISION MAINTAINS ACCURACY
Benefit From Higher Throughput Without Compromise
ILSVRC12 classification top-1 accuracy.
(Sharan Narang, Paulius Micikevicius et al., "Mixed Precision Training“, ICLR 2018)
**Same hyperparameters and learning rate schedule as FP32.
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
AlexNet VGG-D GoogleNet
(Inception v1)
Inception v2 Inception v3 Resnet50
ModelAccuracy
FP32 Mixed Precision**
15
CONTINUOUS PERFORMANCE IMPROVEMENT
Developers’ Software Optimizations Deliver Better Performance on the Same Hardware
Monthly DL Framework Updates & HPC Software Stack Optimizations Drive Performance
0
2000
4000
6000
8000
10000
12000
18.02 18.09 19.02
Images/Second
MxNet
Mixed Precision | 128 Batch Size | ResNet-50
Training | 8x V100
0
50000
100000
150000
200000
250000
300000
350000
400000
18.05 18.09 19.02
Tokens/Second PyTorch
0
1000
2000
3000
4000
5000
6000
7000
8000
18.02 18.09 19.02Images/Second
TensorFlow
Mixed Precision | 128 Batch Size | GNMT | 8x V100 Mixed Precision | 256 Batch Size | ResNet-50
Training | 8x V100
Speedup across Chroma, GROMACS, LAMMPS, QE, MILC, VASP,
SPECFEM3D, NAMD, AMBER, GTC, RTM | 4x V100 v. Dual-Skylake
| CUDA 9 for Mar '18 & Nov '18, CUDA 10 for Mar '19
x
2x
4x
6x
8x
10x
12x
14x
16x
18x
Mar '18 Nov '18 Mar '19
HPC Applications
16
MODEL REGISTRY &
MODEL SCRIPTS
17
ANNOUNCING THE NGC MODEL REGISTRY
Repository of Popular AI Models
Starting point to retrain, prototype or
benchmark against your own models
Use As-Is or easily customize
Private hosted registry for NGC Enterprise
accounts to upload, share and version
18
DOMAIN SPECIFIC | INFERENCE-READY
PRE-TRAINED MODELS
Domain specific for video analytics and
medical imaging
Use transfer learning and your own
data to quickly create accurate AI
Available models: Organ & tumor
segmentation, x-ray classification,
classification and object detection for
video analytics
TENSORRT MODELS
Ready for inference with Tensor Cores
Precision: INT8, FP16, FP32
Optimized for multiple GPU architectures
Available Models: ResNet50, VGG16,
InceptionV1, Mobilenet
19
20
LEARN | BUILD | OPTIMIZE | DEPLOY
MODEL SCRIPTS
Best practices for training models
Faster Performance with Optimized
Libraries and Tensor Cores
State-of-the-Art Accuracy
Scripts for Classification, Detection,
Recommendation, NLP, Segmentation,
Speech Synthesis, Translation
21
22
INDUSTRY SOLUTIONS
23
END-TO-END DEEP LEARNING WORKFLOW
Pre-Trained Models | Training & Adaptation | Ready to Integrate
Accelerate time to market
REMOVE PRODUCT NAMES
INFER ON
INPUT DATA
ANALYZE DATA.
DISPLAY RESULTS
OUTPUT
MODEL
RETRAINOPTIMIZETRAIN WITH
NEW DATA
PRETRAINED
MODEL
24
Transfer Learning Toolkit
PRE-TRAINED MODELS
PRUNETRAIN
Data
Converters
SAMPLE TRAINING PIPELINES
TUNE
AI
Inference
PERCEPTION GRAPH
TrackingCalibration
Processing
REFERENCE APPLICATIONS
Analytics
Visualize
DeepStream SDK
Decoding
NVIDIA METROPOLIS
Intelligent Video Analytics for Smart Cities
RETRAIN WITH
NEW DATA`
QUERY
VIDEO FEED
Frames detecting new class of objects
25
Clara Train SDK
PRE-TRAINED MODELS
TRANSFER
LEARNING
AI-ASSISTED
ANNOTATION
DICOM 2
NIFTI
TRAINING PIPELINES
TUNE
TRT INFERENCE
SERVER
PIPELINE MANAGER
STREAMING RENDER
DICOM
ADAPTER DEPLOYMENT PIPELINES WEBUI
Clara Deploy SDK
NVIDIA CLARA AI PLATFORM
Organ Segmentation for Medical Imaging
RETRAIN WITH
NEW DATA
CT SCANS OF
PATIENT’S LIVER
SEGMENTED LIVER
26
NGC-READY SYSTEMS & SUPPORT
SERVICES
27
NGC-READY SYSTEMS
VALIDATED FOR
FUNCTIONALITY &
PERFORMANCE OF NGC
SOFTWARE
T4 & V100-ACCELERATED
28
NVIDIA NGC SUPPORT SERVICES
Minimize Downtime And Maximize System Utilization
Availability
• Exclusively for V100 & T4
NGC-Ready systems
• Availability
Now: Cisco
Q2: Dell, HPE, Lenovo
• Agreement between NVIDIA
& end-customer
• Purchase from OEM
L1-L3 Support by NVIDIA’s
subject matter expert
• Live phone support during local
biz hours
• 24/7 phone, portal, email to
create support cases
Support Coverage
• NGC DL & ML containers
• NVIDIA drivers
• Kubernetes Device Plug-In
• NVIDIA Container Runtime
• CUDA
29©2018 VMware, Inc.
RUN ANYWHERE
NVIDIA Support Services
NGC
Containers
MODEL SCRIPTS
Models
NGC-Ready Systems
NVIDIA
DGX Systems
CLI
UI
All Major CSPs
Workflow/ValueChain
INDUSTRY SOLUTIONS
TRANSFER
LEARNING
TOOLKITS
KUBEFLOW
PIPELINES
DEPLOYMENT
SDKs
Medical ImagingSmart Cities
30
THE NEW NGC
GPU-optimized Software Hub. Simplifying DL, ML and HPC Workflows
NGC
50+ Containers
DL, ML, HPC
Pre-trained Models
NLP, Classification, Object Detection & more
Industry Workflows
Medical Imaging, Intelligent Video Analytics
Model Training Scripts
NLP, Image Classification, Object Detection & more
Innovate Faster
Deploy Anywhere
Simplify Deployments
31
GET STARTED WITH NGC
Deploy containers:
ngc.nvidia.com
Learn more about NGC offering:
nvidia.com/ngc
Technical information:
developer.nvidia.com
Explore the NGC Registry for DL, ML & HPC
32
GTC TALKS & RESOURCES
L9128 - High Performance Computing Using Containers WORKSHOP TU 10-12
S9525 - Containers Democratize HPC TU 1-2
S9500 - Latest Deep Learning Framework Container Optimizations W 9-10
SE285481 - NGC User Meetup W 7-9
Connect With the Experts
- NGC W 1-2
- NVIDIA Transfer Learning Toolkit for Industry Specific Solutions TU 1-2 & W 2-3
- DL Developer Tool for Network Optimization W 5-6

Weitere ähnliche Inhalte

Mehr von inside-BigData.com

Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architecturesinside-BigData.com
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computinginside-BigData.com
 
Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)inside-BigData.com
 

Mehr von inside-BigData.com (20)

Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computing
 
FPGAs and Machine Learning
FPGAs and Machine LearningFPGAs and Machine Learning
FPGAs and Machine Learning
 
Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Simplifying AI, Data Science, and HPC Workloads with NGC

  • 1. Adel El Hallak – Director of Product Management Phil Rogers – Chief Software Architect NGC March 2019
  • 2. 2 GRAND CHALLENGES REQUIRE MASSIVE COMPUTING MEDICAL IMAGINGASTROPHYSICS NUCLEAR FUSIONAUTONOMOUS DRIVING WEATHERGENOMICS
  • 3. 3 DIFFERENT ROLES. SAME GOALS. Driving Productivity and Faster Time-to-Solutions Developers Speed up development with existing building blocks Data Scientists and Researchers Eliminate mundane tasks, focus on science and research Sysadmins Deploy to production immediately
  • 4. 4©2018 VMware, Inc. OPTIMIZATIONINSTALLATION Complex, time consuming, and error- prone Requires expertise to optimize framework performance IT can’t keep up with frequent software upgrades Users limited to older features and lower performance CHALLENGES UTILIZING AI & HPC SOFTWARE MAINTAINENCEPRODUCTIVITYEXPERTISE Building AI-centric solutions requires expertise
  • 5. 5©2018 VMware, Inc. OPTIMIZED SOFTWARE FASTER DEPLOYMENTS Eliminates installations. Simply Pull & Run the app Key DL frameworks updated monthly for perf optimization Empowers users to deploy the latest versions with IT support Better Insights and faster time-to-solution NGC – SIMPLIFYING AI & HPC WORKFLOWS ZERO MAINTENANCE HIGHER PRODUCTIVITY EMBEDDING EXPERTISE Deliver greater value, faster
  • 6. 6©2018 VMware, Inc. ANNOUNCING NEW NGC CAPABILITIES Workflow/ValueChain Containers Containers Models Model Scripts Classification | Object Detection | NLP/Translation Text to Speech | Recommender Industry Solutions Smart Cities | Medical Imaging Training SDK Deployment SDK KubeFlow Pipelines Proficiency of Skills Advanced ML/DL Practitioner Developers & Data Scientists
  • 7. 7 THE NEW NGC GPU-optimized Software Hub. Simplifying DL, ML and HPC Workflows NGC 50+ Containers DL, ML, HPC Pre-trained Models NLP, Classification, Object Detection & more Industry Workflows Medical Imaging, Intelligent Video Analytics Model Training Scripts NLP, Image Classification, Object Detection & more Innovate Faster Deploy Anywhere Simplify Deployments
  • 9. 9 CONTAINERS: SIMPLIFYING WORKFLOWS Simplifies Deployments - Eliminates complex, time-consuming builds and installs Get started in minutes - Simply Pull & Run the app Portable - Deploy across various environments, from test to production with minimal changes WHY CONTAINERS
  • 10. 10 NGC CONTAINERS: ACCELERATING WORKFLOWS Simplifies Deployments - Eliminates complex, time-consuming builds and installs Get started in minutes - Simply Pull & Run the app Portable - Deploy across various environments, from test to production with minimal changes WHY CONTAINERS Optimized for Performance - Monthly DL container releases offer latest features and superior performance on NVIDIA GPUs Scalable Performance - Supports multi-GPU & multi-node systems for scale-up & scale-out environments Designed for Enterprise & HPC environments - Supports Docker & Singularity runtimes Run Anywhere - Pascal/Volta/Turing-powered NVIDIA DGX, PCs, workstations, servers and top cloud platforms WHY NGC CONTAINERS
  • 11. 11 GPU-OPTIMIZED SOFTWARE CONTAINERS Over 50 Containers on NGC DEEP LEARNING HPC INFERENCE NAMD | GROMACS | more MACHINE LEARNING RAPIDS | H2O | moreTensorRT | DeepStream | more GENOMICS Parabricks VISUALIZATION ParaView | IndeX | more TensorFlow | PyTorch | more
  • 12. 12 DALI CPU Bottleneck Waste GPU Cycles - Complex I/O pipelines - Multi-pipeline frameworks - Decreasing CPU:GPU ratio Eliminating CPU Bottleneck for DL Workflows DALI Shifts Workloads to GPUs - Full input pipeline acceleration including data loading and augmentation - Integrated in PyTorch, TF, MxNET - Supports Resnet50 & SSD Loader Decode Resize Augment Training Loader Decode Resize Augment Training Image classification (ResNet-50)
  • 13. 13 FP32 Higher Precision Range: +/- 3.402823x1038 TENSOR CORES BUILT FOR AI AND HPC Mixed Precision Accelerator – Enabled by AMP 4x4 Product and Accumulate FP32 = FP16 x FP16 + FP32 FP16 Reduced Precision Higher Performance Range: +/- 65,504 4x4 Matrix 16 FP16 values 4x4 Matrix 16 FP16 values 8X hardware throughput of FP32 2-4X gains seen in DL Training D=A*B+C Memory Savings • Half Storage Requirements (larger batch size) • Half the memory traffic by reducing size of gradient/activation tensors
  • 14. 14 MIXED PRECISION MAINTAINS ACCURACY Benefit From Higher Throughput Without Compromise ILSVRC12 classification top-1 accuracy. (Sharan Narang, Paulius Micikevicius et al., "Mixed Precision Training“, ICLR 2018) **Same hyperparameters and learning rate schedule as FP32. 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% AlexNet VGG-D GoogleNet (Inception v1) Inception v2 Inception v3 Resnet50 ModelAccuracy FP32 Mixed Precision**
  • 15. 15 CONTINUOUS PERFORMANCE IMPROVEMENT Developers’ Software Optimizations Deliver Better Performance on the Same Hardware Monthly DL Framework Updates & HPC Software Stack Optimizations Drive Performance 0 2000 4000 6000 8000 10000 12000 18.02 18.09 19.02 Images/Second MxNet Mixed Precision | 128 Batch Size | ResNet-50 Training | 8x V100 0 50000 100000 150000 200000 250000 300000 350000 400000 18.05 18.09 19.02 Tokens/Second PyTorch 0 1000 2000 3000 4000 5000 6000 7000 8000 18.02 18.09 19.02Images/Second TensorFlow Mixed Precision | 128 Batch Size | GNMT | 8x V100 Mixed Precision | 256 Batch Size | ResNet-50 Training | 8x V100 Speedup across Chroma, GROMACS, LAMMPS, QE, MILC, VASP, SPECFEM3D, NAMD, AMBER, GTC, RTM | 4x V100 v. Dual-Skylake | CUDA 9 for Mar '18 & Nov '18, CUDA 10 for Mar '19 x 2x 4x 6x 8x 10x 12x 14x 16x 18x Mar '18 Nov '18 Mar '19 HPC Applications
  • 17. 17 ANNOUNCING THE NGC MODEL REGISTRY Repository of Popular AI Models Starting point to retrain, prototype or benchmark against your own models Use As-Is or easily customize Private hosted registry for NGC Enterprise accounts to upload, share and version
  • 18. 18 DOMAIN SPECIFIC | INFERENCE-READY PRE-TRAINED MODELS Domain specific for video analytics and medical imaging Use transfer learning and your own data to quickly create accurate AI Available models: Organ & tumor segmentation, x-ray classification, classification and object detection for video analytics TENSORRT MODELS Ready for inference with Tensor Cores Precision: INT8, FP16, FP32 Optimized for multiple GPU architectures Available Models: ResNet50, VGG16, InceptionV1, Mobilenet
  • 19. 19
  • 20. 20 LEARN | BUILD | OPTIMIZE | DEPLOY MODEL SCRIPTS Best practices for training models Faster Performance with Optimized Libraries and Tensor Cores State-of-the-Art Accuracy Scripts for Classification, Detection, Recommendation, NLP, Segmentation, Speech Synthesis, Translation
  • 21. 21
  • 23. 23 END-TO-END DEEP LEARNING WORKFLOW Pre-Trained Models | Training & Adaptation | Ready to Integrate Accelerate time to market REMOVE PRODUCT NAMES INFER ON INPUT DATA ANALYZE DATA. DISPLAY RESULTS OUTPUT MODEL RETRAINOPTIMIZETRAIN WITH NEW DATA PRETRAINED MODEL
  • 24. 24 Transfer Learning Toolkit PRE-TRAINED MODELS PRUNETRAIN Data Converters SAMPLE TRAINING PIPELINES TUNE AI Inference PERCEPTION GRAPH TrackingCalibration Processing REFERENCE APPLICATIONS Analytics Visualize DeepStream SDK Decoding NVIDIA METROPOLIS Intelligent Video Analytics for Smart Cities RETRAIN WITH NEW DATA` QUERY VIDEO FEED Frames detecting new class of objects
  • 25. 25 Clara Train SDK PRE-TRAINED MODELS TRANSFER LEARNING AI-ASSISTED ANNOTATION DICOM 2 NIFTI TRAINING PIPELINES TUNE TRT INFERENCE SERVER PIPELINE MANAGER STREAMING RENDER DICOM ADAPTER DEPLOYMENT PIPELINES WEBUI Clara Deploy SDK NVIDIA CLARA AI PLATFORM Organ Segmentation for Medical Imaging RETRAIN WITH NEW DATA CT SCANS OF PATIENT’S LIVER SEGMENTED LIVER
  • 26. 26 NGC-READY SYSTEMS & SUPPORT SERVICES
  • 27. 27 NGC-READY SYSTEMS VALIDATED FOR FUNCTIONALITY & PERFORMANCE OF NGC SOFTWARE T4 & V100-ACCELERATED
  • 28. 28 NVIDIA NGC SUPPORT SERVICES Minimize Downtime And Maximize System Utilization Availability • Exclusively for V100 & T4 NGC-Ready systems • Availability Now: Cisco Q2: Dell, HPE, Lenovo • Agreement between NVIDIA & end-customer • Purchase from OEM L1-L3 Support by NVIDIA’s subject matter expert • Live phone support during local biz hours • 24/7 phone, portal, email to create support cases Support Coverage • NGC DL & ML containers • NVIDIA drivers • Kubernetes Device Plug-In • NVIDIA Container Runtime • CUDA
  • 29. 29©2018 VMware, Inc. RUN ANYWHERE NVIDIA Support Services NGC Containers MODEL SCRIPTS Models NGC-Ready Systems NVIDIA DGX Systems CLI UI All Major CSPs Workflow/ValueChain INDUSTRY SOLUTIONS TRANSFER LEARNING TOOLKITS KUBEFLOW PIPELINES DEPLOYMENT SDKs Medical ImagingSmart Cities
  • 30. 30 THE NEW NGC GPU-optimized Software Hub. Simplifying DL, ML and HPC Workflows NGC 50+ Containers DL, ML, HPC Pre-trained Models NLP, Classification, Object Detection & more Industry Workflows Medical Imaging, Intelligent Video Analytics Model Training Scripts NLP, Image Classification, Object Detection & more Innovate Faster Deploy Anywhere Simplify Deployments
  • 31. 31 GET STARTED WITH NGC Deploy containers: ngc.nvidia.com Learn more about NGC offering: nvidia.com/ngc Technical information: developer.nvidia.com Explore the NGC Registry for DL, ML & HPC
  • 32. 32 GTC TALKS & RESOURCES L9128 - High Performance Computing Using Containers WORKSHOP TU 10-12 S9525 - Containers Democratize HPC TU 1-2 S9500 - Latest Deep Learning Framework Container Optimizations W 9-10 SE285481 - NGC User Meetup W 7-9 Connect With the Experts - NGC W 1-2 - NVIDIA Transfer Learning Toolkit for Industry Specific Solutions TU 1-2 & W 2-3 - DL Developer Tool for Network Optimization W 5-6