Artificial Intelligence is impacting all areas of society, from healthcare and transportation to smart cities and energy. How NVIDIA invests both in internal pure research and accelerated computation to enable its diverse customer base, across gaming & extended reality, graphics, AI, robotics, simulation, high performance scientific computing, healthcare & more. You will be introduced to the GPU computing platform & shown real world successfully deployed applications as well as a glimpse into the current state of the art across academia, enterprise and startups.
7. 8
NVIDIA SELENE
Featuring NVIDIA DGX A100 640GB
4,480 A100 GPUs
560 DGX A100 system
850 Mellanox 200G HDR switches
14 PB of high-performance storage
2.8 EFLOPS of AI peak performance
63 PFLOPS HPL @ 24GF/W
https://blogs.nvidia.com/blog/2020/12/18/nvidia-selene-busy/
8. SINGLE A100 WITH MIG RUNS ALL MLPERF TESTS…
AT THE SAME TIME
Delivers 98% of Performance of a Single MIG Instance Running Alone
MLPerf v1.0 Inference Closed; Per-accelerator performance derived from the best MLPerf results for respective submissions using reported accelerator count in Data Center Offline and Server. 3D U-
Net 99%, ResNet-50, SSD-Large, DLRM 99%, RNN-T, BERT 99%: 1.0-26. MLPerf name and logo are trademarks. See www.mlperf.org for more information.
ResNet-50 v1.5
3D-UNet 99%
RNN-T
BERT-Large
SSD-Large
DLRM
ResNet-50 v1.5
Single A100 with 7
MIG Instances Enabled
98%
Performance vs.
MIG instance
running alone
9. TODAY’S AI
DATA CENTER
50 DGX-1 systems for AI training
600 CPU systems for AI inference
$11M
25 racks
630 kW
10. 5 DGX A100 systems for AI training
and inference
$1M
1 rack
28 kW
1/10th
COST
1/20th
POWER
$1M 28 kW
DGX A100
DATA CENTER
12. 13
13
EXPANDING NGC
NEW CONTAINERS FOR A100 & ARM
Now
NGC-READY SYSTEMS FOR A100
Starting Q3
NGC Private Registry
NGC Container
Environment Modules
Higher HPC app
performance w/ NVTAGS
NEW FEATURES
Now
Multi-arch support for x86,
Arm and Power
Learn More – ngc.nvidia.com | NGC Private Registry | NVTAGS | NGC Container Environment Modules
HPC Simulation & Visualization
AI Frameworks (A100)
Chroma
AutoDock 4
VMD
**
* Available week of June 22 ** Available starting with v20.06
*
*
*
13. 14
ENABLING ENTERPRISE TRANSFORMATION WITH AI
End to End Application Frameworks
Desktop Development Data Center Solutions Accelerated Edge Supercomputers GPU-Accelerated Cloud
Jarvis Merlin Metropolis Clara Isaac Drive Aerial
Conversational
AI
Recommender
Systems
Smart Cities Healthcare Robotics Autonomous
Vehicles
Telecom
21. 23
BUILDING AN AI PRODUCT
SENSORS
PERCEIVE REASON
PLAN
DATA
DATA
ANALYTICS
MACHINE
LEARNING
AI MODEL
VALIDATION
ACTUATORS
AI MODEL
22. INGESTION STORAGE PROCESSING SERVING
BIG DATA PIPELINE
Ingredients:
• Lots of data
• Lots of compute
• Software tools
• Time and patience
Method:
1. Collect raw, massive sets of data.
2. Put the data in a Data Lake.
3. Grab the data that you need and
sort through.
4. Find patterns in the data.
5. Solve the problem.
1. Obtaining and importing
data
2. Organizing & storing data for future use
3. Manipulating and analyzing the
data
4. Operationalizing the
solution
23. 25
HARNESSING
AI
Step I: Build data fabric for your organization
Step II: Define your objective
Step III: Hire the right talent
Step IV: Identify key processes to augment with AI
Step V: Create a sandbox lab environment
Step VI: Operationalize successful pilots
Step VII: Scale up for enterprise-wide adoption
Step VIII: Drive cultural change
24. 26
World Sense See, Understand Automation
AI Program
Computer
ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC
Self-Driving
25. 27
World Sense See, Understand Automation
AI Program
Computer
AI Program
Computer
ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC
Self-Driving
Manufacturing
26. 28
World Sense See, Understand Automation
AI Program
Computer
AI Program
Computer
AI Program
Computer
ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC
Self-Driving
Manufacturing
Radiology
27. 29
Image “Volvo XC90”
Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011.
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.
CONVOLUTIONAL NEURAL NETWORKS
29. RT DENOISING
VIDEO TO 3D
CHARACTER
LOCOMOTION
CHARACTER
CONCEPTING
AUDIO TO FACIAL
ANIMATION
PHYSICS SIMULATION
Clothing models from UC Berkeley Garment Library
THE MAGIC OF DEEP LEARNING
44. 46
PURPOSE BUILT PRE-TRAINED NETWORKS
Number of classes: 3
Dataset: 750k frames
Accuracy: 84%
Number of classes: 4
Dataset: 150k frames
Accuracy: 84%
Number of classes: 12
Dataset: 56k frames
Accuracy: 88%
Number of classes: 20
Dataset: 60k Frames
Accuracy: 92%
Number of Classes: 4
Dataset: 160k frames
Accuracy: 84%
Number of classes: 1
Dataset: 600k images
Accuracy: 95%
PeopleNet
TrafficCamNet
VehicleTypeNet
DashCamNet FaceDetect-IR
VehicleMakeNet
Highly Accurate | Re-Trainable | Out of Box Deployment
46. ANNOUNCING
JARVIS OPEN BETA
Integrated AI Skills with Pre-Trained Models
Fully Customizable Application Pipeline
Human Voice with Neural TTS
Superhuman NLU with Megatron-BERT
<300 ms Latency | 7X Throughput | 1/3rd Cost
Sign Up at developer.nvidia.com/nvidia-jarvis
State-of-the-Art Conversational AI
47. 49
LEARN MORE
Conversational AI
Developer Overview
NVIDIA Jarvis
Product Page
Conversational AI Demo Videos
"Misty" | "Mark" | In-car
Conversational AI Explainer Videos
YouTube Playlist
Jarvis Intro Blog Conversational AI Corp Blogs Intro to building Conversational AI
Apps for Enterprise (Webinar)
48. RECOMMENDERS —
THE PERSONALIZATION ENGINE OF THE INTERNET
DIGITAL CONTENT
2.7 Billion
Monthly Active Users
E-COMMERCE
2 Billion
Digital Shoppers
SOCIAL MEDIA
3.8 Billion
Active Users
DIGITAL ADVERTISING
4.7 Billion
Internet Users
Item
Candidate
Generation
O(102)
Ranking
User
Embedding
User
Items
Recommende
d
Items
Item
Embedding
O(10)
O(109)
49. 51
TRANSFER LEARNING TOOLKIT (TLT)
Zero Code Approach| Domain Adaptability
Purpose-Built
Pretrained Models
Quantization Aware
Training with TLT
Automatic Mixed
Precision with TLT
2X
Inference
Speedup
1.5X
Training time
Speedup
10X
Overall Development
Time Speedup
SmartCow is building turnkey AIoT solutions to
optimize turnaround time at ports and dry
docks. “By using TLT, we were able to reduce
the training iterations by 9x and reduce the
data collection and labeling effort by 5x which
significantly reduces our training cost by 2x”
“Using NVIDIA’S TLT made training a real time
car detector and license plate detector easy. It
eliminated our need to build models from the
ground up, resulting in faster development of
models and ability to explore options”
Highly Accurate
51. 53
First and only workstation with 4-way NVIDIA A100
GPUs, NVLink, and MIG
Four A100 Tensor Core GPUs, 320 GB total HBM2E
Multi-Instance GPU (MIG) for up to 28 GPU instances
in a single DGX Station A100
3rd generation NVLink
200 GB/s bi-directional bandwidth between any GPU
pair, almost 3x compared to PCIe Gen4
New maintenance-free refrigerant cooling system
DGX STATION A100 320G
Workgroup Appliance for the Age of AI
CPU and Memory
64-core AMD® EPYC® CPU, PCIe Gen4
512 GB system memory
Internal Storage
1.92 TB NVME M.2 SSD for OS
7.68TB NVME U.2 SSD for data cache
Connectivity
2x 10GbE (RJ45)
4x Mini DisplayPort for display out
Remote management 1GbE LAN port (RJ45)
52. 54
NEW DGX A100
640GB SYSTEM
Speedups Normalized to Number of GPUs | Comparisons to A100 40GB | Measurements performed DGX
A100 servers . AI Training: DLRM (Huge CTR) | DGX A100: 16x A100 40GB vs 8x A100 80GB | speedup =
1.4X. Speedup normalized to number of GPUs = 2.8X. AI Inference: RNN-T (MLPerf 0.7 Single stream
latency) | DGX A100: A100 40GB vs A100 80GB on 1MIG@10GB when configured for 7MIGs | Data Analytics:
big data benchmark with RAPIDS(0.16), BlazingSQL(0.16), DASK(2.2.0) | 30 analytical retail queries, ETL,
ML, NLP | 96x A100 40GB vs 48x A100 80GB | HPC: Quantum Espresso - CNT10POR8 40x A100 40GB vs 24x
A100 80GB | Speedup normalized to number of GPUs = 1.8X
640 GB of GPU memory per system to increase
model accuracy and reduce-time-to-solution
Up to 3X higher throughput for large-scale workloads
Double the GPU memory for MIG for more flexible
AI development, analytics, and inference
Available individually, or part of DGX SuperPOD
Solution for Enterprise
Upgrade option for current DGX A100 customers
For the Largest AI Workloads
54. 56
NVIDIA GPUs IN THE CLOUD
AVAILABLE ON-DEMAND FROM THE TOP CLOUD SERVICE PROVIDERS
• Immediate access to NVIDIA GPU
infrastructure for data science in the
cloud
• Wide variety of deployment and
management options using
containers, Kubernetes, Kubeflow,
support for cloud native services, and
more
55. 57
RICH
CONTENT
PORTFOLIO
Fundamentals and advanced
hands-on training in key
technologies and application
domains
AI for
Digital Content Creation
Deep Learning
Fundamentals
AI for Healthcare
AI for Autonomous Vehicles
AI for
Intelligent Video Analytics
Accelerated Computing
Fundamentals
AI for Robotics
AI for
Predictive Maintenance
Accelerated Data Science
Fundamentals
Intro to AI in the Data
Center
AI for Anomaly Detection
AI for Industrial Inspection
NVIDIA.com/dli
56. 58
PROFESSIONAL
SERVICES
NVIDIA works with a large network of service
delivery partners to provide services on NVIDIA-
accelerated platforms.
AI Service Delivery Partners
Contact us directly to start a dialogue about your
specific needs:
professionalservices@nvidia.com
Jay/Pat: Several proposed
claims here we need to vet
with Marc H.
57. NVIDIA INCEPTION
ACCELERATING 6K STARTUPS WORLDWIDE
EXPERTISE
NVIDIA Deep Learning Institute
Training in AI, accelerated computing, and
accelerated data science
TECHNOLOGY ASSISTANCE
Developer resources, preferred pricing on on-prem
GPUs, and cloud credits through our global partners
GO-TO-MARKET SUPPORT
Networking events and exposure opportunities
through NVIDIA
VENTURE CAPITAL FUNDING & ECOSYSTEM
NVIDIA Inception GPU Ventures
Investing in breakthrough startups and facilitating
engagements with the VC community
www.nvidia.com/inception