SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
JENSEN HUANG, FOUNDER & CEO | GTC 2017
POWERING THE AI REVOLUTION
2
LIFE AFTER MOORE’S LAW
1980 1990 2000 2010 2020
102
103
104
105
106
107
40 Years of Microprocessor Trend Data
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte,
O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected
for 2010-2015 by K. Rupp
Single-threaded perf
1.5X per year
1.1X per year
Transistors
(thousands)
3
1980 1990 2000 2010 2020
GPU-Computing perf
1.5X per year
1000X
by
2025
RISE OF GPU COMPUTING
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte,
O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected
for 2010-2015 by K. Rupp
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE
4
RISE OF GPU COMPUTING
GPU Developers
11X in 5 Years
GTC Attendees
3X in 5 Years
2017 2017
511,0007,000
20122012
CUDA Downloads
in 2016
1M+
5
ANNOUNCING
PROJECT HOLODECK
Photorealistic models
Interactive physics
Collaboration
Early access in September
6
ERA OF MACHINE LEARNING
“A Quest for Intelligence”
— Fei-Fei Li
“The Master Algorithm”
— Pedro Domingos
7
BIG BANG OF MODERN AI
Auto
Encoders
GANLSTM
IDSIA
CNN on GPU
Stanford &
NVIDIA
Large-scale
DNN on GPU
U Toronto
AlexNet
on GPU
CaptioningNVIDIA BB8 Style TransferBRETTImageNet
Google Photo
Arterys
FDA Approved AlphaGo
Super
Resolution Deep Voice
Baidu
DuLight
NMT
Superhuman
ASR
Reinforcement
Learning
Transfer
Learning
8
DEEP LEARNING FOR RAY TRACING
9
IRAY WITH DEEP LEARNING
10
$5B
BIG BANG OF MODERN AI
Udacity Students in AI Programs
100X in 2 Years
NIPS, ICML, CVPR, ICLR Attendees
2X in 2 Years
2016 2017
20,00013,000
20152014
AI Startups Investments
9X in 4 Years
$5B
20162012
11
NVIDIA
DEEP LEARNING
SDK
GPU AAS
NVAIL
INCEPTION
INTERNET
SERVICES
ENTERPRISE
HEALTHCARE
GPU SYSTEMSFRAMEWORKS
TESLA
HGX-1
DGX-1
NVIDIA
RESEARCH
AUTOMOTIVE
PLACES ROBOTS
NVIDIA
DEEP LEARNING
SDK
DRIVE PX
JETSON TX
POWERING THE AI REVOLUTION
12
NVIDIA INCEPTION —
1,300 DEEP LEARNING STARTUPS
HEALTHCARE
BUSINESS INTELLIGENCE & VISUALIZATION
DEVELOPMENT PLATFORMS
RETAIL, ETAIL
IOT & MANUFACTURING
PLATFORMs & APIs
DATA MANAGEMENT
AEC
FINANCIAL SECURITY, IVA
CYBERAUTONOMOUS MACHINES
13
SAP AI FOR THE
ENTERPRISE
First commercial AI offerings
from SAP
Brand Impact, Service Ticketing,
Invoice-to-Record applications
Powered by NVIDIA GPUs on
DGX-1 and AWS
14
MODEL COMPLEXITY IS EXPLODING
2016 — Baidu Deep Speech 22015 — Microsoft ResNet 2017 — Google NMT
105 ExaFLOPS
8.7 Billion Parameters
20 ExaFLOPS
300 Million Parameters
7 ExaFLOPS
60 Million Parameters
15
ANNOUNCING TESLA V100
GIANT LEAP FOR AI & HPC
VOLTA WITH NEW TENSOR CORE
21B xtors | TSMC 12nm FFN | 815mm2
5,120 CUDA cores
7.5 FP64 TFLOPS | 15 FP32 TFLOPS
NEW 120 Tensor TFLOPS
20MB SM RF | 16MB Cache
16GB HBM2 @ 900 GB/s
300 GB/s NVLink
16
NEW TENSOR CORE
New CUDA TensorOp instructions
& data formats
4x4 matrix processing array
D[FP32] = A[FP16] * B[FP16] + C[FP32]
Optimized for deep learning
Activation Inputs Weights Inputs Output Results
17
ANNOUNCING TESLA V100
GIANT LEAP FOR AI & HPC
VOLTA WITH NEW TENSOR CORE
Compared to Pascal
1.5X General-purpose FLOPS for HPC
12X Tensor FLOPS for DL training
6X Tensor FLOPS for DL inferencing
18
19
GALAXY
20
DEEP LEARNING
FOR STYLE TRANSFER
21
ANNOUNCING NEW FRAMEWORK RELEASES
FOR VOLTA
Hours
CNN Training
(ResNet-50)
Hours
Multi-Node Training with NCCL 2.0
(ResNet-50)
0 5 10 15 20 25
64x V100
8x V100
8x P100
0 10 20 30 40 50
V100
P100
K80
Hours
LSTM Training
(Neural Machine Translation)
0 10 20 30 40 50
8x V100
8x P100
8x K80
22
ANNOUNCING
NVIDIA DGX-1 WITH TESLA V100
ESSENTIAL INSTRUMENT OF AI RESEARCH
960 Tensor TFLOPS | 8x Tesla V100 | NVLink Hybrid Cube
From 8 days on TITAN X to 8 hours
400 servers in a box
23
ANNOUNCING
NVIDIA DGX-1 WITH TESLA V100
ESSENTIAL INSTRUMENT OF AI RESEARCH
960 Tensor TFLOPS | 8x Tesla V100 | NVLink Hybrid Cube
From 8 days on TITAN X to 8 hours
400 servers in a box
$149,000
Order today: nvidia.com/DGX-1
24
ANNOUNCING
NVIDIA DGX STATION
PERSONAL DGX
480 Tensor TFLOPS | 4x Tesla V100 16GB
NVLink Fully Connected | 3x DisplayPort
1500W | Water Cooled
25
ANNOUNCING
NVIDIA DGX STATION
PERSONAL DGX
480 Tensor TFLOPS | 4x Tesla V100 16GB
NVLink Fully Connected | 3x DisplayPort
1500W | Water Cooled
$69,000
Order today: nvidia.com/DGX-Station
26
ANNOUNCING
HGX-1 WITH TESLA V100
VERSATILE GPU CLOUD COMPUTING
8x Tesla V100 with NVLINK Hybrid Cube
2C:8G | 2C:4G | 1C:2G Configurable
NVIDIA Deep Learning, GRID graphics,
CUDA HPC stacks
27
ANNOUNCING TENSORRT
FOR TENSORFLOW
Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations
Import models from Caffe and TensorFlow
Compiler for Deep Learning inferencing
next layer
Relu
bias
1x1 conv 3x3 conv
bias
Relu
5x5 conv
bias
Relu
1x1 conv
bias
Relu
Relu
bias
1x1 conv
Relu
bias
1x1 conv
max pool
previous layer
28
next layer
1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR 1x1 CBR max pool
previous layer
ANNOUNCING TENSORRT
FOR TENSORFLOW
Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations
Import models from Caffe and TensorFlow
Compiler for Deep Learning Inferencing
next layer
bias bias bias bias
max pool
previous layer
Relu Relu Relu Relu
1x1 conv 3x3 conv 5x5 conv 1x1 conv
Relu
bias
1x1 conv
Relu
bias
1x1 conv
29
next layer
3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR max pool
previous layer
concat
1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR
max pool
previous layer
1x1 CBR 1x1 CBR
ANNOUNCING TENSORRT
FOR TENSORFLOW
Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations
Import models from Caffe and TensorFlow
Compiler for Deep Learning Inferencing
30
next layer
3x3 CBR 5x5 CBR 1x1 CBR
max pool
previous layer
1x1 CBR
ANNOUNCING TENSORRT
FOR TENSORFLOW
Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations
Import models from Caffe and TensorFlow
Compiler for Deep Learning Inferencing
31
next layer
3x3 CBR 5x5 CBR 1x1 CBR
max pool
previous layer
1x1 CBR
ANNOUNCING TENSORRT
FOR TENSORFLOW
Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations
Import models from Caffe and TensorFlow
Compiler for Deep Learning Inferencing
32
ANNOUNCING TESLA V100
FOR HYPERSCALE INFERENCE
15-25X INFERENCE SPEED-UP VS SKYLAKE
150W | FHHL PCIE
33
THE CASE FOR GPU ACCELERATED
DATACENTERS
Tesla V100 Reduce 15X500 Nodes CPU Servers 33 Nodes GPU Accelerated Server
300K inf/s Datacenter
@300 inf/s > 1K CPUs
1K CPUs > 500 Nodes
@$3K
@500W
= $1.5M
= 250KW
34
NVIDIA
DEEP LEARNING STACK
DEEP LEARNING FRAMEWORKS
DEEP LEARNING LIBRARIES
NVIDIA cuDNN, NCCL,
cuBLAS, TensorRT
CUDA DRIVER
OPERATING SYSTEM
GPU
SYSTEM
35
Registry of
Containers, Datasets,
and Pre-trained models
NVIDIA
GPU CLOUD
CSPs
ANNOUNCING
NVIDIA GPU CLOUD
Containerized in NVDocker | Optimization across the full stack
Always up-to-date | Fully tested and maintained by NVIDIA | Beta in July
GPU-accelerated Cloud Platform Optimized for Deep Learning
36
POWER OF GPU COMPUTING
0
8
16
24
32
40
AMBER Performance (ns/day)
P100
2016
K80
2015
K40
2014
K20
2013
AMBER 12
CUDA 4
AMBER 14
CUDA 5
AMBER 14
CUDA 6
AMBER 16
CUDA 8
0
2400
4800
7200
9600
12000
GoogleNet Performance (i/s)
cuDNN 2
CUDA 6
cuDNN 4
CUDA 7
cuDNN 6
CUDA 8
NCCL 1.6
cuDNN 7
CUDA 9
NCCL 2
8x K80
2014
8x Maxwell
2015
DGX-1
2016
DGX-1V
2017
37
AI REVOLUTIONIZING TRANSPORTATION
Domino’s: 1M Pizzas
Delivered per Day
800M Parking Spots for 250M Cars
in U.S.
280B Miles per Year
38
NVIDIA DRIVE — AI CAR PLATFORM
Computer Vision Libraries
OS
Perception AI
CUDA, cuDNN, TensorRT
Localization Path Planning
1 TOPS
10 TOPS
100 TOPS
DRIVE PX 2 Parker
Level 2/3
DRIVE PX Xavier
Level 4/5
39
NVIDIA DRIVE
Guardian AngelCo-PilotMapping-to-Driving
40
ANNOUNCING
TOYOTA SELECTS NVIDIA DRIVE PX
FOR AUTONOMOUS VEHICLES
41
AI PROCESSOR FOR AUTONOMOUS MACHINES
XAVIER
30 TOPS DL
30W
Custom ARM64 CPU
512 Core Volta GPU
10 TOPS DL Accelerator
General
Purpose
Architectures
Domain
Specific
Accelerators
Energy Efficiency
CPU
FPGA
CUDA
GPU
DLA
Pascal
Volta
42
AI PROCESSOR FOR AUTONOMOUS MACHINES
XAVIER
30 TOPS DL
30W
Custom ARM64 CPU
512 Core Volta GPU
10 TOPS DL Accelerator
General
Purpose
Architectures
Domain
Specific
Accelerators
Energy Efficiency
CPU
CUDA
GPU
DLA
Volta
+
43
ANNOUNCING XAVIER DLA
NOW OPEN SOURCE
Early Access July | General Release September
Command Interface
Tensor Execution Micro-controller
Memory Interface
Input DMA
(Activations
and Weights)
Unified
512KB
Input
Buffer
Activations
and
Weights
Sparse Weight
Decompression
Native
Winograd
Input
Transform
MAC
Array
2048 Int8
or
1024 Int16
or
1024 FP16
Output
Accumulators
Output
Postprocess
or
(Activation
Function,
Pooling
etc.)
Output
DMA
44
ROBOTS
LEARN
& PLAN
SEE, HEAR,
TOUCH
ACT
45
ROBOTS
Credit: Yevgen Chebotar, Karol Hausman,
Marvin Zhang, Sergey Levine
46
ANNOUNCING ISAAC ROBOT SIMULATOR
NVIDIA GPU Computer
ISAAC
Robot Simulator
OpenAI
GYM
Robot &
Environment
Definition
Virtual Jetson
47
ANNOUNCING ISAAC ROBOT SIMULATOR
NVIDIA GPU Computer
ISAAC
Robot Simulator
OpenAI
GYM
Robot &
Environment
Definition
Virtual Jetson
LEARN
& PLAN
SEE, HEAR,
TOUCH
ACT
48
NVIDIA POWERING THE AI REVOLUTION
NVIDIA GPU Cloud
ISAAC
NVIDIA GPU in Every Cloud
Xavier DLA
Open Source
DGX-1 and DGX StationTesla V100
TensorRT
Tensor Core
NVIDIA
GPU CLOUD
CSPs
GTC 2017: Powering the AI Revolution

Weitere ähnliche Inhalte

Was ist angesagt?

GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
NVIDIA Taiwan
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
NVIDIA Taiwan
 
Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
cnvrg.io AI OS - Hands-on ML Workshops
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2
NVIDIA
 

Was ist angesagt? (20)

GPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierGPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next Frontier
 
OpenACC Highlights - February
OpenACC Highlights - FebruaryOpenACC Highlights - February
OpenACC Highlights - February
 
GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022
 
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
 
OpenACC Month Highlights- October
OpenACC Month Highlights- OctoberOpenACC Month Highlights- October
OpenACC Month Highlights- October
 
GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACC
 
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangAi Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
 
OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021
 
OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020
 
OpenACC Monthly Highlights
OpenACC Monthly HighlightsOpenACC Monthly Highlights
OpenACC Monthly Highlights
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
OpenACC Highlights: 2019 Year in Review
OpenACC Highlights: 2019 Year in ReviewOpenACC Highlights: 2019 Year in Review
OpenACC Highlights: 2019 Year in Review
 
Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2
 

Ähnlich wie GTC 2017: Powering the AI Revolution

Ähnlich wie GTC 2017: Powering the AI Revolution (20)

計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
GTC 2022 Keynote
GTC 2022 KeynoteGTC 2022 Keynote
GTC 2022 Keynote
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9
 
GTC Europe 2017 Keynote
GTC Europe 2017 KeynoteGTC Europe 2017 Keynote
GTC Europe 2017 Keynote
 
Talk on commercialising space data
Talk on commercialising space data Talk on commercialising space data
Talk on commercialising space data
 
Deep Learning on the SaturnV Cluster
Deep Learning on the SaturnV ClusterDeep Learning on the SaturnV Cluster
Deep Learning on the SaturnV Cluster
 
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. AvailabilityHPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
 
An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPU
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
 
Deep Learning Update May 2016
Deep Learning Update May 2016Deep Learning Update May 2016
Deep Learning Update May 2016
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIAH2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
 
Experiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRCExperiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRC
 
SQREAM DB on IBM Power9
SQREAM DB on IBM Power9SQREAM DB on IBM Power9
SQREAM DB on IBM Power9
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 

Mehr von NVIDIA

NVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October SummaryNVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October Summary
NVIDIA
 

Mehr von NVIDIA (20)

NVIDIA Story 2023.pdf
NVIDIA Story 2023.pdfNVIDIA Story 2023.pdf
NVIDIA Story 2023.pdf
 
NVIDIA GTC2022 Spring Highlights
NVIDIA GTC2022 Spring HighlightsNVIDIA GTC2022 Spring Highlights
NVIDIA GTC2022 Spring Highlights
 
NVIDIA Brochure 2021 Company Overview
NVIDIA Brochure 2021 Company OverviewNVIDIA Brochure 2021 Company Overview
NVIDIA Brochure 2021 Company Overview
 
NVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October SummaryNVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October Summary
 
The Best of AI and HPC in Healthcare and Life Sciences
The Best of AI and HPC in Healthcare and Life SciencesThe Best of AI and HPC in Healthcare and Life Sciences
The Best of AI and HPC in Healthcare and Life Sciences
 
NLP for Biomedical Applications
NLP for Biomedical ApplicationsNLP for Biomedical Applications
NLP for Biomedical Applications
 
Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019
 
Seven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence ResearchSeven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence Research
 
NVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA Developer Program Overview
NVIDIA Developer Program Overview
 
NVIDIA at Computex 2019
NVIDIA at Computex 2019 NVIDIA at Computex 2019
NVIDIA at Computex 2019
 
Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019
 
DGX POD Top 4 Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019DGX POD Top 4 Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019
 
Top 5 Data Science Sessions from GTC 2019
Top 5 Data Science Sessions from GTC 2019Top 5 Data Science Sessions from GTC 2019
Top 5 Data Science Sessions from GTC 2019
 
This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019
 
GTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon ValleyGTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon Valley
 
CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019
 
DGX Sessions You Won't Want to Miss at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019DGX Sessions You Won't Want to Miss at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019
 
Transforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon ValleyTransforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon Valley
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
 
CUDA Sessions You Won't Want to Miss at GTC 2019
CUDA Sessions You Won't Want to Miss at GTC 2019CUDA Sessions You Won't Want to Miss at GTC 2019
CUDA Sessions You Won't Want to Miss at GTC 2019
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

GTC 2017: Powering the AI Revolution

  • 1. JENSEN HUANG, FOUNDER & CEO | GTC 2017 POWERING THE AI REVOLUTION
  • 2. 2 LIFE AFTER MOORE’S LAW 1980 1990 2000 2010 2020 102 103 104 105 106 107 40 Years of Microprocessor Trend Data Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp Single-threaded perf 1.5X per year 1.1X per year Transistors (thousands)
  • 3. 3 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 RISE OF GPU COMPUTING Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp 102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year APPLICATIONS SYSTEMS ALGORITHMS CUDA ARCHITECTURE
  • 4. 4 RISE OF GPU COMPUTING GPU Developers 11X in 5 Years GTC Attendees 3X in 5 Years 2017 2017 511,0007,000 20122012 CUDA Downloads in 2016 1M+
  • 5. 5 ANNOUNCING PROJECT HOLODECK Photorealistic models Interactive physics Collaboration Early access in September
  • 6. 6 ERA OF MACHINE LEARNING “A Quest for Intelligence” — Fei-Fei Li “The Master Algorithm” — Pedro Domingos
  • 7. 7 BIG BANG OF MODERN AI Auto Encoders GANLSTM IDSIA CNN on GPU Stanford & NVIDIA Large-scale DNN on GPU U Toronto AlexNet on GPU CaptioningNVIDIA BB8 Style TransferBRETTImageNet Google Photo Arterys FDA Approved AlphaGo Super Resolution Deep Voice Baidu DuLight NMT Superhuman ASR Reinforcement Learning Transfer Learning
  • 8. 8 DEEP LEARNING FOR RAY TRACING
  • 9. 9 IRAY WITH DEEP LEARNING
  • 10. 10 $5B BIG BANG OF MODERN AI Udacity Students in AI Programs 100X in 2 Years NIPS, ICML, CVPR, ICLR Attendees 2X in 2 Years 2016 2017 20,00013,000 20152014 AI Startups Investments 9X in 4 Years $5B 20162012
  • 11. 11 NVIDIA DEEP LEARNING SDK GPU AAS NVAIL INCEPTION INTERNET SERVICES ENTERPRISE HEALTHCARE GPU SYSTEMSFRAMEWORKS TESLA HGX-1 DGX-1 NVIDIA RESEARCH AUTOMOTIVE PLACES ROBOTS NVIDIA DEEP LEARNING SDK DRIVE PX JETSON TX POWERING THE AI REVOLUTION
  • 12. 12 NVIDIA INCEPTION — 1,300 DEEP LEARNING STARTUPS HEALTHCARE BUSINESS INTELLIGENCE & VISUALIZATION DEVELOPMENT PLATFORMS RETAIL, ETAIL IOT & MANUFACTURING PLATFORMs & APIs DATA MANAGEMENT AEC FINANCIAL SECURITY, IVA CYBERAUTONOMOUS MACHINES
  • 13. 13 SAP AI FOR THE ENTERPRISE First commercial AI offerings from SAP Brand Impact, Service Ticketing, Invoice-to-Record applications Powered by NVIDIA GPUs on DGX-1 and AWS
  • 14. 14 MODEL COMPLEXITY IS EXPLODING 2016 — Baidu Deep Speech 22015 — Microsoft ResNet 2017 — Google NMT 105 ExaFLOPS 8.7 Billion Parameters 20 ExaFLOPS 300 Million Parameters 7 ExaFLOPS 60 Million Parameters
  • 15. 15 ANNOUNCING TESLA V100 GIANT LEAP FOR AI & HPC VOLTA WITH NEW TENSOR CORE 21B xtors | TSMC 12nm FFN | 815mm2 5,120 CUDA cores 7.5 FP64 TFLOPS | 15 FP32 TFLOPS NEW 120 Tensor TFLOPS 20MB SM RF | 16MB Cache 16GB HBM2 @ 900 GB/s 300 GB/s NVLink
  • 16. 16 NEW TENSOR CORE New CUDA TensorOp instructions & data formats 4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized for deep learning Activation Inputs Weights Inputs Output Results
  • 17. 17 ANNOUNCING TESLA V100 GIANT LEAP FOR AI & HPC VOLTA WITH NEW TENSOR CORE Compared to Pascal 1.5X General-purpose FLOPS for HPC 12X Tensor FLOPS for DL training 6X Tensor FLOPS for DL inferencing
  • 18. 18
  • 21. 21 ANNOUNCING NEW FRAMEWORK RELEASES FOR VOLTA Hours CNN Training (ResNet-50) Hours Multi-Node Training with NCCL 2.0 (ResNet-50) 0 5 10 15 20 25 64x V100 8x V100 8x P100 0 10 20 30 40 50 V100 P100 K80 Hours LSTM Training (Neural Machine Translation) 0 10 20 30 40 50 8x V100 8x P100 8x K80
  • 22. 22 ANNOUNCING NVIDIA DGX-1 WITH TESLA V100 ESSENTIAL INSTRUMENT OF AI RESEARCH 960 Tensor TFLOPS | 8x Tesla V100 | NVLink Hybrid Cube From 8 days on TITAN X to 8 hours 400 servers in a box
  • 23. 23 ANNOUNCING NVIDIA DGX-1 WITH TESLA V100 ESSENTIAL INSTRUMENT OF AI RESEARCH 960 Tensor TFLOPS | 8x Tesla V100 | NVLink Hybrid Cube From 8 days on TITAN X to 8 hours 400 servers in a box $149,000 Order today: nvidia.com/DGX-1
  • 24. 24 ANNOUNCING NVIDIA DGX STATION PERSONAL DGX 480 Tensor TFLOPS | 4x Tesla V100 16GB NVLink Fully Connected | 3x DisplayPort 1500W | Water Cooled
  • 25. 25 ANNOUNCING NVIDIA DGX STATION PERSONAL DGX 480 Tensor TFLOPS | 4x Tesla V100 16GB NVLink Fully Connected | 3x DisplayPort 1500W | Water Cooled $69,000 Order today: nvidia.com/DGX-Station
  • 26. 26 ANNOUNCING HGX-1 WITH TESLA V100 VERSATILE GPU CLOUD COMPUTING 8x Tesla V100 with NVLINK Hybrid Cube 2C:8G | 2C:4G | 1C:2G Configurable NVIDIA Deep Learning, GRID graphics, CUDA HPC stacks
  • 27. 27 ANNOUNCING TENSORRT FOR TENSORFLOW Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow Compiler for Deep Learning inferencing next layer Relu bias 1x1 conv 3x3 conv bias Relu 5x5 conv bias Relu 1x1 conv bias Relu Relu bias 1x1 conv Relu bias 1x1 conv max pool previous layer
  • 28. 28 next layer 1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR 1x1 CBR 1x1 CBR max pool previous layer ANNOUNCING TENSORRT FOR TENSORFLOW Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow Compiler for Deep Learning Inferencing next layer bias bias bias bias max pool previous layer Relu Relu Relu Relu 1x1 conv 3x3 conv 5x5 conv 1x1 conv Relu bias 1x1 conv Relu bias 1x1 conv
  • 29. 29 next layer 3x3 CBR 5x5 CBR 1x1 CBR 1x1 CBR max pool previous layer concat 1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR max pool previous layer 1x1 CBR 1x1 CBR ANNOUNCING TENSORRT FOR TENSORFLOW Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow Compiler for Deep Learning Inferencing
  • 30. 30 next layer 3x3 CBR 5x5 CBR 1x1 CBR max pool previous layer 1x1 CBR ANNOUNCING TENSORRT FOR TENSORFLOW Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow Compiler for Deep Learning Inferencing
  • 31. 31 next layer 3x3 CBR 5x5 CBR 1x1 CBR max pool previous layer 1x1 CBR ANNOUNCING TENSORRT FOR TENSORFLOW Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow Compiler for Deep Learning Inferencing
  • 32. 32 ANNOUNCING TESLA V100 FOR HYPERSCALE INFERENCE 15-25X INFERENCE SPEED-UP VS SKYLAKE 150W | FHHL PCIE
  • 33. 33 THE CASE FOR GPU ACCELERATED DATACENTERS Tesla V100 Reduce 15X500 Nodes CPU Servers 33 Nodes GPU Accelerated Server 300K inf/s Datacenter @300 inf/s > 1K CPUs 1K CPUs > 500 Nodes @$3K @500W = $1.5M = 250KW
  • 34. 34 NVIDIA DEEP LEARNING STACK DEEP LEARNING FRAMEWORKS DEEP LEARNING LIBRARIES NVIDIA cuDNN, NCCL, cuBLAS, TensorRT CUDA DRIVER OPERATING SYSTEM GPU SYSTEM
  • 35. 35 Registry of Containers, Datasets, and Pre-trained models NVIDIA GPU CLOUD CSPs ANNOUNCING NVIDIA GPU CLOUD Containerized in NVDocker | Optimization across the full stack Always up-to-date | Fully tested and maintained by NVIDIA | Beta in July GPU-accelerated Cloud Platform Optimized for Deep Learning
  • 36. 36 POWER OF GPU COMPUTING 0 8 16 24 32 40 AMBER Performance (ns/day) P100 2016 K80 2015 K40 2014 K20 2013 AMBER 12 CUDA 4 AMBER 14 CUDA 5 AMBER 14 CUDA 6 AMBER 16 CUDA 8 0 2400 4800 7200 9600 12000 GoogleNet Performance (i/s) cuDNN 2 CUDA 6 cuDNN 4 CUDA 7 cuDNN 6 CUDA 8 NCCL 1.6 cuDNN 7 CUDA 9 NCCL 2 8x K80 2014 8x Maxwell 2015 DGX-1 2016 DGX-1V 2017
  • 37. 37 AI REVOLUTIONIZING TRANSPORTATION Domino’s: 1M Pizzas Delivered per Day 800M Parking Spots for 250M Cars in U.S. 280B Miles per Year
  • 38. 38 NVIDIA DRIVE — AI CAR PLATFORM Computer Vision Libraries OS Perception AI CUDA, cuDNN, TensorRT Localization Path Planning 1 TOPS 10 TOPS 100 TOPS DRIVE PX 2 Parker Level 2/3 DRIVE PX Xavier Level 4/5
  • 40. 40 ANNOUNCING TOYOTA SELECTS NVIDIA DRIVE PX FOR AUTONOMOUS VEHICLES
  • 41. 41 AI PROCESSOR FOR AUTONOMOUS MACHINES XAVIER 30 TOPS DL 30W Custom ARM64 CPU 512 Core Volta GPU 10 TOPS DL Accelerator General Purpose Architectures Domain Specific Accelerators Energy Efficiency CPU FPGA CUDA GPU DLA Pascal Volta
  • 42. 42 AI PROCESSOR FOR AUTONOMOUS MACHINES XAVIER 30 TOPS DL 30W Custom ARM64 CPU 512 Core Volta GPU 10 TOPS DL Accelerator General Purpose Architectures Domain Specific Accelerators Energy Efficiency CPU CUDA GPU DLA Volta +
  • 43. 43 ANNOUNCING XAVIER DLA NOW OPEN SOURCE Early Access July | General Release September Command Interface Tensor Execution Micro-controller Memory Interface Input DMA (Activations and Weights) Unified 512KB Input Buffer Activations and Weights Sparse Weight Decompression Native Winograd Input Transform MAC Array 2048 Int8 or 1024 Int16 or 1024 FP16 Output Accumulators Output Postprocess or (Activation Function, Pooling etc.) Output DMA
  • 45. 45 ROBOTS Credit: Yevgen Chebotar, Karol Hausman, Marvin Zhang, Sergey Levine
  • 46. 46 ANNOUNCING ISAAC ROBOT SIMULATOR NVIDIA GPU Computer ISAAC Robot Simulator OpenAI GYM Robot & Environment Definition Virtual Jetson
  • 47. 47 ANNOUNCING ISAAC ROBOT SIMULATOR NVIDIA GPU Computer ISAAC Robot Simulator OpenAI GYM Robot & Environment Definition Virtual Jetson LEARN & PLAN SEE, HEAR, TOUCH ACT
  • 48. 48 NVIDIA POWERING THE AI REVOLUTION NVIDIA GPU Cloud ISAAC NVIDIA GPU in Every Cloud Xavier DLA Open Source DGX-1 and DGX StationTesla V100 TensorRT Tensor Core NVIDIA GPU CLOUD CSPs