SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
GTC 2017
AI FOR INDUSTRY
2
A NEW ERA OF COMPUTING
PC INTERNET
WinTel, Yahoo!
1 billion PC users
1995 2005 2015
MOBILE-CLOUD
iPhone, Amazon AWS
2.5 billion mobile users
AI & IOT
Deep Learning, GPU
100s of billions of devices
3
4
HOW A DEEP NEURAL NETWORK SEES
Image “Audi A7”
Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011.
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.
5
Training
Device
Datacenter
GPU DEEP LEARNING
IS A NEW COMPUTING MODEL
TRAINING
Billions of Trillions of Operations
GPU train larger models, accelerate
time to market
6
Training
Device
Datacenter
GPU DEEP LEARNING
IS A NEW COMPUTING MODEL
DATACENTER INFERENCING
10s of billions of image, voice, video
queries per day
GPU inference for fast response,
maximize datacenter throughput
2016 – Baidu Deep Speech 2
Superhuman Voice Recognition
2015 – Microsoft ResNet
Superhuman Image Recognition
2017 – Google Neural Machine Translation
Near Human Language Translation
100 ExaFLOPS
8700 Million Parameters
20 ExaFLOPS
300 Million Parameters
7 ExaFLOPS
60 Million Parameters
To Tackle Increasingly Complex Challenges
NEURAL NETWORK COMPLEXITY IS EXPLODING
8
GEFORCE NOW
You listen to music on Spotify.
You watch movies on Netflix.
GeForce Now lets you play games
the same way.
Instantly stream the latest titles
from our powerful cloud-gaming
supercomputers. Think of it as your
game console in the sky.
Gaming is now easy and instant.
ROAD TO
EXASCALE
Volta to Fuel Most Powerful
US Supercomputers
1.8
1.5 1.6
1.5
cuFFT Physics
(QUDA)
Seismic
(RTM)
STREAM
V100PerformanceNormalizedtoP100
1.5X HPC Performance in 1 Year
Summit
Supercomputer
200+ PetaFlops
~3,400 Nodes
10 Megawatts
System Config Info: 2X Xeon E5-2690 v4, 2.6GHz, w/ 1X Tesla
P100 or V100. V100 measured on pre-production hardware.
9
NVIDIA SATURN V
124 DGX-1 Deep Learning Supercomputers
10
TESLA V100
THE MOST ADVANCED DATA CENTER GPU EVER BUILT
5,120 CUDA cores
640 NEW Tensor cores
7.5 FP64 TFLOPS | 15 FP32 TFLOPS
120 Tensor TFLOPS
20MB SM RF | 16MB Cache | 16GB HBM2 @ 900 GB/s
300 GB/s NVLink
11
NEW TENSOR CORE BUILT FOR AI
Delivering 120 TFLOPS of DL Performance
TENSOR CORE
ALL MAJOR FRAMEWORKSVOLTA-OPTIMIZED cuDNN
MATRIX DATA OPTIMIZATION:
Dense Matrix of Tensor Compute
TENSOR-OP CONVERSION:
FP32 to Tensor Op Data for
Frameworks
TENSOR CORE
VOLTA TENSOR CORE
4x4 matrix processing array
D[FP32] = A[FP16] * B[FP16] + C[FP32]
Optimized For Deep Learning
12
REVOLUTIONARY AI PERFORMANCE
3X Faster DL Training Performance
Over 80x DL Training
Performance in 3 Years
1x K80
cuDNN2
4x M40
cuDNN3
8x P100
cuDNN6
8x V100
cuDNN7
0x
20x
40x
60x
80x
100x
Q1
15
Q3
15
Q2
17
Q2
16
Googlenet Training Performance
(Speedup Vs K80)
SpeedupvsK80
85% Scale-Out Efficiency
Scales to 64 GPUs with Microsoft
Cognitive Toolkit
0 5 10 15
64X V100
8X V100
8X P100
Multi-Node Training with NCCL2.0
(ResNet-50)
ResNet50 Training for 90 Epochs with 1.28M images dataset | Using
Caffe2 | V100 performance measured on pre-production hardware.
1 Hour
7.4 Hours
18 Hours
3X Reduction in Time to Train
Over P100
0 10 20
1X
V100
1X
P100
2X
CPU
LSTM Training
(Neural Machine Translation)
Neural Machine Translation Training for 13 Epochs |German ->English,
WMT15 subset | CPU = 2x Xeon E5 2699 V4 | V100 performance
measured on pre-production hardware.
15 Days
18 Hours
6 Hours
13
VOLTA DELIVERS 3X MORE INFERENCE
THROUGHPUT
Low Latency performance with V100 and TensorRT
Fuse Layers
Compact
Optimize Precision
(FP32, FP16, INT8)
3x more throughput at 7ms latency with V100
(ResNet-50)
TensorRT Compiled
Real-time
Network
Trained
Neural
Network
0
1,000
2,000
3,000
4,000
5,000
CPU Tesla P100
(TensorFlow)
Tesla P100
(TensorRT)
Tesla V100
(TensorRT)Throughput@7ms(Images/Sec)
33ms
CPU Server: 2X Xeon E5-2660 V4; GPU: w/P100, w/V100 (@150W) | V100 performance measured on pre-production hardware.
3X
10ms
7ms
7ms
14
0
1,000
2,000
3,000
4,000
5,000
6,000
Throughput@7ms(Images/Sec)
NVIDIA TENSORRT 3
World’s Fastest Inference Platform
40X
ResNet-50 Throughput @ 7ms Latency
14ms
CPU-only
Tesla V100
TensorFlow
Tesla V100
TRT 3
(FP16)
Tesla P4
TRT 3
(INT8)
Workload ResNet-50| Data-set ImageNet | CPU: Skylake | GPU: Tesla P4 or Tesla V100
GTC Taiwan 2017 企業端深度學習與人工智慧應用

Weitere ähnliche Inhalte

Was ist angesagt?

Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data AnalyticsOpen Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
inside-BigData.com
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2
NVIDIA
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
NVIDIA Taiwan
 

Was ist angesagt? (20)

Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
 
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data AnalyticsOpen Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
 
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path Forward
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPU
 
AI, A New Computing Model
AI, A New Computing ModelAI, A New Computing Model
AI, A New Computing Model
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
 
OpenACC Month Highlights- October
OpenACC Month Highlights- OctoberOpenACC Month Highlights- October
OpenACC Month Highlights- October
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation Overview
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
 
Nvidia Deep Learning Solutions - Alex Sabatier
Nvidia Deep Learning Solutions - Alex SabatierNvidia Deep Learning Solutions - Alex Sabatier
Nvidia Deep Learning Solutions - Alex Sabatier
 
Nvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't WaitNvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't Wait
 

Ähnlich wie GTC Taiwan 2017 企業端深度學習與人工智慧應用

TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
Willy Marroquin (WillyDevNET)
 

Ähnlich wie GTC Taiwan 2017 企業端深度學習與人工智慧應用 (20)

HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. AvailabilityHPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
GTC 2022 Keynote
GTC 2022 KeynoteGTC 2022 Keynote
GTC 2022 Keynote
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
 
Deep Learning Update May 2016
Deep Learning Update May 2016Deep Learning Update May 2016
Deep Learning Update May 2016
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
 
GTC 2016 Opening Keynote
GTC 2016 Opening KeynoteGTC 2016 Opening Keynote
GTC 2016 Opening Keynote
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
 
JETSON : AI at the EDGE
JETSON : AI at the EDGEJETSON : AI at the EDGE
JETSON : AI at the EDGE
 
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
Aplicações Potenciais de Deep Learning à Indústria do PetróleoAplicações Potenciais de Deep Learning à Indústria do Petróleo
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
 
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 

Mehr von NVIDIA Taiwan

GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
NVIDIA Taiwan
 
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
NVIDIA Taiwan
 
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
NVIDIA Taiwan
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
NVIDIA Taiwan
 
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
NVIDIA Taiwan
 

Mehr von NVIDIA Taiwan (20)

GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統
GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統
GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統
 
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷
 
GTC Taiwan 2017 人工智慧:保險科技的未來
GTC Taiwan 2017 人工智慧:保險科技的未來GTC Taiwan 2017 人工智慧:保險科技的未來
GTC Taiwan 2017 人工智慧:保險科技的未來
 
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
 
GTC Taiwan 2017 用計算來凝視複雜的世界
GTC Taiwan 2017 用計算來凝視複雜的世界 GTC Taiwan 2017 用計算來凝視複雜的世界
GTC Taiwan 2017 用計算來凝視複雜的世界
 
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
 
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
 
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用
 
GTC Taiwan 2017 結合智能視覺系統之機械手臂
GTC Taiwan 2017 結合智能視覺系統之機械手臂GTC Taiwan 2017 結合智能視覺系統之機械手臂
GTC Taiwan 2017 結合智能視覺系統之機械手臂
 
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
 
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
 
GTC Taiwan 2017 應用智慧科技於傳染病防治
GTC Taiwan 2017 應用智慧科技於傳染病防治GTC Taiwan 2017 應用智慧科技於傳染病防治
GTC Taiwan 2017 應用智慧科技於傳染病防治
 
NVIDIA深度學習教育機構 (DLI): Deep Learning Institute
NVIDIA深度學習教育機構 (DLI): Deep Learning InstituteNVIDIA深度學習教育機構 (DLI): Deep Learning Institute
NVIDIA深度學習教育機構 (DLI): Deep Learning Institute
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

GTC Taiwan 2017 企業端深度學習與人工智慧應用

  • 1. GTC 2017 AI FOR INDUSTRY
  • 2. 2 A NEW ERA OF COMPUTING PC INTERNET WinTel, Yahoo! 1 billion PC users 1995 2005 2015 MOBILE-CLOUD iPhone, Amazon AWS 2.5 billion mobile users AI & IOT Deep Learning, GPU 100s of billions of devices
  • 3. 3
  • 4. 4 HOW A DEEP NEURAL NETWORK SEES Image “Audi A7” Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.
  • 5. 5 Training Device Datacenter GPU DEEP LEARNING IS A NEW COMPUTING MODEL TRAINING Billions of Trillions of Operations GPU train larger models, accelerate time to market
  • 6. 6 Training Device Datacenter GPU DEEP LEARNING IS A NEW COMPUTING MODEL DATACENTER INFERENCING 10s of billions of image, voice, video queries per day GPU inference for fast response, maximize datacenter throughput
  • 7. 2016 – Baidu Deep Speech 2 Superhuman Voice Recognition 2015 – Microsoft ResNet Superhuman Image Recognition 2017 – Google Neural Machine Translation Near Human Language Translation 100 ExaFLOPS 8700 Million Parameters 20 ExaFLOPS 300 Million Parameters 7 ExaFLOPS 60 Million Parameters To Tackle Increasingly Complex Challenges NEURAL NETWORK COMPLEXITY IS EXPLODING
  • 8. 8 GEFORCE NOW You listen to music on Spotify. You watch movies on Netflix. GeForce Now lets you play games the same way. Instantly stream the latest titles from our powerful cloud-gaming supercomputers. Think of it as your game console in the sky. Gaming is now easy and instant. ROAD TO EXASCALE Volta to Fuel Most Powerful US Supercomputers 1.8 1.5 1.6 1.5 cuFFT Physics (QUDA) Seismic (RTM) STREAM V100PerformanceNormalizedtoP100 1.5X HPC Performance in 1 Year Summit Supercomputer 200+ PetaFlops ~3,400 Nodes 10 Megawatts System Config Info: 2X Xeon E5-2690 v4, 2.6GHz, w/ 1X Tesla P100 or V100. V100 measured on pre-production hardware.
  • 9. 9 NVIDIA SATURN V 124 DGX-1 Deep Learning Supercomputers
  • 10. 10 TESLA V100 THE MOST ADVANCED DATA CENTER GPU EVER BUILT 5,120 CUDA cores 640 NEW Tensor cores 7.5 FP64 TFLOPS | 15 FP32 TFLOPS 120 Tensor TFLOPS 20MB SM RF | 16MB Cache | 16GB HBM2 @ 900 GB/s 300 GB/s NVLink
  • 11. 11 NEW TENSOR CORE BUILT FOR AI Delivering 120 TFLOPS of DL Performance TENSOR CORE ALL MAJOR FRAMEWORKSVOLTA-OPTIMIZED cuDNN MATRIX DATA OPTIMIZATION: Dense Matrix of Tensor Compute TENSOR-OP CONVERSION: FP32 to Tensor Op Data for Frameworks TENSOR CORE VOLTA TENSOR CORE 4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized For Deep Learning
  • 12. 12 REVOLUTIONARY AI PERFORMANCE 3X Faster DL Training Performance Over 80x DL Training Performance in 3 Years 1x K80 cuDNN2 4x M40 cuDNN3 8x P100 cuDNN6 8x V100 cuDNN7 0x 20x 40x 60x 80x 100x Q1 15 Q3 15 Q2 17 Q2 16 Googlenet Training Performance (Speedup Vs K80) SpeedupvsK80 85% Scale-Out Efficiency Scales to 64 GPUs with Microsoft Cognitive Toolkit 0 5 10 15 64X V100 8X V100 8X P100 Multi-Node Training with NCCL2.0 (ResNet-50) ResNet50 Training for 90 Epochs with 1.28M images dataset | Using Caffe2 | V100 performance measured on pre-production hardware. 1 Hour 7.4 Hours 18 Hours 3X Reduction in Time to Train Over P100 0 10 20 1X V100 1X P100 2X CPU LSTM Training (Neural Machine Translation) Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x Xeon E5 2699 V4 | V100 performance measured on pre-production hardware. 15 Days 18 Hours 6 Hours
  • 13. 13 VOLTA DELIVERS 3X MORE INFERENCE THROUGHPUT Low Latency performance with V100 and TensorRT Fuse Layers Compact Optimize Precision (FP32, FP16, INT8) 3x more throughput at 7ms latency with V100 (ResNet-50) TensorRT Compiled Real-time Network Trained Neural Network 0 1,000 2,000 3,000 4,000 5,000 CPU Tesla P100 (TensorFlow) Tesla P100 (TensorRT) Tesla V100 (TensorRT)Throughput@7ms(Images/Sec) 33ms CPU Server: 2X Xeon E5-2660 V4; GPU: w/P100, w/V100 (@150W) | V100 performance measured on pre-production hardware. 3X 10ms 7ms 7ms
  • 14. 14 0 1,000 2,000 3,000 4,000 5,000 6,000 Throughput@7ms(Images/Sec) NVIDIA TENSORRT 3 World’s Fastest Inference Platform 40X ResNet-50 Throughput @ 7ms Latency 14ms CPU-only Tesla V100 TensorFlow Tesla V100 TRT 3 (FP16) Tesla P4 TRT 3 (INT8) Workload ResNet-50| Data-set ImageNet | CPU: Skylake | GPU: Tesla P4 or Tesla V100