SlideShare a Scribd company logo
1 of 10
FLOPSとバンド幅の遷移および
GPUアーキテクチャの模式図
名古屋大学未来材料・システム研究所 出川智啓
GPUの性能の遷移(理論演算性能)
GeForce
ゲーム用
Quadro
CG用
Tesla
GPGPU用
http://docs.nvidia.com/cuda/cuda-c-programming-guide/で公開されている資料を基に作成
TheoreticalGFLOP/s
2001 2003 2005 2007 2009 2011 2013 2015
*1コードネーム
*2製品ファミリ
GeForce FX 5800
GeForce 6800 Ultra
Pentium 4
GeForce 7800 GTX
GeForce 8800 GTX
GeForce GTX 280
GeForce GTX 480
GeForce GTX 580
Tesla*1
Fermi
Kepler
GeForce GTX 680
Kepler
Maxwell
GeForce GTX TITAN
GeForce 780 Ti
Woodcrest Harpertown
Tesla C1060
Tesla C2050
Tesla K40
Tesla K20X
Tesla M2090
Sandy Bridge
Ivy Bridge
Tesla K80
Tesla P100
GeForce GTX Titan X
GeForce GTX Titan X
Pascal
Haswell
Broadwell
Excel Sheet
2 2017/4/1
GPUの性能の遷移(理論バンド幅)
GeForce
ゲーム用
Quadro
CG用
Tesla
GPGPU用
Tesla*1
Fermi
Maxwell
Kepler
GeForce FX 5900
GeForce 6800 GT
GeForce 7800 GTX
GeForce 8800 GTX
GeForce GTX 280
GeForce GTX 480
GeForce GTX 680
GeForce 780 Ti
Tesla K40Tesla K20X
Tesla M2090
Tesla C2050
Tesla C1060
Northwood Woodcrest
Harpertown
Sandy Bridge
Ivy Bridge
Westmere
Bloomfield
Prescott
2003 2005 2007 2009 2011 2013
TheoreticalGB/s
2015
Tesla P100
GeForce GTX
Titan X
Pascal
GeForce GTX
Titan X
Tesla K80
Haswell
Broadwell
Excel Sheet
3 2017/4/1
http://docs.nvidia.com/cuda/cuda-c-programming-guide/で公開されている資料を基に作成
Teslaアーキテクチャ
 Tesla C1060の仕様
 SM数 30
 CUDA Core数 240(=8 Core/SM×30 SM)
 キャッシュを搭載せず
http://www.anandtech.com/show/2549/2で公開されている画像を基に作成
SP SP
SP SP
SP SP
SP SP
SFU SFU
16 KB
Shared Memory
Register File
(16384×32-bit)
Streaming
Multiprocessor
SMSMSM
4 2017/4/1
Fermiアーキテクチャ
 Tesla M2050の仕様
 SM数 14
 CUDA Core数 448(=32Core/SM×14SM)
 L1/L2 キャッシュを搭載
 ECC(誤り訂正機能)を搭載
詳細はhttp://www.nvidia.co.jp/docs/IO/
81860/NVIDIA_Fermi_Architecture_Whitep
aper_FINAL_J.pdfを参照のこと
Register File
(16384 × 32-bit)
64 KB Shared
Memory / L1 Cache
SM
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU×4
L2 Cache
GigaThread Engine
PCI Express 3.0 Host Interface
MemoryController
GPC
Raster Engine
GPC
Raster Engine
SM
Raster Engine
GPC
Raster Engine
GPC
MemoryControllerMemoryController
MemoryControllerMemoryControllerMemoryController
http://www.anandtech.com/show/2849/3で公開されている画像を基に作成
5 2017/4/1
Keplerアーキテクチャ
 Tesla K20c/mの仕様
 SMX数 13
 CUDA Core数 2,496(=192 Core/SM×13 SMX)
https://library.creativecow.net/kaufman_debra/NVIDIA-VGX/1で公開されている画像を基に作成
詳細はhttps://www.nvidia.co.jp/content
/apac/pdf/tesla/nvidia-kepler-gk110-ar
chitecture-whitepaper-jp.pdfを参照のこと
Register File (65536 × 32-bit)
64 KB Shared Memory / L1 Cache
48 KB Read-Only Data Cache
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
SMX
SMX
L2 Cache
GigaThread Engine
PCI Express 3.0 Host Interface
MemoryControllerMemoryControllerMemoryController
MemoryControllerMemoryControllerMemoryController
6 2017/4/1
Maxwellアーキテクチャ
 Geforce GTX TITAN Xの仕様
 SMM数 24
 CUDA Core数 3,072(=128 Core/SM×24 SM)
 倍精度演算器は搭載していない
http://www.itmedia.co.jp/pcuser/articles/1409/19/news051.htmlで公開されている画像を基に作成
第1世代の詳細はhttps://www.nvidia.co.jp/cont
ent/product-detail-pages/geforce-gtx-750-ti
/geforce-gtx-750ti-whitepaper.pdfを参照のこと
64 KB Shared Memory
L1 Cache
SMM
Register File
(16,384 × 32-
bit)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
L1 Cache
Register File
(16,384 × 32-
bit)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
Register File
(16,384 × 32-
bit)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
Register File
(16,384 × 32-
bit)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
PolyMorph Engine 3.0
SMM
Raster Engine
GPC
Raster Engine
GPC
L2 Cache
GigaThread Engine
PCI Express 3.0 Host Interface
MemoryController
Raster Engine
GPC
Raster Engine
GPC
MemoryController
MemoryControllerMemoryController
7 2017/4/1
L2 Cache
GigaThread Engine
PCI Express 3.0 Host Interface
MemoryControllerMemoryControllerMemoryControllerMemoryController
HighBandwidthMemory2HighBandwidthMemory2
MemoryControllerMemoryControllerMemoryControllerMemoryController
HighBandwidthMemory2HighBandwidthMemory2
High-Speed Hub
NVLinkNVLink NVLinkNVLink
Pascalアーキテクチャ
 Tesla P100の仕様
 SM数 56
 CUDA Core数 3584 (=64 Core/SM×56 SM)
詳細はhttp://images.nvidia.com/content/pdf/t
esla/whitepaper/pascal-architecture-whitepa
per.pdfを参照のこと
64 KB Shared Memory / L1 Cache
48 KB Read-Only Data Cache
SM
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Register File (32768 × 32-bit)
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Register File (32768 × 32-bit)
http://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdfで公開されている画像を基に作成
8 2017/4/1
理論演算性能(Embedded Excel Sheet)
 プログラミング
ガイドの図から
データを推定
 GPU倍精度のみ
正しい値に修正
 残りは近似値
year NVIDIA GPU Single Precisionyear NVIDIA GPU Double Precisionyear Intel CPU Single Precisionyear Intel CPU Double Precision
2003.000 0.00E+00 2008.462 7.80E+01 2003.000 7.60E+00 2003.000 3.80E+00
2004.248 7.72E+01 2009.751 5.15E+02 2005.413 2.66E+01 2005.413 1.33E+01
2005.413 1.54E+02 2011.369 6.66E+02 2006.825 5.12E+01 2006.825 2.66E+01
2006.832 5.17E+02 2012.864 1.31E+03 2008.456 9.00E+01 2008.456 4.24E+01
2008.462 9.28E+02 2013.877 1.43E+03 2009.233 1.10E+02 2009.233 5.26E+01
2009.751 1.34E+03 2014.872 1.87E+03 2010.204 1.68E+02 2010.204 6.29E+01
2010.846 1.52E+03 2016.594 5.30E+03 2011.151 4.26E+02 2011.151 2.16E+02
2012.224 3.07E+03 2013.688 5.32E+02 2013.688 2.66E+02
2013.137 4.50E+03 2014.871 9.90E+02 2014.871 4.95E+02
2013.855 5.36E+03 2016.160 1.32E+03 2016.160 6.68E+02
2015.203 6.14E+03
2016.594 1.02E+04
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
7000
7500
8000
8500
9000
9500
10000
10500
11000
2001 2003 2005 2007 2009 2011 2013 2015 2017
TheoreticalGFLOP/s
year
NVIDIA GPU Double Precision
NVIDIA GPU Single Precision
Intel CPU Double Precision
Intel CPU Single Precision
9 2017/4/1
理論バンド幅*(Embedded Excel Sheet)
year Geforce GPUyear Tesla GPU year Intel CPU
2003.000 1.26E+01 2008.000 1.02E+02 2003.000 6.29E+00
2004.000 3.08E+01 2009.000 1.49E+02 2005.000 8.81E+00
2005.000 5.35E+01 2010.000 1.78E+02 2006.000 1.07E+01
2006.000 8.56E+01 2012.000 2.50E+02 2007.000 1.32E+01
2008.000 1.42E+02 2013.000 2.88E+02 2009.000 3.21E+01
2009.000 1.77E+02 2014.884 4.80E+02 2010.000 3.21E+01
2012.000 1.92E+02 2016.351 7.32E+02 2012.000 5.10E+01
2013.000 3.36E+02 2013.000 5.98E+01
2015.196 3.36E+02 2014.879 6.81E+01
2016.604 4.80E+02 2016.189 7.77E+01
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
2001 2003 2005 2007 2009 2011 2013 2015 2017
TheoreticalGB/s
year
Geforce GPU
Tesla GPU
Intel CPU
 プログラミングガイ
ドの図からデータを
推定
 Tesla GPUのみ正
しい値に修正
 残りは近似値
10 2017/4/1
*前スライドのExcelシートにも含まれているが,念のため

More Related Content

What's hot

OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)
Takeshi HASEGAWA
 

What's hot (20)

Maxwell と Java CUDAプログラミング
Maxwell と Java CUDAプログラミングMaxwell と Java CUDAプログラミング
Maxwell と Java CUDAプログラミング
 
GPU仮想化最前線 - KVMGTとvirtio-gpu -
GPU仮想化最前線 - KVMGTとvirtio-gpu -GPU仮想化最前線 - KVMGTとvirtio-gpu -
GPU仮想化最前線 - KVMGTとvirtio-gpu -
 
【Unite Tokyo 2019】【あら簡単】インテルのGPAを使ってあなたのUnityタイトルを高速化
【Unite Tokyo 2019】【あら簡単】インテルのGPAを使ってあなたのUnityタイトルを高速化【Unite Tokyo 2019】【あら簡単】インテルのGPAを使ってあなたのUnityタイトルを高速化
【Unite Tokyo 2019】【あら簡単】インテルのGPAを使ってあなたのUnityタイトルを高速化
 
PyCUDAの紹介
PyCUDAの紹介PyCUDAの紹介
PyCUDAの紹介
 
画像処理の高性能計算
画像処理の高性能計算画像処理の高性能計算
画像処理の高性能計算
 
[Container Runtime Meetup] runc & User Namespaces
[Container Runtime Meetup] runc & User Namespaces[Container Runtime Meetup] runc & User Namespaces
[Container Runtime Meetup] runc & User Namespaces
 
いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門
 
【CEDEC2017】Unityを使ったNintendo Switch™向けのタイトル開発・移植テクニック!!
【CEDEC2017】Unityを使ったNintendo Switch™向けのタイトル開発・移植テクニック!!【CEDEC2017】Unityを使ったNintendo Switch™向けのタイトル開発・移植テクニック!!
【CEDEC2017】Unityを使ったNintendo Switch™向けのタイトル開発・移植テクニック!!
 
Ixgbe internals
Ixgbe internalsIxgbe internals
Ixgbe internals
 
OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)
 
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
 
CUDAプログラミング入門
CUDAプログラミング入門CUDAプログラミング入門
CUDAプログラミング入門
 
Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報
 
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
 
Androidの新ビルドシステム
Androidの新ビルドシステムAndroidの新ビルドシステム
Androidの新ビルドシステム
 
20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf
 
【CEDEC2018】一歩先のUnityでのパフォーマンス/メモリ計測、デバッグ術
【CEDEC2018】一歩先のUnityでのパフォーマンス/メモリ計測、デバッグ術【CEDEC2018】一歩先のUnityでのパフォーマンス/メモリ計測、デバッグ術
【CEDEC2018】一歩先のUnityでのパフォーマンス/メモリ計測、デバッグ術
 
Unix v6 Internals
Unix v6 InternalsUnix v6 Internals
Unix v6 Internals
 
Javaバイトコード入門
Javaバイトコード入門Javaバイトコード入門
Javaバイトコード入門
 
NGINX Back to Basics: Ingress Controller (Japanese Webinar)
NGINX Back to Basics: Ingress Controller (Japanese Webinar)NGINX Back to Basics: Ingress Controller (Japanese Webinar)
NGINX Back to Basics: Ingress Controller (Japanese Webinar)
 

Viewers also liked

Frameworks We Live By: Design by day-to-day framework development: Multi-para...
Frameworks We Live By: Design by day-to-day framework development: Multi-para...Frameworks We Live By: Design by day-to-day framework development: Multi-para...
Frameworks We Live By: Design by day-to-day framework development: Multi-para...
Atsuhiro Kubo
 

Viewers also liked (20)

NV Wim Delvoye blijft winstmachine
NV Wim Delvoye blijft winstmachineNV Wim Delvoye blijft winstmachine
NV Wim Delvoye blijft winstmachine
 
Investeringsprogramma koninklijk paleis in hoogste versnelling
Investeringsprogramma koninklijk paleis in hoogste versnellingInvesteringsprogramma koninklijk paleis in hoogste versnelling
Investeringsprogramma koninklijk paleis in hoogste versnelling
 
ジャパリパークさいかいけーかく
ジャパリパークさいかいけーかくジャパリパークさいかいけーかく
ジャパリパークさいかいけーかく
 
Delfina Gómez, la candidata desconocida
Delfina Gómez, la candidata desconocidaDelfina Gómez, la candidata desconocida
Delfina Gómez, la candidata desconocida
 
Top 10 Digital Workplace Patterns #spscalgary
Top 10 Digital Workplace Patterns #spscalgaryTop 10 Digital Workplace Patterns #spscalgary
Top 10 Digital Workplace Patterns #spscalgary
 
マイクロソフトが創る未来 医療編 20170401
マイクロソフトが創る未来 医療編 20170401マイクロソフトが創る未来 医療編 20170401
マイクロソフトが創る未来 医療編 20170401
 
La percée de Mélenchon
La percée de MélenchonLa percée de Mélenchon
La percée de Mélenchon
 
Elixir-Conf-Japan-2017-session-ohr486
Elixir-Conf-Japan-2017-session-ohr486Elixir-Conf-Japan-2017-session-ohr486
Elixir-Conf-Japan-2017-session-ohr486
 
researchED Oxford 2017
researchED Oxford 2017researchED Oxford 2017
researchED Oxford 2017
 
Infographic: Medicare Marketing: Direct Mail: Still The #1 Influencer For Tho...
Infographic: Medicare Marketing: Direct Mail: Still The #1 Influencer For Tho...Infographic: Medicare Marketing: Direct Mail: Still The #1 Influencer For Tho...
Infographic: Medicare Marketing: Direct Mail: Still The #1 Influencer For Tho...
 
Culture
CultureCulture
Culture
 
Think Like a 50s Ad Exec & Execute Like a Geek #BeWizard
Think Like a 50s Ad Exec & Execute Like a Geek #BeWizardThink Like a 50s Ad Exec & Execute Like a Geek #BeWizard
Think Like a 50s Ad Exec & Execute Like a Geek #BeWizard
 
Artificial Intelligence or the Brainization of the Economy
Artificial Intelligence or the Brainization of the EconomyArtificial Intelligence or the Brainization of the Economy
Artificial Intelligence or the Brainization of the Economy
 
Everything as a code
Everything as a codeEverything as a code
Everything as a code
 
Head injury
Head injury Head injury
Head injury
 
Humantalents soft skills training-brochure
Humantalents soft skills training-brochureHumantalents soft skills training-brochure
Humantalents soft skills training-brochure
 
Frameworks We Live By: Design by day-to-day framework development: Multi-para...
Frameworks We Live By: Design by day-to-day framework development: Multi-para...Frameworks We Live By: Design by day-to-day framework development: Multi-para...
Frameworks We Live By: Design by day-to-day framework development: Multi-para...
 
Splunk Überblick
Splunk ÜberblickSplunk Überblick
Splunk Überblick
 
Bilmök 2017 - Microsoft Yeni Yesil Yazilim Geliştirme Teknolojileri
Bilmök 2017 - Microsoft Yeni Yesil Yazilim Geliştirme TeknolojileriBilmök 2017 - Microsoft Yeni Yesil Yazilim Geliştirme Teknolojileri
Bilmök 2017 - Microsoft Yeni Yesil Yazilim Geliştirme Teknolojileri
 
A Survey of IT Jobs in the Kingdom of Saudi Arabia 2017
A Survey of IT Jobs in the Kingdom of Saudi Arabia 2017A Survey of IT Jobs in the Kingdom of Saudi Arabia 2017
A Survey of IT Jobs in the Kingdom of Saudi Arabia 2017
 

Similar to Schematic diagrams of GPUs' architecture and Time evolution of theoretical FLOPS and Bandwidth

2013 Elite A-Series Launch
2013 Elite A-Series Launch2013 Elite A-Series Launch
2013 Elite A-Series Launch
AMD
 
La2 Motherboard
La2 MotherboardLa2 Motherboard
La2 Motherboard
Cma Mohd
 
Motherboard
MotherboardMotherboard
Motherboard
Cma Mohd
 
Ict - Motherboard
Ict - MotherboardIct - Motherboard
Ict - Motherboard
aleeya91
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 

Similar to Schematic diagrams of GPUs' architecture and Time evolution of theoretical FLOPS and Bandwidth (20)

20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client PresentationBladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
 
Graphics processing unit (gpu)
Graphics processing unit (gpu)Graphics processing unit (gpu)
Graphics processing unit (gpu)
 
2013 Elite A-Series Launch
2013 Elite A-Series Launch2013 Elite A-Series Launch
2013 Elite A-Series Launch
 
Chipsets amd
Chipsets amdChipsets amd
Chipsets amd
 
GPU/VGA Thermal Design Power
GPU/VGA Thermal Design PowerGPU/VGA Thermal Design Power
GPU/VGA Thermal Design Power
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
[IGC2018] AMD Don Woligroski - WHY Ryzen
[IGC2018] AMD Don Woligroski - WHY Ryzen[IGC2018] AMD Don Woligroski - WHY Ryzen
[IGC2018] AMD Don Woligroski - WHY Ryzen
 
La2 Motherboard
La2 MotherboardLa2 Motherboard
La2 Motherboard
 
Motherboard
MotherboardMotherboard
Motherboard
 
Ict - Motherboard
Ict - MotherboardIct - Motherboard
Ict - Motherboard
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
QNAP Portfolio 2016
QNAP Portfolio 2016 QNAP Portfolio 2016
QNAP Portfolio 2016
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 

More from 智啓 出川

GPGPU Seminar (GPU Accelerated Libraries, 3 of 3, Thrust)
GPGPU Seminar (GPU Accelerated Libraries, 3 of 3, Thrust) GPGPU Seminar (GPU Accelerated Libraries, 3 of 3, Thrust)
GPGPU Seminar (GPU Accelerated Libraries, 3 of 3, Thrust)
智啓 出川
 
GPGPU Seminar (GPU Accelerated Libraries, 2 of 3, cuSPARSE)
GPGPU Seminar (GPU Accelerated Libraries, 2 of 3, cuSPARSE) GPGPU Seminar (GPU Accelerated Libraries, 2 of 3, cuSPARSE)
GPGPU Seminar (GPU Accelerated Libraries, 2 of 3, cuSPARSE)
智啓 出川
 
GPGPU Seminar (GPU Accelerated Libraries, 1 of 3, cuBLAS)
GPGPU Seminar (GPU Accelerated Libraries, 1 of 3, cuBLAS) GPGPU Seminar (GPU Accelerated Libraries, 1 of 3, cuBLAS)
GPGPU Seminar (GPU Accelerated Libraries, 1 of 3, cuBLAS)
智啓 出川
 
GPGPU Seminar (GPGPU and CUDA Fortran)
GPGPU Seminar (GPGPU and CUDA Fortran)GPGPU Seminar (GPGPU and CUDA Fortran)
GPGPU Seminar (GPGPU and CUDA Fortran)
智啓 出川
 
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
智啓 出川
 
GPGPU Seminar (PyCUDA)
GPGPU Seminar (PyCUDA)GPGPU Seminar (PyCUDA)
GPGPU Seminar (PyCUDA)
智啓 出川
 
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
智啓 出川
 
2015年度先端GPGPUシミュレーション工学特論 第14回 複数GPUの利用
2015年度先端GPGPUシミュレーション工学特論 第14回 複数GPUの利用2015年度先端GPGPUシミュレーション工学特論 第14回 複数GPUの利用
2015年度先端GPGPUシミュレーション工学特論 第14回 複数GPUの利用
智啓 出川
 
2015年度先端GPGPUシミュレーション工学特論 第13回 数値流体力学への応用 (高度な最適化)
2015年度先端GPGPUシミュレーション工学特論 第13回 数値流体力学への応用(高度な最適化)2015年度先端GPGPUシミュレーション工学特論 第13回 数値流体力学への応用(高度な最適化)
2015年度先端GPGPUシミュレーション工学特論 第13回 数値流体力学への応用 (高度な最適化)
智啓 出川
 
2015年度先端GPGPUシミュレーション工学特論 第11回 数値流体力学への応用 (支配方程式,CPUプログラム)
2015年度先端GPGPUシミュレーション工学特論 第11回 数値流体力学への応用(支配方程式,CPUプログラム)2015年度先端GPGPUシミュレーション工学特論 第11回 数値流体力学への応用(支配方程式,CPUプログラム)
2015年度先端GPGPUシミュレーション工学特論 第11回 数値流体力学への応用 (支配方程式,CPUプログラム)
智啓 出川
 

More from 智啓 出川 (20)

Fortranが拓く世界、VSCodeが架ける橋
Fortranが拓く世界、VSCodeが架ける橋Fortranが拓く世界、VSCodeが架ける橋
Fortranが拓く世界、VSCodeが架ける橋
 
Very helpful python code to find coefficients of the finite difference method
Very helpful python code to find coefficients of the finite difference methodVery helpful python code to find coefficients of the finite difference method
Very helpful python code to find coefficients of the finite difference method
 
Why do we confuse String and Array of Characters in Fortran?
Why do we confuse String and Array of Characters in Fortran?Why do we confuse String and Array of Characters in Fortran?
Why do we confuse String and Array of Characters in Fortran?
 
Pythonによる累乗近似
Pythonによる累乗近似Pythonによる累乗近似
Pythonによる累乗近似
 
数値計算結果のPythonによる後処理について(1次元データのピーク値およびその位置の推定)
数値計算結果のPythonによる後処理について(1次元データのピーク値およびその位置の推定)数値計算結果のPythonによる後処理について(1次元データのピーク値およびその位置の推定)
数値計算結果のPythonによる後処理について(1次元データのピーク値およびその位置の推定)
 
オブジェクト指向Fortranが拓く(はずだった)新しい世界
オブジェクト指向Fortranが拓く(はずだった)新しい世界オブジェクト指向Fortranが拓く(はずだった)新しい世界
オブジェクト指向Fortranが拓く(はずだった)新しい世界
 
GPGPU Seminar (GPU Accelerated Libraries, 3 of 3, Thrust)
GPGPU Seminar (GPU Accelerated Libraries, 3 of 3, Thrust) GPGPU Seminar (GPU Accelerated Libraries, 3 of 3, Thrust)
GPGPU Seminar (GPU Accelerated Libraries, 3 of 3, Thrust)
 
GPGPU Seminar (GPU Accelerated Libraries, 2 of 3, cuSPARSE)
GPGPU Seminar (GPU Accelerated Libraries, 2 of 3, cuSPARSE) GPGPU Seminar (GPU Accelerated Libraries, 2 of 3, cuSPARSE)
GPGPU Seminar (GPU Accelerated Libraries, 2 of 3, cuSPARSE)
 
GPGPU Seminar (GPU Accelerated Libraries, 1 of 3, cuBLAS)
GPGPU Seminar (GPU Accelerated Libraries, 1 of 3, cuBLAS) GPGPU Seminar (GPU Accelerated Libraries, 1 of 3, cuBLAS)
GPGPU Seminar (GPU Accelerated Libraries, 1 of 3, cuBLAS)
 
GPGPU Education at Nagaoka University of Technology: A Trial Run
GPGPU Education at Nagaoka University of Technology: A Trial RunGPGPU Education at Nagaoka University of Technology: A Trial Run
GPGPU Education at Nagaoka University of Technology: A Trial Run
 
Cuda fortranの利便性を高めるfortran言語の機能
Cuda fortranの利便性を高めるfortran言語の機能Cuda fortranの利便性を高めるfortran言語の機能
Cuda fortranの利便性を高めるfortran言語の機能
 
PGI CUDA FortranとGPU最適化ライブラリの一連携法
PGI CUDA FortranとGPU最適化ライブラリの一連携法PGI CUDA FortranとGPU最適化ライブラリの一連携法
PGI CUDA FortranとGPU最適化ライブラリの一連携法
 
教育機関でのJetsonの活用の可能性
教育機関でのJetsonの活用の可能性教育機関でのJetsonの活用の可能性
教育機関でのJetsonの活用の可能性
 
GPGPU Seminar (GPGPU and CUDA Fortran)
GPGPU Seminar (GPGPU and CUDA Fortran)GPGPU Seminar (GPGPU and CUDA Fortran)
GPGPU Seminar (GPGPU and CUDA Fortran)
 
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
 
GPGPU Seminar (PyCUDA)
GPGPU Seminar (PyCUDA)GPGPU Seminar (PyCUDA)
GPGPU Seminar (PyCUDA)
 
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
 
2015年度先端GPGPUシミュレーション工学特論 第14回 複数GPUの利用
2015年度先端GPGPUシミュレーション工学特論 第14回 複数GPUの利用2015年度先端GPGPUシミュレーション工学特論 第14回 複数GPUの利用
2015年度先端GPGPUシミュレーション工学特論 第14回 複数GPUの利用
 
2015年度先端GPGPUシミュレーション工学特論 第13回 数値流体力学への応用 (高度な最適化)
2015年度先端GPGPUシミュレーション工学特論 第13回 数値流体力学への応用(高度な最適化)2015年度先端GPGPUシミュレーション工学特論 第13回 数値流体力学への応用(高度な最適化)
2015年度先端GPGPUシミュレーション工学特論 第13回 数値流体力学への応用 (高度な最適化)
 
2015年度先端GPGPUシミュレーション工学特論 第11回 数値流体力学への応用 (支配方程式,CPUプログラム)
2015年度先端GPGPUシミュレーション工学特論 第11回 数値流体力学への応用(支配方程式,CPUプログラム)2015年度先端GPGPUシミュレーション工学特論 第11回 数値流体力学への応用(支配方程式,CPUプログラム)
2015年度先端GPGPUシミュレーション工学特論 第11回 数値流体力学への応用 (支配方程式,CPUプログラム)
 

Recently uploaded

THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 

Recently uploaded (20)

THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 

Schematic diagrams of GPUs' architecture and Time evolution of theoretical FLOPS and Bandwidth

  • 2. GPUの性能の遷移(理論演算性能) GeForce ゲーム用 Quadro CG用 Tesla GPGPU用 http://docs.nvidia.com/cuda/cuda-c-programming-guide/で公開されている資料を基に作成 TheoreticalGFLOP/s 2001 2003 2005 2007 2009 2011 2013 2015 *1コードネーム *2製品ファミリ GeForce FX 5800 GeForce 6800 Ultra Pentium 4 GeForce 7800 GTX GeForce 8800 GTX GeForce GTX 280 GeForce GTX 480 GeForce GTX 580 Tesla*1 Fermi Kepler GeForce GTX 680 Kepler Maxwell GeForce GTX TITAN GeForce 780 Ti Woodcrest Harpertown Tesla C1060 Tesla C2050 Tesla K40 Tesla K20X Tesla M2090 Sandy Bridge Ivy Bridge Tesla K80 Tesla P100 GeForce GTX Titan X GeForce GTX Titan X Pascal Haswell Broadwell Excel Sheet 2 2017/4/1
  • 3. GPUの性能の遷移(理論バンド幅) GeForce ゲーム用 Quadro CG用 Tesla GPGPU用 Tesla*1 Fermi Maxwell Kepler GeForce FX 5900 GeForce 6800 GT GeForce 7800 GTX GeForce 8800 GTX GeForce GTX 280 GeForce GTX 480 GeForce GTX 680 GeForce 780 Ti Tesla K40Tesla K20X Tesla M2090 Tesla C2050 Tesla C1060 Northwood Woodcrest Harpertown Sandy Bridge Ivy Bridge Westmere Bloomfield Prescott 2003 2005 2007 2009 2011 2013 TheoreticalGB/s 2015 Tesla P100 GeForce GTX Titan X Pascal GeForce GTX Titan X Tesla K80 Haswell Broadwell Excel Sheet 3 2017/4/1 http://docs.nvidia.com/cuda/cuda-c-programming-guide/で公開されている資料を基に作成
  • 4. Teslaアーキテクチャ  Tesla C1060の仕様  SM数 30  CUDA Core数 240(=8 Core/SM×30 SM)  キャッシュを搭載せず http://www.anandtech.com/show/2549/2で公開されている画像を基に作成 SP SP SP SP SP SP SP SP SFU SFU 16 KB Shared Memory Register File (16384×32-bit) Streaming Multiprocessor SMSMSM 4 2017/4/1
  • 5. Fermiアーキテクチャ  Tesla M2050の仕様  SM数 14  CUDA Core数 448(=32Core/SM×14SM)  L1/L2 キャッシュを搭載  ECC(誤り訂正機能)を搭載 詳細はhttp://www.nvidia.co.jp/docs/IO/ 81860/NVIDIA_Fermi_Architecture_Whitep aper_FINAL_J.pdfを参照のこと Register File (16384 × 32-bit) 64 KB Shared Memory / L1 Cache SM Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core SFU×4 L2 Cache GigaThread Engine PCI Express 3.0 Host Interface MemoryController GPC Raster Engine GPC Raster Engine SM Raster Engine GPC Raster Engine GPC MemoryControllerMemoryController MemoryControllerMemoryControllerMemoryController http://www.anandtech.com/show/2849/3で公開されている画像を基に作成 5 2017/4/1
  • 6. Keplerアーキテクチャ  Tesla K20c/mの仕様  SMX数 13  CUDA Core数 2,496(=192 Core/SM×13 SMX) https://library.creativecow.net/kaufman_debra/NVIDIA-VGX/1で公開されている画像を基に作成 詳細はhttps://www.nvidia.co.jp/content /apac/pdf/tesla/nvidia-kepler-gk110-ar chitecture-whitepaper-jp.pdfを参照のこと Register File (65536 × 32-bit) 64 KB Shared Memory / L1 Cache 48 KB Read-Only Data Cache Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit SMX SMX L2 Cache GigaThread Engine PCI Express 3.0 Host Interface MemoryControllerMemoryControllerMemoryController MemoryControllerMemoryControllerMemoryController 6 2017/4/1
  • 7. Maxwellアーキテクチャ  Geforce GTX TITAN Xの仕様  SMM数 24  CUDA Core数 3,072(=128 Core/SM×24 SM)  倍精度演算器は搭載していない http://www.itmedia.co.jp/pcuser/articles/1409/19/news051.htmlで公開されている画像を基に作成 第1世代の詳細はhttps://www.nvidia.co.jp/cont ent/product-detail-pages/geforce-gtx-750-ti /geforce-gtx-750ti-whitepaper.pdfを参照のこと 64 KB Shared Memory L1 Cache SMM Register File (16,384 × 32- bit) Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core SFU SFU SFU SFU SFU SFU SFU SFU L1 Cache Register File (16,384 × 32- bit) Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core SFU SFU SFU SFU SFU SFU SFU SFU Register File (16,384 × 32- bit) Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core SFU SFU SFU SFU SFU SFU SFU SFU Register File (16,384 × 32- bit) Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core SFU SFU SFU SFU SFU SFU SFU SFU PolyMorph Engine 3.0 SMM Raster Engine GPC Raster Engine GPC L2 Cache GigaThread Engine PCI Express 3.0 Host Interface MemoryController Raster Engine GPC Raster Engine GPC MemoryController MemoryControllerMemoryController 7 2017/4/1
  • 8. L2 Cache GigaThread Engine PCI Express 3.0 Host Interface MemoryControllerMemoryControllerMemoryControllerMemoryController HighBandwidthMemory2HighBandwidthMemory2 MemoryControllerMemoryControllerMemoryControllerMemoryController HighBandwidthMemory2HighBandwidthMemory2 High-Speed Hub NVLinkNVLink NVLinkNVLink Pascalアーキテクチャ  Tesla P100の仕様  SM数 56  CUDA Core数 3584 (=64 Core/SM×56 SM) 詳細はhttp://images.nvidia.com/content/pdf/t esla/whitepaper/pascal-architecture-whitepa per.pdfを参照のこと 64 KB Shared Memory / L1 Cache 48 KB Read-Only Data Cache SM SFU SFU SFU SFU SFU SFU SFU SFU DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Register File (32768 × 32-bit) SFU SFU SFU SFU SFU SFU SFU SFU DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit DP Unit Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Register File (32768 × 32-bit) http://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdfで公開されている画像を基に作成 8 2017/4/1
  • 9. 理論演算性能(Embedded Excel Sheet)  プログラミング ガイドの図から データを推定  GPU倍精度のみ 正しい値に修正  残りは近似値 year NVIDIA GPU Single Precisionyear NVIDIA GPU Double Precisionyear Intel CPU Single Precisionyear Intel CPU Double Precision 2003.000 0.00E+00 2008.462 7.80E+01 2003.000 7.60E+00 2003.000 3.80E+00 2004.248 7.72E+01 2009.751 5.15E+02 2005.413 2.66E+01 2005.413 1.33E+01 2005.413 1.54E+02 2011.369 6.66E+02 2006.825 5.12E+01 2006.825 2.66E+01 2006.832 5.17E+02 2012.864 1.31E+03 2008.456 9.00E+01 2008.456 4.24E+01 2008.462 9.28E+02 2013.877 1.43E+03 2009.233 1.10E+02 2009.233 5.26E+01 2009.751 1.34E+03 2014.872 1.87E+03 2010.204 1.68E+02 2010.204 6.29E+01 2010.846 1.52E+03 2016.594 5.30E+03 2011.151 4.26E+02 2011.151 2.16E+02 2012.224 3.07E+03 2013.688 5.32E+02 2013.688 2.66E+02 2013.137 4.50E+03 2014.871 9.90E+02 2014.871 4.95E+02 2013.855 5.36E+03 2016.160 1.32E+03 2016.160 6.68E+02 2015.203 6.14E+03 2016.594 1.02E+04 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000 10500 11000 2001 2003 2005 2007 2009 2011 2013 2015 2017 TheoreticalGFLOP/s year NVIDIA GPU Double Precision NVIDIA GPU Single Precision Intel CPU Double Precision Intel CPU Single Precision 9 2017/4/1
  • 10. 理論バンド幅*(Embedded Excel Sheet) year Geforce GPUyear Tesla GPU year Intel CPU 2003.000 1.26E+01 2008.000 1.02E+02 2003.000 6.29E+00 2004.000 3.08E+01 2009.000 1.49E+02 2005.000 8.81E+00 2005.000 5.35E+01 2010.000 1.78E+02 2006.000 1.07E+01 2006.000 8.56E+01 2012.000 2.50E+02 2007.000 1.32E+01 2008.000 1.42E+02 2013.000 2.88E+02 2009.000 3.21E+01 2009.000 1.77E+02 2014.884 4.80E+02 2010.000 3.21E+01 2012.000 1.92E+02 2016.351 7.32E+02 2012.000 5.10E+01 2013.000 3.36E+02 2013.000 5.98E+01 2015.196 3.36E+02 2014.879 6.81E+01 2016.604 4.80E+02 2016.189 7.77E+01 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 2001 2003 2005 2007 2009 2011 2013 2015 2017 TheoreticalGB/s year Geforce GPU Tesla GPU Intel CPU  プログラミングガイ ドの図からデータを 推定  Tesla GPUのみ正 しい値に修正  残りは近似値 10 2017/4/1 *前スライドのExcelシートにも含まれているが,念のため