More Related Content
Similar to Schematic diagrams of GPUs' architecture and Time evolution of theoretical FLOPS and Bandwidth (20)
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FLOPS and Bandwidth
- 3. GPUの性能の遷移(理論バンド幅)
GeForce
ゲーム用
Quadro
CG用
Tesla
GPGPU用
Tesla*1
Fermi
Maxwell
Kepler
GeForce FX 5900
GeForce 6800 GT
GeForce 7800 GTX
GeForce 8800 GTX
GeForce GTX 280
GeForce GTX 480
GeForce GTX 680
GeForce 780 Ti
Tesla K40Tesla K20X
Tesla M2090
Tesla C2050
Tesla C1060
Northwood Woodcrest
Harpertown
Sandy Bridge
Ivy Bridge
Westmere
Bloomfield
Prescott
2003 2005 2007 2009 2011 2013
TheoreticalGB/s
2015
Tesla P100
GeForce GTX
Titan X
Pascal
GeForce GTX
Titan X
Tesla K80
Haswell
Broadwell
Excel Sheet
3 2017/4/1
http://docs.nvidia.com/cuda/cuda-c-programming-guide/で公開されている資料を基に作成
- 4. Teslaアーキテクチャ
Tesla C1060の仕様
SM数 30
CUDA Core数 240(=8 Core/SM×30 SM)
キャッシュを搭載せず
http://www.anandtech.com/show/2549/2で公開されている画像を基に作成
SP SP
SP SP
SP SP
SP SP
SFU SFU
16 KB
Shared Memory
Register File
(16384×32-bit)
Streaming
Multiprocessor
SMSMSM
4 2017/4/1
- 5. Fermiアーキテクチャ
Tesla M2050の仕様
SM数 14
CUDA Core数 448(=32Core/SM×14SM)
L1/L2 キャッシュを搭載
ECC(誤り訂正機能)を搭載
詳細はhttp://www.nvidia.co.jp/docs/IO/
81860/NVIDIA_Fermi_Architecture_Whitep
aper_FINAL_J.pdfを参照のこと
Register File
(16384 × 32-bit)
64 KB Shared
Memory / L1 Cache
SM
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU×4
L2 Cache
GigaThread Engine
PCI Express 3.0 Host Interface
MemoryController
GPC
Raster Engine
GPC
Raster Engine
SM
Raster Engine
GPC
Raster Engine
GPC
MemoryControllerMemoryController
MemoryControllerMemoryControllerMemoryController
http://www.anandtech.com/show/2849/3で公開されている画像を基に作成
5 2017/4/1
- 6. Keplerアーキテクチャ
Tesla K20c/mの仕様
SMX数 13
CUDA Core数 2,496(=192 Core/SM×13 SMX)
https://library.creativecow.net/kaufman_debra/NVIDIA-VGX/1で公開されている画像を基に作成
詳細はhttps://www.nvidia.co.jp/content
/apac/pdf/tesla/nvidia-kepler-gk110-ar
chitecture-whitepaper-jp.pdfを参照のこと
Register File (65536 × 32-bit)
64 KB Shared Memory / L1 Cache
48 KB Read-Only Data Cache
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
SMX
SMX
L2 Cache
GigaThread Engine
PCI Express 3.0 Host Interface
MemoryControllerMemoryControllerMemoryController
MemoryControllerMemoryControllerMemoryController
6 2017/4/1
- 7. Maxwellアーキテクチャ
Geforce GTX TITAN Xの仕様
SMM数 24
CUDA Core数 3,072(=128 Core/SM×24 SM)
倍精度演算器は搭載していない
http://www.itmedia.co.jp/pcuser/articles/1409/19/news051.htmlで公開されている画像を基に作成
第1世代の詳細はhttps://www.nvidia.co.jp/cont
ent/product-detail-pages/geforce-gtx-750-ti
/geforce-gtx-750ti-whitepaper.pdfを参照のこと
64 KB Shared Memory
L1 Cache
SMM
Register File
(16,384 × 32-
bit)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
L1 Cache
Register File
(16,384 × 32-
bit)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
Register File
(16,384 × 32-
bit)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
Register File
(16,384 × 32-
bit)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
PolyMorph Engine 3.0
SMM
Raster Engine
GPC
Raster Engine
GPC
L2 Cache
GigaThread Engine
PCI Express 3.0 Host Interface
MemoryController
Raster Engine
GPC
Raster Engine
GPC
MemoryController
MemoryControllerMemoryController
7 2017/4/1
- 8. L2 Cache
GigaThread Engine
PCI Express 3.0 Host Interface
MemoryControllerMemoryControllerMemoryControllerMemoryController
HighBandwidthMemory2HighBandwidthMemory2
MemoryControllerMemoryControllerMemoryControllerMemoryController
HighBandwidthMemory2HighBandwidthMemory2
High-Speed Hub
NVLinkNVLink NVLinkNVLink
Pascalアーキテクチャ
Tesla P100の仕様
SM数 56
CUDA Core数 3584 (=64 Core/SM×56 SM)
詳細はhttp://images.nvidia.com/content/pdf/t
esla/whitepaper/pascal-architecture-whitepa
per.pdfを参照のこと
64 KB Shared Memory / L1 Cache
48 KB Read-Only Data Cache
SM
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Register File (32768 × 32-bit)
SFU
SFU
SFU
SFU
SFU
SFU
SFU
SFU
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
DP Unit
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Register File (32768 × 32-bit)
http://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdfで公開されている画像を基に作成
8 2017/4/1
- 9. 理論演算性能(Embedded Excel Sheet)
プログラミング
ガイドの図から
データを推定
GPU倍精度のみ
正しい値に修正
残りは近似値
year NVIDIA GPU Single Precisionyear NVIDIA GPU Double Precisionyear Intel CPU Single Precisionyear Intel CPU Double Precision
2003.000 0.00E+00 2008.462 7.80E+01 2003.000 7.60E+00 2003.000 3.80E+00
2004.248 7.72E+01 2009.751 5.15E+02 2005.413 2.66E+01 2005.413 1.33E+01
2005.413 1.54E+02 2011.369 6.66E+02 2006.825 5.12E+01 2006.825 2.66E+01
2006.832 5.17E+02 2012.864 1.31E+03 2008.456 9.00E+01 2008.456 4.24E+01
2008.462 9.28E+02 2013.877 1.43E+03 2009.233 1.10E+02 2009.233 5.26E+01
2009.751 1.34E+03 2014.872 1.87E+03 2010.204 1.68E+02 2010.204 6.29E+01
2010.846 1.52E+03 2016.594 5.30E+03 2011.151 4.26E+02 2011.151 2.16E+02
2012.224 3.07E+03 2013.688 5.32E+02 2013.688 2.66E+02
2013.137 4.50E+03 2014.871 9.90E+02 2014.871 4.95E+02
2013.855 5.36E+03 2016.160 1.32E+03 2016.160 6.68E+02
2015.203 6.14E+03
2016.594 1.02E+04
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
7000
7500
8000
8500
9000
9500
10000
10500
11000
2001 2003 2005 2007 2009 2011 2013 2015 2017
TheoreticalGFLOP/s
year
NVIDIA GPU Double Precision
NVIDIA GPU Single Precision
Intel CPU Double Precision
Intel CPU Single Precision
9 2017/4/1
- 10. 理論バンド幅*(Embedded Excel Sheet)
year Geforce GPUyear Tesla GPU year Intel CPU
2003.000 1.26E+01 2008.000 1.02E+02 2003.000 6.29E+00
2004.000 3.08E+01 2009.000 1.49E+02 2005.000 8.81E+00
2005.000 5.35E+01 2010.000 1.78E+02 2006.000 1.07E+01
2006.000 8.56E+01 2012.000 2.50E+02 2007.000 1.32E+01
2008.000 1.42E+02 2013.000 2.88E+02 2009.000 3.21E+01
2009.000 1.77E+02 2014.884 4.80E+02 2010.000 3.21E+01
2012.000 1.92E+02 2016.351 7.32E+02 2012.000 5.10E+01
2013.000 3.36E+02 2013.000 5.98E+01
2015.196 3.36E+02 2014.879 6.81E+01
2016.604 4.80E+02 2016.189 7.77E+01
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
2001 2003 2005 2007 2009 2011 2013 2015 2017
TheoreticalGB/s
year
Geforce GPU
Tesla GPU
Intel CPU
プログラミングガイ
ドの図からデータを
推定
Tesla GPUのみ正
しい値に修正
残りは近似値
10 2017/4/1
*前スライドのExcelシートにも含まれているが,念のため