20161121 open hyperscale#6

エヌビディア
ディープラーニングソリューションアーキテクト兼 CUDAエンジニア村上真奈
NVIDIA GPUマネージメント入門
HyperScale 勉強会 #06

3
前回の復習
http://docs.aws.amazon.com/ja_jp/AWSEC2/latest/UserGuide/using_cluster_computing.html
P2インスタンスついに出た!

4
前回の復習
https://aws.amazon.com/jp/blogs/aws/new-p2-instance-type-for-amazon-ec2-up-to-16-gpus/
とりあえずこのコマンドを実行すると良いらしい

5
これって何をしているの?

6
GPU Boost For NVIDIA Tesla
GPU Boostという言葉を聞いた事がありますか?
https://www.nvidia.com/content/PDF/kepler/nvidia-gpu-boost-tesla-k40-06767-001-v02.pdf
Increase Performance with GPU Boost
ベースクロック(デフォルトクロック)とBoost
Clock(ブーストクロック)が存在。
ブースとクロックは複数ある場合もある
Tesla K40,GPU Boost whitepaperより引用

7
Increase Performance with GPU Boost and K80 Autoboost
• PARALLEL FORALL(NVIDIA技術ブログ)にTesla K80のパフォーマンス向上の為にという記事あり
• GPU BoostとAutoboostの説明あり
https://devblogs.nvidia.com/parallelforall/increase-performance-gpu-boost-k80-autoboost/
Increase Performance with GPU Boost

8
NVIDIA SMI & NVML
• NVIDIA Management Library (NVML SDK)
• http://developer.nvidia.com/tesla-deployment-kit
• Python NVMLバインド
• http://pypi.python.org/pypi/nvidia-ml-py/
• Perl NVMLバインド
• http://search.cpan.org/~nvbinding/nvidia-ml-pl/
Monitoring and Managing NVIDIA GPUs in Cluster Environments

9
いろいろ書いてあるが、、、

10
まずはやってみましょう

11
GPUパフォーマンス設定の手順
1. クラスタ内のGPU数を確認する
2. 現在のGPUのクロックを確認する
3. Persistenceモードを有効にする
4. SUPPORTED_CLOCKSの確認する
5. クロックの変更する
GPU Boost

12
1.クラスタ内のGPU数を確認する
• nvidia-smiを実行(GPU一覧の取得のみならばnvidia-smi –Lでも良い)
$ nvidia-smi
Sun Nov 20 08:00:34 2016
+-----------------------------------------------------------------------------------------------------------------------------------+
| NVIDIA-SMI 361.93.02 Driver Version: 361.93.02 |
|-------------------------------------------+-------------------------------------------+------------------------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|=============================+==============================+=============================|
| 0 Tesla P100-SXM2... On | 0000:06:00.0 Off | 0 |
| N/A 23C P0 29W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------------------+-------------------------------------------+------------------------------------------+
| 1 Tesla P100-SXM2... On | 0000:07:00.0 Off | 0 |
+-------------------------------------------+-------------------------------------------+------------------------------------------+
| 2 Tesla P100-SXM2... On | 0000:0A:00.0 Off | 0 |
+-------------------------------------------+-------------------------------------------+------------------------------------------+
| 3 Tesla P100-SXM2... On | 0000:0B:00.0 Off | 0 |
+-------------------------------------------+-------------------------------------------+------------------------------------------+
| 4 Tesla P100-SXM2... On | 0000:85:00.0 Off | 0 |
+-------------------------------------------+-------------------------------------------+------------------------------------------+
| 5 Tesla P100-SXM2... On | 0000:86:00.0 Off | 0 |
+-------------------------------------------+-------------------------------------------+------------------------------------------+
| 6 Tesla P100-SXM2... On | 0000:89:00.0 Off | 0 |
+-------------------------------------------+-------------------------------------------+------------------------------------------+
| 7 Tesla P100-SXM2... On | 0000:8A:00.0 Off | 0 |
+-------------------------------------------+-------------------------------------------+------------------------------------------+

13
2.現在のGPUのクロックを確認する
• nvidia-smi –q –i [GPU IDまたはBus ID] –d CLOCK
$ nvidia-smi -q -i 0 -d CLOCK
==============NVSMI LOG==============
Timestamp : Sun Nov 20 20:23:45 2016
Driver Version : 361.93.02
Attached GPUs : 8
GPU 0000:07:00.0
Clocks
Graphics : 405 MHz
SM : 405 MHz
Memory : 715 MHz
Video : 835 MHz
Applications Clocks
Graphics : 1328 MHz
Memory : 715 MHz
Default Applications Clocks
Graphics : 1328 MHz
Memory : 715 MHz
Max Clocks
Graphics : 1480 MHz
SM : 1480 MHz
Memory : 715 MHz
Video : 1480 MHz
SM Clock Samples
Duration : 4499.47 sec

14
3.Persistenceモードを有効にする
• Persistenceモードとは、CUDAアプリケーションやXアプリケーションが存在していない状態でも
NVIDIAドライバをロードし続ける為の設定
• もしドライバがロードされていない場合、ロードしてから計算が走るのでオーバーヘッドが発生してしまう
可能性がある
• sudo nvidia-smi –pm ENABLED –i [GPU IDまたはBus ID]
$ nvidia-smi -pm ENABLED -i 0
Persistence mode is already Enabled for GPU 0000:06:00.0.
All done.

15
4. SUPPORTED_CLOCKSの確認する
• nvidia-smi –q –i [GPU IDまたはBus ID] –d SUPPORTED_CLOCKS
Attached GPUs : 8
GPU 0000:06:00.0
Supported Clocks
Memory : 715 MHz
Graphics : 1480 MHz
Graphics : 1468 MHz
Graphics : 1455 MHz
Graphics : 1442 MHz
Graphics : 1430 MHz
Graphics : 1417 MHz
Graphics : 1404 MHz
Graphics : 1392 MHz
Graphics : 1379 MHz
Graphics : 1366 MHz
Graphics : 1354 MHz
Graphics : 1341 MHz
Graphics : 1328 MHz
Graphics : 1316 MHz
Graphics : 1303 MHz
Graphics : 1290 MHz
Graphics : 1278 MHz
Graphics : 1265 MHz
Graphics : 1252 MHz
Graphics : 1240 MHz

16
5.クロックの変更する
• graphics clock rateはmemory clock rateは依存しているので、両方を設定する必要がある
• sudo nvidia-smi –ac <memory,graphics> –i [GPU IDまたはBus ID]
$ sudo nvidia-smi -ac 715,1480 -i 0
Applications clocks set to "(MEM 715, SM 1480)" for GPU 0000:06:00.0
All done.
$ sudo nvidia-smi -q -i 0 -d CLOCK
==============NVSMI LOG==============
Timestamp : Sun Nov 20 21:16:54 2016
Driver Version : 361.93.02
Attached GPUs : 8
GPU 0000:06:00.0
Clocks
Graphics : 405 MHz
SM : 405 MHz
Memory : 715 MHz
Video : 835 MHz
Applications Clocks
Graphics : 1480 MHz
Memory : 715 MHz
Default Applications Clocks
Graphics : 1328 MHz
Memory : 715 MHz
Max Clocks
Graphics : 1480 MHz
SM : 1480 MHz
Memory : 715 MHz

17
ベンチマークをしてみよう

18
Graphics Clock = 1328 MHz
ベンチマーク
dgxuser@dgx-1:~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody$ ./nbody -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Tesla P100-SXM2-16GB" with compute capability 6.0
> Compute 6.0 CUDA device: [Tesla P100-SXM2-16GB]
57344 bodies, total time for 10 iterations: 110.661 ms
= 297.154 billion interactions per second
= 5943.080 single-precision GFLOP/s at 20 flops per interaction

19
Graphics Clock = 1480 MHz
ベンチマーク
dgxuser@dgx-1:~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody$ ./nbody -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Tesla P100-SXM2-16GB" with compute capability 6.0
> Compute 6.0 CUDA device: [Tesla P100-SXM2-16GB]
57344 bodies, total time for 10 iterations: 103.895 ms
= 316.505 billion interactions per second
= 6330.091 single-precision GFLOP/s at 20 flops per interaction

21
その他
• デフォルトクロックに戻す
• sudo nvidia-smi –rac –i [GPU IDまたはBus ID]
• クロック変更可能なユーザを設定する
• sudo nvidia-smi –acp [UNRESTRICTEDまたはRESTRICTED] –i [GPU IDまたはBus ID]
• などなど

20161121 open hyperscale#6

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (12)

Ähnlich wie 20161121 open hyperscale#6

Ähnlich wie 20161121 open hyperscale#6 (20)

20161121 open hyperscale#6