© Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part withou...
1
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
2
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
3
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
4
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
5
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
6
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
7
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
8
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
9
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia m...
10
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
11
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
12
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
13
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
14
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
15
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
16
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
17
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
18
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
19
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
20
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
21
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
22
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
23
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
24
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
25
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
26
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
27
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
28
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
29
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
30
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
31
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
32
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
33
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
34
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
35
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
36
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
37
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
38
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
39
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
40
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
41
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
43
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
44
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
45
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
46
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
47
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
48
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
49
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
50
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
51
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
52
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
53
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
54
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
55
IBM Systems Technical Events | ibm.com/training/events
© Copyright IBM Corporation 2016. Technical University/Symposia ...
Nächste SlideShare
Wird geladen in …5
×

Enabling POWER 8 advanced features on Linux

928 Aufrufe

Veröffentlicht am

Discover some Power8 advanced features and how to enable them under Linux on Power.
- POWER8 hardware in-core crypto acceleration which improves performances when using specific encryption protocols like https.
- Zswap Memory Compression accelerated by POWER 842 hardware compression engine which helps to keep your system reactive in memory over-commited situation.
- Power8 micro-threading support enabled by PowerKVM which improves computational efficiency under system's processor over-commitment.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Enabling POWER 8 advanced features on Linux

  1. 1. © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Enabling POWER 8 advanced features on Linux Sébastien Chabrolles Julien Limodin Fabrice Moyen PowerSystem Linux Center IBM Montpellier
  2. 2. 1 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. POWER8 Hardware Accelerator NX On Chip Accelerators (NX): Symetric Crypto Compression engine Random Number Generator One NX complex per chip A given NX can access all memory in the SMP A given NX can be accessed by any core Can be accessed via powerVM hypervizor call In Core Accelerators : Symetric Crytpo Private per core Leverage Vector Unit (VMX) Direct access for guest/VM (including KVM) IBM - POWER8 12 cores per socket (from 3 to 4 GHz) 8 HW threads / core (SMT technology) Large cache (96 MB : 8 MB / core) High Memory Bandwidth(~200 GB/s)
  3. 3. 2 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 1. Transparent Memory Compression 2. - 3. Power8 Split-Core Enable POWER 8 advanced features on Linux
  4. 4. 3 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Transparent Memory Compression Transparent Memory Compression is a feature provided by the operating system (Kernel) dynamically compresses process memory without process knowledge. PowerVM with AIX proposes this functionality via AME (Active Memory Expansion) Unfortunately, AME does not exist for Linux. Linux has an alternative solution is named ZSWAP !!! Zswap is a feature that hooks into the read and write sides of the swap code and acts as a compressed cache for pages go to and from the swap device Like AME, Zswap can use the Power NX compression accelerator (842) to improve compression performance. But unlike AME, zswap has some restriction : Paging device are needed with enough space to store uncompressed data. but still the real one. Application processes must allow to be swapped-out.
  5. 5. 4 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. P8 NX (on-chip) block diagram Second generation Nest Accelerator complex* Encryption Engine Random number generator Two 842 compression / decompression engines Proprietary IBM Research algorithm SRAM based dictionary compression Used by AME Good compression ratio at high bandwidth 106% of LZO on 190+ benchmarks 158% of compression ratio of software DEFLATE with FHT on Canterbury corpus Only available via PowerVM or BareMetal Linux. -chip accelerators for cryptography and active IBM J. Res. & Dev., vol. 57, no. 4, Nov./Dec. 2013. On-chip SMP Interconnect Interface che DMA Controller 842 Channel 0 RNG Channel 1 chs AES SHA IOB chs AES SHA IOB che 842 Channel 2 Channel 3 32B 32B 16B 16B 32B 32B32B 16B 16B 32B 32B32B 16B16B ingress arraysegress arrays 2to1 clock region On-chip SMP interconnect
  6. 6. 5 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Zswap ! For that, we will use a well known Java Benchmark (SPECjbb), run it several time while increasing the JVM Heap-Size. 1 core POWER8 10GB Mem Ubuntu 16.04 10 GB Phys. Mem JVM Heap-Size 9GB 10 GB 18 GB SPECjbb 1- Baseline Test with Zswap deactivated 2- Test with zswap and software compression (default) 3- Test with zswap and Power HW compression (842)
  7. 7. 6 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Memory Over-Allocation test with SPECjbb2005 (BaseLine) 0 20 40 60 80 100 120 9 10 11 12 13 14 15 16 17 18 %bopsvsnominal JVM Heap Size SPECjbb2005 performance and Memory Over-Allocation 1 P8 core SMT8 10GB Mem zswap off Memory Over-commitment 10% of nominal performance due to Memory thrashing)
  8. 8. 7 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. SWAP / Paging Activity System Memory Swap device 1- Swap Out / Page Out When the memory is full, a process (LRUD) scans memory and move the device. Asynchrous Backgroud task => No impact on 2- Swap In / Page In When page-fault occurs and pages are located in the paging device, those pages must be moved back to the Memory. As physical disks are much more slower => THIS HURTS PERFORMANCE !!! Swap out Swap in
  9. 9. 8 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 20 40 60 80 100 120 9 10 11 12 13 14 15 16 17 18 SwapI/O(MB/s) JVM Heap Size Swap I/O activity - SPECjbb2005 Memory Over-Allocation 1P8 core SMT8 - 10GB Mem zswap off Memory Over-Allocation test with SPECjbb2005 (Swap I/O) Memory Over-commitment Single SAS disk used as Swap device Reaches his limit at ~100 MB/s (50% read)
  10. 10. 9 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. In the memory thrashing case, the non-deterministic latency and performance degradation that I/O introduces could be fatal to your I/O storm could even prevent you to connect to your system or start any We need a way to smooth out this I/O storm and performance cliff as memory demand meets memory capacity. Zswap!
  11. 11. 10 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. ZSWAP requirement 1. Zswap is directly available in the Linux Kernel since v3.11 RedHat 7, CentOS 7, Fedora 19 Suse 12 Ubuntu 14.04 Enable zswap at boot level by adding the option zswap.enabled=1 in your boot loader. 2. Power NX (on-chip) acceleration (842) is only available for PowerVM and BareMetal Linux. Not Available today for PowerKVM guest cat /proc/device-tree/ibm,platform-facilities/ibm,compression-v1/status should return okay Note : Ubuntu need a kernel 4.2 or above to get access to Power NX hw (starting with ubuntu 15.10) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1488495 Enable zswap HW compression with zswap.compressor=842 in your boot loader.
  12. 12. 11 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Enabling POWER HW compression engine (842) with zswap RedHat : 1- Enable Zswap with 842 compressor at boot time. vi /etc/sysconfig/grub add zswap.enabled=1 zswap.compressor=842 to GRUB_CMDLINE_LINUX 2- Regenerate your grub.cfg file. grub2-mkconfig > /boot/grub2/grub.cfg 3- Add 842 kernel modules to your ramdisk echo 842 > /etc/modules-load.d/842.conf dracut -f 4- reboot and verify with dmesg | grep zswap [ 1.064790] zswap: loaded using pool 842/zbud
  13. 13. 12 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Enabling POWER HW compression engine (842) with zswap Ubuntu: 1- Enable Zswap with 842 compressor at boot time. vi /etc/sysconfig/grub add zswap.enabled=1 zswap.compressor=842 to GRUB_CMDLINE_LINUX 2- Regenerate your grub.cfg file. grub2-mkconfig > /boot/grub2/grub.cfg 3- Add 842 kernel modules to your ramdisk echo 842 > /etc/modules-load.d/842.conf vi /usr/share/initramfs-tools/hooks/842 Add the following lines: #!/bin/sh -e PREREQS="" case $1 in prereqs) echo "${PREREQS}"; exit 0;; esac . /usr/share/initramfs-tools/hook-functions force_load 842 update-initramfs -u 4- dmesg | grep zswap [ 1.064790] zswap: loaded using pool 842/zbud
  14. 14. 13 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Zswap parameters and monitoring Zswap parameters are located in /sys/module/zswap/parameters You can change : - compressor : [ lzo or 842 ] default lzo Compressor algorithm to use - enabled : [ Y or N ] Enable zswap - max_pool_percent : [1 to 100] default 20 Compress pool size limit (in % of RAM) - Zpool : [ zbud or zsmalloc ] default zbud Compression pool algorithm. Zbud : - store 2 pages in one slot (compression ratio 2:1) - evict the oldest pages to disk when full Zsmalloc : - can store more pages per slot than zbud (compression ratio ~ 3:1) - but unlike zbud, redirect new allocation to paging device when full. (does not recycle old pages). You can monitor zswap activity by looking at counters located in /sys/kernel/debug/zswap
  15. 15. 14 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. zswap Swap device 1- Compress/Uncompress (zbud by default). Scan/Compress use extra CPU cycles, but when page-fault occurs, it is really faster to get pages from the compressed pool in memory than disk. 3- Swap In / Page In When page-fault occurs and pages are located in the paging device, those pages must be moved back to the Memory. THIS HURTS PERFORMANCE !!! Uncompressed Memory Zpool (zbud) ZSWAP ZSWAP 2- Swap Out / Page Out When the compress zpool is full, zbud moves odest compressed pages to the swap device
  16. 16. 15 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. ZSWAP Memory Over-Allocation test with SPECjbb2005 0 20 40 60 80 100 120 9 10 11 12 13 14 15 16 17 18 %bopsvsnominal JVM Heap Size Testing zswap (zbud) with SPECjbb2005 1 P8 core SMT8 10GB Mem - max_pool_percent=40 zswap off zswap 842 (HW) Memory Over-commitment Zpool Over-commitment 75% of nominal performance at 140% memory
  17. 17. 16 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. ZSWAP HW vs Soft. compression 0 20 40 60 80 100 120 9 10 11 12 13 14 15 16 17 18 %bopsvsnominal JVM Heap Size Testing zswap (zbud) with SPECjbb2005 1 P8 core SMT8 10GB Mem - max_pool_percent=40 zswap off zswap lzo zswap 842 (HW) Memory Over-commitment Zpool Over-commitment X1.5
  18. 18. 17 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. ZSWAP Memory Over-Allocation test with SPECjbb2005 0 20 40 60 80 100 120 9 10 11 12 13 14 15 16 17 18 %bopsvsnominal JVM Heap Size Testing zswap (zbud) with SPECjbb2005 1 P8 core SMT8 10GB Mem - max_pool_percent=40 zswap 842 (HW) Memory Over-commitment Zpool Over-commitment 1 2 3
  19. 19. 18 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Case 1 : Zswap with Memory not Over-Committed Swap device Memory Used (uncompressed) Free memory Enough Memory available application No/Little swap I/O occuring Zswap is idle (no CPU overhead) => You can almost use all the memory before zswap starts working 100% Memory Used (uncompressed) 100% CPU user Best performance for application
  20. 20. 19 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Case 2 : Zswap with Memory Over-Committed Swap device Memory Used (uncompressed) Application needs more memory than available Zswap starts working, compressing pages in/out zpool. Zpool is increasing No/Little swap I/O occuring Below nominal performance due to memory scanning, unmapping. Compression/decompression are offloaded to NX 842 Zpool (zbud) ZSWAP 25% CPU system due to page scanning 75% of nominal performance on CPU bound application (worst case)
  21. 21. 20 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Zswap with 842(HW) vs LZO(Soft) Zswap HW compression 842 10GB RAM, 14GB Java Heap Size 25% of System CPU (overhead) due to memory page scanning. Compression offloaded to NX 842 75% of nominal performance Zswap Soft. Compression LZO 10GB RAM, 14GB Java Heap Size 50% of system CPU (overhead) due to memory page scanning and compression 50% of nominal performance 50% better CPU usage with POWER HW compression
  22. 22. 21 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 20 40 60 80 100 120 9 10 11 12 13 14 15 16 17 18 SwapI/O(kB/s) JVM Heap Size Testing zswap (zbud) with SPECjbb2005 1P8 core SMT8 - 10GB Mem - max_pool_percent=40 zswap off zswap on ZSWAP Memory Over-Allocation (Swap IO activity) Memory Over-commitment Zpool Over-commitment No or few paging when running 1 2 3
  23. 23. 22 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Case 3 : Zswap with Memory Over-Committed and Zpool Full Swap device Memory Used (uncompressed) Application needs more memory than available Zswap is working, compressing pages in/out zpool Zpool reaches max_pool_percent limit (compress pool is full). Need to free some space in Zpool => Swapping in/out !!! Performance degradation Zpool (zbud) FULL ZSWAP max_pool_percent=40 75% CPU wait I/O; only 10 % CPU user 10% of nominal performance due to waiting for pages on swap device (swap in) SWAP IN/OUT
  24. 24. 23 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Zswap Conclusion Zswap is not AME, but it can really helps to reduce impact of paging activity and secure your production system with no cost and no penalty: Power8 NX842 compression engine are available for PowerVM and BareMetal Linux No Impact, when memory demand is below RAM capacity installed. Can maintain your system at 75% performance in CPU 100% case (the worse scenario) and Zswap zbud x1.4 Memory expansion ratio (with max_pool_percent=40) You need More ??? then you can try zswap with ZSMALLOC allocator .
  25. 25. 24 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Zswap with Zsmalloc compress pool (vs zbud) Swap device 1- Compress/Uncompress Scan/Compress use extra CPU cycles, but when page-fault occurs, it is really faster to get pages from the compressed pool in memory than disk. 2- Swap In / Out But compare to zbud, zsmalloc page replacement algorithm. When the zpool is full, Paging out will occurs directly from the main memory to the paging device. Uncompressed Memory Zpool (zsmalloc) ZSWAP ZSWAP Zsmalloc can store more pages per slot than zbud. (3:1 measured) Resulting to a higher memory
  26. 26. 25 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 20 40 60 80 100 120 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 %bopsvsnomina JVM Heap Size Testing zswap (zbud vs zsmalloc) with SPECjbb2005 1 P8 core SMT8 10GB Mem - max_pool_percent=40 zswap off zswap zsmalloc 842 (HW) zswap 842 (HW) 75% Nominal perf. @ x1.8 Memory size 50% Nominal perf. @ x2 Memory size Memory Over-commitment Zpool (zbud) limit Zpool (zsmalloc) limit ZSWAP (zsmalloc) Memory Over-Allocation test with SPECjbb2005 x2
  27. 27. 26 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Monitor Zswap (zsmalloc) activity on 10GB VM with Grafana 10GB 15GB 20GB 25GB 30GB 35GB 40GB
  28. 28. 27 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 1. Transparent Memory Compression 2. - 3. Power8 Split-Core Enable POWER 8 advanced features on Linux
  29. 29. 28 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Symetric vs Asymetric encryption Symmetric encryption (AES): SLOW/Complex operation Private key never distributed Use to send AES secret key FAST/Simple operation Secret Key must be distributed Optimized by Power8 Not Optimized by Power8
  30. 30. 29 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Anatomy of a SSL/HTTPS request SSL Handshake Executed only once Asymetric encryption Secret Key exchange Data exchange Symetric encryption Client browser Server Majority of the exchange will use symetric encryption
  31. 31. 30 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. POWER8 Hardware Accelerator NX On Chip Accelerators (NX): Symetric Crypto: AES, SHA True random number generator Need to use thru hypervizor call for guest/VM Better single thread performance, larger bandwith Symetric Crypto currently not available for PowerKVM guest In Core Accelerators : Symetric Crypto : AES, SHA Cyclic Redundancy Check Private per core Leverage Vector Unit (VMX) Direct access for guest/VM IBM - POWER8 12 cores per socket (from 3 to 4 GHz) 8 HW threads / core (SMT technology) Large cache (96 MB : 8 MB / core) High Memory Bandwidth(~200 GB/s)
  32. 32. 31 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. AES Symmetric Cryptography / SHA Hash Engine AES Key lengths: 128b,192b,256b Combination AES-SHA / SHA-AES supported Move the data once to encrypt/decrypt and/then authenticate I/O buffer (IOB) provides function 8.9Gbps throughput per engine for AES 128 CBC Encrypt at 2.4GHz, 256B message 7Gbps engine throughput for SHA-512 at 2.4GHz, 256B message Supports byte aligned source and target data buffers, scatter/gather AES modes supported Electronic Codebook (ECB) Cipher Block Chaining (CBC) Counter (CTR) Counter with CBC-MAC (CCM) Galois Counter Mode (GCM) XCBC-MAC-96 (XMAC) Hash mode supported SHA1 SHA2 SHA-256 SHA2 SHA-512 Keyed-hash MAC (HMAC) MD5
  33. 33. 32 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. POWER8 Hardware Encryption Source: Performance Characteristics of the POWER8 Processor, Alex Mericas, IBM Corporation Algorithm POWER7+ POWER8 On-Chip On-Chip In-Core AES-GCM X X X AES-CTR X X X AES-CBC X X X AES-ECB X X X SHA-256 X X X SHA-512 X X X RNG X X CRC X Algorithm POWER7+ (SW) POWER8 (HW) Single Thread Multi Thread SHA-512 35 10.7 (x3) 2.6 (x13) AES-128-ENC 17 4 (x4) 0.8 (x21) AES-256-ENC 21 5.5 (x3.8) 1.1 (x19) Cycles per Byte (1 core and in-core crypto) -Chip Hardware Accelerators introduced with POWER7+ POWER8 has same accelerators Offload encryption for OS-based large messages (encrypted file systems, etc) On virtualized system, access to On-Chip (NX) Hardware Accelerators needs to be made through hypervizor call. In-Core acceleration is directly accessible to virtualized guest (no hypervisor call needed). includes user-mode instructions to accelerate common algorithms
  34. 34. 33 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Linux on Power hypervizor compatibility matrix Accelerator Features Baremetal PowerVM guest PowerKVM guest On-chip Compression (842) AES RNG In-core AES SHA CRC
  35. 35. 34 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. P8 Hardware Encryption Acceleration Combination of on-chip accelerators for CPU offload with larger blocks of encryption work, and in-core instructions for small data sizes. Exploitation available transparently under OS services and APIs On-chip Crypto In-core CryptoRandom Number Generation /dev/random /dev/urandom Hardware Kernel User Space Cryptographic Library in C IPsec TCP/IP Encrypted File System GSkit Standard Library Strong Keys Encrypted Data In Flight Encrypted Data In At Rest OpenSSL Key Generation Hypervisor H_COP calls Applications Custom Application Use/Libs = can be exploited here Physical TPM Standard Crypto APIs OpenSSL 1.0.2 libcrypto 34
  36. 36. 35 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. How to enable the in-core crypto accelerator: In Java, starting with IBM Java 7.1, AES is accelerated by using POWER8 in-core AES instructions by specifying -Dcom.ibm.crypto.provider.doAESInHardware=true on the JVM command line. OpenSSL > 1.0.2 is using VMX in-core P8 instruction and optimization for AES/SHA All the application based on this version of openSSL will benefit from P8 encryption acceleration. Ubuntu : OpenSSL 1.0.2 in ubuntu 15.10 and 16.04 RedHat : Still in OpenSSL 1.0.1 => Crypto Not Accelerated Fedora 23 : OpenSSL 1.0.2 Suse12, OpenSuse 13 : Still in OpenSSL 1.0.1 => Crypto Not Accelerated What can you do if you do not have the OpenSSL 1.0.2 ? Code recompilation with « Advanced Toolchain (v9) » « Advanced toolchain » is a gcc based compiler (provided by IBM for free) that provide POWER optimized library. (like libcrypto). You can then enable HW crypto acceleration to your application even if your Linux distribution provide the latest libcrypto (OpenSSL 1.0.2)
  37. 37. 36 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM Advance Toolchain for PowerLinux URLs: IBM Advance Toolchain for PowerLinux Documentation Improving performance with IBM Advance Toolchain for PowerLinux Description: The IBM Advance Toolchain for PowerLinux is a set of open source development tools and runtime libraries which allows users to take leading edge advantage of IBM's latest POWER hardware features on Linux. Over time, these libraries and latest compiler technologies are integrated into the shipping distributions. However, the IBM Advance Toolchain for PowerLinux contains the latest tested and supported GNU Compiler Collection (GCC) compiler versions, tailored for Power systems, and packaged together with an expanding set of processor-tuned libraries, allowing you to take advantage of the latest technology without waiting.. GCC Compiler
  38. 38. 37 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Example of Apache and wget compiled with Advance Toolchain (1/3) Idea was to recompile Apache and wget with Advance Toolchain to use the Power8 HW in-core cryptography in order to improve the performance. Recompile on PowerLinux: Get source code of Apache and wget from community Install Advance Toolchain AT9 Recompile out-of-the-box with the following flags, no source code changes at all required. export CFLAGS="-O3 -m64 -mcpu=power8 -mtune=power8" export PATH=/opt/at9.0/bin/:$PATH Configure, make and make install Simple test: download a 10G file with wget from the Apache web server in HTTPSinste 10GB Apache (httpd) WGET loopbackSSL
  39. 39. 38 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Example of Advance Toolchain with Apache and wget (2/3) Standard Apache and wget provided by the repo Transfer done in 3m10s Compiled Apache and wget with Advance Toolchain Transfer done in 23s
  40. 40. 39 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Standard Advanced toolchain Example of Advance Toolchain with Apache and wget (3/3) Profiling shows that AT version is using P8 accelerated version of ghash and aes
  41. 41. 40 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Example 2 : J2EE Application benchmark (DayTrader application) 60% better CPU Utilisation with Power in-core encryption With P8 HW CryptoWithout P8 HW Crypto
  42. 42. 41 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 1. Transparent Memory Compression 2. - 3. Power8 Split-Core Enable POWER 8 advanced features on Linux
  43. 43. 43 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Enabling SMT on PowerKVM guests (1/2) runrunPowerKVM with 2 P8 cores Guest1 2 vcpus Guest2 4 vcpus Default : 2 vcores, 1 thread Manually Defined: 1 vcore, 4 threads <vcpu>4<vcpu/> <cpu> <topology sockets=1 cores=1 threads=4/> </cpu> guest2.xml WAIT No free core available. Vcore cannot be dispatched Waiting for next dispatch (time sharing) SMT level different than 1 will slow down Guests dispatching. How do we schedule guest VCPUs onto physical CPU cores? Introduce notion of "virtual core" (vcore) VCPUs are allocated to vcores before being dispatched by PowerKVM host to real Core. By default 1 vcpu = 1 vcore Can be modified to xVCPU = 1 core to enable SMT.
  44. 44. 44 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Enabling SMT on PowerKVM guests (2/2) In order to configure a KVM Guest, the number of VCPUs on a guest must be set to the product of cores and threads per core assigned to the guest, and the number of threads per core must be explictly set. vcpu = sockets x cores x threads For example, when using libvirt, you can configure a guest with the following settings in order to get a guest with SMT=8 and 2 cores (16 total vcpus) <vcpu>16</vcpu> <cpu> <topology sockets='1' cores='2' threads </cpu> With that configuration, a guest OS will be able to enable SMT=8 (default) and use the 16 threads across the assigned two cores. This also allows the guest to dynamically control the SMT level directly from the OS (ppc64_cpu --smt=x)
  45. 45. 45 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Enabling SMT topology with Kimchi on PowerKVM 3.1
  46. 46. 46 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Default guest SMT mode is 1 VCPU/vcore Inefficient use of resources in whole-core mode (1 thread/core) Often chosen by users who are not familiar with POWER Often chosen by management agents (e.g. OpenStack) Setting topology is too complex in big cloud environment Up to now, default core-split mode was whole-core Good for single-thread performance Allows users to run SMT1, SMT2, SMT4 and SMT8 guests Hits over commitment early, especially with SMT1 guests with 20 cores P8 => 20 maximum vcpu dispatched in // by default. PowerKVM 3.1 addresses these points with 2 features : 1. (sub)core sharing (piggybacking) 2. Dynamic multi-threading (split-core) 2 vcpus PowerKVM with 2 P8 cores run run Guest 1 Guest 1 Guest 2 runrun PowerKVM with 2 P8 cores
  47. 47. 47 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. PowerKVM Micro-Threading (Split-Core) No split-core : 1 full core available with up to 8 parallel threads Only 1 guest running at a time (PowerVM only mode available) split-core by 2 : 2 sub-cores available each with up to 4 parallel threads. Up to 2 guests running at a time split-core by 4 : 4 sub-cores available each with up to 2 parallel threads. Up to 4 guests running at a time IBM Power8 chip 1 Core 1 2 21 43 1
  48. 48. 48 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. PowerKVM Micro-Threading (Split-Core) VM1 VM2 VM3 VM4 Context switching (hypervisor overhead) time Fullcore thr1 thr2 thr3 thr4 thr5 thr6 thr7 thr8 Full core POWER8 Power8 is a 8 threads processor. All threads share MMU(1) context, therefore must be in same partition. Guests in single thread (SMT 1) mode cannot use the full core capacity. Micro-Threading benefits: Better CPU resources usage More virtual machines per core Reduces over-commitment overhead (context switch) Micro-Threading limitations: Guest SMT is limited to 2 or 4, depending on the Split Core level (Half core, Quarter Core) All threads are running in SMT8 mode. (lower single thread perf.) PowerKVM introduces the possibility to split a Power8 core in 2 or 4 subcores: Micro-Threading (static in PowerKVM 2.1, dynamic in PowerKVM 3.1) Each subcore has its own MMU(1) and can be dispatched independently to a different Guest (VM). (1) MMU (MemoryManagement Unit) is a Hardware Memory Decoder that maps virtual addresses to physical addresses VM2 subcore1 VM1 VM3 VM4 time subcore1 subcore2 subcore3 subcore4 thr1 thr2 thr3 thr4 thr5 thr6 thr7 thr8 POWER8 subcore2 subcore3 subcore4
  49. 49. 49 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. PowerKVM 3.1 Dynamic Micro-Threading (SubCores) With PowerKVM 3.1, The hypervisor may dynamically choose to split by-two or by- four each core in order to optimize vcpus needs with hardware available resources. runrunPowerKVM3 with 1 P8 core Guest1 2 vcpus <topology sockets=1 cores=1threads=2/> Guest2 4 vcpus <topology sockets=1 cores=1 threads=4/> Manually Defined : 1 vcore, 2 threads Manually Defined: 1 vcore, 4 threads run runPowerKVM3 with 1 P8 core Guest1 2 vcpus <topology sockets=1 cores=1 threads=2/> Guest2 2 vcpus <topology sockets=1 cores=1 threads=2/> Manually Defined : 1 vcore, 2 threads Manually Defined: 1 vcore, 2 threads Guest3 2 vcpus <topology sockets=1 cores=1 threads=2/> Manually Defined : 1 vcore, 2 threads Splitting by 2 is optimum Splitting by 4 is optimum To manually and statically set the level of subcoring, use at PowerKVM host level: ppc64_cpu --subcores-per-core # Get number of subcores per core ppc64_cpu --subcores-per-core=X # Set subcores per core to X (1,2 or 4) ppc64_cpu --threads-per-core # Get threads per core (It needs all VMs to be offline)
  50. 50. 50 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. PowerKVM 3.1 Micro-Threading (Subcore) DEMO
  51. 51. 51 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. PowerKVM 3.1 Dynamic Micro-Threading (SubCores) DEMO The demonstration is done with: 4 Guests (Virtual machines), all pinned onto one single core of a 20-cores S822L Power8 server. PowerKVM 3.1 virtualization. Each guest is defined with a manual topology of 1 vcore and 2 threads. run PowerKVM3 with 1 P8 core split1 2 vcpus <topology sockets=1 cores=1 threads=2/> split2 2 vcpus <topology sockets=1 cores=1 threads=2/> Manually Defined : 1 vcore, 2 threads Manually Defined: 1 vcore, 2 threads split3 2 vcpus <topology sockets=1 cores=1 threads=2/> Manually Defined : 1 vcore, 2 threads split3 2 vcpus <topology sockets=1 cores=1 threads=2/> Manually Defined : 1 vcore, 2 threads
  52. 52. 52 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Time Slice CoreThreads 1 2 3 4 5 6 7 8 Time Slice CoreThreads 1 2 3 4 5 6 7 8 PowerKVM 3.1 Dynamic Micro-Threading (SubCores) DEMO (guest topology is 1 vcore, 2 threads) Time Slice CoreThreads 1 2 3 4 5 6 7 8 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 split1 split2 Split3 split4 No Micro-Threading allowed Micro-Threading with 2 sub-cores max Micro-Threading with 4 sub-cores max
  53. 53. 53 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 400 VMs on a (small) S822LC 20-cores ? Thanks to split-core (and piggybacking), even 400 VMs but nevertheless powerfull IBM S822LC is OK (even if definitely extreme). Guest= 2 vcpus Default : 2 vcores, 1 threads No need to split(thanks to piggyback with 20 VMs) Split-core helps optimizing cores utilization Number of VMs Almost like PowerKVM 2.1 (piggybacknot available with pKVM 2.1) PowerKVM 3.1 split-corebenefits PgbenchpostgreSQL workload(tps)
  54. 54. 54 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Session Evaluations YOUR OPINION MATTERS! Submit four or more session evaluations by 5:30pm Wednesday to be eligible for drawings! *Winners will be notified Thursday morning. Prizes must be picked up at registration desk, during operating hours, by the conclusion of the event. 1 2 3 4
  55. 55. 55 IBM Systems Technical Events | ibm.com/training/events © Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Continue growing your IBM skills ibm.com/training provides a comprehensive portfolio of skills and career accelerators that are designed to meet all your training needs. If training that is right for you with our Global Training Providers, we can help. Contact IBM Training at dpmc@us.ibm.com Global Skills Initiative

×