2. ABOUT ME
Oracle Consultant since 2001
Former developer (C, Java, perl, PL/SQL)
Blogger since 2004
http://laurent.leturgez.free.fr (In french and discontinued)
http://laurent-leturgez.com
Twitter : @lleturgez
OCM 11g
3. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
4. Caveats
Most of the topics are from
My own researches
My past life as a developer
Some of the topics are about internals, so:
Analysis and conclusion may be incomplete
Future versions of Oracle may change the features
Tests have been done with Oracle 12.1.0.2, Oracle
Enterprise Linux 7.1, VMWare Fusion 7 (And VirtualBox)
5. Before we start …
Some fundamentals (from Dennis Yurichev’s book)
CPU register : […]The easiest way to understand a register is to
think of it as an untyped temporary variable. Imagine if you were
working with high-level PL1 and could only use eight 32-bit (or 64-
bit) variables. Yet a lot can be done using just these!
Instruction : A primitive CPU command. The simplest examples
include: moving data between registers, working with memory and
arithmetic primitives. As a rule, each CPU has its own instruction set
architecture (ISA).
Assembly language : Mnemonic code and some extensions like
macros which are intended to make a programmer’s life easier.
http://beginners.re/Reverse_Engineering_for_Beginners-en.pdf
6. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
7. SIMD instructions … outside
Oracle 12c
SIMD stands for Single Instruction Multiple Data
Process multiple data
In one CPU instruction
Based on
Specific registers
Specific CPU instructions and sets of instructions
Not Oracle specific
CPU Architecture specific
Intel
IBM (Altivec)
Sparc (VIS)
This presentation is mainly about Intel architecture
8. SIMD instructions … outside
Oracle 12c
What is a SIMD register ?
It’s a CPU register
Wider than traditional registers (RDI, RSI, R8, R9 etc.)
128 up to 512 bits wide
Contains many data
9. SIMD instructions … outside
Oracle 12c
Scalar operation
an array of 4 integers {1,2,3,4}
add 1 to each value
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
2
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
2
2
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
4
1
5
3 4 52
…/…
LOAD ADD SAVE
4 LOAD
4 ADD
4 SAVE
10. SIMD instructions … outside
Oracle 12c
SIMD operation
an array of 4 integers {1,2,3,4}
add 1 to each value
SIMD Reg1
CPU
RAM
In
Out
2 3 41
1 1 11SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
2 3 41
1 1 11SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
2 3 41
1 1 11
3 4 52
SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
3 4 52
2 3 41
1 1 11
3 4 52
SIMD Reg2
SIMD Reg3
LOAD ADD SAVE
11. SIMD instructions … outside
Oracle 12c
Instruction
set
MMX SSE SSE2/SSE3/S
SSE3/SSE4
AVX/AVX2 AVX3 or
AVX512
Register Size 64 Bits 128 bits 128 bits 256 Bits 512 bits
# Registers 8 8 16 16 32
Register Name MM0 to MM7 XMM0 to XMM7 XMM0 to XMM15 YMM0 to YMM15 ZMM0 to ZMM31
Processors Pentium II Pentium III Pentium IV to
Nehalem
Sandy Bridge -
Haswell
Skylake
Other Only four 32 bits
single precision
floating point
numbers
Usage expansion
(two 64 bits
double precision,
four 32 bits
integers and up to
sixteen 8 bits
bytes)
Three operand
instructions (non
destructive) :
A+B=C rather
than A=A+B
Alignements
requirements
relaxed
13. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
14. Will my application use SIMD registers
and instructions ?
It depends on :
Hardware
Consult processors datasheets to see which instruction set extensions
are used (if many)
http://ark.intel.com/#@Processors
Hypervisor
Some (old) hypervisors do not support modern extensions
VirtualBox versions <5.0 don’t support SSE4, AVX and AVX2
Hyper-V on W2008R2-SP1 needs patch for specific processors to
support AVX
15. It depends on the Operating System
AVX (256 bits) is supported from
Linux Kernel >= 2.6.30
Redhat EL5 : 2.6.18
Oracle EL5 w/UEK : 2.6.32
AVX needs xsave kernel parameter
Solaris 10 upd 10 and Solaris 11
Windows 2008 R2 SP1
Will my application use SIMD registers
and instructions ?
16. It depends on the compiler
GCC
> 4.6 for AVX support
Use of specific switches (-msse2, -msse4.1, msse4.2, -mavx,
-mavx2 …)
Intel C/C++ Compiler (ICC)
> 11.1 for AVX Support and > 13.0 for AVX2 support
Use of specific switches (-xsse4.2, -xavx, -xcore-avx2 …)
Beware of optimization switches (-O1,-O2, -O3)
More … disassemble (if you are allowed to )
Registers
Assembler instructions
Will my application use SIMD registers
and instructions ?
17. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
18. Based on a C program
Used CPU: Haswell microarchitecture (Core i7-
4960HQ). AVX/AVX2 enabled
3 tests : No SIMD, SSE4, AVX
Input: one array containing 1Million values.
Goal: Add 1 to each value, each million values
repeated 4k, 8k, 16k and 32k times
CPU Time(s) = f(#rows)
“Quick and Dirty” Sample code available here:
https://app.box.com/s/ibmnbblpho4xtbeq2x8ir60nrk37208v
Raw performance
19. Raw performance
10.35
20.46
42.35
85.64
3.3 6.81
13.73
25.58
1.96 3.51 7.23
15.15
0
10
20
30
40
50
60
70
80
90
4096 M. ROWS 8192 M. ROWS 16384 M. ROWS 32768 M. ROWS
CPUTime(Sec)
RAW Performance (CPU) for SIMD Instructions
NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)
20. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
21. SIMD instructions … inside
Oracle 12c
In Memory Data Structure
In Memory Compression Unit :
IMCU
IMCU is the unit of column store
allocation
Target size is 1M rows
(controlled by _inmemory_imcu_target_rows)
One IMCU can contain more than
one column
Each column in one IMCU is a
column unit (CU)
22. SIMD instructions … inside
Oracle 12c
In memory column store storage indexes
For each column unit, min and max values are maintained in
a storage index
Storage Indexes provide CU pruning
Information about CU available in GV$IM_COL_CU
(Undocumented. See Bug ID 19361690)
IMCU
Pruning
23. SIMD instructions … inside
Oracle 12c
The way your data is sorted matters for best IMCU pruning
24. SIMD instructions … inside
Oracle 12c
SIMD extensions are used with In Memory storage indexes
for efficient filtering
1. IM Storage Indexes do IMCU pruning
2. SIMD instructions apply efficiently filter predicates
IMCU
Pruning
Prod-id
10
10
14
14
10
Filtering
with SIMD
25. SIMD instructions … inside
Oracle 12c
Oracle 12c uses specific libraries for SIMD (and compression)
Located in $ORACLE_HOME/lib
libshpksse4212.so for SSE4.2 extensions
Compiled with ICC v12 with specific xsse4.2 switch
libshpkavx12.so for AVX extensions
Compiled with ICC v12 with specific xavx switch
libshpkavx212.so for AVX2 extensions
Not yet implemented (8 functions implemented)
No ICC avx2 switch used because ICC v12 doesn’t support AVX2
Thanks Tanel Pöder
26. SIMD instructions … inside
Oracle 12c
Oracle SIMD related functions
Located in kdzk kernel module (HPK)
Part of Advanced Compression library (ADVCMP)
Easily tracked with systemtap
27. SIMD instructions … inside
Oracle 12c
How Oracle uses SIMD extensions ?
It depends on many parameters
OS Level : /proc/cpuinfo
AVX and AVX2 support
SSE4 Support only
28. SIMD instructions … inside
Oracle 12c
Which library am I using ?
pmap
AVX support
SSE4 support
29. SIMD instructions … inside
Oracle 12c
Which compiler options have been used ?
Read “comment” section in ELF
Read the corresponding compiler documentation
[oracle@oel7 conf]$ readelf -p .comment $ORACLE_HOME/lib/libshpkavx12.so |
> | egrep -i 'intel|gcc' | egrep 'xavx|mavx’
[ 2c] -?comment:Intel(R) C Intel(R) 64 Compiler XE for applications running on
Intel(R) 64, Version 12.0 Build 20120731
…/…
-DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx
30. SIMD instructions … inside
Oracle 12c
How are SIMD registers used by Oracle ?
GDB
To get the call stack (backtrace)
To set breakpoints on interesting functions
To view register contents (traditional and SIMD)
“Info registers” for traditional registers
“Info all-registers” for all registers (SIMD reg included)
(gdb) print $ymmX.<format>
Format can be v8_float, v4_double, v32_int8, v16_int16, v8_int32,
v4_int64, or v2_int128
31. SIMD instructions … inside
Oracle 12c
In red, register content
has been modified
In blue, the second part of
the SIMD registers (128
bits) is empty
32. SIMD instructions … inside
Oracle 12c
Oracle IM can use AVX or SSE4 extensions for SIMD
operations
When AVX is used
It uses only 128 bits out of 256 bits wide registers
• AVX adds new register-state through the 256-bit wide YMM
register file
• Explicit operating system support is required to properly save
and restore AVX's expanded registers between context
switches
• Without this, only AVX 128-bit is supported
33. SIMD instructions … inside
Oracle 12c
The culprit
Oracle 12.1.0.2 is supported from EL5 onwards
EL5 Redhat Kernel is 2.6.18 and this flag (xsave) is
supported from 2.6.30 kernels
For compatibility reasons, Oracle has to compile
its code on 2.6.18 kernels
34. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
35. Tracing SIMD in Oracle 12c
Oradebug has 2 components related to IM
36. Tracing SIMD in Oracle 12c
Interesting components to trace for SIMD
and/or IMCU Pruning are :
IM_optimizer
Gives information about CBO calculation related to
IM
ADVCMP_DECOMP.*
ADVCMP_DECOMP_HPK : SIMD functions
ADVCMP_DECOMP_PCODE : Portable Code
Machine (usually comparison functions and results)
37. Tracing SIMD in Oracle 12c
IM_optimizer
Information available in trace file
IMCU Pruning ratio
CU decompression costing (per IMCU)
Predicate evaluation costing (per row)
Statement has to be parsed to get results
38. Tracing SIMD in Oracle 12c
select prod_id,cust_id,time_id from laurent.s_capa_high where amount_sold=20;
39. Tracing SIMD in Oracle 12c
This information is available in CBO trace file (10053 or SQL_costing
event)
40. Tracing SIMD in Oracle 12c
ADVCMP_DECOMP
ADVCMP_DECOMP_HPK
Information is available in the trace file (for each IMCU
processed)
Used library and function
Number of rows and counting algorithm
Processing rate (comparison and decompression if relevant)
But nothing on the results of the processing
41. Tracing SIMD in Oracle 12c
ADVCMP_DECOMP
ADVCMP_DECOMP_HPK
Gives information about SIMD function usage and filtering (after
IMCU pruning)
Example: inmemory table with NO MEMCOMPRESS or DML
compression
42. Tracing SIMD in Oracle 12c
ADVCMP_DECOMP
ADVCMP_DECOMP_HPK
Example: inmemory compressed table
SIMD are used only in the kdzk_eq_dict functions
43. Tracing SIMD in Oracle 12c
My thoughts about compression/decompression
NO MEMCOMPRESS / COMPRESS FOR DML
kdzk*dynp* functions (ex: kdzk_eq_dynp_16bit, kdzk_le_dynp_32bit
etc.)
FOR QUERY LOW / QUERY HIGH
Dictionary Encoding (LZW ?) : kdzk_*dict* functions (ex:
kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.)
Run Length Encoding: kdzk_burst_rle* functions (ex:
kdzk_burst_rle_8bit, kdzk_burst_rle_16bit …)
Bit packing compression: kdzk*fixed* functions (ex:
kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit …)
44. Tracing SIMD in Oracle 12c
My thoughts about compression/decompression
FOR CAPACITY LOW
FOR QUERY LOW + additional proprietary compression (OZIP)
Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex:
kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.)
FOR CAPACITY HIGH
FOR QUERY HIGH + heavy weigth compression algorithm
Compression/decompression method depends on:
Datatype
Column Compression Unit size
Column contents
AVX adds new register-state through the 256-bit wide YMM register file, so explicit operating system support is required to properly save and restore AVX's expanded registers between context switches; without this, only AVX 128-bit is supported[citation needed].
Actual Size depends on size of row, compression factor
Updated by background process
Triggered by IMC0
W00x : processes that populate IM Column store
Contains list of rowid
Depends on how data are sorted inside the extents because, loading data into IMCU reads table extents sequentially
More than 1400 function implemented in AVX and SSE42 libraries
Xavx (diff mavx) has specific optimization
HPK : High Performance Compression ?
/proc/cpuinfo gives information depending on Hardware, kernel, kernel options, and hypervisor used (if used)
For other OS, use tools that uses CPUID function and read EAX, EBX, ECX and EDX registers
CPUINFO depends on Hardware, Kernel and its options, used hypervisor
ELF : Executable and Linking Format
Decompression costing : columns used in filter predicates + Columns in select
Predicate cost evaluation : /!\ cumulative values
Cost generated by column in the SELECT clause are not reported on the 10053 event trace file. Only the column in the filter predicate