SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Downloaden Sie, um offline zu lesen
Performance Brief for the
HP DL980 (Database Server) and
DL380 (ION Data Accelerator™)
4.24.2013
Copyright Notice
The information contained in this document is subject to change without notice.
Fusion-io MAKES NO WARRANTY OF ANY KIND WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Except to correct same after receipt of reasonable notice, Fusion-io shall not be liable for errors contained
herein or for incidental and/or consequential damages in connection with the furnishing, performance, or use
of this material.
The information contained in this document is protected by copyright.
© 2013, Fusion-io, Inc. All rights reserved.
Fusion-io, the Fusion-io logo and ioDrive are registered trademarks of Fusion-io in the United States and other
countries.
The names of other organizations and products referenced herein are the trademarks or service marks (as
applicable) of their respective owners. Unless otherwise stated herein, no association with any other
organization or product referenced herein is intended or should be inferred.
Fusion-io: 2855 E. Cottonwood Parkway, Box 100 Salt Lake City, UT 84121 USA
(801) 424-5500
CONTENTS
Introduction ............................................................................................................................................. 1
HARDWARE ................................................................................................................................... 2
ION Data Accelerator System ................................................................................................... 2
Initiator System........................................................................................................................ 2
Storage Configuration.............................................................................................................................. 3
INITIATOR HBA PLACEMENT........................................................................................................... 3
ION DATA ACCELERATOR STORAGE POOL CONFIGURATION ........................................................ 5
ION VOLUME CONFIGURATION...................................................................................................... 5
ION LUN CONFIGURATION............................................................................................................. 6
MULTIPATH VERIFICATION ............................................................................................................. 8
Initiator BIOS Tuning ..............................................................................................................................11
UPDATING THE BIOS FOR NUMA DETECTION...............................................................................12
POWER MANAGEMENT OPTIONS.................................................................................................12
SYSTEM OPTIONS.........................................................................................................................14
ADVANCED OPTIONS...................................................................................................................16
Setting the Addressing Mode.................................................................................................16
Disabling x2APIC....................................................................................................................17
Initiator Tuning on Linux ........................................................................................................................18
MULTIPATHING ............................................................................................................................18
DISABLING PROCESSOR C-STATES IN LINUX.................................................................................18
IONTUNER RPM............................................................................................................................19
Block Device Tuning with udev Rules .....................................................................................20
Disabling the cpuspeed Daemon............................................................................................21
Pinning interrupts ..................................................................................................................21
VERIFYING THREAD PINNING........................................................................................................22
Oracle Tuning.........................................................................................................................................25
HUGEPAGES.................................................................................................................................25
SYSCTL PARAMETERS ..................................................................................................................25
ORACLE INITIALIZATION PARAMETERS.........................................................................................26
fio Performance Testing .........................................................................................................................27
PRECONDITIONING FLASH STORAGE ...........................................................................................27
TESTING THREAD CPU AFFINITY...................................................................................................27
TEST COMMANDS .......................................................................................................................27
RESULTS.......................................................................................................................................30
SEQUENTIAL R/W THROUGHPUT AND IOPS..................................................................................31
RANDOM MIX R/W IOPS ..............................................................................................................32
RANDOM MIX R/W THROUGHPUT ...............................................................................................32
Oracle Performance Testing....................................................................................................................34
TEST SETUP ..................................................................................................................................34
TEST COMMANDS .......................................................................................................................36
RESULTS.......................................................................................................................................37
Oracle Database Testing.........................................................................................................................38
READ WORKLOAD TEST – QUEST BENCHMARK FACTORY...........................................................38
OLTP WORKLOAD TEST – HEAVY INSERT SCRIPT..........................................................................43
TRANSACTIONS TEST – SWINGBENCH .........................................................................................47
Conclusions............................................................................................................................................48
Glossary .................................................................................................................................................49
Appendix A: Tuning Checklist ................................................................................................................50
Appendix B: Speeding up Oracle Database Performance with ioMemory – an HP Session.......................52
ARCHITECTURE OVERVIEW ..........................................................................................................52
ABOUT ION DATA ACCELERATOR................................................................................................53
ION Data Accelerator Software ..............................................................................................53
Fusion-Powered Storage Stack...............................................................................................53
Why ION Data Accelerator? ...................................................................................................54
ABOUT ION DATA ACCELERATOR HA (HIGH AVAILABILITY) ........................................................54
PERFORMANCE TEST RESULTS: HP DL380 / HP DL980..................................................................55
OVERVIEW OF THE ION DATA ACCELERATOR GUI.......................................................................57
COMPARATIVE SOLUTIONS..........................................................................................................60
BEST PRACTICES ..........................................................................................................................61
BENCHMARK TEST CONFIGURATION ...........................................................................................62
RAW PERFORMANCE TEST RESULTS WITH FIO .............................................................................63
Total IOPS ..............................................................................................................................63
Average Completion Latency (Microseconds) .........................................................................64
Raw I/O Test: 70% Read, 30% Write.....................................................................................64
Raw I/O Test: 100% Read at 8KB...........................................................................................65
Raw I/O Test: Read Latency (Microseconds)............................................................................65
ORACLE WORKLOAD TESTS.........................................................................................................66
Introduction
________________________________________________________________________
This document describes methods used to maximize performance for Oracle Database Server running
on an HP DL980 and for ION Data Accelerator running on an HP DL380. These methods should
provide a foundation for tuning methods with a variety of tests and customer applications.
The non-uniform memory access (NUMA) architecture of the DL980 presents challenges in
minimizing data transfers between multiple processor nodes, while efficiently distributing I/O
processing across available resources. Without any tuning, a configuration capable of as much as
700,000 IOPS may instead achieve no more than 160,000 IOPS. Likewise, a system capable of
bandwidths of up to 7 GB/s may be limited to 3.5 GB/s. Testing performed with an un-tuned initiator
may reflect poorly on ION Data Accelerator performance, when in reality the ION Data Accelerator
software is not the problem.
The goals of this document are to
• Provide an example of what is possible with a specific configuration.
• Provide the tools necessary to improve performance on a variety of DL980 configurations, or
with other initiator servers used with ION Data Accelerator.
Depending on the ioDrives and HBAs used, as well as fabric connectivity, you may need to vary the
tuning described in this document. A script has been provided to perform the most complex tuning
operations, but the steps performed by the script are fully described so you can adapt them for a
variety of configurations.
These tuning methods were originally used to maximize performance at HP European Performance
Center in Böblingen. A similar configuration was recreated at Fusion-io in San Jose, and the
performance results described in this document are the results of that testing. Though there were
minor variations between the two configurations, similar performance was achieved.
For more details on the features and functionality of ION Data Accelerator, refer to the ION Data
Accelerator User Guide.
1
HARDWARE
This section describes the hardware components used in the performance testing of the ION Data
Accelerator appliance with its initiator.
ION Data Accelerator System
• DL380p Gen8 server
• 2 x Intel Xeon E5-2640 CPUs (6 cores each, 2.5 GHz)
• 64GB RAM
• 3 x 2.41TB ioDrive2 Duos
• 1 x QLogic 8Gbit Fibre Channel quad-port HBA
• 2 x QLogic 8Gbit Fibre Channel dual-port HBAs
• ION Data Accelerator 2.0.0 build 349 (VSL 3.2.3 build 950)
Initiator System
• HP DL980 Gen7 server
• 8 x Intel Xeon E7-4870 CPUs (10 cores each, 2.4 GHz)
• 256 GB RAM
• 3 x Emulex 8 Gbit Fibre Channel dual-port HBAs
• 1 x QLogic 8 Gbit Fibre Channel dual-port HBA
• Red Hat Enterprise Linux 6.3
• Oracle Database 11g Enterprise Edition 64-bit Release 11.2.0.3.0 with ASM
2
Storage Configuration
________________________________________________________________________
INITIATOR HBA PLACEMENT
The NUMA architecture of the DL980 must be considered when choosing where to place HBAs. PCIe
slots 7, 8, 9, 10, and 11 are attached to the I/O hub nearest to CPU sockets 0 and 1. PCIe slots 1, 2,
3, 4, 5, and 6 are attached to the I/O hub nearest to CPU sockets 2 and 3. PCIe slots 12, 13, 14, 15,
and 16 are attached to the I/O hub nearest to CPU sockets 4 and 5.
In the configurations used at HP Böblingen and Fusion-io San Jose, two HBAs were placed in slots
from 1 through 6, and two HBAs were placed in slots from 7 through 11. In that configuration, I/O
3
traffic is split between two I/O hubs. By using multiple I/O Hubs, more CPU cores can access data
from the HBAs at a low cost, but there is a risk of transferring data between I/O hubs, which may
cause poor performance. It is important to configure volume access such that no single volume is
accessed from multiple I/O hubs. Note that even though a PCIe slot may be equidistant from two
nodes, there is still less latency between cores within a node than between CPU cores on separate
nodes attached to the same I/O hub.
Although the diagram above shows slots 12 through 16 attached to CPU sockets 6 and 7, other
documentation from HP suggests that these slots are attached to nodes 4 and 5. If using the
expansion slots, it is best to manually check the location of the PCIe slots.
You can use lspci to find the bus addresses of HBAs in the system:
# lspci | grep "Fibre Channel"
0b:00.0 Fibre Channel: ...
0b:00.1 Fibre Channel: ...
11:00.0 Fibre Channel: ...
11:00.1 Fibre Channel: ...
54:00.0 Fibre Channel: ...
54:00.1 Fibre Channel: ...
60:00.0 Fibre Channel: ...
60:00.1 Fibre Channel: ...
You can also use dmidecode to determine the PCI slot associated with each bus address:
# dmidecode -t slot
...
Handle 0x0908, DMI type 9, 17 bytes
System Slot Information
Designation: PCI-E Slot 9
Type: x8 PCI Express 2 x16
Current Usage: In Use
Length: Long
ID: 9
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:0b:00.0
...
Handle 0x090A, DMI type 9, 17 bytes
System Slot Information
Designation: PCI-E Slot11
Type: x8 PCI Express 2 x16
Current Usage: In Use
Length: Long
4
ID: 11
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:11:00.0
...
Handle 0x0901, DMI type 9, 17 bytes
System Slot Information
Designation: PCI-E Slot 2
Type: x8 PCI Express 2 x16
Current Usage: In Use
Length: Long
ID: 2
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:54:00.0
...
Handle 0x0905, DMI type 9, 17 bytes
System Slot Information
Designation: PCI-E Slot 6
Type: x8 PCI Express 2 x16
Current Usage: In Use
Length: Long
ID: 6
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:60:00.0
ION DATA ACCELERATOR STORAGE POOL CONFIGURATION
A RAID 0 set was created using all three ioDrive2 Duo cards present in the ION Data Accelerator
system. This was done by using the following CLI command to create a storage profile for maximum
performance:
admin@/> profile:create max_performance
ION VOLUME CONFIGURATION
Eight volumes of equal size were created from the storage pool, using the following CLI commands:
admin@/> volume:create volume0 841 pool_md0
admin@/> volume:create volume1 841 pool_md0
5
admin@/> volume:create volume2 841 pool_md0
admin@/> volume:create volume3 841 pool_md0
admin@/> volume:create volume4 841 pool_md0
admin@/> volume:create volume5 841 pool_md0
admin@/> volume:create volume6 841 pool_md0
admin@/> volume:create volume7 841 pool_md0
For ION Data Accelerator configurations with many ioDrives, it may be necessary to use 16 or more
volumes to achieve maximum performance.
ION LUN CONFIGURATION
To provide sufficient performance as well as redundancy, LUN access should be provided through
multiple ION Data Accelerator targets and multiple initiator cards. Additionally, because of the
NUMA architecture characteristics of the DL980, it may be best to localize access for each volume to
a single I/O hub. Volumes should be exposed so that traffic is distributed evenly across all ports.
The diagram below shows the link configuration that was used at HP Böblingen.
Figure 1. Link configuration used at HP Boblingen
Four ports on the ION Data Accelerator system were connected to eight ports on the DL980 initiator,
through a switch. On the initiator, two dual-port cards were placed in I/O hub 1 and in I/O hub 2.
Exports were created on the four ports of the ION Data Accelerator to the four ports on each I/O hub
of the initiator.
Each volume was exported on two links:
6
• Volume 0: t1 to i1, t4 to i4
• Volume 1: t2 to i2, t3 to i3
• Volume 2: t3 to i7, t2 to i6
• Volume 3: t1 to i5, t4 to i8
The same access pattern was repeated with every set of four subsequent volumes. Notice that access
to each volume is localized to a single I/O hub on the initiator.
The diagram below shows the link configuration that was used at Fusion-io San Jose.
Figure 2. Link configuration used at Fusion-io San Jose
Because a switch was unavailable, eight ports on the ION Data Accelerator system were directly
connected to eight ports on the initiator.
Each volume was exported on two links:
• Volume 0: t1 to i1, t6 to i4
• Volume 1: t3 to i5, t8 to i8
• Volume 2: t2 to i2, t5 to i3
• Volume 3: t4 to i6, t7 to i7
The same access pattern was repeated with every set of four subsequent volumes. Notice that access
to each volume is once again localized to a single I/O hub on the initiator.
The following CLI commands were used to create initiator groups and LUNs on the ION Data
Accelerator system at Fusion-io San Jose:
7
admin@/> inigroup:create i1 10:00:00:90:fa:14:a1:fc
admin@/> inigroup:create i2 10:00:00:90:fa:14:a1:fd
admin@/> inigroup:create i3 10:00:00:90:fa:14:f9:d4
admin@/> inigroup:create i4 10:00:00:90:fa:14:f9:d5
admin@/> inigroup:create i5 10:00:00:90:fa:1b:03:c8
admin@/> inigroup:create i6 10:00:00:90:fa:1b:03:c9
admin@/> inigroup:create i7 21:00:00:24:ff:46:bf:ca
admin@/> inigroup:create i8 21:00:00:24:ff:46:bf:cb
admin@/> lun:create -b 512 volume0 i1 21:00:00:24:ff:69:d3:4c
admin@/> lun:create -b 512 volume0 i6 21:00:00:24:ff:46:c0:b5
admin@/> lun:create -b 512 volume1 i3 21:00:00:24:ff:69:d3:4e
admin@/> lun:create -b 512 volume1 i8 21:00:00:24:ff:45:f4:ad
admin@/> lun:create -b 512 volume2 i2 21:00:00:24:ff:69:d3:4d
admin@/> lun:create -b 512 volume2 i5 21:00:00:24:ff:46:c0:b4
admin@/> lun:create -b 512 volume3 i4 21:00:00:24:ff:69:d3:4f
admin@/> lun:create -b 512 volume3 i7 21:00:00:24:ff:45:f4:ac
admin@/> lun:create -b 512 volume4 i1 21:00:00:24:ff:69:d3:4c
admin@/> lun:create -b 512 volume4 i6 21:00:00:24:ff:46:c0:b5
admin@/> lun:create -b 512 volume5 i3 21:00:00:24:ff:69:d3:4e
admin@/> lun:create -b 512 volume5 i8 21:00:00:24:ff:45:f4:ad
admin@/> lun:create -b 512 volume6 i2 21:00:00:24:ff:69:d3:4d
admin@/> lun:create -b 512 volume6 i5 21:00:00:24:ff:46:c0:b4
admin@/> lun:create -b 512 volume7 i4 21:00:00:24:ff:69:d3:4f
admin@/> lun:create -b 512 volume7 i7 21:00:00:24:ff:45:f4:ac
MULTIPATH VERIFICATION
When the steps above have been completed and dm-multipath has been started on the initiator,
the multipath command may be used to verify the configuration.
# multipath –ll
mpathhes (23337613362643333) dm-2 FUSIONIO,ION LUN
size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 1:0:0:0 sdd 8:48 active ready running
`- 2:0:0:0 sdf 8:80 active ready running
mpathhez (23330633436333064) dm-7 FUSIONIO,ION LUN
size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 4:0:0:1 sdk 8:160 active ready running
`- 7:0:0:1 sdq 65:0 active ready running
8
mpathhey (23437373930653063) dm-4 FUSIONIO,ION LUN
size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 0:0:0:1 sdc 8:32 active ready running
`- 3:0:0:1 sdi 8:128 active ready running
mpathhex (26433343437616137) dm-8 FUSIONIO,ION LUN
size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 5:0:0:1 sdm 8:192 active ready running
`- 6:0:0:1 sdo 8:224 active ready running
mpathhew (23061313364323662) dm-5 FUSIONIO,ION LUN
size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 1:0:0:1 sde 8:64 active ready running
`- 2:0:0:1 sdg 8:96 active ready running
mpathhev (26432353466383337) dm-6 FUSIONIO,ION LUN
size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 4:0:0:0 sdj 8:144 active ready running
`- 7:0:0:0 sdp 8:240 active ready running
mpathheu (23637366232363564) dm-3 FUSIONIO,ION LUN
size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 0:0:0:0 sdb 8:16 active ready running
`- 3:0:0:0 sdh 8:112 active ready running
mpathhet (23632393433663839) dm-9 FUSIONIO,ION LUN
size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 5:0:0:0 sdl 8:176 active ready running
`- 6:0:0:0 sdn 8:208 active ready running
Notice that there are eight multipath devices, each comprised of two LUNs. Each path has a number
associated with it, of the form <host>:0:0:<lun#>. The host numbers correspond to specific PCI
device ports. A PCI device address can be correlated to a host number by looking in sysfs:
# ls -d /sys/bus/pci/devices/*/host*
/sys/bus/pci/devices/0000:11:00.0/host0
9
/sys/bus/pci/devices/0000:11:00.1/host1
/sys/bus/pci/devices/0000:0b:00.0/host2
/sys/bus/pci/devices/0000:0b:00.1/host3
/sys/bus/pci/devices/0000:54:00.0/host4
/sys/bus/pci/devices/0000:54:00.1/host5
/sys/bus/pci/devices/0000:60:00.0/host6
/sys/bus/pci/devices/0000:60:00.1/host7
For example, multipath device mpathhet has paths through hosts 5 and 6 (shown by the numbers
5:0:0:0 and 6:0:0:0), which correspond to devices 0000:54:00.1 and 0000:60:00.0. The output from
the dmidecode command used in the Initiator HBA Placement section shows that this volume is
exposed through HBAs in slots 2 and 6, which are both in the same I/O hub. It is important that each
volume presented in multipath is accessed only through HBAs in the same I/O hub.
10
Initiator BIOS Tuning
________________________________________________________________________
The following settings should be applied on the HP DL980 initiator, using the ROM-Based Setup
Utility (RBSU) on boot.
To enter the RBSU, press F9 during boot (when the F9 Setup option appears on the screen).
11
UPDATING THE BIOS FOR NUMA DETECTION
In the DL980 BIOS version dated 05/01/2012, a change was made to the SLIT node distances. This
may affect performance, so it is recommended that the latest version of the BIOS be used. Incorrect
SLIT node distances are a common issue with early BIOS revisions on many platforms.
The BIOS version can be determined from the main BIOS screen. Alternatively, numactl can be used
to verify that the node distances match the table below:
# numactl –hardware
...
node distances:
node 0 1 2 3 4 5 6 7
0: 10 12 17 17 19 19 19 19
1: 12 10 17 17 19 19 19 19
2: 17 17 10 12 19 19 19 19
3: 17 17 12 10 19 19 19 19
4: 19 19 19 19 10 12 17 17
5: 19 19 19 19 12 10 17 17
6: 19 19 19 19 17 17 10 12
7: 19 19 19 19 17 17 12 10
POWER MANAGEMENT OPTIONS
To enable maximum performance, disable the HP power management options.
1. Select Power Management Options > HP Power Profile > Maximum Performance.
12
13
2. Verify that C-states have been disabled by selecting Power Management Options > Advanced
Power Management Options > Minimum Processor Idle Power Core State.
“No C-states” should be highlighted in the menu.
C-states may also need to be disabled in Linux, as explained later in this document.
SYSTEM OPTIONS
Intel Hyperthreading may or may not be beneficial to ION Data Accelerator performance. In this test
setup, Hyperthreading was enabled. Other system options were set as described below.
1. Enable hyperthreading by selecting System Options > Processor Options > Intel
Hyperthreading Options > Enabled.
14
2. Disable Virtualization if it is not required, by selecting System Options >Processor
Options > Intel Virtualization Technology > Disabled.
15
3. Disable VT-d (Virtualization Technology for Directed I/O) by selecting System Options >
Processor Options > Intel VT-d > Disabled.
ADVANCED OPTIONS
Setting the Addressing Mode
The preferred addressing mode depends on the operating system and the amount of memory used.
For all RHEL 5.x installations, use 40-bit addressing. For RHEL 6.x installations, use 40-bit addressing
when 1TB or less memory is present; otherwise, 44-bit addressing must be used to take advantage of
all available memory. To disable 44-bit addressing, select Advanced Options > Advanced System
ROM Options > Address Mode 44-bit > Disabled.
16
For RHEL 6.x installations using greater than 1 TB of memory, use 44-bit addressing: Advanced
Options > Advanced System ROM Options > Address Mode 44-bit > Enabled.
At HP Böblingen, the DL980 contained 1TB of memory, so 40-bit addressing was sufficient.
Disabling x2APIC
To verify that x2APIC is disabled, select Advanced Options > Advanced System ROM Options >
x2APIC Options. The “Disabled” option should be highlighted; select it if it is not.
17
Initiator Tuning on Linux
________________________________________________________________________
The following settings should be configured in Linux. In some cases, a reboot must be applied in
order for changes to take effect.
MULTIPATHING
Typically, the preferred queuing technique is to send I/O to the path with the least number of I/Os
currently queued. The following is an example of how the multipath.conf file can be configured,
using a path_selector of “queue-length 0”:
device {
vendor "FUSIONIO"
product "*"
path_selector "queue-length 0"
rr_min_io_rq 1
rr_weight uniform
no_path_retry 20
failback 60
path_grouping_policy multibus
path_checker tur
}
Another approach that may provide better results is setting path_selector to “round-robin”.
The round-robin value uses fewer CPU cycles, but it does not correct for unbalanced performance
characteristics of multiple paths, or any additional load from other devices that may be slowing down
one of the paths.
DISABLING PROCESSOR C-STATES IN LINUX
For newer Linux kernels (2.6.32 or later) disabling CPU idle power states can boost performance.
18
However, these must be disabled at boot time rather than in the BIOS.
To disable CPU states, add intel_idle.max_cstate=0 processor.max_cstate=0 boot
parameters to the /boot/grub/grub.conf file as follows:
title Red Hat Enterprise Linux (2.6.32-279.el6.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.32-279.el6.x86_64 ro root=/dev/mapper/vg_rhel980-
lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_rhel980/lv_root rd_NO_MD
SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=vg_rhel980/lv_swap
KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet intel_idle.max_cstate=0
processor.max_cstate=0
initrd /initramfs-2.6.32-279.el6.x86_64.img
One way to verify that the CPU states have been disabled entirely is to verify that the CPU state
sysfs files do not exist:
# ls /sys/devices/system/cpu/cpu0/cpuidle
ls: cannot access /sys/devices/system/cpu/cpu0/cpuidle: No such file or
directory
IONTUNER RPM
The tuning suggestions in this section can be performed in one step by installing the iontuner
RPM. The RPM is made available on the Fusion-io internal network:
https://confluence.int.fusionio.com/display/ION/Documentation#Documentation-
IONPerformanceBrief,HPDL980(INTERNAL-ONLY)
The RPM can be installed with the following command (the RPM version may be different):
# rpm –Uvh iontuner-0.0.2-1.noarch.rpm
If ION LUNs have already been detected by the initiator, a reboot or reload of device drivers may be
necessary after the RPM install. This servers to complete the tuning that is performed upon device
discovery. If in doubt about LUN discovery, reboot.
The tuning described in the following sub-sections is done by the iontuner RPM, and it does not
need to be performed manually if the RPM has been installed. Detailed steps are provided here in
order to completely describe the RPM function and to assist those who may need to adjust the steps
for unsupported platforms.
19
Block DeviceTuning with udev Rules
Str The tuning in this section is performed by the iontuner RPM.
To improve I/O performance, you should tune the I/O scheduling queues on all devices in the data
path. This includes both the individual SCSI devices (/dev/sd*) and the device-mapper devices
(/dev/dm-*).
Three settings changes have been proven to provide a performance benefit under some workloads:
1) Always use the noop queue with ION Data Accelerator devices:
# echo noop > /sys/block/<device>/queue/scheduler
2) Use strict block-request affinity. This forces the handling of I/O completion to occur on the
same CPU where the request was initiated.
# echo 2 > /sys/block/<device>/queue/rq_affinity
Strict block-request affinity is not available on RHEL 5, and on some kernels, group affinity
will be used where strict affinity is not supported. After setting the file to ‘2’, a read of the
file will return ‘1’ if only CPU group affinity is available.
3) To get more consistent performance results, disable entropy pool contribution:
# echo 0 > /sys/block/<device>/queue/add_random
The methods described above must be run after multipath devices are configured and detected by
the initiator, and they will not persist through a reboot. This is because Linux provides the udev rules
mechanism, which allows for some sysfs parameters to be set upon device discovery, both at boot
time and run time.
The iontuner RPM installs the following rules in /etc/udev/rules.d/99-iontuner.rules:
ACTION=="add|change", SUBSYSTEM=="block", ATTR{device/vendor}=="FUSIONIO",
ATTR{queue/scheduler}="noop", ATTR{queue/rq_affinity}="2",
ATTR{queue/add_random}="0"
ACTION=="add|change", KERNEL=="dm-*", PROGRAM="/bin/bash -c 'cat
/sys/block/$name/slaves/*/device/vendor | grep FUSIONIO'",
ATTR{queue/scheduler}="noop", ATTR{queue/rq_affinity}="2",
ATTR{queue/add_random}="0"
The first rule applies scheduler, rq_affinity, and add_random changes to all SCSI block
devices (/dev/sd*) whose vender is FUSIONIO.
The second rule applies scheduler, rq_affinity, and add_random changes to all DM
multipath devices (/dev/dm-*) that are created on top of block devices whose vendor is FUSIONIO.
20
Disabling the cpuspeed Daemon
Str The tuning in this section is performed by the iontuner RPM.
Disabling the cpuspeed daemon on Linux can boost overall performance. To disable the cpuspeed
daemon immediately, run this command:
# service cpuspeed stop
To prevent the cpuspeed daemon from running after a reboot, run this command:
# chkconfig cpuspeed off
Pinning interrupts
Str The tuning in this section is performed by the iontuner RPM.
To minimize data transfer and synchronization throughout the system, I/O interrupts should be
handled on a socket close to the HBA’s I/O hub.
When manually configuring IRQs, the irqbalance daemon must first be disabled. To disable the
irqbalance daemon immediately, run this command:
# service irqbalance stop
To prevent the irqbalance daemon from running after a reboot, run this command:
# chkconfig irqbalance off
IRQs should be pinned for each driver that handles interrupts for ION device I/O. Typically, this is just
the HBA driver. Driver IRQs can be identified in /proc/interrupts by the matching IRQ numbers
to the driver prefix listed in the same row. The following table shows some common drivers and the
prefix necessary to identify driver IRQs:
Driver Prefix
QLogic FC qla
Brocade FC bfa
Emulex FC lpfc
Emulex iSCSI beiscsi,eth
The iontuner RPM installs the iontuner service init script. This runs at boot time to distribute IRQs
across the CPU cores local to HBA’s I/O hub. Below is an example of the commands issued at
21
startup::
echo 00000000,00000000,00000000,00000000,00000001 > /proc/irq/114/smp_affinity
echo 00000000,00000000,00000000,00000000,00000002 > /proc/irq/115/smp_affinity
echo 00000000,00000000,00000000,00000000,00000004 > /proc/irq/116/smp_affinity
echo 00000000,00000000,00000000,00000000,00000008 > /proc/irq/117/smp_affinity
echo 00000000,00000000,00000000,00000000,00000010 > /proc/irq/118/smp_affinity
echo 00000000,00000000,00000000,00000000,00000020 > /proc/irq/119/smp_affinity
echo 00000000,00000000,00000000,00000000,00000040 > /proc/irq/120/smp_affinity
echo 00000000,00000000,00000000,00000000,00000080 > /proc/irq/121/smp_affinity
echo 00000000,00000000,00000000,00000000,00100000 > /proc/irq/134/smp_affinity
echo 00000000,00000000,00000000,00000000,00200000 > /proc/irq/135/smp_affinity
echo 00000000,00000000,00000000,00000000,00400000 > /proc/irq/136/smp_affinity
echo 00000000,00000000,00000000,00000000,00800000 > /proc/irq/137/smp_affinity
echo 00000000,00000000,00000000,00000000,01000000 > /proc/irq/122/smp_affinity
echo 00000000,00000000,00000000,00000000,02000000 > /proc/irq/123/smp_affinity
echo 00000000,00000000,00000000,00000000,04000000 > /proc/irq/124/smp_affinity
echo 00000000,00000000,00000000,00000000,08000000 > /proc/irq/125/smp_affinity
Affinity is set by writing to the /proc/irq/<irq#>/smp_affinity file for a given IRQ. Each IRQ
is assigned affinity to a different CPU core on a node nearest to the IRQ’s PCIe device. In
smp_affinity files, each core is represented by a single bit, starting with the least significant bit
mapping to CPU 0. The IRQs associated with each device driver can be found by reading the
/proc/interrupts file.
There are ten CPU cores per node. In the example above, eight interrupts (the first eight entries) for
the devices in slots 9 and 11 are mapped to node 0, and eight interrupts (the last eight entries) for
the devices in slots 2 and 6 are mapped to node 2. On the DL980, each PCIe slot can be efficiently
assigned to either of the nodes corresponding to its I/O hub. However, it is important that all
processes related to that device be assigned to the same node.
Because these settings will not persist through a reboot, the iontuner service runs each time
the system is booted.
VERIFYING THREAD PINNING
Str The tuning in this section was not necessary in the DL980/RHEL 6.3 testing. It is included because
it is unknown at this time whether it may be necessary on other platforms.
To further minimize data transfer and synchronization times throughout the system, it may be
beneficial to place critical I/O driver threads on the same socket as the interrupts and HBA. This may
only be necessary with some drivers. For instance, this is helpful with QLogic drivers but is not
necessary when using Emulex drivers because no critical work is performed in Emulex driver threads.
In the case of the DL980 running RHEL 6.3, the QLogic driver threads always ran on cores local to the
HBAs, even though they were not pinned.
22
To check where QLogic driver threads are executing, run the following command:
# ps –eo comm,psr | grep qla
qla2xxx_6_dpc 20
qla2xxx_7_dpc 20
The number beside each process indicates the core it is currently executing on.
The numbers “6” and “7” in the above example correspond to specific PCI device host numbers.
You can correlate a PCI device to a host number by looking in sysfs:
# ls -d /sys/bus/pci/devices/*/host*
/sys/bus/pci/devices/0000:11:00.0/host0
/sys/bus/pci/devices/0000:11:00.1/host1
/sys/bus/pci/devices/0000:0b:00.0/host2
/sys/bus/pci/devices/0000:0b:00.1/host3
/sys/bus/pci/devices/0000:54:00.0/host4
/sys/bus/pci/devices/0000:54:00.1/host5
/sys/bus/pci/devices/0000:60:00.0/host6
/sys/bus/pci/devices/0000:60:00.1/host7
The CPUs local to each PCI device can also be found in sysfs:
# cat /sys/bus/pci/devices/0000:54:00.0/local_cpulist
20-29,100-109
If the device thread is not executing on one of the listed cores, run the following command:
# /usr/sbin/iontuner.py --pinqladriver
The output from the script shows the commands it issued:
taskset -pc 20-29,100-109 947
taskset -pc 20-29,100-109 942
The script assigns CPU affinity for each discovered PID through the taskset command, using the
following parameters:
# taskset –pc <CPU mask> <PID>
PIDs can be discovered through the ps command, but each driver has its own naming convention for
these processes. For example, the following command will show QLogic driver threads:
# ps -eo comm,pid | grep qla
qla2xxx_6_dpc 942
qla2xxx_7_dpc 947
The driver thread should be pinned to the set of cores listed in the device local_cpulist.
23
On the DL980, although every I/O hub is local to two NUMA nodes, only the CPU cores from the
lower numbered node are shown as local to each PCI device. In this example, the first range (20-
29) corresponds to the CPU cores in NUMA node 2, and the second range (100-109)
corresponds to the hyper-threading cores for NUMA node 2. The second CPU core range will
only be present if hyper-threading is enabled. Though the device is also local to NUMA node 3, it
is generally sufficient to pin all devices to one of the two NUMA nodes, provided there are
enough CPU resources on a single node. Splitting pinning between the two nodes requires
extreme precision. Pinning resources from one device on two separate nodes can create poor
performance. Though both nodes may be local to the device, they are not local to each other.
These settings will not persist through a reboot.
24
Oracle Tuning
________________________________________________________________________
The following settings are specific to tuning for Oracle. A reboot must be applied in order for system
settings to take effect.
HUGEPAGES
Configuring HugePages reduces the overhead of utilizing large amounts of memory by reducing the
page table size of the Oracle System Global Area (SGA). The default HugePage size is 2 MB,
compared with the typical page size of 4 KB. With a page size of 2 MB, a 10 GB SGA will have only
5120 pages compared to 2.6 million pages without HugePages.
HugePages can be configured in /etc/sysctl.conf:
vm.nr_hugepages=55612
vm.hugetlb_shm_group=501
The number of HugePages used here is based on a recommendation from Oracle. The group should
be set to the group ID of Oracle. This can be determined using the id command.
# id –g oracle
501
After a reboot, the number of available HugePages can be verified.
# cat /proc/meminfo | grep HugePages_Total
HugePages_Total: 55612
SYSCTL PARAMETERS
The following parameters were configured for Oracle in /etc/sysctl.conf:
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
net.core.rmem_default = 4194304
net.core.rmem_max = 4194304
25
net.core.wmem_default = 262144
net.ipv4.ip_local_port_range = 9000 65500
fs.file-max = 6815744
net.core.wmem_max = 1048576
fs.aio-max-nr = 1048576
ORACLE INITIALIZATION PARAMETERS
The following parameters were set in the /opt/oracle/product/11.2.0/dbs/initorcl.ora
file:
*.db_block_size=8192
*.db_recovery_file_dest_size=2000G
*.processes=6000
*.db_writer_processes=16
*.dml_locks=80000
*.filesystemio_options='SETALL'
*.open_cursors=8192
*.optimizer_capture_sql_plan_baselines=FALSE
*.parallel_degree_policy='AUTO'
*.parallel_threads_per_cpu=2
*.pga_aggregate_target=8G
*.sga_max_size=50G
*.sga_target=50G
*.use_large_pages='only'
_enable_NUMA_support=TRUE
The _enable_NUMA_support parameter enables Oracle NUMA optimizations.
The use_large_pages parameter ensures that each NUMA segment will be backed by HugePages.
26
fio Performance Testing
________________________________________________________________________
After performing the configuration described in this document, the fio tool can be used to verify
the synthetic performance of the ION Data Accelerator configuration.
PRECONDITIONING FLASH STORAGE
Running tests immediately after a low-level format of the flash storage is not a meaningful test
for the ION Data Accelerator system or any other flash-based storage system.
It is always recommended that preconditioning be performed prior to measuring performance. When
comparing multiple flash storage solutions, it is necessary to perform the same preconditioning on
each system. Improper preconditioning can lead to extremely unrealistic performance comparisons.
Preconditioning can be performed by writing a random data pattern to the entire address range of
the device, using a consistent block size. A block size of 1MB is recommended.
TESTING THREAD CPU AFFINITY
Earlier, this document described how to align all I/O to a given LUN on a single socket. This was done
by HBA placement, restricted LUN access, target-initiator connections, IRQ affinity, and driver thread
affinity. The final component is to force the test threads accessing that LUN onto the same NUMA
node as all of the other components. Configuring this will vary depending on the test used. For the
fio test, the cpus_allowed parameter can be used as shown in the examples below.
TEST COMMANDS
The iontuner RPM provides a script that may be used to generate fio job files with optimal NUMA
tuning parameters. The RPM is made available on the Fusion-io internal network in the same location
as this document:
27
https://confluence.int.fusionio.com/display/ION/Documentation#Documentation-
IONPerformanceBrief,HPDL980(INTERNAL-ONLY)
A fio job file can be created using the following command format:
# /usr/sbin/iontuner.py --setupfio=’<parameters>’
The script generates a job file using fio parameters that have been shown to provide optimal
performance results. They also provide efficient pinning for all test threads. In addition to the built-in
parameters, options specified in the <parameters> field as a comma-separated list are also added
to the job file. This option should be used to specify read/write balance, random vs. sequential I/O,
test length, and any other parameters specific to the workload being tested.
For example, the following command can be used to generate a random 4KB read test:
# /usr/sbin/iontuner.py --
setupfio='rw=randrw,bs=4k,rwmixread=100,runtime=600,loops=10000,numjobs=1'
This command generates the following job file in /root/iontuner-fio.ini:
[global]
rw=randrw
bs=4k
rwmixread=100
runtime=600
loops=10000
numjobs=1
iodepth=256
group_reporting=1
thread=1
exitall=1
sync=0
direct=1
randrepeat=0
norandommap=1
ioengine=libaio
gtod_reduce=1
iodepth_batch=64
iodepth_batch_complete=64
iodepth_batch_submit=64
[dm-10]
filename=/dev/dm-10
offset=0
size=8409579520
cpus_allowed=20,21,22,23,24,25,26,27,28,29,100,101,102,103,104,105,106,107,108,109
[dm-8]
filename=/dev/dm-8
offset=0
size=8409579520
cpus_allowed=20,21,22,23,24,25,26,27,28,29,100,101,102,103,104,105,106,107,108,109
28
[dm-9]
filename=/dev/dm-9
offset=0
size=8409579520
cpus_allowed=0,1,2,3,4,5,6,7,8,9,80,81,82,83,84,85,86,87,88,89
[dm-6]
filename=/dev/dm-6
offset=0
size=8409579520
cpus_allowed=20,21,22,23,24,25,26,27,28,29,100,101,102,103,104,105,106,107,108,109
[dm-7]
filename=/dev/dm-7
offset=0
size=8409579520
cpus_allowed=0,1,2,3,4,5,6,7,8,9,80,81,82,83,84,85,86,87,88,89
[dm-4]
filename=/dev/dm-4
offset=0
size=8409579520
cpus_allowed=20,21,22,23,24,25,26,27,28,29,100,101,102,103,104,105,106,107,108,109
[dm-5]
filename=/dev/dm-5
offset=0
size=8409579520
cpus_allowed=0,1,2,3,4,5,6,7,8,9,80,81,82,83,84,85,86,87,88,89
[dm-3]
filename=/dev/dm-3
offset=0
size=8409579520
cpus_allowed=0,1,2,3,4,5,6,7,8,9,80,81,82,83,84,85,86,87,88,89
The numjobs parameter must be tuned specifically for each configuration. Though one job per
volume was optimal in this configuration, for ION Data Accelerator configurations with many
ioDrives it may be necessary to use four or more jobs per volume to achieve maximum performance.
The cpus_allowed parameter is used to specify a list of CPUs on which each test thread may run.
Earlier sections of this document described how to align all I/O to a given volume on a single socket
by HBA placement, restricted LUN access, target-initiator connections, IRQ affinity, and driver thread
affinity. This final component forces the test threads accessing that volume onto the same NUMA
node as all of the other components.
To manually determine which CPUs a multipath device should be pinned to, first the host number
must be obtained from the multipath command:
# multipath –l
mpathgzu (26364646430613766) dm-3 FUSIONIO,ION LUN
size=174G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0'
wp=rw
29
`-+- policy='queue-length 0' prio=0 status=active
|- 2:0:0:0 sdm 8:192 active undef running
`- 1:0:0:0 sdg 8:96 active undef running
...
The first number listed with each underlying sd* device indicates the host number. The host number
can be correlated to a PCI device by looking in sysfs:
# ls -d /sys/bus/pci/devices/*/host*
/sys/bus/pci/devices/0000:11:00.1/host1
/sys/bus/pci/devices/0000:0b:00.0/host2
...
The CPUs local to each PCI device can also be found in sysfs:
# cat /sys/bus/pci/devices/0000:11:00.1/local_cpulist
0-9,80-89
# cat /sys/bus/pci/devices/0000:0b:00.0/local_cpulist
0-9,80-89
If the devices are pathed properly, the local CPU list for each underlying device should be identical.
These CPUs should be listed in the cpus_allowed parameter of fio.
Information on the other fio parameters used here is available in the fio man page.
In addition to creating a job file, the script will output the command that can be used to run a fio
test with the job file. To run the test, copy the output of the script onto the command line:
# fio ./iontuner-fio.ini
The fio test will execute and generate test results to the terminal.
RESULTS
The following fio test results are captured in this section, all on the HP DL980 initiator:
• Sequential R/W throughput and IOPS
• Random mix R/W IOPS
• Random mix R/W throughput
All tests were performed with the following elements:
• 3 x 2.41TB ioDrive2 Duos
• 1 x RAID 0 pool
• 8 ION volumes, 2 LUNs per volume
30
• 8 direct-connect FC8 target-initiator links, 2 LUNs per initiator-target link
• 1 dm-multipath device per volume
• 1 worker/device, queue depth=256/worker
Preconditioning was performed prior to the set of tests for each block size by using fio to write to
the entire range of the device with a 1 MB block size.
SEQUENTIAL R/W THROUGHPUT AND IOPS
31
RANDOM MIX R/W IOPS
RANDOM MIX R/W THROUGHPUT
32
The results above indicate performance measured and reported by fio, and for selected tests the
numbers were compared with the output of the iostat command. The numbers were comparable.
Performance results can vary dramatically depending on the number of ION Data Accelerator
volumes used, the number of paths to each volume, and the number of test threads run per volume
(determined by the fio numjobs parameter). For this particular configuration, tests were run on a
variety of volume, path, and thread counts before determining that 8 volumes, 2 paths per volume,
and 1 thread per volume was optimal. This configuration was chosen because it provided the best
results for random read IOPS. Depending on the specifics of a configuration and the workload
chosen for optimization, other combinations may provide better results.
The above tests report the fastest random read IOPS at around 700,000 IOPS. However, to test
initiator capabilities, some benchmarks were performed immediately after formatting the ioDrives.
For example, this test achieved 800,000 IOPS:
# /usr/sbin/iontuner.py --
setupfio='rw=randrw,bs=4k,rwmixread=100,runtime=600,loops=10000,numjobs=1'
Running immediately after a format is not a meaningful test for the ION Data Accelerator system
itself, as reads are not serviced out of flash. Still, this indicated that given more ioDrives in the ION
Data Accelerator, it is likely the DL980 could have achieved even higher performance numbers.
Similarly, the fastest reported combined read and write bandwidth is 6900 MB/s. Shortly after the
cards were formatted, greater throughput was possible from the initiator:
# /usr/sbin/iontuner.py --
setupfio='rw=randrw,bs=1m,rwmixread=50,runtime=600,loops=10000,numjobs=1’
This test achieved 3740 MB/s read bandwidth and 3750 MB/s write bandwidth, for a total
bandwidth of 7490 MB/s.
A final indicator of performance limited by the ioDrives is reduced mixed bandwidth performance at
some block sizes. This is comparable to test results seen with a single ioDrive in a local server.
Writing data to the full address range prior to testing is a necessary step to achieve realistic results
with an ION Data Accelerator test. These final tests are proof that it is unlikely that the NUMA
architecture of the DL980 was the limiting factor in these fio results. The DL980 appeared to fully
exercise the performance capabilities of the ION Data Accelerator.
33
Oracle Performance Testing
________________________________________________________________________
Oracle Orion is a tool for predicting the performance of an Oracle database without having to install
Oracle or create a database. It simulates Oracle database I/O workloads using the same I/O software
stack as Oracle.
Tuning for Orion is very similar to tuning for fio. By running simultaneous copies of Orion’s
advanced test, it is possible to approximate workloads similar to fio. Alternatively, the Online
Transaction Processing (OLTP) and Data Warehouse (DSS) tests can be used to attempt to
synthetically approximate user workloads. Orion can also be used to test mixed large and small block
sizes.
TEST SETUP
The Orion tests were run as root, but it was necessary to set the ORACLE_HOME environmental
variable. To find this variable, run the following commands from an Oracle user shell:
# su – oracle
$ echo $ORACLE_HOME
/opt/oracle/product/11.2.0/db_1
$ exit
To make the variable permanent, run the following command in the terminal or add it to
~/.bashrc (the specific Oracle version will vary):
# ORACLE_HOME=/opt/oracle/product/11.2.0/db_1
The iontuner RPM provides a script that can be used to generate Orion test commands with
optimal NUMA tuning parameters. The RPM is available on the Fusion-io internal network in the
same location as this document:
https://confluence.int.fusionio.com/display/ION/Documentation#Documentation-
IONPerformanceBrief,HPDL980(INTERNAL-ONLY)
34
Orion .lun files can be created using the following command:
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --
setuporion='<parameters>’
The script generates commands that have been shown to provide optimal performance results and
efficient pinning for all test threads.
For example, the following command can be used to generate a 4KB read IOPS test:
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 0
-size_small 4 -duration 600'
The script generates .lun files saved in the current directory and outputs the following commands:
taskset -c 20-29,100-109 /opt/oracle/product/11.2.0/bin/orion -testname
iontuner-dm-6 -run advanced -matrix point –type rand -num_large 0 -
num_small 2048 -write 0 -size_small 4 -duration 600 &
taskset -c 20-29,100-109 /opt/oracle/product/11.2.0/bin/orion -testname
iontuner-dm-7 -run advanced -matrix point –type rand -num_large 0 -
num_small 2048 -write 0 -size_small 4 -duration 600 &
taskset -c 20-29,100-109 /opt/oracle/product/11.2.0/bin/orion -testname
iontuner-dm-4 -run advanced -matrix point –type rand -num_large 0 -
num_small 2048 -write 0 -size_small 4 -duration 600 &
taskset -c 20-29,100-109 /opt/oracle/product/11.2.0/bin/orion -testname
iontuner-dm-5 -run advanced -matrix point –type rand -num_large 0 -
num_small 2048 -write 0 -size_small 4 -duration 600 &
taskset -c 0-9,80-89 /opt/oracle/product/11.2.0/bin/orion -testname
iontuner-dm-2 -run advanced -matrix point –type rand -num_large 0 -
num_small 2048 -write 0 -size_small 4 -duration 600 &
taskset -c 0-9,80-89 /opt/oracle/product/11.2.0/bin/orion -testname
iontuner-dm-3 -run advanced -matrix point –type rand -num_large 0 -
num_small 2048 -write 0 -size_small 4 -duration 600 &
taskset -c 0-9,80-89 /opt/oracle/product/11.2.0/bin/orion -testname
iontuner-dm-0 -run advanced -matrix point –type rand -num_large 0 -
num_small 2048 -write 0 -size_small 4 -duration 600 &
taskset -c 0-9,80-89 /opt/oracle/product/11.2.0/bin/orion -testname
iontuner-dm-1 -run advanced -matrix point –type rand -num_large 0 -
num_small 2048 -write 0 -size_small 4 -duration 600 &
For this configuration, the best results were obtained by creating a separate .lun file for each volume
and running a single Orion test on each volume. Splitting the volumes into separate .lun files made it
possible for taskset to run each Orion test and assign it affinity to the CPUs local to the devices
being tested. The local CPUs can be determined with the multipath command using the same
method described in FIO Test Commands later in this document.
You can copy and paste the taskset commands into the terminal to run them in parallel. Because
the output from Orion displays only the maximum performance of each instance (which may
individually occur at different times), the iostat command should be used to read performance as
viewed from the initiator devices:
35
# iostat –x /dev/dm-*
TEST COMMANDS
The fio tests used for 8KB IOPS were approximated with the following commands:
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type seq -num_large 0 -num_small 2048 -write 100
-size_small 8 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 0
-size_small 8 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 0 -num_small 2048 -write
100 -size_small 8 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 75
-size_small 8 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 50
-size_small 8 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 25
-size_small 8 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 0
-size_small 8 -duration 600'
The fio tests used for 512KB bandwidth were approximated with the following commands:
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type seq -num_large 2048 -num_small 0 -write 100
-size_large 512 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 0
-size_large 512 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 2048 -num_small 0 -write
100 -size_large 512 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 75
-size_large 512 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 50
-size_large 512 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 25
-size_large 512 -duration 600'
# /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run
advanced -matrix point –type rand -num_large 2048 -num_small 0 –write
0 -size_large 512 -duration 600'
For running the DSS test, the iontuner.lun file was created with all eight volumes specified. The
36
DSS test was run with the following command:
# taskset -c 0-9,80-89,20-29,100-109 ./orion -testname iontuner -run dss
Because all devices were used in a single command, the CPUs local to all of the HBAs were specified
to taskset.
The OLTP test was run with the following command:
# taskset -c 0-9,80-89,20-29,100-109 ./orion -testname iontuner -run
oltp
RESULTS
When running Orion advanced tests that approximated fio tests for 8KB and 512KB block sizes, the
results were almost identical to fio.
There was more variation between runs than between the two utilities. Because the previous state of
the ioDrives has a large impact on the performance of any test, it is necessary when comparing test
runs to sequence tests in a consistent order and begin with the same initial ioDrive conditioning.
Providing Orion results for these tests would only bring attention to minor variations that provide no
additional information about the tuning of the DL980.
Additionally with the advanced tests, there was an unexpected behavior of Orion: for block sizes
larger than 512KB, it seems that 512KB accesses are always generated to the devices.
The DSS test resulted in a maximum bandwidth of 6039 MB/s.
There are many variations to the Orion test that could be experimented with. To get an accurate
measurement of maximum performance, it is necessary to run multiple copies of the test and
evaluate the results from iostat. With any of the test options that run multiple test points
(advanced, OLTP, DSS) there is no guarantee that all of the test copies will synchronously run each
test point. This may invalidate results.
37
Oracle Database Testing
________________________________________________________________________
For Oracle database testing, a number of tools were used to show the maximum capabilities of the
system under a variety of workloads.
READ WORKLOAD TEST – QUEST BENCHMARK FACTORY
For a more realistic Oracle test, a Windows server was connected to the DL980 via an additional Fibre
Channel link. An Oracle disk group was created containing all of the ION Data Accelerator volumes.
Quest Benchmark Factory was used to create a database on the disk group with the following
configuration:
• Size: 300GB
• Logging Mode: ARCHIVELOG
The Oracle components below were placed in one ASM disk group, +DATA, which consisted of 8
LUNs (each 800 GB) enabled with multipathing:
• Redo – 20 redo log members, each 2048 MB in size
• Archivelogs – placed in the default FRA
• FRA – db_recovery_file_dest=’+DATA’, db_recovery_file_dest_size=’3000G’
• UNDO, data, and temporary tablespaces
The ASM +DATA disk group was created with external redundancy and with a default 1MB AU size.
SYS, SYSTEM, and second UNDO tablespaces were created in the ADMIN disk group. This was done
in order to easily drop and recreate the TEST data and disk groups without having to recreate the
database.
38
For a read workload test, Quest Benchmark Factory > Database Scalability Job > TPC-H Power
Test was used.
39
The test was configured for 50 users.
40
Performance was evaluated on the DL980 while TPC-H Power Test was running.
Oracle Enterprise Manager was used to show read bandwidth during the test.
During the test Oracle showed a read bandwidth of just over 6000 MB/s.
An Automated Workload Repository (AWR) report was generated during the test. The following
excerpts provide details on the I/O performed by the test.
41
The AWR report function summary shows a total read bandwidth of 5.8 GB/s averaged over the
length of the test.
The file statistics show the breakdown of I/O for each file.
42
Using ‘iostat –mx /dev/dm-*’, a snapshot of bandwidth from the ION volumes was verified.
An approximate read bandwidth of 755MB/s was seen on each of the eight volumes, for a total read
bandwidth of 6043MB/s from the ION Data Accelerator server. The avgrq-sz column shows that
the average request size was between 512 and 1024 sectors (256 KB and 512 KB). These results are
consistent with the bandwidth of approximately 6100MB/s seen from fio in this block size range.
However, it is important to recognize that Oracle performs data transfers of many sizes
simultaneously, so the synthetic fixed block size results of fio are not a direct comparison, only an
approximation of the capability at this workload.
OLTP WORKLOAD TEST – HEAVY INSERT SCRIPT
Performance was evaluated while running a custom OLTP load generated by a script running heavy
insert database transactions on the DL980.
Oracle Enterprise Manager was used to show bandwidth and IOPS during the test.
43
During the test Oracle showed a total bandwidth of approximately 4000 MB/s.
44
An AWR report was generated during the test. The following excerpts provide details on the I/O
performed by the test.
The AWR report function summary shows a total read bandwidth of 884 MB/s and write bandwidth
of 2.6 GB/s averaged over the length of the test, or 3.5 GB/s combined.
The file statistics show the breakdown of I/O for each file.
45
Using ‘iostat –mx /dev/dm-*’, a snapshot of bandwidth from the ION volumes was verified.
A read bandwidth of 952 MB/s and write bandwidth of 2505 MB/s was seen, for a total bandwidth
of 3457 MB/s from the ION Data Accelerator server. The workload is 22% read and 78% write I/O.
The avgrq-sz column shows that the average request size was around 123 sectors, or 61KB. The
result from the fio test for a 25% read workload and 64KB block size was 3705MB/s, which is
consistent with the results of this test. Once again, it is important to recognize that Oracle performs
data transfers of many sizes simultaneously, so the synthetic fixed block size results of fio are not a
direct comparison, only an approximation of the capability at this workload.
46
TRANSACTIONS TEST – SWINGBENCH
An Order Entry Sample OLTP Test was run in Swingbench on the DL980. The test was configured
with 100 users and transaction delay disabled. Because of some difficulties with Swingbench that
were not related to performance, hyper-threading was disabled for this test.
The test resulted in an average of 934,359 transactions per minute (TPM) and a maximum of
1,150,103 TPM.
Oracle transactions vary greatly in the I/O they produce on the backend storage. A specific TPM
number such as the one provided by Swingbench is only useful when compared to a number
produced by a Swingbench test with the same parameters.
47
Conclusions
________________________________________________________________________
Prior to tuning, it is possible that performance on a NUMA system such as the HP DL980 will appear
to be lower than that of systems with less complex architectures. The script used throughout this
document for NUMA-specific tuning will be made available to simplify and standardize this tuning
process.
Synthetic benchmarks such as fio or Orion provide direct measurement of ION Data Accelerator
storage capabilities. The flexibility of these tools is extremely useful when tuning storage
configurations and initiator system parameters. The comparable results achieved by fio and Orion
indicate that either of these tools is sufficient. The configuration used at Fusion-io in San Jose was
capable of sustaining 700,000 random IOPS and up to 7GB/s in bandwidth, but there were indicators
that the DL980 would have been capable of sustaining even greater numbers when used in
combination with more ioDrives in the ION Data Accelerator.
However, synthetic benchmark performance alone does not guarantee user application performance.
Additional system parameters must be tuned for Oracle, and appropriate tests must be used to
identify the maximum performance for each specific workload. Oracle produced a read bandwidth of
up to 6GB/s and a mixed bandwidth of nearly 3.5GB/s. While these numbers may seem to be lower
than those seen by fio, they are very comparable to the results of an fio test with a similar
read/write balance and average block size. The close proximity of the Oracle results to the fio
results indicates that Oracle has been tuned to take full advantage of the performance of the
storage. Tests in Swingbench were measured at up to 1,150,103 TPM, but this number is only useful
when compared to other Swingbench results.
NUMA support is an active topic in Linux development. As newer distributions become available and
their built-in tools improve, it is likely that less manual tuning will be necessary. While tuning with
this script provided is not currently persistent, methods are being investigated to provide automatic
tuning at boot time as well as upon device discovery. When configured properly, the DL980 is a very
powerful Oracle initiator for use with the ION Data Accelerator.
48
Glossary
________________________________________________________________________
Initiator - An initiator of I/O is analogous to a client in a client/server system. Initiators use a SCSI
transport protocol to access block storage over a network. A database or mail server is an initiator,
for example.
LUN – Logical Unit Number. Targets furnish containers for I/O that are a contiguous array of blocks
identified by logical unit number. A LUN is usually synonymous with physical disk drive, since
initiators perceive it as such. For ION Data Accelerator, a LUN is a volume that has been exported to
one or more initiators.
Pool –an aggregation of IoMemory or RAIDset block devices. Block devices can be added to a pool.
Target – the opposite of an initiator, is a receiver of I/O operations, analogous to a server in a
client/server system. The target for I/O is the provider of (network) storage - a SAN disk array is a
traditional target. ION Data Accelerator is an all-flash storage target by comparison.
Volume – a logical construct identifying a unit of data storage. A volume is allocated to allow for
expandability within the space constraints of a pool. For ION Data Accelerator, a volume is not
necessarily directly linked to a physical device.
49
Appendix A: Tuning Checklist
________________________________________________________________________
The following is a complete checklist of the tuning steps described in the document that can be used
as a quick reference:
1. Check initiator HBA slot locations.
2. Check ION storage profile.
3. Verify that a sufficient number of ION volumes are used.
4. Verify that a sufficient number of LUN paths are used.
5. Verify that LUN paths are distributed so all fabric resources are balanced.
6. Verify that all LUNs for each volume are presented only to HBAs within one NUMA node.
7. Update the BIOS and verify that NUMA distances are detected properly.
8. Set the BIOS power profile to Maximum Performance.
9. Verify that cstates are disabled in the BIOS.
10. Enable Hyperthreading in the BIOS settings.
11. Disable virtualization and VT-d in the BIOS if not needed.
12. Check the addressing mode in the BIOS.
13. Disable x2APIC in the BIOS.
14. Verify multipath path_selector is queue-length
15. Disable processor cstates with boot parameters.
16. Install the iontuner RPM (tunes block devices with udev rules, disables the cpuspeed
50
daemon, disables the irqbalance daemon, and pins IRQs).
17. Use fio or Orion commands generated by iontuner when testing baseline performance.
18. Configure HugePages for Oracle.
19. Configure sysctl parameters for Oracle.
20. Configure Oracle initialization parameters, including _enable_NUMA_support and
use_large_pages.
51
Appendix B: Speeding up Oracle Database
Performance with ioMemory – an HP Session
________________________________________________________________________
This appendix is adapted from a session presented at the HP ExpertOne Technology & Solutions
Summit, Dec. 2012 in Frankfurt, Germany.
ARCHITECTURE OVERVIEW
The diagram below shows the basic topology for shared NAND flash storage using the ION Data
Accelerator connected to database servers.
Fabric
Node 1 Node 2
I/O bottlenecks in a shared storage system can be removed by strategically placing transaction logs,
the TempDB, hot (frequently accessed) tables, or the entire database on ioMemory in the ION Data
Accelerator.
52
ABOUT ION DATA ACCELERATOR
An ION Data Accelerator system consists of the following basic components:
ION Data Accelerator Software – runs as a GUI or CLI, transforming tier 1
servers into an open shared flash resource. Up to 20x performance improvement
has been achieved, compared to traditional disk-based shared storage systems.
Fusion ioMemory – is proven, tested, reliable, and fast, with thousands of
satisfied customers worldwide.
Open System Platforms – ION Data Accelerator software runs on a variety of
tier 1 servers, providing industry-leading performance, reliability, and capacity.
Hundreds of thousands of these servers are deployed in enterprises today.
Supported network protocols include Fibre Channel, SRP/InfiniBand, and iSCSI.
ION Data Accelerator Software
The ION Data Accelerator software running on the host server
• Is optimized for ioMemory
• Works on industry-standard servers
• Supports JBOD, RAID 0, and RAID 10 modes (including spare drives)
• Provides GUI, CLI, SMIS, and SNMP access
• Is easy to configure
• Enables software-defined storage
Fusion-Powered Storage Stack
The following diagram shows how the elements of a Fusion-powered software/hardware stack.
Your application
Transforms the server into a storage target
Virtual Storage Layer, a purpose-built flash access layer
Fast, reliable, cost-effective flash memory in a PCIe form factor
Tier 1 serverServer
ioMemory
VSL
ION Software
Application
53
Why ION Data Accelerator?
ION Data Accelerator provides the following advantages:
• It is a highly efficient shared storage target.
• With its low latency, high IOPS, and high bandwidth it can accelerate writes and reads in a
variety of environments, including SAP, SQL, Navision, Oracle, VMware, etc.
• It outperforms even cache hits from storage array vendors.
Because of the increased performance that ION Data Accelerator achieves, customers can
• Support more concurrent users.
• Lower response times.
• Run queries and reports faster
• Finish batch jobs in shorter time
• Increase application stability
ABOUT ION DATA ACCELERATOR HA (HIGH AVAILABILITY)
ION Data Accelerator enables a powerful and effective HA (High Availability) environment for your
shared storage, when HA licensing is enabled.
54
The diagram below shows basic LUN access (exported volumes) in an HA configuration.
LUN 0 LUN 0
LUN 1 LUN 1
LUN 0 LUN 1
40Gb
PERFORMANCE TEST RESULTS: HP DL380 / HP DL980
The following charts show performance results for an HP DL380 target running ION Data
Accelerator, with an HP DL980 initiator.
55
56
OVERVIEW OF THE ION DATA ACCELERATOR GUI
Summary Screen:
Creating a Storage Profile for the storage pool:
57
Creating volumes from the storage pool:
Setting up an initiator group (LUN masking) to access volumes:
58
Managing initiators:
Editing initiator access:
59
Managing volumes:
COMPARATIVE SOLUTIONS
The diagram below shows a winning solution for ION Data Accelerator and Oracle, compared with
rival EMC:
3PAR T400
Oracle SGA:
700 GB
• HP DL980
• Red Hat 6
• 64 or 80 cores Intel E7
• 1 TB memory
TempDB
Other
apps &
Table-
spaces
HP IO Accelerator
Redo Logs
Hot Tables
60
The table below illustrates the competitive advantages of ION Data Accelerator:
Comparison Point ION Note
Open Systems Server
Foundation
✔
Fusion-io relies on time tested open systems
server hardware while competitors are proprietary
Fusion-io Adaptive Flashback
vs. Competitor RAID
✔
VSL with Adaptive Flashback provides two
orders of magnitude better media error rates
ION RAID vs. Competition ✔
ION provides more flexibility with JBOD, RAID-
0, RAID-10 vs. one static configuration option
Street Price ($/GB) ✔
Fusion-io delivers a solution estimated to be at
least 30% lower cost/GB
Price/IOPS ✔ Fusion-io is the clear winner
Power ✔ Fusion-io draws less power
BEST PRACTICES
The following best practices are important to follow in order to achieve top performance for Oracle
testing.
• Present 16 to 32 LUNs to the host for maximum performance.
• Use the noop scheduler.
• Use round robin for multipath.conf.
• When using a DL980 as load generator, make sure you pin the I/O issuing processes.
• It doesn’t matter so much on which nodes the processes are pinned, as long as they are
pinned.
61
The maximum performance configuration shown below achieved about 700K IOPS.
DL 980
IOH 1
IOH 2
CPU 0
CPU 1
CPU 2
CPU 3
HBA 1
HBA 2
HBA 3
HBA 4
IONSwitch
HBA 1
HBA 2
BENCHMARK TEST CONFIGURATION
Below is a proof-of-concept configuration that can be extended in any direction:
A single server can achieve 600K IOPS at a 4KB block size.
Below are system configurations for the storage server (ION Data Accelerator appliance) and the
database server.
Storage Server
• DL 380p Gen8, 2 socket, 2.9GHz
• 4 x 2.4TB HP IO Accelerator
• 2 x dual-port 8Gbit Fibre Channel
62
Database Server
• DL980 G7 8s /80c, 1TB RAM
• 4 x dual-port 8Gbit Fibre Channel
RAW PERFORMANCE TEST RESULTS WITH FIO
Total IOPS
1
2
4
8
16
32
64
128
0
100000
200000
300000
400000
500000
600000
700000
1
2
4
8
16
32
64
#ofJobs
IOPS
Qdepth
ION Data Accelerator with RAID 0, 2 RAIDSETS, 32 LUNs at 4KB block size, 100% read
63
Average Completion Latency (Microseconds)
0
100000
200000
300000
400000
500000
600000
700000
0
100
200
300
400
500
600
700
800
900
1000
1 2 4 8 16 32 64 128
IOPS
Latency(µs)
# of Jobs
Latency (µs)
IOPS
ION Data Accelerator with RAID 0, 2 RAIDSETS, 32 LUNs at 4KB block size, 100% read, Qdepth = 4
Raw I/OTest: 70% Read, 30% Write
ION Data Accelerator with RAID 0, 2 RAIDSETS, 16 LUNs at 4KB block size
64
Raw I/OTest: 100% Read at 8KB
1
2
4
8
16
32
64
128
0
100000
200000
300000
400000
500000
1
2
4
8
16
32
64
400000-500000
300000-400000
200000-300000
100000-200000
0-100000
# of Jobs
ION Data Accelerator with RAID 0, 2 RAIDSETS, 32 LUNs at 8KB block size
Raw I/OTest: Read Latency (Microseconds)
1
2
4
8
16
32
64
128
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
1
2
4
8
16
32
64
16000-18000
14000-16000
12000-14000
10000-12000
8000-10000
6000-8000
4000-6000
2000-4000
0-2000
ION Data Accelerator with RAID 0, 2 RAIDSETS, 32 LUNs at 8KB block size
65
ORACLE WORKLOAD TESTS
The following configuration was used for Oracle workload testing:
Database
• 1TB of data
• Tables from million to billion rows
Data Access Pattern
• Sequential write
• Data load (bulk load, real-time)
• Full table scan
• Select data via index
• Update data via index
MB/sec
66
processes
IOPS
processes
~2.2 GB/sec
random read
67
IOPS
processes
Up to 2.5 GB/sec write
Up to 300 MB/sec
redolog
CPU Load 21 % max
Load generator:
hammerora from
http://hammerora.sourceforge.net
1 TB DB size
80 users 10ms delay
68
Cpu load: 33%
Almost no IO wait!!!
69

Weitere ähnliche Inhalte

Was ist angesagt?

Aspen polymersunitopsv8 2-usr
Aspen polymersunitopsv8 2-usrAspen polymersunitopsv8 2-usr
Aspen polymersunitopsv8 2-usrg_bumbac
 
E2027 a8v e-se
E2027 a8v e-seE2027 a8v e-se
E2027 a8v e-seyyc73
 
Dell Latitude 15 5000 Series (E5570) Service Manual PDF (English) / User ...
    Dell Latitude 15 5000 Series (E5570) Service Manual PDF (English) / User ...    Dell Latitude 15 5000 Series (E5570) Service Manual PDF (English) / User ...
Dell Latitude 15 5000 Series (E5570) Service Manual PDF (English) / User ...manualsheet
 
Dell latitude e5570 Service Manual PDF (English) / User Guide
Dell latitude e5570 Service Manual PDF (English) / User GuideDell latitude e5570 Service Manual PDF (English) / User Guide
Dell latitude e5570 Service Manual PDF (English) / User Guidemanualsheet
 
G31 m s motherboard pc
G31 m s motherboard pcG31 m s motherboard pc
G31 m s motherboard pceddyhuezo
 
G41 m vs2
G41 m vs2G41 m vs2
G41 m vs2ruisuv
 

Was ist angesagt? (13)

2dmanual v5.0
2dmanual v5.02dmanual v5.0
2dmanual v5.0
 
Aspen polymersunitopsv8 2-usr
Aspen polymersunitopsv8 2-usrAspen polymersunitopsv8 2-usr
Aspen polymersunitopsv8 2-usr
 
E2027 a8v e-se
E2027 a8v e-seE2027 a8v e-se
E2027 a8v e-se
 
Dell Latitude 15 5000 Series (E5570) Service Manual PDF (English) / User ...
    Dell Latitude 15 5000 Series (E5570) Service Manual PDF (English) / User ...    Dell Latitude 15 5000 Series (E5570) Service Manual PDF (English) / User ...
Dell Latitude 15 5000 Series (E5570) Service Manual PDF (English) / User ...
 
Dell latitude e5570 Service Manual PDF (English) / User Guide
Dell latitude e5570 Service Manual PDF (English) / User GuideDell latitude e5570 Service Manual PDF (English) / User Guide
Dell latitude e5570 Service Manual PDF (English) / User Guide
 
test
testtest
test
 
G31 m s r2.0
G31 m s r2.0G31 m s r2.0
G31 m s r2.0
 
Recovery oracle
Recovery oracleRecovery oracle
Recovery oracle
 
G31 m s motherboard pc
G31 m s motherboard pcG31 m s motherboard pc
G31 m s motherboard pc
 
9650 0801-01-sf a
9650 0801-01-sf a9650 0801-01-sf a
9650 0801-01-sf a
 
H61 m vs r2.0
H61 m vs r2.0H61 m vs r2.0
H61 m vs r2.0
 
G41 m vs2
G41 m vs2G41 m vs2
G41 m vs2
 
R44 2960
R44 2960R44 2960
R44 2960
 

Ähnlich wie ION performance brief hp dl980-8b

Web logic installation document
Web logic installation documentWeb logic installation document
Web logic installation documentTaoqir Hassan
 
oracle finance E13422
oracle finance E13422oracle finance E13422
oracle finance E13422Vijay Kumar
 
Cell management (e ran3.0 05)
Cell management (e ran3.0 05)Cell management (e ran3.0 05)
Cell management (e ran3.0 05)Danilo Silvestri
 
HP Micro Server remote access card user manual
HP Micro Server remote access card user manualHP Micro Server remote access card user manual
HP Micro Server remote access card user manualMark Rosenau
 
P6 analytics install_and_config_guide
P6 analytics install_and_config_guideP6 analytics install_and_config_guide
P6 analytics install_and_config_guidevishaalkumar11
 
Whats New In Change Auditor - 5.5
Whats New In Change Auditor - 5.5Whats New In Change Auditor - 5.5
Whats New In Change Auditor - 5.5Curtis Brenneman
 
Whats New In Change Auditor - 5.5
Whats New In Change Auditor - 5.5Whats New In Change Auditor - 5.5
Whats New In Change Auditor - 5.5Curtis Brenneman
 
Fortios v5.0-patch-release-7-release-notes
Fortios v5.0-patch-release-7-release-notesFortios v5.0-patch-release-7-release-notes
Fortios v5.0-patch-release-7-release-notesvenkadesh Prasath
 
2.oracle purchasing
2.oracle purchasing2.oracle purchasing
2.oracle purchasingTamir Taha
 
Microsoft OCSP LUNA SA PCI Integration Guide
Microsoft OCSP LUNA SA PCI Integration GuideMicrosoft OCSP LUNA SA PCI Integration Guide
Microsoft OCSP LUNA SA PCI Integration GuideChris x-MS
 
GoldenGate Fundamentals Student Guide Version 10.4
GoldenGate Fundamentals Student Guide Version 10.4 GoldenGate Fundamentals Student Guide Version 10.4
GoldenGate Fundamentals Student Guide Version 10.4 voyna
 
Installing and conf guide for hp sm connector
Installing and conf guide for hp sm connectorInstalling and conf guide for hp sm connector
Installing and conf guide for hp sm connectorTheEnferRimbaud
 
Fscm91sbil b1109
Fscm91sbil b1109Fscm91sbil b1109
Fscm91sbil b1109shivram2511
 
Mantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referenceMantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referencemistercteam
 
HPE Matrix Operating Environment 7.5 Recovery Management User Guide
HPE Matrix Operating Environment 7.5 Recovery Management User GuideHPE Matrix Operating Environment 7.5 Recovery Management User Guide
HPE Matrix Operating Environment 7.5 Recovery Management User GuideVictor Rocha
 
Getting Started on PeopleSoft InstallationJuly 2014.docx
Getting Started on PeopleSoft InstallationJuly 2014.docxGetting Started on PeopleSoft InstallationJuly 2014.docx
Getting Started on PeopleSoft InstallationJuly 2014.docxgilbertkpeters11344
 

Ähnlich wie ION performance brief hp dl980-8b (20)

Web logic installation document
Web logic installation documentWeb logic installation document
Web logic installation document
 
R12 gop
R12 gopR12 gop
R12 gop
 
oracle finance E13422
oracle finance E13422oracle finance E13422
oracle finance E13422
 
Cell management (e ran3.0 05)
Cell management (e ran3.0 05)Cell management (e ran3.0 05)
Cell management (e ran3.0 05)
 
HP Micro Server remote access card user manual
HP Micro Server remote access card user manualHP Micro Server remote access card user manual
HP Micro Server remote access card user manual
 
P6 analytics install_and_config_guide
P6 analytics install_and_config_guideP6 analytics install_and_config_guide
P6 analytics install_and_config_guide
 
Whats New In Change Auditor - 5.5
Whats New In Change Auditor - 5.5Whats New In Change Auditor - 5.5
Whats New In Change Auditor - 5.5
 
Whats New In Change Auditor - 5.5
Whats New In Change Auditor - 5.5Whats New In Change Auditor - 5.5
Whats New In Change Auditor - 5.5
 
Fortios v5.0-patch-release-7-release-notes
Fortios v5.0-patch-release-7-release-notesFortios v5.0-patch-release-7-release-notes
Fortios v5.0-patch-release-7-release-notes
 
2.oracle purchasing
2.oracle purchasing2.oracle purchasing
2.oracle purchasing
 
Microsoft OCSP LUNA SA PCI Integration Guide
Microsoft OCSP LUNA SA PCI Integration GuideMicrosoft OCSP LUNA SA PCI Integration Guide
Microsoft OCSP LUNA SA PCI Integration Guide
 
E29632
E29632E29632
E29632
 
GoldenGate Fundamentals Student Guide Version 10.4
GoldenGate Fundamentals Student Guide Version 10.4 GoldenGate Fundamentals Student Guide Version 10.4
GoldenGate Fundamentals Student Guide Version 10.4
 
Installing and conf guide for hp sm connector
Installing and conf guide for hp sm connectorInstalling and conf guide for hp sm connector
Installing and conf guide for hp sm connector
 
maxxforce-7-manual.pdf
maxxforce-7-manual.pdfmaxxforce-7-manual.pdf
maxxforce-7-manual.pdf
 
Fscm91sbil b1109
Fscm91sbil b1109Fscm91sbil b1109
Fscm91sbil b1109
 
Mantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referenceMantle programming-guide-and-api-reference
Mantle programming-guide-and-api-reference
 
Fortigate ha-50
Fortigate ha-50Fortigate ha-50
Fortigate ha-50
 
HPE Matrix Operating Environment 7.5 Recovery Management User Guide
HPE Matrix Operating Environment 7.5 Recovery Management User GuideHPE Matrix Operating Environment 7.5 Recovery Management User Guide
HPE Matrix Operating Environment 7.5 Recovery Management User Guide
 
Getting Started on PeopleSoft InstallationJuly 2014.docx
Getting Started on PeopleSoft InstallationJuly 2014.docxGetting Started on PeopleSoft InstallationJuly 2014.docx
Getting Started on PeopleSoft InstallationJuly 2014.docx
 

Mehr von Louis liu

Tcpcopy benchmark
Tcpcopy benchmarkTcpcopy benchmark
Tcpcopy benchmarkLouis liu
 
JK Log-Center architect
JK Log-Center architectJK Log-Center architect
JK Log-Center architectLouis liu
 
JKDB BACKUP Introduction
JKDB BACKUP IntroductionJKDB BACKUP Introduction
JKDB BACKUP IntroductionLouis liu
 
Infiniflash benchmark
Infiniflash benchmarkInfiniflash benchmark
Infiniflash benchmarkLouis liu
 
MySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkMySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkLouis liu
 
Nvmfs benchmark
Nvmfs benchmarkNvmfs benchmark
Nvmfs benchmarkLouis liu
 
MySQL 5.7 milestone
MySQL 5.7 milestoneMySQL 5.7 milestone
MySQL 5.7 milestoneLouis liu
 
MySQL Oslayer performace optimization
MySQL  Oslayer performace optimizationMySQL  Oslayer performace optimization
MySQL Oslayer performace optimizationLouis liu
 
MySQL async message subscription platform
MySQL async message subscription platformMySQL async message subscription platform
MySQL async message subscription platformLouis liu
 
HBASE Performane Test
HBASE Performane TestHBASE Performane Test
HBASE Performane TestLouis liu
 
Jkcn MySQLDB 架构
Jkcn MySQLDB 架构Jkcn MySQLDB 架构
Jkcn MySQLDB 架构Louis liu
 
基于Mongodb的压力评测工具 ycsb的一些概括
基于Mongodb的压力评测工具 ycsb的一些概括基于Mongodb的压力评测工具 ycsb的一些概括
基于Mongodb的压力评测工具 ycsb的一些概括Louis liu
 
My sql fabric ha and sharding solutions
My sql fabric ha and sharding solutionsMy sql fabric ha and sharding solutions
My sql fabric ha and sharding solutionsLouis liu
 
NetApp ef540 SSD Storage Test
NetApp ef540 SSD Storage TestNetApp ef540 SSD Storage Test
NetApp ef540 SSD Storage TestLouis liu
 
Exadata best practice on E-commerce area
Exadata best practice on E-commerce area Exadata best practice on E-commerce area
Exadata best practice on E-commerce area Louis liu
 
MySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryMySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryLouis liu
 
Ssd gc review
Ssd gc reviewSsd gc review
Ssd gc reviewLouis liu
 
1号店数据库架构
1号店数据库架构1号店数据库架构
1号店数据库架构Louis liu
 
Architecture of YHD
Architecture of YHDArchitecture of YHD
Architecture of YHDLouis liu
 

Mehr von Louis liu (20)

Tcpcopy benchmark
Tcpcopy benchmarkTcpcopy benchmark
Tcpcopy benchmark
 
JK Log-Center architect
JK Log-Center architectJK Log-Center architect
JK Log-Center architect
 
Wdt Test
Wdt TestWdt Test
Wdt Test
 
JKDB BACKUP Introduction
JKDB BACKUP IntroductionJKDB BACKUP Introduction
JKDB BACKUP Introduction
 
Infiniflash benchmark
Infiniflash benchmarkInfiniflash benchmark
Infiniflash benchmark
 
MySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkMySQL Tokudb engine benchmark
MySQL Tokudb engine benchmark
 
Nvmfs benchmark
Nvmfs benchmarkNvmfs benchmark
Nvmfs benchmark
 
MySQL 5.7 milestone
MySQL 5.7 milestoneMySQL 5.7 milestone
MySQL 5.7 milestone
 
MySQL Oslayer performace optimization
MySQL  Oslayer performace optimizationMySQL  Oslayer performace optimization
MySQL Oslayer performace optimization
 
MySQL async message subscription platform
MySQL async message subscription platformMySQL async message subscription platform
MySQL async message subscription platform
 
HBASE Performane Test
HBASE Performane TestHBASE Performane Test
HBASE Performane Test
 
Jkcn MySQLDB 架构
Jkcn MySQLDB 架构Jkcn MySQLDB 架构
Jkcn MySQLDB 架构
 
基于Mongodb的压力评测工具 ycsb的一些概括
基于Mongodb的压力评测工具 ycsb的一些概括基于Mongodb的压力评测工具 ycsb的一些概括
基于Mongodb的压力评测工具 ycsb的一些概括
 
My sql fabric ha and sharding solutions
My sql fabric ha and sharding solutionsMy sql fabric ha and sharding solutions
My sql fabric ha and sharding solutions
 
NetApp ef540 SSD Storage Test
NetApp ef540 SSD Storage TestNetApp ef540 SSD Storage Test
NetApp ef540 SSD Storage Test
 
Exadata best practice on E-commerce area
Exadata best practice on E-commerce area Exadata best practice on E-commerce area
Exadata best practice on E-commerce area
 
MySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryMySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summary
 
Ssd gc review
Ssd gc reviewSsd gc review
Ssd gc review
 
1号店数据库架构
1号店数据库架构1号店数据库架构
1号店数据库架构
 
Architecture of YHD
Architecture of YHDArchitecture of YHD
Architecture of YHD
 

Kürzlich hochgeladen

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

ION performance brief hp dl980-8b

  • 1. Performance Brief for the HP DL980 (Database Server) and DL380 (ION Data Accelerator™) 4.24.2013
  • 2. Copyright Notice The information contained in this document is subject to change without notice. Fusion-io MAKES NO WARRANTY OF ANY KIND WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Except to correct same after receipt of reasonable notice, Fusion-io shall not be liable for errors contained herein or for incidental and/or consequential damages in connection with the furnishing, performance, or use of this material. The information contained in this document is protected by copyright. © 2013, Fusion-io, Inc. All rights reserved. Fusion-io, the Fusion-io logo and ioDrive are registered trademarks of Fusion-io in the United States and other countries. The names of other organizations and products referenced herein are the trademarks or service marks (as applicable) of their respective owners. Unless otherwise stated herein, no association with any other organization or product referenced herein is intended or should be inferred. Fusion-io: 2855 E. Cottonwood Parkway, Box 100 Salt Lake City, UT 84121 USA (801) 424-5500
  • 3. CONTENTS Introduction ............................................................................................................................................. 1 HARDWARE ................................................................................................................................... 2 ION Data Accelerator System ................................................................................................... 2 Initiator System........................................................................................................................ 2 Storage Configuration.............................................................................................................................. 3 INITIATOR HBA PLACEMENT........................................................................................................... 3 ION DATA ACCELERATOR STORAGE POOL CONFIGURATION ........................................................ 5 ION VOLUME CONFIGURATION...................................................................................................... 5 ION LUN CONFIGURATION............................................................................................................. 6 MULTIPATH VERIFICATION ............................................................................................................. 8 Initiator BIOS Tuning ..............................................................................................................................11 UPDATING THE BIOS FOR NUMA DETECTION...............................................................................12 POWER MANAGEMENT OPTIONS.................................................................................................12 SYSTEM OPTIONS.........................................................................................................................14 ADVANCED OPTIONS...................................................................................................................16 Setting the Addressing Mode.................................................................................................16 Disabling x2APIC....................................................................................................................17 Initiator Tuning on Linux ........................................................................................................................18 MULTIPATHING ............................................................................................................................18 DISABLING PROCESSOR C-STATES IN LINUX.................................................................................18 IONTUNER RPM............................................................................................................................19 Block Device Tuning with udev Rules .....................................................................................20 Disabling the cpuspeed Daemon............................................................................................21 Pinning interrupts ..................................................................................................................21 VERIFYING THREAD PINNING........................................................................................................22 Oracle Tuning.........................................................................................................................................25 HUGEPAGES.................................................................................................................................25 SYSCTL PARAMETERS ..................................................................................................................25 ORACLE INITIALIZATION PARAMETERS.........................................................................................26 fio Performance Testing .........................................................................................................................27 PRECONDITIONING FLASH STORAGE ...........................................................................................27 TESTING THREAD CPU AFFINITY...................................................................................................27
  • 4. TEST COMMANDS .......................................................................................................................27 RESULTS.......................................................................................................................................30 SEQUENTIAL R/W THROUGHPUT AND IOPS..................................................................................31 RANDOM MIX R/W IOPS ..............................................................................................................32 RANDOM MIX R/W THROUGHPUT ...............................................................................................32 Oracle Performance Testing....................................................................................................................34 TEST SETUP ..................................................................................................................................34 TEST COMMANDS .......................................................................................................................36 RESULTS.......................................................................................................................................37 Oracle Database Testing.........................................................................................................................38 READ WORKLOAD TEST – QUEST BENCHMARK FACTORY...........................................................38 OLTP WORKLOAD TEST – HEAVY INSERT SCRIPT..........................................................................43 TRANSACTIONS TEST – SWINGBENCH .........................................................................................47 Conclusions............................................................................................................................................48 Glossary .................................................................................................................................................49 Appendix A: Tuning Checklist ................................................................................................................50 Appendix B: Speeding up Oracle Database Performance with ioMemory – an HP Session.......................52 ARCHITECTURE OVERVIEW ..........................................................................................................52 ABOUT ION DATA ACCELERATOR................................................................................................53 ION Data Accelerator Software ..............................................................................................53 Fusion-Powered Storage Stack...............................................................................................53 Why ION Data Accelerator? ...................................................................................................54 ABOUT ION DATA ACCELERATOR HA (HIGH AVAILABILITY) ........................................................54 PERFORMANCE TEST RESULTS: HP DL380 / HP DL980..................................................................55 OVERVIEW OF THE ION DATA ACCELERATOR GUI.......................................................................57 COMPARATIVE SOLUTIONS..........................................................................................................60 BEST PRACTICES ..........................................................................................................................61 BENCHMARK TEST CONFIGURATION ...........................................................................................62 RAW PERFORMANCE TEST RESULTS WITH FIO .............................................................................63 Total IOPS ..............................................................................................................................63 Average Completion Latency (Microseconds) .........................................................................64 Raw I/O Test: 70% Read, 30% Write.....................................................................................64 Raw I/O Test: 100% Read at 8KB...........................................................................................65 Raw I/O Test: Read Latency (Microseconds)............................................................................65 ORACLE WORKLOAD TESTS.........................................................................................................66
  • 5. Introduction ________________________________________________________________________ This document describes methods used to maximize performance for Oracle Database Server running on an HP DL980 and for ION Data Accelerator running on an HP DL380. These methods should provide a foundation for tuning methods with a variety of tests and customer applications. The non-uniform memory access (NUMA) architecture of the DL980 presents challenges in minimizing data transfers between multiple processor nodes, while efficiently distributing I/O processing across available resources. Without any tuning, a configuration capable of as much as 700,000 IOPS may instead achieve no more than 160,000 IOPS. Likewise, a system capable of bandwidths of up to 7 GB/s may be limited to 3.5 GB/s. Testing performed with an un-tuned initiator may reflect poorly on ION Data Accelerator performance, when in reality the ION Data Accelerator software is not the problem. The goals of this document are to • Provide an example of what is possible with a specific configuration. • Provide the tools necessary to improve performance on a variety of DL980 configurations, or with other initiator servers used with ION Data Accelerator. Depending on the ioDrives and HBAs used, as well as fabric connectivity, you may need to vary the tuning described in this document. A script has been provided to perform the most complex tuning operations, but the steps performed by the script are fully described so you can adapt them for a variety of configurations. These tuning methods were originally used to maximize performance at HP European Performance Center in Böblingen. A similar configuration was recreated at Fusion-io in San Jose, and the performance results described in this document are the results of that testing. Though there were minor variations between the two configurations, similar performance was achieved. For more details on the features and functionality of ION Data Accelerator, refer to the ION Data Accelerator User Guide. 1
  • 6. HARDWARE This section describes the hardware components used in the performance testing of the ION Data Accelerator appliance with its initiator. ION Data Accelerator System • DL380p Gen8 server • 2 x Intel Xeon E5-2640 CPUs (6 cores each, 2.5 GHz) • 64GB RAM • 3 x 2.41TB ioDrive2 Duos • 1 x QLogic 8Gbit Fibre Channel quad-port HBA • 2 x QLogic 8Gbit Fibre Channel dual-port HBAs • ION Data Accelerator 2.0.0 build 349 (VSL 3.2.3 build 950) Initiator System • HP DL980 Gen7 server • 8 x Intel Xeon E7-4870 CPUs (10 cores each, 2.4 GHz) • 256 GB RAM • 3 x Emulex 8 Gbit Fibre Channel dual-port HBAs • 1 x QLogic 8 Gbit Fibre Channel dual-port HBA • Red Hat Enterprise Linux 6.3 • Oracle Database 11g Enterprise Edition 64-bit Release 11.2.0.3.0 with ASM 2
  • 7. Storage Configuration ________________________________________________________________________ INITIATOR HBA PLACEMENT The NUMA architecture of the DL980 must be considered when choosing where to place HBAs. PCIe slots 7, 8, 9, 10, and 11 are attached to the I/O hub nearest to CPU sockets 0 and 1. PCIe slots 1, 2, 3, 4, 5, and 6 are attached to the I/O hub nearest to CPU sockets 2 and 3. PCIe slots 12, 13, 14, 15, and 16 are attached to the I/O hub nearest to CPU sockets 4 and 5. In the configurations used at HP Böblingen and Fusion-io San Jose, two HBAs were placed in slots from 1 through 6, and two HBAs were placed in slots from 7 through 11. In that configuration, I/O 3
  • 8. traffic is split between two I/O hubs. By using multiple I/O Hubs, more CPU cores can access data from the HBAs at a low cost, but there is a risk of transferring data between I/O hubs, which may cause poor performance. It is important to configure volume access such that no single volume is accessed from multiple I/O hubs. Note that even though a PCIe slot may be equidistant from two nodes, there is still less latency between cores within a node than between CPU cores on separate nodes attached to the same I/O hub. Although the diagram above shows slots 12 through 16 attached to CPU sockets 6 and 7, other documentation from HP suggests that these slots are attached to nodes 4 and 5. If using the expansion slots, it is best to manually check the location of the PCIe slots. You can use lspci to find the bus addresses of HBAs in the system: # lspci | grep "Fibre Channel" 0b:00.0 Fibre Channel: ... 0b:00.1 Fibre Channel: ... 11:00.0 Fibre Channel: ... 11:00.1 Fibre Channel: ... 54:00.0 Fibre Channel: ... 54:00.1 Fibre Channel: ... 60:00.0 Fibre Channel: ... 60:00.1 Fibre Channel: ... You can also use dmidecode to determine the PCI slot associated with each bus address: # dmidecode -t slot ... Handle 0x0908, DMI type 9, 17 bytes System Slot Information Designation: PCI-E Slot 9 Type: x8 PCI Express 2 x16 Current Usage: In Use Length: Long ID: 9 Characteristics: 3.3 V is provided PME signal is supported Bus Address: 0000:0b:00.0 ... Handle 0x090A, DMI type 9, 17 bytes System Slot Information Designation: PCI-E Slot11 Type: x8 PCI Express 2 x16 Current Usage: In Use Length: Long 4
  • 9. ID: 11 Characteristics: 3.3 V is provided PME signal is supported Bus Address: 0000:11:00.0 ... Handle 0x0901, DMI type 9, 17 bytes System Slot Information Designation: PCI-E Slot 2 Type: x8 PCI Express 2 x16 Current Usage: In Use Length: Long ID: 2 Characteristics: 3.3 V is provided PME signal is supported Bus Address: 0000:54:00.0 ... Handle 0x0905, DMI type 9, 17 bytes System Slot Information Designation: PCI-E Slot 6 Type: x8 PCI Express 2 x16 Current Usage: In Use Length: Long ID: 6 Characteristics: 3.3 V is provided PME signal is supported Bus Address: 0000:60:00.0 ION DATA ACCELERATOR STORAGE POOL CONFIGURATION A RAID 0 set was created using all three ioDrive2 Duo cards present in the ION Data Accelerator system. This was done by using the following CLI command to create a storage profile for maximum performance: admin@/> profile:create max_performance ION VOLUME CONFIGURATION Eight volumes of equal size were created from the storage pool, using the following CLI commands: admin@/> volume:create volume0 841 pool_md0 admin@/> volume:create volume1 841 pool_md0 5
  • 10. admin@/> volume:create volume2 841 pool_md0 admin@/> volume:create volume3 841 pool_md0 admin@/> volume:create volume4 841 pool_md0 admin@/> volume:create volume5 841 pool_md0 admin@/> volume:create volume6 841 pool_md0 admin@/> volume:create volume7 841 pool_md0 For ION Data Accelerator configurations with many ioDrives, it may be necessary to use 16 or more volumes to achieve maximum performance. ION LUN CONFIGURATION To provide sufficient performance as well as redundancy, LUN access should be provided through multiple ION Data Accelerator targets and multiple initiator cards. Additionally, because of the NUMA architecture characteristics of the DL980, it may be best to localize access for each volume to a single I/O hub. Volumes should be exposed so that traffic is distributed evenly across all ports. The diagram below shows the link configuration that was used at HP Böblingen. Figure 1. Link configuration used at HP Boblingen Four ports on the ION Data Accelerator system were connected to eight ports on the DL980 initiator, through a switch. On the initiator, two dual-port cards were placed in I/O hub 1 and in I/O hub 2. Exports were created on the four ports of the ION Data Accelerator to the four ports on each I/O hub of the initiator. Each volume was exported on two links: 6
  • 11. • Volume 0: t1 to i1, t4 to i4 • Volume 1: t2 to i2, t3 to i3 • Volume 2: t3 to i7, t2 to i6 • Volume 3: t1 to i5, t4 to i8 The same access pattern was repeated with every set of four subsequent volumes. Notice that access to each volume is localized to a single I/O hub on the initiator. The diagram below shows the link configuration that was used at Fusion-io San Jose. Figure 2. Link configuration used at Fusion-io San Jose Because a switch was unavailable, eight ports on the ION Data Accelerator system were directly connected to eight ports on the initiator. Each volume was exported on two links: • Volume 0: t1 to i1, t6 to i4 • Volume 1: t3 to i5, t8 to i8 • Volume 2: t2 to i2, t5 to i3 • Volume 3: t4 to i6, t7 to i7 The same access pattern was repeated with every set of four subsequent volumes. Notice that access to each volume is once again localized to a single I/O hub on the initiator. The following CLI commands were used to create initiator groups and LUNs on the ION Data Accelerator system at Fusion-io San Jose: 7
  • 12. admin@/> inigroup:create i1 10:00:00:90:fa:14:a1:fc admin@/> inigroup:create i2 10:00:00:90:fa:14:a1:fd admin@/> inigroup:create i3 10:00:00:90:fa:14:f9:d4 admin@/> inigroup:create i4 10:00:00:90:fa:14:f9:d5 admin@/> inigroup:create i5 10:00:00:90:fa:1b:03:c8 admin@/> inigroup:create i6 10:00:00:90:fa:1b:03:c9 admin@/> inigroup:create i7 21:00:00:24:ff:46:bf:ca admin@/> inigroup:create i8 21:00:00:24:ff:46:bf:cb admin@/> lun:create -b 512 volume0 i1 21:00:00:24:ff:69:d3:4c admin@/> lun:create -b 512 volume0 i6 21:00:00:24:ff:46:c0:b5 admin@/> lun:create -b 512 volume1 i3 21:00:00:24:ff:69:d3:4e admin@/> lun:create -b 512 volume1 i8 21:00:00:24:ff:45:f4:ad admin@/> lun:create -b 512 volume2 i2 21:00:00:24:ff:69:d3:4d admin@/> lun:create -b 512 volume2 i5 21:00:00:24:ff:46:c0:b4 admin@/> lun:create -b 512 volume3 i4 21:00:00:24:ff:69:d3:4f admin@/> lun:create -b 512 volume3 i7 21:00:00:24:ff:45:f4:ac admin@/> lun:create -b 512 volume4 i1 21:00:00:24:ff:69:d3:4c admin@/> lun:create -b 512 volume4 i6 21:00:00:24:ff:46:c0:b5 admin@/> lun:create -b 512 volume5 i3 21:00:00:24:ff:69:d3:4e admin@/> lun:create -b 512 volume5 i8 21:00:00:24:ff:45:f4:ad admin@/> lun:create -b 512 volume6 i2 21:00:00:24:ff:69:d3:4d admin@/> lun:create -b 512 volume6 i5 21:00:00:24:ff:46:c0:b4 admin@/> lun:create -b 512 volume7 i4 21:00:00:24:ff:69:d3:4f admin@/> lun:create -b 512 volume7 i7 21:00:00:24:ff:45:f4:ac MULTIPATH VERIFICATION When the steps above have been completed and dm-multipath has been started on the initiator, the multipath command may be used to verify the configuration. # multipath –ll mpathhes (23337613362643333) dm-2 FUSIONIO,ION LUN size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 1:0:0:0 sdd 8:48 active ready running `- 2:0:0:0 sdf 8:80 active ready running mpathhez (23330633436333064) dm-7 FUSIONIO,ION LUN size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 4:0:0:1 sdk 8:160 active ready running `- 7:0:0:1 sdq 65:0 active ready running 8
  • 13. mpathhey (23437373930653063) dm-4 FUSIONIO,ION LUN size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 0:0:0:1 sdc 8:32 active ready running `- 3:0:0:1 sdi 8:128 active ready running mpathhex (26433343437616137) dm-8 FUSIONIO,ION LUN size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 5:0:0:1 sdm 8:192 active ready running `- 6:0:0:1 sdo 8:224 active ready running mpathhew (23061313364323662) dm-5 FUSIONIO,ION LUN size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 1:0:0:1 sde 8:64 active ready running `- 2:0:0:1 sdg 8:96 active ready running mpathhev (26432353466383337) dm-6 FUSIONIO,ION LUN size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 4:0:0:0 sdj 8:144 active ready running `- 7:0:0:0 sdp 8:240 active ready running mpathheu (23637366232363564) dm-3 FUSIONIO,ION LUN size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 0:0:0:0 sdb 8:16 active ready running `- 3:0:0:0 sdh 8:112 active ready running mpathhet (23632393433663839) dm-9 FUSIONIO,ION LUN size=783G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 5:0:0:0 sdl 8:176 active ready running `- 6:0:0:0 sdn 8:208 active ready running Notice that there are eight multipath devices, each comprised of two LUNs. Each path has a number associated with it, of the form <host>:0:0:<lun#>. The host numbers correspond to specific PCI device ports. A PCI device address can be correlated to a host number by looking in sysfs: # ls -d /sys/bus/pci/devices/*/host* /sys/bus/pci/devices/0000:11:00.0/host0 9
  • 14. /sys/bus/pci/devices/0000:11:00.1/host1 /sys/bus/pci/devices/0000:0b:00.0/host2 /sys/bus/pci/devices/0000:0b:00.1/host3 /sys/bus/pci/devices/0000:54:00.0/host4 /sys/bus/pci/devices/0000:54:00.1/host5 /sys/bus/pci/devices/0000:60:00.0/host6 /sys/bus/pci/devices/0000:60:00.1/host7 For example, multipath device mpathhet has paths through hosts 5 and 6 (shown by the numbers 5:0:0:0 and 6:0:0:0), which correspond to devices 0000:54:00.1 and 0000:60:00.0. The output from the dmidecode command used in the Initiator HBA Placement section shows that this volume is exposed through HBAs in slots 2 and 6, which are both in the same I/O hub. It is important that each volume presented in multipath is accessed only through HBAs in the same I/O hub. 10
  • 15. Initiator BIOS Tuning ________________________________________________________________________ The following settings should be applied on the HP DL980 initiator, using the ROM-Based Setup Utility (RBSU) on boot. To enter the RBSU, press F9 during boot (when the F9 Setup option appears on the screen). 11
  • 16. UPDATING THE BIOS FOR NUMA DETECTION In the DL980 BIOS version dated 05/01/2012, a change was made to the SLIT node distances. This may affect performance, so it is recommended that the latest version of the BIOS be used. Incorrect SLIT node distances are a common issue with early BIOS revisions on many platforms. The BIOS version can be determined from the main BIOS screen. Alternatively, numactl can be used to verify that the node distances match the table below: # numactl –hardware ... node distances: node 0 1 2 3 4 5 6 7 0: 10 12 17 17 19 19 19 19 1: 12 10 17 17 19 19 19 19 2: 17 17 10 12 19 19 19 19 3: 17 17 12 10 19 19 19 19 4: 19 19 19 19 10 12 17 17 5: 19 19 19 19 12 10 17 17 6: 19 19 19 19 17 17 10 12 7: 19 19 19 19 17 17 12 10 POWER MANAGEMENT OPTIONS To enable maximum performance, disable the HP power management options. 1. Select Power Management Options > HP Power Profile > Maximum Performance. 12
  • 17. 13
  • 18. 2. Verify that C-states have been disabled by selecting Power Management Options > Advanced Power Management Options > Minimum Processor Idle Power Core State. “No C-states” should be highlighted in the menu. C-states may also need to be disabled in Linux, as explained later in this document. SYSTEM OPTIONS Intel Hyperthreading may or may not be beneficial to ION Data Accelerator performance. In this test setup, Hyperthreading was enabled. Other system options were set as described below. 1. Enable hyperthreading by selecting System Options > Processor Options > Intel Hyperthreading Options > Enabled. 14
  • 19. 2. Disable Virtualization if it is not required, by selecting System Options >Processor Options > Intel Virtualization Technology > Disabled. 15
  • 20. 3. Disable VT-d (Virtualization Technology for Directed I/O) by selecting System Options > Processor Options > Intel VT-d > Disabled. ADVANCED OPTIONS Setting the Addressing Mode The preferred addressing mode depends on the operating system and the amount of memory used. For all RHEL 5.x installations, use 40-bit addressing. For RHEL 6.x installations, use 40-bit addressing when 1TB or less memory is present; otherwise, 44-bit addressing must be used to take advantage of all available memory. To disable 44-bit addressing, select Advanced Options > Advanced System ROM Options > Address Mode 44-bit > Disabled. 16
  • 21. For RHEL 6.x installations using greater than 1 TB of memory, use 44-bit addressing: Advanced Options > Advanced System ROM Options > Address Mode 44-bit > Enabled. At HP Böblingen, the DL980 contained 1TB of memory, so 40-bit addressing was sufficient. Disabling x2APIC To verify that x2APIC is disabled, select Advanced Options > Advanced System ROM Options > x2APIC Options. The “Disabled” option should be highlighted; select it if it is not. 17
  • 22. Initiator Tuning on Linux ________________________________________________________________________ The following settings should be configured in Linux. In some cases, a reboot must be applied in order for changes to take effect. MULTIPATHING Typically, the preferred queuing technique is to send I/O to the path with the least number of I/Os currently queued. The following is an example of how the multipath.conf file can be configured, using a path_selector of “queue-length 0”: device { vendor "FUSIONIO" product "*" path_selector "queue-length 0" rr_min_io_rq 1 rr_weight uniform no_path_retry 20 failback 60 path_grouping_policy multibus path_checker tur } Another approach that may provide better results is setting path_selector to “round-robin”. The round-robin value uses fewer CPU cycles, but it does not correct for unbalanced performance characteristics of multiple paths, or any additional load from other devices that may be slowing down one of the paths. DISABLING PROCESSOR C-STATES IN LINUX For newer Linux kernels (2.6.32 or later) disabling CPU idle power states can boost performance. 18
  • 23. However, these must be disabled at boot time rather than in the BIOS. To disable CPU states, add intel_idle.max_cstate=0 processor.max_cstate=0 boot parameters to the /boot/grub/grub.conf file as follows: title Red Hat Enterprise Linux (2.6.32-279.el6.x86_64) root (hd0,0) kernel /vmlinuz-2.6.32-279.el6.x86_64 ro root=/dev/mapper/vg_rhel980- lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_rhel980/lv_root rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=vg_rhel980/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet intel_idle.max_cstate=0 processor.max_cstate=0 initrd /initramfs-2.6.32-279.el6.x86_64.img One way to verify that the CPU states have been disabled entirely is to verify that the CPU state sysfs files do not exist: # ls /sys/devices/system/cpu/cpu0/cpuidle ls: cannot access /sys/devices/system/cpu/cpu0/cpuidle: No such file or directory IONTUNER RPM The tuning suggestions in this section can be performed in one step by installing the iontuner RPM. The RPM is made available on the Fusion-io internal network: https://confluence.int.fusionio.com/display/ION/Documentation#Documentation- IONPerformanceBrief,HPDL980(INTERNAL-ONLY) The RPM can be installed with the following command (the RPM version may be different): # rpm –Uvh iontuner-0.0.2-1.noarch.rpm If ION LUNs have already been detected by the initiator, a reboot or reload of device drivers may be necessary after the RPM install. This servers to complete the tuning that is performed upon device discovery. If in doubt about LUN discovery, reboot. The tuning described in the following sub-sections is done by the iontuner RPM, and it does not need to be performed manually if the RPM has been installed. Detailed steps are provided here in order to completely describe the RPM function and to assist those who may need to adjust the steps for unsupported platforms. 19
  • 24. Block DeviceTuning with udev Rules Str The tuning in this section is performed by the iontuner RPM. To improve I/O performance, you should tune the I/O scheduling queues on all devices in the data path. This includes both the individual SCSI devices (/dev/sd*) and the device-mapper devices (/dev/dm-*). Three settings changes have been proven to provide a performance benefit under some workloads: 1) Always use the noop queue with ION Data Accelerator devices: # echo noop > /sys/block/<device>/queue/scheduler 2) Use strict block-request affinity. This forces the handling of I/O completion to occur on the same CPU where the request was initiated. # echo 2 > /sys/block/<device>/queue/rq_affinity Strict block-request affinity is not available on RHEL 5, and on some kernels, group affinity will be used where strict affinity is not supported. After setting the file to ‘2’, a read of the file will return ‘1’ if only CPU group affinity is available. 3) To get more consistent performance results, disable entropy pool contribution: # echo 0 > /sys/block/<device>/queue/add_random The methods described above must be run after multipath devices are configured and detected by the initiator, and they will not persist through a reboot. This is because Linux provides the udev rules mechanism, which allows for some sysfs parameters to be set upon device discovery, both at boot time and run time. The iontuner RPM installs the following rules in /etc/udev/rules.d/99-iontuner.rules: ACTION=="add|change", SUBSYSTEM=="block", ATTR{device/vendor}=="FUSIONIO", ATTR{queue/scheduler}="noop", ATTR{queue/rq_affinity}="2", ATTR{queue/add_random}="0" ACTION=="add|change", KERNEL=="dm-*", PROGRAM="/bin/bash -c 'cat /sys/block/$name/slaves/*/device/vendor | grep FUSIONIO'", ATTR{queue/scheduler}="noop", ATTR{queue/rq_affinity}="2", ATTR{queue/add_random}="0" The first rule applies scheduler, rq_affinity, and add_random changes to all SCSI block devices (/dev/sd*) whose vender is FUSIONIO. The second rule applies scheduler, rq_affinity, and add_random changes to all DM multipath devices (/dev/dm-*) that are created on top of block devices whose vendor is FUSIONIO. 20
  • 25. Disabling the cpuspeed Daemon Str The tuning in this section is performed by the iontuner RPM. Disabling the cpuspeed daemon on Linux can boost overall performance. To disable the cpuspeed daemon immediately, run this command: # service cpuspeed stop To prevent the cpuspeed daemon from running after a reboot, run this command: # chkconfig cpuspeed off Pinning interrupts Str The tuning in this section is performed by the iontuner RPM. To minimize data transfer and synchronization throughout the system, I/O interrupts should be handled on a socket close to the HBA’s I/O hub. When manually configuring IRQs, the irqbalance daemon must first be disabled. To disable the irqbalance daemon immediately, run this command: # service irqbalance stop To prevent the irqbalance daemon from running after a reboot, run this command: # chkconfig irqbalance off IRQs should be pinned for each driver that handles interrupts for ION device I/O. Typically, this is just the HBA driver. Driver IRQs can be identified in /proc/interrupts by the matching IRQ numbers to the driver prefix listed in the same row. The following table shows some common drivers and the prefix necessary to identify driver IRQs: Driver Prefix QLogic FC qla Brocade FC bfa Emulex FC lpfc Emulex iSCSI beiscsi,eth The iontuner RPM installs the iontuner service init script. This runs at boot time to distribute IRQs across the CPU cores local to HBA’s I/O hub. Below is an example of the commands issued at 21
  • 26. startup:: echo 00000000,00000000,00000000,00000000,00000001 > /proc/irq/114/smp_affinity echo 00000000,00000000,00000000,00000000,00000002 > /proc/irq/115/smp_affinity echo 00000000,00000000,00000000,00000000,00000004 > /proc/irq/116/smp_affinity echo 00000000,00000000,00000000,00000000,00000008 > /proc/irq/117/smp_affinity echo 00000000,00000000,00000000,00000000,00000010 > /proc/irq/118/smp_affinity echo 00000000,00000000,00000000,00000000,00000020 > /proc/irq/119/smp_affinity echo 00000000,00000000,00000000,00000000,00000040 > /proc/irq/120/smp_affinity echo 00000000,00000000,00000000,00000000,00000080 > /proc/irq/121/smp_affinity echo 00000000,00000000,00000000,00000000,00100000 > /proc/irq/134/smp_affinity echo 00000000,00000000,00000000,00000000,00200000 > /proc/irq/135/smp_affinity echo 00000000,00000000,00000000,00000000,00400000 > /proc/irq/136/smp_affinity echo 00000000,00000000,00000000,00000000,00800000 > /proc/irq/137/smp_affinity echo 00000000,00000000,00000000,00000000,01000000 > /proc/irq/122/smp_affinity echo 00000000,00000000,00000000,00000000,02000000 > /proc/irq/123/smp_affinity echo 00000000,00000000,00000000,00000000,04000000 > /proc/irq/124/smp_affinity echo 00000000,00000000,00000000,00000000,08000000 > /proc/irq/125/smp_affinity Affinity is set by writing to the /proc/irq/<irq#>/smp_affinity file for a given IRQ. Each IRQ is assigned affinity to a different CPU core on a node nearest to the IRQ’s PCIe device. In smp_affinity files, each core is represented by a single bit, starting with the least significant bit mapping to CPU 0. The IRQs associated with each device driver can be found by reading the /proc/interrupts file. There are ten CPU cores per node. In the example above, eight interrupts (the first eight entries) for the devices in slots 9 and 11 are mapped to node 0, and eight interrupts (the last eight entries) for the devices in slots 2 and 6 are mapped to node 2. On the DL980, each PCIe slot can be efficiently assigned to either of the nodes corresponding to its I/O hub. However, it is important that all processes related to that device be assigned to the same node. Because these settings will not persist through a reboot, the iontuner service runs each time the system is booted. VERIFYING THREAD PINNING Str The tuning in this section was not necessary in the DL980/RHEL 6.3 testing. It is included because it is unknown at this time whether it may be necessary on other platforms. To further minimize data transfer and synchronization times throughout the system, it may be beneficial to place critical I/O driver threads on the same socket as the interrupts and HBA. This may only be necessary with some drivers. For instance, this is helpful with QLogic drivers but is not necessary when using Emulex drivers because no critical work is performed in Emulex driver threads. In the case of the DL980 running RHEL 6.3, the QLogic driver threads always ran on cores local to the HBAs, even though they were not pinned. 22
  • 27. To check where QLogic driver threads are executing, run the following command: # ps –eo comm,psr | grep qla qla2xxx_6_dpc 20 qla2xxx_7_dpc 20 The number beside each process indicates the core it is currently executing on. The numbers “6” and “7” in the above example correspond to specific PCI device host numbers. You can correlate a PCI device to a host number by looking in sysfs: # ls -d /sys/bus/pci/devices/*/host* /sys/bus/pci/devices/0000:11:00.0/host0 /sys/bus/pci/devices/0000:11:00.1/host1 /sys/bus/pci/devices/0000:0b:00.0/host2 /sys/bus/pci/devices/0000:0b:00.1/host3 /sys/bus/pci/devices/0000:54:00.0/host4 /sys/bus/pci/devices/0000:54:00.1/host5 /sys/bus/pci/devices/0000:60:00.0/host6 /sys/bus/pci/devices/0000:60:00.1/host7 The CPUs local to each PCI device can also be found in sysfs: # cat /sys/bus/pci/devices/0000:54:00.0/local_cpulist 20-29,100-109 If the device thread is not executing on one of the listed cores, run the following command: # /usr/sbin/iontuner.py --pinqladriver The output from the script shows the commands it issued: taskset -pc 20-29,100-109 947 taskset -pc 20-29,100-109 942 The script assigns CPU affinity for each discovered PID through the taskset command, using the following parameters: # taskset –pc <CPU mask> <PID> PIDs can be discovered through the ps command, but each driver has its own naming convention for these processes. For example, the following command will show QLogic driver threads: # ps -eo comm,pid | grep qla qla2xxx_6_dpc 942 qla2xxx_7_dpc 947 The driver thread should be pinned to the set of cores listed in the device local_cpulist. 23
  • 28. On the DL980, although every I/O hub is local to two NUMA nodes, only the CPU cores from the lower numbered node are shown as local to each PCI device. In this example, the first range (20- 29) corresponds to the CPU cores in NUMA node 2, and the second range (100-109) corresponds to the hyper-threading cores for NUMA node 2. The second CPU core range will only be present if hyper-threading is enabled. Though the device is also local to NUMA node 3, it is generally sufficient to pin all devices to one of the two NUMA nodes, provided there are enough CPU resources on a single node. Splitting pinning between the two nodes requires extreme precision. Pinning resources from one device on two separate nodes can create poor performance. Though both nodes may be local to the device, they are not local to each other. These settings will not persist through a reboot. 24
  • 29. Oracle Tuning ________________________________________________________________________ The following settings are specific to tuning for Oracle. A reboot must be applied in order for system settings to take effect. HUGEPAGES Configuring HugePages reduces the overhead of utilizing large amounts of memory by reducing the page table size of the Oracle System Global Area (SGA). The default HugePage size is 2 MB, compared with the typical page size of 4 KB. With a page size of 2 MB, a 10 GB SGA will have only 5120 pages compared to 2.6 million pages without HugePages. HugePages can be configured in /etc/sysctl.conf: vm.nr_hugepages=55612 vm.hugetlb_shm_group=501 The number of HugePages used here is based on a recommendation from Oracle. The group should be set to the group ID of Oracle. This can be determined using the id command. # id –g oracle 501 After a reboot, the number of available HugePages can be verified. # cat /proc/meminfo | grep HugePages_Total HugePages_Total: 55612 SYSCTL PARAMETERS The following parameters were configured for Oracle in /etc/sysctl.conf: kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 net.core.rmem_default = 4194304 net.core.rmem_max = 4194304 25
  • 30. net.core.wmem_default = 262144 net.ipv4.ip_local_port_range = 9000 65500 fs.file-max = 6815744 net.core.wmem_max = 1048576 fs.aio-max-nr = 1048576 ORACLE INITIALIZATION PARAMETERS The following parameters were set in the /opt/oracle/product/11.2.0/dbs/initorcl.ora file: *.db_block_size=8192 *.db_recovery_file_dest_size=2000G *.processes=6000 *.db_writer_processes=16 *.dml_locks=80000 *.filesystemio_options='SETALL' *.open_cursors=8192 *.optimizer_capture_sql_plan_baselines=FALSE *.parallel_degree_policy='AUTO' *.parallel_threads_per_cpu=2 *.pga_aggregate_target=8G *.sga_max_size=50G *.sga_target=50G *.use_large_pages='only' _enable_NUMA_support=TRUE The _enable_NUMA_support parameter enables Oracle NUMA optimizations. The use_large_pages parameter ensures that each NUMA segment will be backed by HugePages. 26
  • 31. fio Performance Testing ________________________________________________________________________ After performing the configuration described in this document, the fio tool can be used to verify the synthetic performance of the ION Data Accelerator configuration. PRECONDITIONING FLASH STORAGE Running tests immediately after a low-level format of the flash storage is not a meaningful test for the ION Data Accelerator system or any other flash-based storage system. It is always recommended that preconditioning be performed prior to measuring performance. When comparing multiple flash storage solutions, it is necessary to perform the same preconditioning on each system. Improper preconditioning can lead to extremely unrealistic performance comparisons. Preconditioning can be performed by writing a random data pattern to the entire address range of the device, using a consistent block size. A block size of 1MB is recommended. TESTING THREAD CPU AFFINITY Earlier, this document described how to align all I/O to a given LUN on a single socket. This was done by HBA placement, restricted LUN access, target-initiator connections, IRQ affinity, and driver thread affinity. The final component is to force the test threads accessing that LUN onto the same NUMA node as all of the other components. Configuring this will vary depending on the test used. For the fio test, the cpus_allowed parameter can be used as shown in the examples below. TEST COMMANDS The iontuner RPM provides a script that may be used to generate fio job files with optimal NUMA tuning parameters. The RPM is made available on the Fusion-io internal network in the same location as this document: 27
  • 32. https://confluence.int.fusionio.com/display/ION/Documentation#Documentation- IONPerformanceBrief,HPDL980(INTERNAL-ONLY) A fio job file can be created using the following command format: # /usr/sbin/iontuner.py --setupfio=’<parameters>’ The script generates a job file using fio parameters that have been shown to provide optimal performance results. They also provide efficient pinning for all test threads. In addition to the built-in parameters, options specified in the <parameters> field as a comma-separated list are also added to the job file. This option should be used to specify read/write balance, random vs. sequential I/O, test length, and any other parameters specific to the workload being tested. For example, the following command can be used to generate a random 4KB read test: # /usr/sbin/iontuner.py -- setupfio='rw=randrw,bs=4k,rwmixread=100,runtime=600,loops=10000,numjobs=1' This command generates the following job file in /root/iontuner-fio.ini: [global] rw=randrw bs=4k rwmixread=100 runtime=600 loops=10000 numjobs=1 iodepth=256 group_reporting=1 thread=1 exitall=1 sync=0 direct=1 randrepeat=0 norandommap=1 ioengine=libaio gtod_reduce=1 iodepth_batch=64 iodepth_batch_complete=64 iodepth_batch_submit=64 [dm-10] filename=/dev/dm-10 offset=0 size=8409579520 cpus_allowed=20,21,22,23,24,25,26,27,28,29,100,101,102,103,104,105,106,107,108,109 [dm-8] filename=/dev/dm-8 offset=0 size=8409579520 cpus_allowed=20,21,22,23,24,25,26,27,28,29,100,101,102,103,104,105,106,107,108,109 28
  • 33. [dm-9] filename=/dev/dm-9 offset=0 size=8409579520 cpus_allowed=0,1,2,3,4,5,6,7,8,9,80,81,82,83,84,85,86,87,88,89 [dm-6] filename=/dev/dm-6 offset=0 size=8409579520 cpus_allowed=20,21,22,23,24,25,26,27,28,29,100,101,102,103,104,105,106,107,108,109 [dm-7] filename=/dev/dm-7 offset=0 size=8409579520 cpus_allowed=0,1,2,3,4,5,6,7,8,9,80,81,82,83,84,85,86,87,88,89 [dm-4] filename=/dev/dm-4 offset=0 size=8409579520 cpus_allowed=20,21,22,23,24,25,26,27,28,29,100,101,102,103,104,105,106,107,108,109 [dm-5] filename=/dev/dm-5 offset=0 size=8409579520 cpus_allowed=0,1,2,3,4,5,6,7,8,9,80,81,82,83,84,85,86,87,88,89 [dm-3] filename=/dev/dm-3 offset=0 size=8409579520 cpus_allowed=0,1,2,3,4,5,6,7,8,9,80,81,82,83,84,85,86,87,88,89 The numjobs parameter must be tuned specifically for each configuration. Though one job per volume was optimal in this configuration, for ION Data Accelerator configurations with many ioDrives it may be necessary to use four or more jobs per volume to achieve maximum performance. The cpus_allowed parameter is used to specify a list of CPUs on which each test thread may run. Earlier sections of this document described how to align all I/O to a given volume on a single socket by HBA placement, restricted LUN access, target-initiator connections, IRQ affinity, and driver thread affinity. This final component forces the test threads accessing that volume onto the same NUMA node as all of the other components. To manually determine which CPUs a multipath device should be pinned to, first the host number must be obtained from the multipath command: # multipath –l mpathgzu (26364646430613766) dm-3 FUSIONIO,ION LUN size=174G features='3 queue_if_no_path pg_init_retries 50' hwhandler='0' wp=rw 29
  • 34. `-+- policy='queue-length 0' prio=0 status=active |- 2:0:0:0 sdm 8:192 active undef running `- 1:0:0:0 sdg 8:96 active undef running ... The first number listed with each underlying sd* device indicates the host number. The host number can be correlated to a PCI device by looking in sysfs: # ls -d /sys/bus/pci/devices/*/host* /sys/bus/pci/devices/0000:11:00.1/host1 /sys/bus/pci/devices/0000:0b:00.0/host2 ... The CPUs local to each PCI device can also be found in sysfs: # cat /sys/bus/pci/devices/0000:11:00.1/local_cpulist 0-9,80-89 # cat /sys/bus/pci/devices/0000:0b:00.0/local_cpulist 0-9,80-89 If the devices are pathed properly, the local CPU list for each underlying device should be identical. These CPUs should be listed in the cpus_allowed parameter of fio. Information on the other fio parameters used here is available in the fio man page. In addition to creating a job file, the script will output the command that can be used to run a fio test with the job file. To run the test, copy the output of the script onto the command line: # fio ./iontuner-fio.ini The fio test will execute and generate test results to the terminal. RESULTS The following fio test results are captured in this section, all on the HP DL980 initiator: • Sequential R/W throughput and IOPS • Random mix R/W IOPS • Random mix R/W throughput All tests were performed with the following elements: • 3 x 2.41TB ioDrive2 Duos • 1 x RAID 0 pool • 8 ION volumes, 2 LUNs per volume 30
  • 35. • 8 direct-connect FC8 target-initiator links, 2 LUNs per initiator-target link • 1 dm-multipath device per volume • 1 worker/device, queue depth=256/worker Preconditioning was performed prior to the set of tests for each block size by using fio to write to the entire range of the device with a 1 MB block size. SEQUENTIAL R/W THROUGHPUT AND IOPS 31
  • 36. RANDOM MIX R/W IOPS RANDOM MIX R/W THROUGHPUT 32
  • 37. The results above indicate performance measured and reported by fio, and for selected tests the numbers were compared with the output of the iostat command. The numbers were comparable. Performance results can vary dramatically depending on the number of ION Data Accelerator volumes used, the number of paths to each volume, and the number of test threads run per volume (determined by the fio numjobs parameter). For this particular configuration, tests were run on a variety of volume, path, and thread counts before determining that 8 volumes, 2 paths per volume, and 1 thread per volume was optimal. This configuration was chosen because it provided the best results for random read IOPS. Depending on the specifics of a configuration and the workload chosen for optimization, other combinations may provide better results. The above tests report the fastest random read IOPS at around 700,000 IOPS. However, to test initiator capabilities, some benchmarks were performed immediately after formatting the ioDrives. For example, this test achieved 800,000 IOPS: # /usr/sbin/iontuner.py -- setupfio='rw=randrw,bs=4k,rwmixread=100,runtime=600,loops=10000,numjobs=1' Running immediately after a format is not a meaningful test for the ION Data Accelerator system itself, as reads are not serviced out of flash. Still, this indicated that given more ioDrives in the ION Data Accelerator, it is likely the DL980 could have achieved even higher performance numbers. Similarly, the fastest reported combined read and write bandwidth is 6900 MB/s. Shortly after the cards were formatted, greater throughput was possible from the initiator: # /usr/sbin/iontuner.py -- setupfio='rw=randrw,bs=1m,rwmixread=50,runtime=600,loops=10000,numjobs=1’ This test achieved 3740 MB/s read bandwidth and 3750 MB/s write bandwidth, for a total bandwidth of 7490 MB/s. A final indicator of performance limited by the ioDrives is reduced mixed bandwidth performance at some block sizes. This is comparable to test results seen with a single ioDrive in a local server. Writing data to the full address range prior to testing is a necessary step to achieve realistic results with an ION Data Accelerator test. These final tests are proof that it is unlikely that the NUMA architecture of the DL980 was the limiting factor in these fio results. The DL980 appeared to fully exercise the performance capabilities of the ION Data Accelerator. 33
  • 38. Oracle Performance Testing ________________________________________________________________________ Oracle Orion is a tool for predicting the performance of an Oracle database without having to install Oracle or create a database. It simulates Oracle database I/O workloads using the same I/O software stack as Oracle. Tuning for Orion is very similar to tuning for fio. By running simultaneous copies of Orion’s advanced test, it is possible to approximate workloads similar to fio. Alternatively, the Online Transaction Processing (OLTP) and Data Warehouse (DSS) tests can be used to attempt to synthetically approximate user workloads. Orion can also be used to test mixed large and small block sizes. TEST SETUP The Orion tests were run as root, but it was necessary to set the ORACLE_HOME environmental variable. To find this variable, run the following commands from an Oracle user shell: # su – oracle $ echo $ORACLE_HOME /opt/oracle/product/11.2.0/db_1 $ exit To make the variable permanent, run the following command in the terminal or add it to ~/.bashrc (the specific Oracle version will vary): # ORACLE_HOME=/opt/oracle/product/11.2.0/db_1 The iontuner RPM provides a script that can be used to generate Orion test commands with optimal NUMA tuning parameters. The RPM is available on the Fusion-io internal network in the same location as this document: https://confluence.int.fusionio.com/display/ION/Documentation#Documentation- IONPerformanceBrief,HPDL980(INTERNAL-ONLY) 34
  • 39. Orion .lun files can be created using the following command: # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME -- setuporion='<parameters>’ The script generates commands that have been shown to provide optimal performance results and efficient pinning for all test threads. For example, the following command can be used to generate a 4KB read IOPS test: # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 0 -size_small 4 -duration 600' The script generates .lun files saved in the current directory and outputs the following commands: taskset -c 20-29,100-109 /opt/oracle/product/11.2.0/bin/orion -testname iontuner-dm-6 -run advanced -matrix point –type rand -num_large 0 - num_small 2048 -write 0 -size_small 4 -duration 600 & taskset -c 20-29,100-109 /opt/oracle/product/11.2.0/bin/orion -testname iontuner-dm-7 -run advanced -matrix point –type rand -num_large 0 - num_small 2048 -write 0 -size_small 4 -duration 600 & taskset -c 20-29,100-109 /opt/oracle/product/11.2.0/bin/orion -testname iontuner-dm-4 -run advanced -matrix point –type rand -num_large 0 - num_small 2048 -write 0 -size_small 4 -duration 600 & taskset -c 20-29,100-109 /opt/oracle/product/11.2.0/bin/orion -testname iontuner-dm-5 -run advanced -matrix point –type rand -num_large 0 - num_small 2048 -write 0 -size_small 4 -duration 600 & taskset -c 0-9,80-89 /opt/oracle/product/11.2.0/bin/orion -testname iontuner-dm-2 -run advanced -matrix point –type rand -num_large 0 - num_small 2048 -write 0 -size_small 4 -duration 600 & taskset -c 0-9,80-89 /opt/oracle/product/11.2.0/bin/orion -testname iontuner-dm-3 -run advanced -matrix point –type rand -num_large 0 - num_small 2048 -write 0 -size_small 4 -duration 600 & taskset -c 0-9,80-89 /opt/oracle/product/11.2.0/bin/orion -testname iontuner-dm-0 -run advanced -matrix point –type rand -num_large 0 - num_small 2048 -write 0 -size_small 4 -duration 600 & taskset -c 0-9,80-89 /opt/oracle/product/11.2.0/bin/orion -testname iontuner-dm-1 -run advanced -matrix point –type rand -num_large 0 - num_small 2048 -write 0 -size_small 4 -duration 600 & For this configuration, the best results were obtained by creating a separate .lun file for each volume and running a single Orion test on each volume. Splitting the volumes into separate .lun files made it possible for taskset to run each Orion test and assign it affinity to the CPUs local to the devices being tested. The local CPUs can be determined with the multipath command using the same method described in FIO Test Commands later in this document. You can copy and paste the taskset commands into the terminal to run them in parallel. Because the output from Orion displays only the maximum performance of each instance (which may individually occur at different times), the iostat command should be used to read performance as viewed from the initiator devices: 35
  • 40. # iostat –x /dev/dm-* TEST COMMANDS The fio tests used for 8KB IOPS were approximated with the following commands: # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type seq -num_large 0 -num_small 2048 -write 100 -size_small 8 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 0 -size_small 8 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 100 -size_small 8 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 75 -size_small 8 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 50 -size_small 8 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 25 -size_small 8 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 0 -num_small 2048 -write 0 -size_small 8 -duration 600' The fio tests used for 512KB bandwidth were approximated with the following commands: # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type seq -num_large 2048 -num_small 0 -write 100 -size_large 512 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 0 -size_large 512 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 100 -size_large 512 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 75 -size_large 512 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 50 -size_large 512 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 2048 -num_small 0 -write 25 -size_large 512 -duration 600' # /usr/sbin/iontuner.py --oraclehome=$ORACLE_HOME --setuporion='-run advanced -matrix point –type rand -num_large 2048 -num_small 0 –write 0 -size_large 512 -duration 600' For running the DSS test, the iontuner.lun file was created with all eight volumes specified. The 36
  • 41. DSS test was run with the following command: # taskset -c 0-9,80-89,20-29,100-109 ./orion -testname iontuner -run dss Because all devices were used in a single command, the CPUs local to all of the HBAs were specified to taskset. The OLTP test was run with the following command: # taskset -c 0-9,80-89,20-29,100-109 ./orion -testname iontuner -run oltp RESULTS When running Orion advanced tests that approximated fio tests for 8KB and 512KB block sizes, the results were almost identical to fio. There was more variation between runs than between the two utilities. Because the previous state of the ioDrives has a large impact on the performance of any test, it is necessary when comparing test runs to sequence tests in a consistent order and begin with the same initial ioDrive conditioning. Providing Orion results for these tests would only bring attention to minor variations that provide no additional information about the tuning of the DL980. Additionally with the advanced tests, there was an unexpected behavior of Orion: for block sizes larger than 512KB, it seems that 512KB accesses are always generated to the devices. The DSS test resulted in a maximum bandwidth of 6039 MB/s. There are many variations to the Orion test that could be experimented with. To get an accurate measurement of maximum performance, it is necessary to run multiple copies of the test and evaluate the results from iostat. With any of the test options that run multiple test points (advanced, OLTP, DSS) there is no guarantee that all of the test copies will synchronously run each test point. This may invalidate results. 37
  • 42. Oracle Database Testing ________________________________________________________________________ For Oracle database testing, a number of tools were used to show the maximum capabilities of the system under a variety of workloads. READ WORKLOAD TEST – QUEST BENCHMARK FACTORY For a more realistic Oracle test, a Windows server was connected to the DL980 via an additional Fibre Channel link. An Oracle disk group was created containing all of the ION Data Accelerator volumes. Quest Benchmark Factory was used to create a database on the disk group with the following configuration: • Size: 300GB • Logging Mode: ARCHIVELOG The Oracle components below were placed in one ASM disk group, +DATA, which consisted of 8 LUNs (each 800 GB) enabled with multipathing: • Redo – 20 redo log members, each 2048 MB in size • Archivelogs – placed in the default FRA • FRA – db_recovery_file_dest=’+DATA’, db_recovery_file_dest_size=’3000G’ • UNDO, data, and temporary tablespaces The ASM +DATA disk group was created with external redundancy and with a default 1MB AU size. SYS, SYSTEM, and second UNDO tablespaces were created in the ADMIN disk group. This was done in order to easily drop and recreate the TEST data and disk groups without having to recreate the database. 38
  • 43. For a read workload test, Quest Benchmark Factory > Database Scalability Job > TPC-H Power Test was used. 39
  • 44. The test was configured for 50 users. 40
  • 45. Performance was evaluated on the DL980 while TPC-H Power Test was running. Oracle Enterprise Manager was used to show read bandwidth during the test. During the test Oracle showed a read bandwidth of just over 6000 MB/s. An Automated Workload Repository (AWR) report was generated during the test. The following excerpts provide details on the I/O performed by the test. 41
  • 46. The AWR report function summary shows a total read bandwidth of 5.8 GB/s averaged over the length of the test. The file statistics show the breakdown of I/O for each file. 42
  • 47. Using ‘iostat –mx /dev/dm-*’, a snapshot of bandwidth from the ION volumes was verified. An approximate read bandwidth of 755MB/s was seen on each of the eight volumes, for a total read bandwidth of 6043MB/s from the ION Data Accelerator server. The avgrq-sz column shows that the average request size was between 512 and 1024 sectors (256 KB and 512 KB). These results are consistent with the bandwidth of approximately 6100MB/s seen from fio in this block size range. However, it is important to recognize that Oracle performs data transfers of many sizes simultaneously, so the synthetic fixed block size results of fio are not a direct comparison, only an approximation of the capability at this workload. OLTP WORKLOAD TEST – HEAVY INSERT SCRIPT Performance was evaluated while running a custom OLTP load generated by a script running heavy insert database transactions on the DL980. Oracle Enterprise Manager was used to show bandwidth and IOPS during the test. 43
  • 48. During the test Oracle showed a total bandwidth of approximately 4000 MB/s. 44
  • 49. An AWR report was generated during the test. The following excerpts provide details on the I/O performed by the test. The AWR report function summary shows a total read bandwidth of 884 MB/s and write bandwidth of 2.6 GB/s averaged over the length of the test, or 3.5 GB/s combined. The file statistics show the breakdown of I/O for each file. 45
  • 50. Using ‘iostat –mx /dev/dm-*’, a snapshot of bandwidth from the ION volumes was verified. A read bandwidth of 952 MB/s and write bandwidth of 2505 MB/s was seen, for a total bandwidth of 3457 MB/s from the ION Data Accelerator server. The workload is 22% read and 78% write I/O. The avgrq-sz column shows that the average request size was around 123 sectors, or 61KB. The result from the fio test for a 25% read workload and 64KB block size was 3705MB/s, which is consistent with the results of this test. Once again, it is important to recognize that Oracle performs data transfers of many sizes simultaneously, so the synthetic fixed block size results of fio are not a direct comparison, only an approximation of the capability at this workload. 46
  • 51. TRANSACTIONS TEST – SWINGBENCH An Order Entry Sample OLTP Test was run in Swingbench on the DL980. The test was configured with 100 users and transaction delay disabled. Because of some difficulties with Swingbench that were not related to performance, hyper-threading was disabled for this test. The test resulted in an average of 934,359 transactions per minute (TPM) and a maximum of 1,150,103 TPM. Oracle transactions vary greatly in the I/O they produce on the backend storage. A specific TPM number such as the one provided by Swingbench is only useful when compared to a number produced by a Swingbench test with the same parameters. 47
  • 52. Conclusions ________________________________________________________________________ Prior to tuning, it is possible that performance on a NUMA system such as the HP DL980 will appear to be lower than that of systems with less complex architectures. The script used throughout this document for NUMA-specific tuning will be made available to simplify and standardize this tuning process. Synthetic benchmarks such as fio or Orion provide direct measurement of ION Data Accelerator storage capabilities. The flexibility of these tools is extremely useful when tuning storage configurations and initiator system parameters. The comparable results achieved by fio and Orion indicate that either of these tools is sufficient. The configuration used at Fusion-io in San Jose was capable of sustaining 700,000 random IOPS and up to 7GB/s in bandwidth, but there were indicators that the DL980 would have been capable of sustaining even greater numbers when used in combination with more ioDrives in the ION Data Accelerator. However, synthetic benchmark performance alone does not guarantee user application performance. Additional system parameters must be tuned for Oracle, and appropriate tests must be used to identify the maximum performance for each specific workload. Oracle produced a read bandwidth of up to 6GB/s and a mixed bandwidth of nearly 3.5GB/s. While these numbers may seem to be lower than those seen by fio, they are very comparable to the results of an fio test with a similar read/write balance and average block size. The close proximity of the Oracle results to the fio results indicates that Oracle has been tuned to take full advantage of the performance of the storage. Tests in Swingbench were measured at up to 1,150,103 TPM, but this number is only useful when compared to other Swingbench results. NUMA support is an active topic in Linux development. As newer distributions become available and their built-in tools improve, it is likely that less manual tuning will be necessary. While tuning with this script provided is not currently persistent, methods are being investigated to provide automatic tuning at boot time as well as upon device discovery. When configured properly, the DL980 is a very powerful Oracle initiator for use with the ION Data Accelerator. 48
  • 53. Glossary ________________________________________________________________________ Initiator - An initiator of I/O is analogous to a client in a client/server system. Initiators use a SCSI transport protocol to access block storage over a network. A database or mail server is an initiator, for example. LUN – Logical Unit Number. Targets furnish containers for I/O that are a contiguous array of blocks identified by logical unit number. A LUN is usually synonymous with physical disk drive, since initiators perceive it as such. For ION Data Accelerator, a LUN is a volume that has been exported to one or more initiators. Pool –an aggregation of IoMemory or RAIDset block devices. Block devices can be added to a pool. Target – the opposite of an initiator, is a receiver of I/O operations, analogous to a server in a client/server system. The target for I/O is the provider of (network) storage - a SAN disk array is a traditional target. ION Data Accelerator is an all-flash storage target by comparison. Volume – a logical construct identifying a unit of data storage. A volume is allocated to allow for expandability within the space constraints of a pool. For ION Data Accelerator, a volume is not necessarily directly linked to a physical device. 49
  • 54. Appendix A: Tuning Checklist ________________________________________________________________________ The following is a complete checklist of the tuning steps described in the document that can be used as a quick reference: 1. Check initiator HBA slot locations. 2. Check ION storage profile. 3. Verify that a sufficient number of ION volumes are used. 4. Verify that a sufficient number of LUN paths are used. 5. Verify that LUN paths are distributed so all fabric resources are balanced. 6. Verify that all LUNs for each volume are presented only to HBAs within one NUMA node. 7. Update the BIOS and verify that NUMA distances are detected properly. 8. Set the BIOS power profile to Maximum Performance. 9. Verify that cstates are disabled in the BIOS. 10. Enable Hyperthreading in the BIOS settings. 11. Disable virtualization and VT-d in the BIOS if not needed. 12. Check the addressing mode in the BIOS. 13. Disable x2APIC in the BIOS. 14. Verify multipath path_selector is queue-length 15. Disable processor cstates with boot parameters. 16. Install the iontuner RPM (tunes block devices with udev rules, disables the cpuspeed 50
  • 55. daemon, disables the irqbalance daemon, and pins IRQs). 17. Use fio or Orion commands generated by iontuner when testing baseline performance. 18. Configure HugePages for Oracle. 19. Configure sysctl parameters for Oracle. 20. Configure Oracle initialization parameters, including _enable_NUMA_support and use_large_pages. 51
  • 56. Appendix B: Speeding up Oracle Database Performance with ioMemory – an HP Session ________________________________________________________________________ This appendix is adapted from a session presented at the HP ExpertOne Technology & Solutions Summit, Dec. 2012 in Frankfurt, Germany. ARCHITECTURE OVERVIEW The diagram below shows the basic topology for shared NAND flash storage using the ION Data Accelerator connected to database servers. Fabric Node 1 Node 2 I/O bottlenecks in a shared storage system can be removed by strategically placing transaction logs, the TempDB, hot (frequently accessed) tables, or the entire database on ioMemory in the ION Data Accelerator. 52
  • 57. ABOUT ION DATA ACCELERATOR An ION Data Accelerator system consists of the following basic components: ION Data Accelerator Software – runs as a GUI or CLI, transforming tier 1 servers into an open shared flash resource. Up to 20x performance improvement has been achieved, compared to traditional disk-based shared storage systems. Fusion ioMemory – is proven, tested, reliable, and fast, with thousands of satisfied customers worldwide. Open System Platforms – ION Data Accelerator software runs on a variety of tier 1 servers, providing industry-leading performance, reliability, and capacity. Hundreds of thousands of these servers are deployed in enterprises today. Supported network protocols include Fibre Channel, SRP/InfiniBand, and iSCSI. ION Data Accelerator Software The ION Data Accelerator software running on the host server • Is optimized for ioMemory • Works on industry-standard servers • Supports JBOD, RAID 0, and RAID 10 modes (including spare drives) • Provides GUI, CLI, SMIS, and SNMP access • Is easy to configure • Enables software-defined storage Fusion-Powered Storage Stack The following diagram shows how the elements of a Fusion-powered software/hardware stack. Your application Transforms the server into a storage target Virtual Storage Layer, a purpose-built flash access layer Fast, reliable, cost-effective flash memory in a PCIe form factor Tier 1 serverServer ioMemory VSL ION Software Application 53
  • 58. Why ION Data Accelerator? ION Data Accelerator provides the following advantages: • It is a highly efficient shared storage target. • With its low latency, high IOPS, and high bandwidth it can accelerate writes and reads in a variety of environments, including SAP, SQL, Navision, Oracle, VMware, etc. • It outperforms even cache hits from storage array vendors. Because of the increased performance that ION Data Accelerator achieves, customers can • Support more concurrent users. • Lower response times. • Run queries and reports faster • Finish batch jobs in shorter time • Increase application stability ABOUT ION DATA ACCELERATOR HA (HIGH AVAILABILITY) ION Data Accelerator enables a powerful and effective HA (High Availability) environment for your shared storage, when HA licensing is enabled. 54
  • 59. The diagram below shows basic LUN access (exported volumes) in an HA configuration. LUN 0 LUN 0 LUN 1 LUN 1 LUN 0 LUN 1 40Gb PERFORMANCE TEST RESULTS: HP DL380 / HP DL980 The following charts show performance results for an HP DL380 target running ION Data Accelerator, with an HP DL980 initiator. 55
  • 60. 56
  • 61. OVERVIEW OF THE ION DATA ACCELERATOR GUI Summary Screen: Creating a Storage Profile for the storage pool: 57
  • 62. Creating volumes from the storage pool: Setting up an initiator group (LUN masking) to access volumes: 58
  • 64. Managing volumes: COMPARATIVE SOLUTIONS The diagram below shows a winning solution for ION Data Accelerator and Oracle, compared with rival EMC: 3PAR T400 Oracle SGA: 700 GB • HP DL980 • Red Hat 6 • 64 or 80 cores Intel E7 • 1 TB memory TempDB Other apps & Table- spaces HP IO Accelerator Redo Logs Hot Tables 60
  • 65. The table below illustrates the competitive advantages of ION Data Accelerator: Comparison Point ION Note Open Systems Server Foundation ✔ Fusion-io relies on time tested open systems server hardware while competitors are proprietary Fusion-io Adaptive Flashback vs. Competitor RAID ✔ VSL with Adaptive Flashback provides two orders of magnitude better media error rates ION RAID vs. Competition ✔ ION provides more flexibility with JBOD, RAID- 0, RAID-10 vs. one static configuration option Street Price ($/GB) ✔ Fusion-io delivers a solution estimated to be at least 30% lower cost/GB Price/IOPS ✔ Fusion-io is the clear winner Power ✔ Fusion-io draws less power BEST PRACTICES The following best practices are important to follow in order to achieve top performance for Oracle testing. • Present 16 to 32 LUNs to the host for maximum performance. • Use the noop scheduler. • Use round robin for multipath.conf. • When using a DL980 as load generator, make sure you pin the I/O issuing processes. • It doesn’t matter so much on which nodes the processes are pinned, as long as they are pinned. 61
  • 66. The maximum performance configuration shown below achieved about 700K IOPS. DL 980 IOH 1 IOH 2 CPU 0 CPU 1 CPU 2 CPU 3 HBA 1 HBA 2 HBA 3 HBA 4 IONSwitch HBA 1 HBA 2 BENCHMARK TEST CONFIGURATION Below is a proof-of-concept configuration that can be extended in any direction: A single server can achieve 600K IOPS at a 4KB block size. Below are system configurations for the storage server (ION Data Accelerator appliance) and the database server. Storage Server • DL 380p Gen8, 2 socket, 2.9GHz • 4 x 2.4TB HP IO Accelerator • 2 x dual-port 8Gbit Fibre Channel 62
  • 67. Database Server • DL980 G7 8s /80c, 1TB RAM • 4 x dual-port 8Gbit Fibre Channel RAW PERFORMANCE TEST RESULTS WITH FIO Total IOPS 1 2 4 8 16 32 64 128 0 100000 200000 300000 400000 500000 600000 700000 1 2 4 8 16 32 64 #ofJobs IOPS Qdepth ION Data Accelerator with RAID 0, 2 RAIDSETS, 32 LUNs at 4KB block size, 100% read 63
  • 68. Average Completion Latency (Microseconds) 0 100000 200000 300000 400000 500000 600000 700000 0 100 200 300 400 500 600 700 800 900 1000 1 2 4 8 16 32 64 128 IOPS Latency(µs) # of Jobs Latency (µs) IOPS ION Data Accelerator with RAID 0, 2 RAIDSETS, 32 LUNs at 4KB block size, 100% read, Qdepth = 4 Raw I/OTest: 70% Read, 30% Write ION Data Accelerator with RAID 0, 2 RAIDSETS, 16 LUNs at 4KB block size 64
  • 69. Raw I/OTest: 100% Read at 8KB 1 2 4 8 16 32 64 128 0 100000 200000 300000 400000 500000 1 2 4 8 16 32 64 400000-500000 300000-400000 200000-300000 100000-200000 0-100000 # of Jobs ION Data Accelerator with RAID 0, 2 RAIDSETS, 32 LUNs at 8KB block size Raw I/OTest: Read Latency (Microseconds) 1 2 4 8 16 32 64 128 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 1 2 4 8 16 32 64 16000-18000 14000-16000 12000-14000 10000-12000 8000-10000 6000-8000 4000-6000 2000-4000 0-2000 ION Data Accelerator with RAID 0, 2 RAIDSETS, 32 LUNs at 8KB block size 65
  • 70. ORACLE WORKLOAD TESTS The following configuration was used for Oracle workload testing: Database • 1TB of data • Tables from million to billion rows Data Access Pattern • Sequential write • Data load (bulk load, real-time) • Full table scan • Select data via index • Update data via index MB/sec 66
  • 72. IOPS processes Up to 2.5 GB/sec write Up to 300 MB/sec redolog CPU Load 21 % max Load generator: hammerora from http://hammerora.sourceforge.net 1 TB DB size 80 users 10ms delay 68
  • 73. Cpu load: 33% Almost no IO wait!!! 69