5. The Problem: Software is becoming the bottleneck
The Opportunity: Use Intel software ingredients to
unlock the potential of new media
HDD SATA NAND
SSD
NVMe* NAND
SSD
Intel® Optane™
SSD
Latency
I/O
Performance <500 IO/s
>25,000 IO/s
>400,000 IO/s
>2ms
<100µs <100µs
6. Storage
Performance
Development
Kit
6
Scalable and Efficient Software Ingredients
• User space, lockless, polled-mode components
• Up to millions of IOPS per core
• Designed for Intel Optane™ technology latencies
Intel® Platform Storage Reference Architecture
• Optimized for Intel platform characteristics
• Open source building blocks (BSD licensed)
• Available via spdk.io
8. Benefits of using SPDK
SPDK
more performance
from Intel CPUs, non-
volatile media, and
networking
FASTER TTM/
LESS RESOURCES
than developing components
from scratch
10X MORE IOPS/coreUp to for NVMe-oF* vs. Linux kernel
as NVM technologies
increase in performanceFuture ProofingProvides
for NVMe vs. Linux kernel8X MORE IOPS/coreUp to
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to
assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance
350%Up to for RocksDB workloadsBETTER Tail Latency
9. SPDK Updates: 17.03 Release (Mar 2017)
Blobstore
• Block allocator for applications
• Variable granularity, defaults to 4KB
BlobFS
• Lightweight, non-POSIX filesystem
• Page caching & prefetch
• Initially limited to DB file semantic
requirements (e.g. file name and size)
RocksDB SPDK Environment
• Implement RocksDB using BlobFS
QEMU vhost-scsi Target
• Simplified I/O path to local QEMU
guest VMs with unmodified apps
NVMe over Fabrics Improvements
• Read latency improvement
• NVMe-oF Host (Initiator) zero-copy
• Discovery code simplification
• Quality, performance & hardening fixes
Newcomponents:
broader set of use cases for SPDK
libraries & ingredients
Existingcomponents:
feature and hardening
improvements
10. Current status
Fully realizing new media performance requires software optimizations
SPDK positioned to enable developers to realize this performance
SPDK available today via http://spdk.io
Help us build SPDK as an open source community!
11.
12. Current SPDK support in BlueStore
New features
Support multiple threads for doing I/Os on NVMe SSDs via SPDK user space NVMe driver
Support running SPDK I/O threads on designated CPU cores in configuration file.
Upgrade in Ceph (now is 17.03)
Upgraded SPDK to 16.11 in Dec, 2016
Upgraded SPDK to 17.03 in April, 2017
Stability
Fixed several compilation issues, running time bugs while using SPDK.
Totally 16 SPDK related Patches are merged in Bluestore (mainly in NVMEDEVICE
module)
14. Block service exported by Ceph via iSCSI protocol
Cloud service providers which
provision VM service can use
iSCSI.
If Ceph could export block
service with good performance, it
would be easy to glue those
providers to Ceph cluster solution.
APP
Multipath
iSCSI
initiator
dm-1
sdx sdy
iSCSI
target
RBD
iSCSI
target
RBD
OSD OSD OSD OSD
OSD OSD OSD OSD
Client
iSCSI gateway
Ceph cluster
15. iSCSI + RBD Gateway
Ceph server
CPU:Intel(R) Xeon(R) CPU E5-2660 v4 @2.00GHz
Four intel P3700 SSDs
One OSD on each SSD, total 4 osds
4 pools PG number 512, one 10G image in one pool
iSCSI target server (librbd+SPDK / librbd+tgt)
CPU:Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Only one core enable
iSCSI initiator
CPU:Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
iSCSI Initiator
iSCSI Target Server
iSCSI Target
librbd
Ceph Server
OSD0 OSD1
OSD2 OSD3
21. SPDK support for Ceph in 2017
To make SPDK really useful in Ceph, we will still do the following works with
partners:
Continue stability maintenance
– Version upgrade, bug fixing in compilation/running time.
Performance enhancement
– Continue optimizing NVMEDEVICE module according to customers or partners’
feedback.
New feature Development:
– Occasionally pickup some common requirements/feedback in community and may
upstream those features in NVMEDEVICE module
22. Proposals/opportunties for better leveraging SPDK
Multiple OSD support on same NVMe Device by using SPDK.
Leverage SPDK’s multiple process features in user space NVMe driver.
Risks: Same with kernel, i.e., fail all OSDs on the device if it is fail.
Enhance cache support in NVMEDEVICE via using SPDK
Need better cache/buffer strategy for Read/Write performance improvement.
Optimize Rocksdb usage in Bluestore by SPDK’s blobfs/blobstore
Make Rocksdb use SPDK’s Blobfs/Blostore instead of kernel file system for metadata
management.
23. Leverage SPDK to accelerate the block service
exported by Ceph
Optimization in front of Ceph
Use optimized Block service daemon, e.g., SPDK iSCSI target or NVMe-oF target
Introduce Cache policy in Block service daemon.
Store Optimization inside Ceph
Use SPDK’s user space NVMe driver instead of Kernel NVMe driver (Already have)
May replace “BlueRocksEnv + Bluefs” with “BlobfsENV + Blobfs/Blobstore”.
24. Ceph RBD service
SPDK optimized iSCSI target SPDK optimized NVMe-oF target
SPDK Ceph RBD bdev module (Leverage librbd/librados)
SPDK Cache module
Existing SPDK app/module
Existing Ceph
Service/component
FileStore
Export Block Service
KVStoreBluestore
metadata
RocksDB
BlueRocksENV
Bluefs
Kernel/SPDK
driver
NVMe device
metadata
RocksDB
SPDK BlobfsENV
SPDK Blobfs/
Blobstore
SPDK NVMe
driver
NVMe device
Optimized module to
be developed (TBD in
SPDK roadmap)
Accelerate block service exported by Ceph via
SPDK
Even replace RocksDB?
25.
26. Summary
SPDK proves to useful to explore the capability of fast storage devices (e.g.,
NVMe SSDs)
But it still needs lots of development work to make SPDK useful for Bluestore in
product quality level.
Call for actions:
Call for code contribution in SPDK community
Call for leveraging SPDK for Ceph optimization, welcome to contact SPDK dev team
for help and collaboration.
27.
28. Summary
SPDK proves to useful to explore the capability of fast storage devices (e.g.,
NVMe SSDs)
But it still needs lots of development work to make SPDK useful for Bluestore in
product quality level.
Call for actions:
Call for code contribution in SPDK community
Call for leveraging SPDK for Ceph optimization, welcome to contact SPDK dev team
for help and collaboration.
29.
30. Vhost-scsi Performance
SPDK provides
1 Million IOPS with 1 core
and
8x VM performance vs. kernel!
Features Realized Benefit
High performance
storage virtualization
Increased VM
density
Reduced VM exit Reduced tail
latencies
1
11
System Configuration: Target system: 2x Intel® Xeon® E5-2695v4 (HT off), Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s, 1 DIMM per channel, 8x Intel® P3700 NVMe SSD (800GB),
4x per CPU socket, FW 8DV10102, Network: Mellanox* ConnectX-4 100Gb RDMA, direct connection between initiator and target; Initiator OS: CentOS* Linux* 7.2, Linux kernel 4.7.0-rc2, Target OS (SPDK): CentOS Linux 7.2, Linux
kernel 3.10.0-327.el7.x86_64, Target OS (Linux kernel): CentOS Linux 7.2, Linux kernel 4.7.0-rc2 Performance as measured by: fio, 4KB Random Read I/O, 2 RDMA QP per remote SSD, Numjobs=4 per SSD, Queue Depth: 32/job
10
10
10
17
8
1
0 5 10 15 20 25 30
QEMU virtio-scsi
kernel vhost-scsi
SPDK vhost-scsi
VM cores I/O processing cores
0
200000
400000
600000
800000
1000000
QEMU virtio-scsi kernel vhost-scsi SPDK vhost-scsi
I/Os handled per I/O processing core
31. Alibaba* Cloud ECS Case Study: Write Performance
Source: http://mt.sohu.com/20170228/n481925423.shtml
* Other namesand brands may be claimedas the property of others
Ali Cloud sees 300% improvement
in IOPS and latency using SPDK
0
200
400
600
800
1000
1200
1400
1 2 4 8 16 32
Latency(usec)
Queue Depth
Random Write Latency (usec)
General Virtualization Infrastructure
Ali Cloud High-Performance Storage Infrastructure with SPDK
0
50000
100000
150000
200000
250000
300000
350000
400000
1 2 4 8 16 32
IOPS
Queue Depth
Random Write 4K IOPS
General Virtualization Infrastructure
Ali Cloud High-Performance Storage Infrastructure with SPDK
32. Alibaba* Cloud ECS Case Study: MySQL Sysbench
Source:http://mt.sohu.com/20170228/n481925423.shtml
* Other names and brands may be claimed as the propertyof others
Sysbench Update sees 4.6X QPS at
10% of the latency!
0
2
4
6
8
10
12
14
16
18
Select Update
Latency(ms)
MySQL Sysbench - Latency
General Virtualization Infrastructure High Performance Virtualization with SPDK
0
20000
40000
60000
80000
100000
120000
Select Update
MySQL Sysbench - TPS/QPS
General Virtualization Infrastructure High Performance Virtualization with SPDK