From SUSECon 2015: Smooth integration of emerging Software Defined Storage technologies into traditional Data Center using Fiber Channel and iSCSI as key values for success.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Â
TUT18972: Unleash the power of Ceph across the Data Center
1. Unleash the Power of Ceph
Across the Data Center
TUT18972: FC/iSCSI for Ceph
Ettore Simone
Senior Architect
Alchemy Solutions Lab
ettore.simone@alchemy.solutions
2. 2
Agenda
⢠Introduction
⢠The Bridge
⢠The Architecture
⢠Use Cases
⢠How It Works
⢠Some Benchmarks
⢠Some Optimizations
⢠Q&A
⢠Bonus Tracks
4. 4
About Ceph
âCeph is a distributed object store and file system
designed to provide excellent performance, reliability
and scalability.â (http://ceph.com/)
FUT19336 - SUSE Enterprise Storage Overview and Roadmap
TUT20074 - SUSE Enterprise Storage Design and Performance
6. 6
Some facts
Common data centers storage solutions are built
mainly on top of Fibre Channel (yes, and NAS too).
Source: Wikibon Server SAN Research Project 2014
7. 7
Is the storage mindset changing?
New/Cloud
â Micro-services Composed Applications
â NoSQL and Distributed Database (lazy commit, replication)
â Object and Distributed Storage
SCALE-OUT
Classic
â Traditional Application â Relational DB â Traditional Storage
â Transactional Process â Commit on DB â Commit on Disk
SCALE-UP
8. 8
Is the storage mindset changing? No.
New/Cloud
â Micro-services Composed Applications
â NoSQL and Distributed Database (lazy commit, replication)
â Object and Distributed Storage
Natural playground of Ceph
Classic
â Traditional Application â Relational DB â Traditional Storage
â Transactional Process â Commit on DB â Commit on Disk
Where we want to introduce Ceph!
9. 9
Is the new kid on the block so noisy?
Ceph is cool but I cannot rearchitect my storage!
And what about my shiny big disk arrays?
I have already N protocols, why another one?
<Add your own fear here>
10. 10
SAN
SCSI
over FC
Our goal
How to achieve a non disruptive introduction of Ceph
into a traditional storage infrastructure?
NAS
NFS/SMB/iSCSI
over Ethernet
RBD
over Ethernet
Ceph
11. 11
How to let happily coexist Ceph in your
datacenter with the existing neighborhood
(traditional workloads, legacy servers, FC switches etc...)
14. 14
Back to our goal
How to achieve a non disruptive introduction of Ceph
into a traditional storage infrastructure?
RBDSAN NAS
15. 15
Linux-IO Target (LIOâ˘)
Is the most common open-source SCSI target in
modern GNU/Linux distros:
FC
FCoE
FireWire
iSCSI
iSER
SRP
loop
vHost
FABRIC BACKSTORELIO
FILEIO
IBLOCK
RBD
pSCSI
RAMDISK
TCMU
Kernel space
26. 26
Smooth transition
Native migration of SAN/LUN to RBD/Volumes help
migration/conversion/coexisting:
Traditional Workloads Private Cloud
CephSAN GW
New Workloads
27. 27
Smooth transition
Native migration of SAN/LUN to RBD/Volumes help
migration/conversion/coexisting:
Traditional Workloads Private Cloud
CephSAN GW
New Workloads
28. 28
Smooth transition
Native migration of SAN/LUN to RBD/Volumes help
migration/conversion/coexisting:
Traditional Workloads Private Cloud
CephSAN GW
New Workloads
29. 29
Smooth transition
Native migration of SAN/LUN to RBD/Volumes help
migration/conversion/coexisting:
Traditional Workloads Private Cloud
CephSAN GW
New Workloads
30. 30
Storage replacement
No drama at the End of Life/Support of traditional
storages:
Traditional Workloads Private Cloud
CephGW
New Workloads
33. 33
Ceph and Linux-IO
SCSI commands from fabrics are addressed by LIO
core, configured using targetcli or directly via sysfs,
and proxied to the interested block device through the
relative backstore module.
CLIENTS
CEPHCLUSTER
/sys/kernel/config/target
â user space â
â kernel space â
34. 34
Enable QLocig in target mode
# modprobe qla2xxx qlini_mode="disabled"
CLIENTS
CEPHCLUSTER
/sys/kernel/config/target
â user space â
â kernel space â
35. 35
Identify and enable HBAs
# cat
/sys/class/scsi_host/host*/device/fc_host/h
ost*/port_name |
sed -e 's/../:&/g' -e 's/:0x://'
# targetcli qla2xxx/ create ${WWPN}
CLIENTS
CEPHCLUSTER
/sys/kernel/config/target
â user space â
â kernel space â
36. 36
Map RBDs and create backstores
# rbd map -p ${POOL} ${VOL}
# targetcli backstores/rbd create name="$
{POOL}-${VOL}" dev="${DEV}"
CLIENTS
CEPHCLUSTER
/sys/kernel/config/target
/dev/rbd0
â user space â
â kernel space â
37. 37
Create LUNs connected to RBDs
# targetcli qla2xxx/${WWPN}/luns create
/backstores/rbd/${POOL}-${VOL}
CLIENTS
CEPHCLUSTER
/sys/kernel/config/target
/dev/rbd0
â user space â
â kernel space â
LUN0
38. 38
âZoningâ to filter access with ACLs
# targetcli qla2xxx/${WWPN}/acls create $
{INITIATOR} true
CLIENTS
CEPHCLUSTER
/sys/kernel/config/target
/dev/rbd0
â user space â
â kernel space â
LUN0
40. 40
First of all...
This solution is NOT a drop in replacement for SAN nor
NAS (at the moment at least!).
The main focus is to identify how to minimize the
overhead from native RBD to FC/iSCSI.
41. 41
Raw performance/estimation on 15K
Physical Disk IOPS: Ceph IOPS:
â 4K RND Read = 193 x 24 = 4.632
â 4K RND Write = 178 x 24 / 3 = 1.424 / 3 = 475
Physical Disk Throughput: Ceph Throughput:
â 512K RND Read = 108 MB/s x 24 = 2.600
â 512K RND Write = 105 MB/s x 24 / 3 = 840 / 2 = 420 MB/s
NOTE:
â 24 OSD and 3 Replicas per Pool
â No SSD for journal (so ~1/3 IOPS and ~1/2 of bandwidth for
writes)
44. 46
Where we are working on
Centralized management with GUI/CLI
â Deploy MON/OSD/GW nodes
â Manage Nodes/Disk/Pools/Map/LIO
â Monitor cluster and node status
Reaction on failures
Using librados/librbd with tcmu for backstore
46. 48
More integration with existing tools
Extend LRBD do accept multiple Fabric:
â iSCSI (native support)
â FC
â FCoE
Linux-IO:
â Use of librados via tcmu
48. 50
I/O scheduler matter!
On OSD nodes:
â deadline on physical disk (cfq if ionice scrub thread)
â noop on RAID disk
â read_ahead_kb=2048
On Gateway nodes:
â noop on mapped RBD
On Client nodes:
â noop or deadline on multipath device
50. 52
Design optimizations
⢠SSD on monitor nodes for LevelDB: decrease CPU,
memory usage and time during recovery
⢠SSD Journal decrease I/O latency: 3x IOPS and better
throughput
55. 57
Business Continuity architecture
Low latency connected sites:
WARNING: To improve availability a third site to place a
quorum node are highly encouraged.
56. 58
Disaster Recovery architecture
High latency or disconnected sites:
As in OpenStack Ceph plug-in for Cinder Backup:
# rbd export-diff pool/image@end --from-snap start - |
ssh -C remote rbd import-diff â pool/image
57. 59
KVM Gateways
⢠VT-x Physical passthrough of QLogic
⢠RBD Volumes as VirtIO devices
⢠Linux-IO iblock backstore
71. Unpublished Work of SUSE LLC. All Rights Reserved.
This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC.
Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their
assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,
abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making
purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and
specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The
development, release, and timing of features or functionality described for SUSE products remains at the sole discretion
of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time,
without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this
presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-
party trademarks are the property of their respective owners.