Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Ceph on rdma

5.248 Aufrufe

Veröffentlicht am

TCP vs RDMA implementation on Ceph

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Ceph on rdma

  1. 1. Emerging Storage Solutions (EMS) SanDisk Confidential 1c CEPH Performance on XIO
  2. 2. Emerging Storage Solutions (EMS) SanDisk Confidential 2 Setup  4 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1
  3. 3. Emerging Storage Solutions (EMS) SanDisk Confidential 3 Result Transport IOPS BW % of read served from disk User%cpu Sys%cpu %idle TCP ~50K ~200M ~99 ~15 ~12 ~55 RDMA ~130K ~520M ~99 ~40 ~19 ~11 Summary: • ~1.5X performance gain • TCP iops/core = 2777, XIO iops/core = 3651
  4. 4. Emerging Storage Solutions (EMS) SanDisk Confidential 4 Setup  16 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1
  5. 5. Emerging Storage Solutions (EMS) SanDisk Confidential 5 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~118K ~470M ~99% ~3 ~26 ~16% RDMA ~120K ~480M ~99% ~7 ~25 ~28% Summary: • TCP is catching up; TCP iops/core = 3041, XIO iops/core = 3225 in cluster nodes • More memory consumed by XIO
  6. 6. Emerging Storage Solutions (EMS) SanDisk Confidential 6 Setup  16 OSDs, one per SSD (4TB)  2 hosts, 8 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~6 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1
  7. 7. Emerging Storage Solutions (EMS) SanDisk Confidential 7 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~175K ~700M ~99% ~8 ~18 ~16% RDMA ~238K ~952M ~99% ~14 ~20 ~28% Summary: • ~36% performance gain • TCP iops/core = 4755, XIO iops/core = 6918 in cluster nodes • More than 10% memory usage by RDMA
  8. 8. Emerging Storage Solutions (EMS) SanDisk Confidential 8 Setup  32 OSDs, one per SSD (4TB)  2 hosts, 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~6 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1,15:1,5:2
  9. 9. Emerging Storage Solutions (EMS) SanDisk Confidential 9 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~214K ~775M ~99% ~9 ~12 ~16% RDMA ~230K ~870M ~99% ~12 ~18 ~28% Summary: • TCP is catching up again, not much of gain • TCP iops/core = 2939, XIO iops/core = 3267 in cluster nodes • More emory usage per cluster node
  10. 10. Emerging Storage Solutions (EMS) SanDisk Confidential 10 Did some testing with more powerful setup  8 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 25:1
  11. 11. Emerging Storage Solutions (EMS) SanDisk Confidential 11 Result Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~148K ~505M ~99% ~15 ~68 ~11% RDMA ~166K ~665M ~99% ~18 ~73 ~19% Summary: • ~12% performance gain • TCP iops/core = 3109, XIO iops/core = 3616 in cluster nodes. • For client node, TCP iops/core = 8258, XIO iops/core = 10978 • More than 8% memory usage by RDMA
  12. 12. Emerging Storage Solutions (EMS) SanDisk Confidential 12 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~265K ~1037M ~0 ~35 ~40 ~11% RDMA ~276K ~1084M ~0 ~60 ~63 ~19% Summary: • Not much difference throughput wise • But, significant difference here.. TCP iops/core = 7280, XIO iops/core = 12,321 in cluster nodes • More than 8% memory usage by RDMA
  13. 13. Emerging Storage Solutions (EMS) SanDisk Confidential 13 Bumping up OSDs on the same setup  16 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  14. 14. Emerging Storage Solutions (EMS) SanDisk Confidential 14 Result Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~142K ~505M ~99% ~18 ~68 ~18% RDMA ~166K ~665M ~99% ~18 ~73 ~38% Summary: • TCP iops/core = 3092, XIO iops/core = 3614 in cluster nodes • TCP iops/core = 7924, XIO iops/core = 10978 • More than 2X memory usage by RDMA • No t much scaling between 8 and 16 OSDs for both TCP/RDMA !!! Nothing is saturated at this point.
  15. 15. Emerging Storage Solutions (EMS) SanDisk Confidential 15 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~268K ~1049M ~0 ~37 ~37 ~17% RDMA ~400K (when osd side portal thread = 2, client side = 8) ~1600M ~0 ~40 ~42 ~40% Summary: • Well, suspecting some lock contention in the OSD layer, started playing with xio portal threads • With less number of portal threads (2) in the OSD node, bumped up the no disk hit performance to 400K !! • I can see increasing XIO portal threads in OSD layer decreasing performance in this case • Tried with some shard options but TCP remains almost similar to 8 OSD case. Seems like this is a limit.
  16. 16. Emerging Storage Solutions (EMS) SanDisk Confidential 16 Checking the scale out nature  32OSDs, one per SSD (4TB)  2 nodes with 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  17. 17. Emerging Storage Solutions (EMS) SanDisk Confidential 17 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~323K ~1263M ~0 ~40 ~12 ~18.7% RDMA ~343K ~1339M ~0 ~55 ~30 ~37.5% Summary: • TCP is scaling but not XIO ! • In fact it is giving less throughput than 16 OSD setup ! • TCP iops/core = 4806, XIO iops/core = 6805 in cluster nodes • TCP iops/core = 6565, XIO iops/core=8750, even more significant in the client nodes • XIO mem usage per node is again ~2X
  18. 18. Emerging Storage Solutions (EMS) SanDisk Confidential 18 Result Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~249K ~973M ~99% ~22 ~18 ~15.5% RDMA ~258K ~1006M ~99% ~24 ~40 ~38% Summary: • TCP/XIO similar throughput • TCP iops/core = 5422, XIO iops/core=7678. Significant gain with XIO in client side • XIO mem usage per node is again more than 2X
  19. 19. Emerging Storage Solutions (EMS) SanDisk Confidential 19 Trying out bigger block sizes  32OSDs, one per SSD (4TB)  2 nodes with 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 1 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Couldn’t able to run 4 clients in parallel in case of XIO  Block size = 16K/64K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  20. 20. Emerging Storage Solutions (EMS) SanDisk Confidential 20 Result(32OSDS,16K,1client ) Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~150K ~2354M ~99% ~35 ~48 ~15.5% RDMA ~152K (spiky) ~2355M ~99% ~40 ~60 ~38% Summary: • TCP/XIO similar throughput • XIO is very spiky • Couldn’t run more than 1 client (8 num_jobs) with XIO. • But, cpu gain is visible
  21. 21. Emerging Storage Solutions (EMS) SanDisk Confidential 21 Result(32OSDS, 1 client) Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~53K ~3312M ~99% ~57 ~74 ~15.5% RDMA ~55K (but spiky) ~3625M ~99% ~57 ~82 ~39% Summary: • TCP/XIO similar throughput • XIO is very spiky • Couldn’t run more than 1 client (8 num_jobs) with XIO. • But, cpu gain is visible specially in client side
  22. 22. Emerging Storage Solutions (EMS) SanDisk Confidential 22 Summary  Highlights: – Definite improvement on iops/core – Single client is much more efficient with XIO messenger – Lower number of OSDs can give high throughput – If we can fix the internal XIO messenger contention, it has potential to outperform TCP in a big way  Lowlights: – TCP is catching up fast with increasing OSDs – TCP also scaling out well than XIO I guess – XIO present state is *unstable*, some crash/peering problem – Startup time for a connection is much higher for XIO – XIO connection is taking time to stabilize to a fix throughput – Memory requirement is considerably higher

×