Designing High Performance Ceph at Scale

•Als PPTX, PDF herunterladen•

0 gefällt mir•1,552 views

The document discusses strategies for optimizing Ceph performance at scale. It describes the presenters' typical node configurations, including storage nodes with 72 HDDs and NVME journals, and monitor/RGW nodes. Various techniques are discussed like ensuring proper NUMA alignment of processes, IRQs, and mount points. General tuning tips include using latest drivers, OS tuning, and addressing network issues. The document stresses that monitors can become overloaded during large rebalances and deleting large pools, so more than one monitor is needed for large clusters.

Technologie

Designing for
High Performance Ceph at Scale
April 26, 2016
James Saint-Rossy - Principal Storage Engineer, Comcast
John Benton - Consulting Systems Engineer, WWT

Today’s Agenda
• Our Lab/Production Environment
• Holistic Architecture
• Strategies for Benchmarking
• Performance Bottlenecks/Lessons Learned
• Tuning Tips and Tricks
Designing for High Performance Ceph at Scale2

Our Typical Node Configuration
Storage Node
• 72 X 6 TB SATA 7.2K HDD’s
• 3 X 1.6TB PCIe NVME’s (Journals)
• 2 X Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (12 cores)
• 256 GB of RAM
• Dual Port 40Gbe NIC
Mon/RGW Node
• 2 x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
• 32 GB of Ram
• Dual Port 10 Gbe NIC
• ...Nothing Special
3 Designing for High Performance Ceph at Scale

Lab/Production Environment Layout
Designing for High Performance Ceph at Scale4

Holistic Architecture
Customer Requirements
-IOPS/Read Write Mix/Object Size …
-How Much Replication
-Which APIs
Cost
-HW Cost/Support Cost/Operational Cost?
Failure Domain
-Servers/Racks/Servers/Rows Etc...
Data Center Constraints
-Space/Power/Thermal
Operational Complexity
-Complex Hardware Configs
Designing for High Performance Ceph at Scale5

Holistic Architecture Cont’d
Journals
- Colocated?
- SSD vs NVME?
Designing for High Performance Ceph at Scale6

Strategies for Benchmarking
Tools
-Fio for block
-Cosbench for object
IOPS Isn’t Everything
-1000 workers may give you 30% more iops but at the
cost of 600% higher latency
Verify Published Stats With Benchmarks
-… Always
Verify Scale-Out
Designing for High Performance Ceph at Scale7

Performance - TCMalloc
• As cluster size increased, %SYS was increasingly taxed
• System profiling revealed up to 50% of CPU resources used by TCMalloc
• This library can be tuned to have more memory. This was good for nearly a
50% increase
Designing for High Performance Ceph at Scale8

Modern PC Architecture
9 Designing for High Performance Ceph at Scale

Performance - Inter-node data flow
10 Designing for High Performance Ceph at Scale

OSD Data Workflow
11
"complicated situation" by bandinisonfire is licensed under CC BY-NC-SA 2.0
Designing for High Performance Ceph at Scale

Performance - NUMA
• The bigger and faster the data node, the bigger the
bottleneck potential
• We tuned several areas to avoid unnecessary trips
across the QPI bus
• To map everything you must:
• Map CPU cores to sockets
• Map PCIE devices to sockets
• Map storage disks (and journals) to the associated
HBA
Designing for High Performance Ceph at Scale12

NUMA - IRQs
Pin all soft IRQs for all IO devices to it’s associated NUMA
node
13 Designing for High Performance Ceph at Scale

NUMA - Mount Points
Align mount points so that the OSD and journal are on the
same NUMA node
14 Designing for High Performance Ceph at Scale

NUMA - OSD Processes
Pin OSD processes to the NUMA node associated with the
storage it controls
15 Designing for High Performance Ceph at Scale

Performance - General Tips
• Use latest vendor drivers.
-We have seen 30% improvements from stock drivers
• OS tuning focused on increasing threads, file handles,
etc.
• Jumbo frames help, particularly on the cluster network
• Flow control issues with 40Gbe network adapters
• Scan for failing (but perhaps not completely failed) disks
Designing for High Performance Ceph at Scale16

Designing for High Performance Ceph at Scale17
"Question" by alphageek is licensed under CC BY-NC-SA 2.0

Designing for High Performance Ceph at Scale18

Performance - Mons
• Mons are generally a glorified TFTP server and you can
get away with 1+2 for redundancy
• That is, until they aren’t….....
• In certain situations like cluster rebalancing or deleting a
pool with a lot of PG’s, a single CPU on *ALL* mons will
become jammed up. They start evicting each other and
meyhem ensues.
• How to fix this:
Presentation title (optional)19

Empfohlen

ceph-barcelona-v-1.2Ranga Swami Reddy Muthumula

Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas

Ceph on arm64 uploadCeph Community

MySQL on CephKyle Bader

Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu

Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt

QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry

BlueStore: a new, faster storage backend for CephSage Weil

Empfohlen

ceph-barcelona-v-1.2Ranga Swami Reddy Muthumula

Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas

Ceph on arm64 uploadCeph Community

MySQL on CephKyle Bader

Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu

Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt

QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry

BlueStore: a new, faster storage backend for CephSage Weil

Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil

Ceph Performance: Projects Leading up to JewelColleen Corrice

Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry

libradosPatrick McGarry

Red Hat Ceph Storage Roadmap: January 2016Red_Hat_Storage

Evaluation of RBD replication options @CERNCeph Community

SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée

2021.02 new in Ceph Pacific DashboardCeph Community

HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro

Ceph for Big Science - Dan van der SterCeph Community

Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli

Ceph Month 2021: RADOS UpdateCeph Community

Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice

Ceph on 64-bit ARM with X-GeneCeph Community

Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuCeph Community

Red Hat Storage Server Administration Deep DiveRed_Hat_Storage

Ceph Day Taipei - Bring Ceph to EnterpriseCeph Community

Hadoop over rgwzhouyuan

Red Hat Storage RoadmapColleen Corrice

Ceph Deployment at Target: Customer SpotlightColleen Corrice

Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Patrick McGarry

Which Hypervisor Is Best? My SQL on CephRed_Hat_Storage

Weitere ähnliche Inhalte

Was ist angesagt?

Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil

Ceph Performance: Projects Leading up to JewelColleen Corrice

Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry

libradosPatrick McGarry

Red Hat Ceph Storage Roadmap: January 2016Red_Hat_Storage

Evaluation of RBD replication options @CERNCeph Community

SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée

2021.02 new in Ceph Pacific DashboardCeph Community

HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro

Ceph for Big Science - Dan van der SterCeph Community

Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli

Ceph Month 2021: RADOS UpdateCeph Community

Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice

Ceph on 64-bit ARM with X-GeneCeph Community

Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuCeph Community

Red Hat Storage Server Administration Deep DiveRed_Hat_Storage

Ceph Day Taipei - Bring Ceph to EnterpriseCeph Community

Hadoop over rgwzhouyuan

Red Hat Storage RoadmapColleen Corrice

Ceph Deployment at Target: Customer SpotlightColleen Corrice

Was ist angesagt? (20)

Storage tiering and erasure coding in Ceph (SCaLE13x)

Ceph Performance: Projects Leading up to Jewel

Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud

librados

Red Hat Ceph Storage Roadmap: January 2016

Evaluation of RBD replication options @CERN

SUSE Storage: Sizing and Performance (Ceph)

2021.02 new in Ceph Pacific Dashboard

HKG15-401: Ceph and Software Defined Storage on ARM servers

Ceph for Big Science - Dan van der Ster

Quick-and-Easy Deployment of a Ceph Storage Cluster

Ceph Month 2021: RADOS Update

Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions

Ceph on 64-bit ARM with X-Gene

Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu

Red Hat Storage Server Administration Deep Dive

Ceph Day Taipei - Bring Ceph to Enterprise

Hadoop over rgw

Red Hat Storage Roadmap

Ceph Deployment at Target: Customer Spotlight

Andere mochten auch

Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Patrick McGarry

Which Hypervisor Is Best? My SQL on CephRed_Hat_Storage

Ceph Performance and Sizing GuideJose De La Rosa

Mellanox High Performance Networks for CephMellanox Technologies

BlueStore: a new, faster storage backend for CephSage Weil

Docker open manage_meetup_sep_2016Jose De La Rosa

Ceph - High Performance Without High CostsJonathan Long

Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage

Ceph@MIMOS: Growing Pains from R&D to DeploymentPatrick McGarry

BluestorePatrick McGarry

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community

ceph optimization on ssd ilsoo byun-shortNAVER D2

Ceph Performance on OpenStack - Barcelona SummitTakehiro Kudou

Cephfs jewel mds performance benchmarkXiaoxi Chen

Andere mochten auch (14)

Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...

Which Hypervisor Is Best? My SQL on Ceph

Ceph Performance and Sizing Guide

Mellanox High Performance Networks for Ceph

BlueStore: a new, faster storage backend for Ceph

Docker open manage_meetup_sep_2016

Ceph - High Performance Without High Costs

Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...

Ceph@MIMOS: Growing Pains from R&D to Deployment

Bluestore

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned

ceph optimization on ssd ilsoo byun-short

Ceph Performance on OpenStack - Barcelona Summit

Cephfs jewel mds performance benchmark

Ähnlich wie Designing High Performance Ceph at Scale

Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community

Shak larry-jeder-perf-and-tuning-summit14-part1-finalTommy Lee

Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...Haidee McMahon

VMworld 2013: Extreme Performance Series: Monster Virtual Machines VMworld

QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community

Building a High Performance Analytics PlatformSantanu Dey

Spark Summit EU talk by Berni SchieferSpark Summit

Oracle real application_clusterPrabhat gangwar

OpenPOWER Acceleration of HPCC SystemsHPCC Systems

VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld

POWER9 for AI & HPCinside-BigData.com

Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees

Recent Developments in DonardPMC-Sierra Inc.

Taking Splunk to the Next Level - Architecture Breakout SessionSplunk

Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community

Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB

Ambedded - how to build a true no single point of failure ceph cluster inwin stack

OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistParis Open Source Summit

Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Redis Labs

Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Community

Ähnlich wie Designing High Performance Ceph at Scale (20)

Ceph Community Talk on High-Performance Solid Sate Ceph

Shak larry-jeder-perf-and-tuning-summit14-part1-final

Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...

VMworld 2013: Extreme Performance Series: Monster Virtual Machines

QCT Ceph Solution - Design Consideration and Reference Architecture

Building a High Performance Analytics Platform

Spark Summit EU talk by Berni Schiefer

Oracle real application_cluster

OpenPOWER Acceleration of HPCC Systems

VMworld 2013: How SRP Delivers More Than Power to Their Customers

POWER9 for AI & HPC

Revolutionary Storage for Modern Databases, Applications and Infrastrcture

Recent Developments in Donard

Taking Splunk to the Next Level - Architecture Breakout Session

Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture

Optimizing Servers for High-Throughput and Low-Latency at Dropbox

Ambedded - how to build a true no single point of failure ceph cluster

OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist

Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...

Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server

Kürzlich hochgeladen

What is Artificial Intelligence?????????blackmambaettijean

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Advanced Computer Architecture – An IntroductionDilum Bandara

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Sample pptx for embedding into website for demoHarshalMandlekar2

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

How to write a Business Continuity PlanDatabarracks

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

unit 4 immunoblotting technique complete.pptxBkGupta21

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Kürzlich hochgeladen (20)

What is Artificial Intelligence?????????

The Ultimate Guide to Choosing WordPress Pros and Cons

Advanced Computer Architecture – An Introduction

Anypoint Exchange: It’s Not Just a Repo!

Ensuring Technical Readiness For Copilot in Microsoft 365

Sample pptx for embedding into website for demo

Take control of your SAP testing with UiPath Test Suite

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

"Debugging python applications inside k8s environment", Andrii Soldatenko

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Dev Dives: Streamline document processing with UiPath Studio Web

How to write a Business Continuity Plan

Nell’iperspazio con Rocket: il Framework Web di Rust!

DSPy a system for AI to Write Prompts and Do Fine Tuning

unit 4 immunoblotting technique complete.pptx

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Designing High Performance Ceph at Scale

1. Designing for High Performance Ceph at Scale April 26, 2016 James Saint-Rossy - Principal Storage Engineer, Comcast John Benton - Consulting Systems Engineer, WWT

2. Today’s Agenda • Our Lab/Production Environment • Holistic Architecture • Strategies for Benchmarking • Performance Bottlenecks/Lessons Learned • Tuning Tips and Tricks Designing for High Performance Ceph at Scale2

3. Our Typical Node Configuration Storage Node • 72 X 6 TB SATA 7.2K HDD’s • 3 X 1.6TB PCIe NVME’s (Journals) • 2 X Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (12 cores) • 256 GB of RAM • Dual Port 40Gbe NIC Mon/RGW Node • 2 x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz • 32 GB of Ram • Dual Port 10 Gbe NIC • ...Nothing Special 3 Designing for High Performance Ceph at Scale

4. Lab/Production Environment Layout Designing for High Performance Ceph at Scale4

5. Holistic Architecture Customer Requirements -IOPS/Read Write Mix/Object Size … -How Much Replication -Which APIs Cost -HW Cost/Support Cost/Operational Cost? Failure Domain -Servers/Racks/Servers/Rows Etc... Data Center Constraints -Space/Power/Thermal Operational Complexity -Complex Hardware Configs Designing for High Performance Ceph at Scale5

6. Holistic Architecture Cont’d Journals - Colocated? - SSD vs NVME? Designing for High Performance Ceph at Scale6

7. Strategies for Benchmarking Tools -Fio for block -Cosbench for object IOPS Isn’t Everything -1000 workers may give you 30% more iops but at the cost of 600% higher latency Verify Published Stats With Benchmarks -… Always Verify Scale-Out Designing for High Performance Ceph at Scale7

8. Performance - TCMalloc • As cluster size increased, %SYS was increasingly taxed • System profiling revealed up to 50% of CPU resources used by TCMalloc • This library can be tuned to have more memory. This was good for nearly a 50% increase Designing for High Performance Ceph at Scale8

9. Modern PC Architecture 9 Designing for High Performance Ceph at Scale

10. Performance - Inter-node data flow 10 Designing for High Performance Ceph at Scale

11. OSD Data Workflow 11 "complicated situation" by bandinisonfire is licensed under CC BY-NC-SA 2.0 Designing for High Performance Ceph at Scale

12. Performance - NUMA • The bigger and faster the data node, the bigger the bottleneck potential • We tuned several areas to avoid unnecessary trips across the QPI bus • To map everything you must: • Map CPU cores to sockets • Map PCIE devices to sockets • Map storage disks (and journals) to the associated HBA Designing for High Performance Ceph at Scale12

13. NUMA - IRQs Pin all soft IRQs for all IO devices to it’s associated NUMA node 13 Designing for High Performance Ceph at Scale

14. NUMA - Mount Points Align mount points so that the OSD and journal are on the same NUMA node 14 Designing for High Performance Ceph at Scale

15. NUMA - OSD Processes Pin OSD processes to the NUMA node associated with the storage it controls 15 Designing for High Performance Ceph at Scale

16. Performance - General Tips • Use latest vendor drivers. -We have seen 30% improvements from stock drivers • OS tuning focused on increasing threads, file handles, etc. • Jumbo frames help, particularly on the cluster network • Flow control issues with 40Gbe network adapters • Scan for failing (but perhaps not completely failed) disks Designing for High Performance Ceph at Scale16

17. Designing for High Performance Ceph at Scale17 "Question" by alphageek is licensed under CC BY-NC-SA 2.0

18. Designing for High Performance Ceph at Scale18

19. Performance - Mons • Mons are generally a glorified TFTP server and you can get away with 1+2 for redundancy • That is, until they aren’t…..... • In certain situations like cluster rebalancing or deleting a pool with a lot of PG’s, a single CPU on *ALL* mons will become jammed up. They start evicting each other and meyhem ensues. • How to fix this: Presentation title (optional)19