SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Intel QLC: Cost-effective Ceph on NVMe
Ceph Month 06/11/2021
Anthony D’Atri, Solutions Architect anthony.datri@intel.com
Yuyang Sun, Product Marketing Manager yuyang.sun@intel.com
2
Ceph Month June 2021
Legal Disclaimers
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or
component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of
others.
3
Ceph Month June 2021
§ SSDs are too expensive
§ SSDs are too small
§ QLC is too slow and DWPD
are too low
§ HDDs are more reliable
SSD vs HDD:
The Reality
I’d like to use SSDs for Ceph
OSDs but they can’t compete
with HDDs
4
Ceph Month June 2021
§ SSDs are too expensive
§ SSDs are too small
§ QLC is too slow and DWPD
are too low
§ HDDs are more reliable
SSD vs HDD:
The Reality
The Myth
I’d like to use SSDs for Ceph
OSDs but they can’t compete
with HDDs
5
Ceph Month June 2021
§ Competitive now; subtle factors
beyond calculators1
§ HDDs may be short-stroked or
capacity restricted: interface
bottleneck and recovery time
§ HDDs run out of IOPS before
capacity: extra drives are required
to meet IOPS needs
§ Expand clusters faster than data
inflow: priceless!
Cost
TCO crossover soon … or
today!
See appendix for footnotes.
6
Ceph Month June 2021
§ TB/chassis, TB/RU, TB/watt, OpEx,
racks, cost of RMA2/crushing
failed drives
§ Cluster maintenance without
prolonged and risky reduced
redundancy.
§ How much does degraded user/
customer experience cost?
Especially during recovery?
Cost
TCO crossover soon … or
today!
See appendix for footnotes.
7
Ceph Month June 2021
• 144-layer QLC NAND enables
high-capacity devices
• Intel® NVMe QLC SSD is
available in capacities up to
30TB3
• Up to 1.5PB raw per RU with
E1.L EDSFF drives4
• Abundance of IOPS allows
flexible capacity provisioning
Capacity
Large capacity: fewer chassis, RUs,
and racks
See appendix for footnotes.
8
Ceph Month June 2021
§ Intel® SSD D5-P5316 NVMe QLC
delivers up to 800K 4KB random read
IOPS, 38% increase gen over gen3
§ Up to 7000 MB/s sequential read, 2x+
gen over gen3
§ SATA saturates at ~550 MB/s5
§ PCIe Gen 4 NVMe crushes the SATA
bottleneck
§ Two or more OSDs per device improve
throughput, IOPS, and tail latency6
Performance
Fast and wide
See appendix for footnotes. Results may vary.
10
Ceph Month June 2021
§ RGW is prone to hotspots and QoS
events
§ One strategy to mitigate latency and
IOPS bottlenecks is to cap HDD size, eg.
at 8TB
§ Adjustment of scrub intervals, a CDN
front end, and load balancer throttling
can help, but OSD upweighting a single
HDD still can take weeks.
§ OSD crashes can impact API availability
§ Replacing HDDs with Intel QLC SSDs for
bucket data can markedly improve QoS
and serviceability
Performance
Operational Advantages
11
Ceph Month June 2021
§ Most SSD failures are firmware
– and fixable in-situ7
§ 99% of SSDs never exceed
15% of rated endurance7,8
§ One RGW deployment projects
seven years of endurance using
previous gen Intel QLC
§ Current gen provides even
more
Reliability and
Endurance
Better than you think, and
more than you need!
See appendix for footnotes.
12
Ceph Month June 2021
§ 30TB Intel® SSD D5-P5316
QLC SSD rated at ≥ 22PB of
IU-aligned random writes9
§ 1DWPD 7.68T TLC SSD rated
at <15PB of 4K random
writes9
§ Tunable endurance via
overprovisioning13
Reliability and
Endurance
Get with the program
[erase cycle]
See appendix for footnotes.
13
Ceph Month June 2021
§ 8TB HDD 0.44% AFR spec, 1-
2% actual9
§ Intel DC QLC NAND SSD
AFR <0.44%9
§ Greater temperature range9
§ Better UBER9
§ Cost to have hands replace a
failed drive? To RMA?
Reliability and
OpEx
Drive failures cost money
and QoS
See appendix for footnotes.
14
Ceph Month June 2021
Intel® QLC SSD
delivers up to 104
PBW, significantly
outperforming HDDs
2.75 2.75
14.016
22.93
56.71
104.55
0
20
40
60
80
100
120
Western
Digital
Ultrastar DC
HC650 20TB
Seagate Exos
X18 18 TB
Intel® SSD D7-
P5510 7.38
TB (64K
random write)
Intel® SSD D5-
P5316 30.72
TB (64K
random write)
Intel® SSD D5-
P5316 24.58
TB (64K
random write)
[20% OP]
Intel® SSD D5-
P5316 30.72
TB (64K
sequential
writes)
HDD and SSD endurance in Petabytes Written
(PBW)
(higher is better)
HDD only allows 2.75PB of combined read / write IO before
exceeding the AFR target.
See appendix for sources 8, 9, 11, 12. Results may vary.
15
Ceph Month June 2021
§ bluestore_min_alloc_size=16
k|64k
§ Writes aligned to IU multiples
enhance performance and
endurance
§ Metadata is small percent of
overall workload
Optimize endurance
and performance
Align to IU size
16
Ceph Month June 2021
§ RGW: large objects
§ RBD: Backup, Archive, Media
§ CephFS: 4MB block size,
mostly used for larger files
§ Metadata, RocksDB are small
fraction of overall write
workload
Example
use cases
17
Ceph Month June 2021
§ RocksDB block size aligned to IU
§ RocksDB universal compaction
§ Other RocksDB tuning
§ Optane acceleration of WAL+DB,
write shaping
§ Crimson, RocksDB successor
§ Separate pools for large/small
objects. EC & replication, QLC & TLC.
Internal RGW enhancement? Lua
script to change storage class?
Additional
optimizations
To be explored, because
better is still better:
18
Ceph Month June 2021
Appendix
1. https://www.snia.org/forums/cmsi/ssd-endurance
2. Author’s professional experience: RMA cost not worth the effort for devices worth < USD 500
3. https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/Intel-D5-P5316_product_Brief-728323.pdf
https://www.intel.com/content/www/us/en/products/docs/memory-storage/solid-state-drives/data-center-ssds/d5-p5316-series-brief
4. https://echostreams.com/products/flachesan2n108m-un
https://www.supermicro.com/en/products/system/1U/1029/SSG-1029P-NES32R.cfm
5. https://www.isunshare.com/computer/why-the-max-sata-3-speed-is-550mbs-usually.html
6. https://ceph.io/community/part-4-rhcs-3-2-bluestore-advanced-performance-investigation
7. https://searchstorage.techtarget.com/post/Monitoring-the-Health-of-NVMe-SSDs
https://searchstorage.techtarget.com/tip/4-causes-of-SSD-failure-and-how-to-deal-with-them
8. https://www.usenix.org/system/files/fast20-maneas.pdf
9. https://www.intel.com/content/dam/www/central-libraries/us/en/documents/qlc-nand-ready-for-data-center-white-paper.pdf
10. https://searchstorage.techtarget.com/post/Monitoring-the-Health-of-NVMe-SSDs
https://searchstorage.techtarget.com/tip/4-causes-of-SSD-failure-and-how-to-deal-with-them
11. https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-
series/data-sheet-ultrastar-dc-hc650.pdf
12. https://www.seagate.com/files/www-content/datasheets/pdfs/exos-x18-channel-DS2045-1-2007GB-en_SG.pdf
13. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/over-provisioning-nand-based-ssds-better-endurance-whitepaper.pdf
19

Weitere ähnliche Inhalte

Was ist angesagt?

Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
ShapeBlue
 

Was ist angesagt? (20)

Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Ceph
CephCeph
Ceph
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 

Ähnlich wie Intel QLC: Cost-effective Ceph on NVMe

Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red_Hat_Storage
 
Intel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIeIntel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIe
Low Hong Chuan
 

Ähnlich wie Intel QLC: Cost-effective Ceph on NVMe (20)

Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
Deep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server PlatformsDeep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server Platforms
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPC
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teams
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Improve performance and minimize latency for IO- intensive apps by pairing In...
Improve performance and minimize latency for IO- intensive apps by pairing In...Improve performance and minimize latency for IO- intensive apps by pairing In...
Improve performance and minimize latency for IO- intensive apps by pairing In...
 
Intel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIeIntel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIe
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Intel QLC: Cost-effective Ceph on NVMe

  • 1. Intel QLC: Cost-effective Ceph on NVMe Ceph Month 06/11/2021 Anthony D’Atri, Solutions Architect anthony.datri@intel.com Yuyang Sun, Product Marketing Manager yuyang.sun@intel.com
  • 2. 2 Ceph Month June 2021 Legal Disclaimers Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
  • 3. 3 Ceph Month June 2021 § SSDs are too expensive § SSDs are too small § QLC is too slow and DWPD are too low § HDDs are more reliable SSD vs HDD: The Reality I’d like to use SSDs for Ceph OSDs but they can’t compete with HDDs
  • 4. 4 Ceph Month June 2021 § SSDs are too expensive § SSDs are too small § QLC is too slow and DWPD are too low § HDDs are more reliable SSD vs HDD: The Reality The Myth I’d like to use SSDs for Ceph OSDs but they can’t compete with HDDs
  • 5. 5 Ceph Month June 2021 § Competitive now; subtle factors beyond calculators1 § HDDs may be short-stroked or capacity restricted: interface bottleneck and recovery time § HDDs run out of IOPS before capacity: extra drives are required to meet IOPS needs § Expand clusters faster than data inflow: priceless! Cost TCO crossover soon … or today! See appendix for footnotes.
  • 6. 6 Ceph Month June 2021 § TB/chassis, TB/RU, TB/watt, OpEx, racks, cost of RMA2/crushing failed drives § Cluster maintenance without prolonged and risky reduced redundancy. § How much does degraded user/ customer experience cost? Especially during recovery? Cost TCO crossover soon … or today! See appendix for footnotes.
  • 7. 7 Ceph Month June 2021 • 144-layer QLC NAND enables high-capacity devices • Intel® NVMe QLC SSD is available in capacities up to 30TB3 • Up to 1.5PB raw per RU with E1.L EDSFF drives4 • Abundance of IOPS allows flexible capacity provisioning Capacity Large capacity: fewer chassis, RUs, and racks See appendix for footnotes.
  • 8. 8 Ceph Month June 2021 § Intel® SSD D5-P5316 NVMe QLC delivers up to 800K 4KB random read IOPS, 38% increase gen over gen3 § Up to 7000 MB/s sequential read, 2x+ gen over gen3 § SATA saturates at ~550 MB/s5 § PCIe Gen 4 NVMe crushes the SATA bottleneck § Two or more OSDs per device improve throughput, IOPS, and tail latency6 Performance Fast and wide See appendix for footnotes. Results may vary.
  • 9. 10 Ceph Month June 2021 § RGW is prone to hotspots and QoS events § One strategy to mitigate latency and IOPS bottlenecks is to cap HDD size, eg. at 8TB § Adjustment of scrub intervals, a CDN front end, and load balancer throttling can help, but OSD upweighting a single HDD still can take weeks. § OSD crashes can impact API availability § Replacing HDDs with Intel QLC SSDs for bucket data can markedly improve QoS and serviceability Performance Operational Advantages
  • 10. 11 Ceph Month June 2021 § Most SSD failures are firmware – and fixable in-situ7 § 99% of SSDs never exceed 15% of rated endurance7,8 § One RGW deployment projects seven years of endurance using previous gen Intel QLC § Current gen provides even more Reliability and Endurance Better than you think, and more than you need! See appendix for footnotes.
  • 11. 12 Ceph Month June 2021 § 30TB Intel® SSD D5-P5316 QLC SSD rated at ≥ 22PB of IU-aligned random writes9 § 1DWPD 7.68T TLC SSD rated at <15PB of 4K random writes9 § Tunable endurance via overprovisioning13 Reliability and Endurance Get with the program [erase cycle] See appendix for footnotes.
  • 12. 13 Ceph Month June 2021 § 8TB HDD 0.44% AFR spec, 1- 2% actual9 § Intel DC QLC NAND SSD AFR <0.44%9 § Greater temperature range9 § Better UBER9 § Cost to have hands replace a failed drive? To RMA? Reliability and OpEx Drive failures cost money and QoS See appendix for footnotes.
  • 13. 14 Ceph Month June 2021 Intel® QLC SSD delivers up to 104 PBW, significantly outperforming HDDs 2.75 2.75 14.016 22.93 56.71 104.55 0 20 40 60 80 100 120 Western Digital Ultrastar DC HC650 20TB Seagate Exos X18 18 TB Intel® SSD D7- P5510 7.38 TB (64K random write) Intel® SSD D5- P5316 30.72 TB (64K random write) Intel® SSD D5- P5316 24.58 TB (64K random write) [20% OP] Intel® SSD D5- P5316 30.72 TB (64K sequential writes) HDD and SSD endurance in Petabytes Written (PBW) (higher is better) HDD only allows 2.75PB of combined read / write IO before exceeding the AFR target. See appendix for sources 8, 9, 11, 12. Results may vary.
  • 14. 15 Ceph Month June 2021 § bluestore_min_alloc_size=16 k|64k § Writes aligned to IU multiples enhance performance and endurance § Metadata is small percent of overall workload Optimize endurance and performance Align to IU size
  • 15. 16 Ceph Month June 2021 § RGW: large objects § RBD: Backup, Archive, Media § CephFS: 4MB block size, mostly used for larger files § Metadata, RocksDB are small fraction of overall write workload Example use cases
  • 16. 17 Ceph Month June 2021 § RocksDB block size aligned to IU § RocksDB universal compaction § Other RocksDB tuning § Optane acceleration of WAL+DB, write shaping § Crimson, RocksDB successor § Separate pools for large/small objects. EC & replication, QLC & TLC. Internal RGW enhancement? Lua script to change storage class? Additional optimizations To be explored, because better is still better:
  • 17. 18 Ceph Month June 2021 Appendix 1. https://www.snia.org/forums/cmsi/ssd-endurance 2. Author’s professional experience: RMA cost not worth the effort for devices worth < USD 500 3. https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/Intel-D5-P5316_product_Brief-728323.pdf https://www.intel.com/content/www/us/en/products/docs/memory-storage/solid-state-drives/data-center-ssds/d5-p5316-series-brief 4. https://echostreams.com/products/flachesan2n108m-un https://www.supermicro.com/en/products/system/1U/1029/SSG-1029P-NES32R.cfm 5. https://www.isunshare.com/computer/why-the-max-sata-3-speed-is-550mbs-usually.html 6. https://ceph.io/community/part-4-rhcs-3-2-bluestore-advanced-performance-investigation 7. https://searchstorage.techtarget.com/post/Monitoring-the-Health-of-NVMe-SSDs https://searchstorage.techtarget.com/tip/4-causes-of-SSD-failure-and-how-to-deal-with-them 8. https://www.usenix.org/system/files/fast20-maneas.pdf 9. https://www.intel.com/content/dam/www/central-libraries/us/en/documents/qlc-nand-ready-for-data-center-white-paper.pdf 10. https://searchstorage.techtarget.com/post/Monitoring-the-Health-of-NVMe-SSDs https://searchstorage.techtarget.com/tip/4-causes-of-SSD-failure-and-how-to-deal-with-them 11. https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600- series/data-sheet-ultrastar-dc-hc650.pdf 12. https://www.seagate.com/files/www-content/datasheets/pdfs/exos-x18-channel-DS2045-1-2007GB-en_SG.pdf 13. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/over-provisioning-nand-based-ssds-better-endurance-whitepaper.pdf
  • 18. 19