SlideShare ist ein Scribd-Unternehmen logo
1 von 21
The Four Horsemen of
Storage System Performance
Stephen Foskett
stephen@fosketts.net
@SFoskett
Blog.Fosketts.net
© Foskett Services
1
Stephen Foskett
is the organizer of Tech Field Day,
proprietor of Gestalt IT,
strangely interested in storage,
baseball believer,
all-around nerd, car nut,
Microsoft MVP and VMware vExpert,
former first-chair bass clarinet player and punk rock frontman,
obsessive about lightbulbs, lover of a good Manhattan,
watch blogger, Apple blogger, vegetarian blogger,
dad to three kids with anagram names,
grammar obsessive, avid reader,
King of the Andals and the First Men,
humanist, frequent traveler,
and (apparently) lover of his own voice
© Foskett Services 2
© Foskett Services 3
The Rule of Spindles
© Foskett Services 4
The Nature of Disks
• Disks are mechanical – heat,
vibration, rotation, seek
• Read/write heads can only
access a single spot on the
disk at once
• Sequential throughput is much
higher than random
© Foskett Services 5
Combining Spindles
• Spread data across drives to
overcome disk performance
limits
• RAID was invented for this
© Foskett Services 6
The Rule of Spindles
• Adding more spindles is
usually faster than adding
faster spindles
• Disks just can’t get much
faster
• Slower disks are becoming the
norm
© Foskett Services 7
Never Enough Cache
© Foskett Services 8
Overcoming the Limits of Spindles
• Solid state storage is much
faster than disks – RAM, flash,
etc
• Most modern storage systems
are tiered, with RAM, flash,
and disk
• Solid-state is more expensive,
but flash is getting cheaper all
the time
© Foskett Services 9
Five Uses for Disk Buffers
• Read cache - frequently-requested data is read
from memory rather than disk
• I/O-matching - slower disks and faster interfaces
work together
• Read-around (ahead or behind) pre-fetch cache
• Read-after-write - saving recently-written data to
serve later read requests
• Command queue – writes are reordered
© Foskett Services 10
Write-Through and Write-Back Cache
© Foskett Services 11
I/O As a Chain of Bottlenecks
© Foskett Services 12
The Chain of Command
• Storage isn’t just disks and
arrays; all that data has to go
somewhere
• Most I/O travels through five or
more busses or channels
between CPU and disk drive
© Foskett Services 13
The Bottle Neck
• How long will it take to fill or
empty a disk drive or array?
• Which is the slowest link?
• Can we bring storage closer to
compute?
© Foskett Services 14
A Chain of Bottlenecks
© Foskett Services 15
A Lack of Intelligence
© Foskett Services 16
The Stack of Lies
• We have lots of compute
power, but very little
communication through the I/O
stack
• Each layer simplifies for the
next
• Disks “know” nothing about
data (and neither do most
arrays)
© Foskett Services 17
De-Multiplex and Communicate
• Generally, more
communication through the
stack gives a better result
overall
• Removing the I/O blender will
help
© Foskett Services 18
Building Better Storage
© Foskett Services 19
Defeating the Four Horsemen
1. Understand the nature of
disks
2. Tier storage
3. Attack bottlenecks
4. Get integrated
© Foskett Services 20
Thank You!
Stephen Foskett
stephen@fosketts.net
@SFoskett
blog.fosketts.net
TechFieldDay.com
21

Weitere ähnliche Inhalte

Ähnlich wie The Four Horsemen of Storage System Performance

Ähnlich wie The Four Horsemen of Storage System Performance (20)

disk sechduling
disk sechdulingdisk sechduling
disk sechduling
 
Secondary Storage Devices
Secondary Storage DevicesSecondary Storage Devices
Secondary Storage Devices
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf
 
15 storage
15 storage15 storage
15 storage
 
Unit 4 DBMS.ppt
Unit 4 DBMS.pptUnit 4 DBMS.ppt
Unit 4 DBMS.ppt
 
virtula memory.ppt
virtula memory.pptvirtula memory.ppt
virtula memory.ppt
 
06 external memory
06 external memory06 external memory
06 external memory
 
Thiru
ThiruThiru
Thiru
 
Disk memory systems
Disk memory systemsDisk memory systems
Disk memory systems
 
internal_memory
internal_memoryinternal_memory
internal_memory
 
06_External Memory.ppt
06_External Memory.ppt06_External Memory.ppt
06_External Memory.ppt
 
Memorymapping.ppt
Memorymapping.pptMemorymapping.ppt
Memorymapping.ppt
 
ch11
ch11ch11
ch11
 
Class notesfeb27
Class notesfeb27Class notesfeb27
Class notesfeb27
 
DownloadClassSessionFile (44).pdf
DownloadClassSessionFile (44).pdfDownloadClassSessionFile (44).pdf
DownloadClassSessionFile (44).pdf
 
Ext4 write barrier
Ext4 write barrierExt4 write barrier
Ext4 write barrier
 
Mass Storage Devices
Mass Storage DevicesMass Storage Devices
Mass Storage Devices
 
Magnetic disk - Krishna Geetha.ppt
Magnetic disk  - Krishna Geetha.pptMagnetic disk  - Krishna Geetha.ppt
Magnetic disk - Krishna Geetha.ppt
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
L21-Introduction-to-IO.ppt
L21-Introduction-to-IO.pptL21-Introduction-to-IO.ppt
L21-Introduction-to-IO.ppt
 

Mehr von Stephen Foskett

What’s the Deal with Containers, Anyway?
What’s the Deal with Containers, Anyway?What’s the Deal with Containers, Anyway?
What’s the Deal with Containers, Anyway?Stephen Foskett
 
Storage for Virtual Environments 2011 R2
Storage for Virtual Environments 2011 R2Storage for Virtual Environments 2011 R2
Storage for Virtual Environments 2011 R2Stephen Foskett
 
"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011
"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011
"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011Stephen Foskett
 
State of the Art Thin Provisioning
State of the Art Thin ProvisioningState of the Art Thin Provisioning
State of the Art Thin ProvisioningStephen Foskett
 
Rearchitecting Storage for Server Virtualization
Rearchitecting Storage for Server VirtualizationRearchitecting Storage for Server Virtualization
Rearchitecting Storage for Server VirtualizationStephen Foskett
 
Eleven Essential Attributes For Email Archiving
Eleven Essential Attributes For Email ArchivingEleven Essential Attributes For Email Archiving
Eleven Essential Attributes For Email ArchivingStephen Foskett
 
Email Archiving Solutions Whats The Difference
Email Archiving Solutions Whats The DifferenceEmail Archiving Solutions Whats The Difference
Email Archiving Solutions Whats The DifferenceStephen Foskett
 
Deep Dive Into Email Archiving Products
Deep Dive Into Email Archiving ProductsDeep Dive Into Email Archiving Products
Deep Dive Into Email Archiving ProductsStephen Foskett
 
Storage Virtualization Introduction
Storage Virtualization IntroductionStorage Virtualization Introduction
Storage Virtualization IntroductionStephen Foskett
 
Extreme Tiered Storage Flash, Disk, And Cloud
Extreme Tiered Storage Flash, Disk, And CloudExtreme Tiered Storage Flash, Disk, And Cloud
Extreme Tiered Storage Flash, Disk, And CloudStephen Foskett
 
The Right Approach To Cloud Storage
The Right Approach To Cloud StorageThe Right Approach To Cloud Storage
The Right Approach To Cloud StorageStephen Foskett
 
Storage Decisions Nirvanix Introduction
Storage Decisions Nirvanix IntroductionStorage Decisions Nirvanix Introduction
Storage Decisions Nirvanix IntroductionStephen Foskett
 
Solve 3 Enterprise Storage Problems Today
Solve 3 Enterprise Storage Problems TodaySolve 3 Enterprise Storage Problems Today
Solve 3 Enterprise Storage Problems TodayStephen Foskett
 

Mehr von Stephen Foskett (17)

The Zen of Storage
The Zen of StorageThe Zen of Storage
The Zen of Storage
 
What’s the Deal with Containers, Anyway?
What’s the Deal with Containers, Anyway?What’s the Deal with Containers, Anyway?
What’s the Deal with Containers, Anyway?
 
Storage for Virtual Environments 2011 R2
Storage for Virtual Environments 2011 R2Storage for Virtual Environments 2011 R2
Storage for Virtual Environments 2011 R2
 
"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011
"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011
"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011
 
State of the Art Thin Provisioning
State of the Art Thin ProvisioningState of the Art Thin Provisioning
State of the Art Thin Provisioning
 
Rearchitecting Storage for Server Virtualization
Rearchitecting Storage for Server VirtualizationRearchitecting Storage for Server Virtualization
Rearchitecting Storage for Server Virtualization
 
Eleven Essential Attributes For Email Archiving
Eleven Essential Attributes For Email ArchivingEleven Essential Attributes For Email Archiving
Eleven Essential Attributes For Email Archiving
 
Email Archiving Solutions Whats The Difference
Email Archiving Solutions Whats The DifferenceEmail Archiving Solutions Whats The Difference
Email Archiving Solutions Whats The Difference
 
Storage School 1
Storage School 1Storage School 1
Storage School 1
 
Storage School 2
Storage School 2Storage School 2
Storage School 2
 
Deep Dive Into Email Archiving Products
Deep Dive Into Email Archiving ProductsDeep Dive Into Email Archiving Products
Deep Dive Into Email Archiving Products
 
Storage Virtualization Introduction
Storage Virtualization IntroductionStorage Virtualization Introduction
Storage Virtualization Introduction
 
Extreme Tiered Storage Flash, Disk, And Cloud
Extreme Tiered Storage Flash, Disk, And CloudExtreme Tiered Storage Flash, Disk, And Cloud
Extreme Tiered Storage Flash, Disk, And Cloud
 
The Right Approach To Cloud Storage
The Right Approach To Cloud StorageThe Right Approach To Cloud Storage
The Right Approach To Cloud Storage
 
Storage Decisions Nirvanix Introduction
Storage Decisions Nirvanix IntroductionStorage Decisions Nirvanix Introduction
Storage Decisions Nirvanix Introduction
 
Solve 3 Enterprise Storage Problems Today
Solve 3 Enterprise Storage Problems TodaySolve 3 Enterprise Storage Problems Today
Solve 3 Enterprise Storage Problems Today
 
Cloud Storage Benefits
Cloud Storage BenefitsCloud Storage Benefits
Cloud Storage Benefits
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

The Four Horsemen of Storage System Performance

  • 1. The Four Horsemen of Storage System Performance Stephen Foskett stephen@fosketts.net @SFoskett Blog.Fosketts.net © Foskett Services 1
  • 2. Stephen Foskett is the organizer of Tech Field Day, proprietor of Gestalt IT, strangely interested in storage, baseball believer, all-around nerd, car nut, Microsoft MVP and VMware vExpert, former first-chair bass clarinet player and punk rock frontman, obsessive about lightbulbs, lover of a good Manhattan, watch blogger, Apple blogger, vegetarian blogger, dad to three kids with anagram names, grammar obsessive, avid reader, King of the Andals and the First Men, humanist, frequent traveler, and (apparently) lover of his own voice © Foskett Services 2
  • 4. The Rule of Spindles © Foskett Services 4
  • 5. The Nature of Disks • Disks are mechanical – heat, vibration, rotation, seek • Read/write heads can only access a single spot on the disk at once • Sequential throughput is much higher than random © Foskett Services 5
  • 6. Combining Spindles • Spread data across drives to overcome disk performance limits • RAID was invented for this © Foskett Services 6
  • 7. The Rule of Spindles • Adding more spindles is usually faster than adding faster spindles • Disks just can’t get much faster • Slower disks are becoming the norm © Foskett Services 7
  • 8. Never Enough Cache © Foskett Services 8
  • 9. Overcoming the Limits of Spindles • Solid state storage is much faster than disks – RAM, flash, etc • Most modern storage systems are tiered, with RAM, flash, and disk • Solid-state is more expensive, but flash is getting cheaper all the time © Foskett Services 9
  • 10. Five Uses for Disk Buffers • Read cache - frequently-requested data is read from memory rather than disk • I/O-matching - slower disks and faster interfaces work together • Read-around (ahead or behind) pre-fetch cache • Read-after-write - saving recently-written data to serve later read requests • Command queue – writes are reordered © Foskett Services 10
  • 11. Write-Through and Write-Back Cache © Foskett Services 11
  • 12. I/O As a Chain of Bottlenecks © Foskett Services 12
  • 13. The Chain of Command • Storage isn’t just disks and arrays; all that data has to go somewhere • Most I/O travels through five or more busses or channels between CPU and disk drive © Foskett Services 13
  • 14. The Bottle Neck • How long will it take to fill or empty a disk drive or array? • Which is the slowest link? • Can we bring storage closer to compute? © Foskett Services 14
  • 15. A Chain of Bottlenecks © Foskett Services 15
  • 16. A Lack of Intelligence © Foskett Services 16
  • 17. The Stack of Lies • We have lots of compute power, but very little communication through the I/O stack • Each layer simplifies for the next • Disks “know” nothing about data (and neither do most arrays) © Foskett Services 17
  • 18. De-Multiplex and Communicate • Generally, more communication through the stack gives a better result overall • Removing the I/O blender will help © Foskett Services 18
  • 19. Building Better Storage © Foskett Services 19
  • 20. Defeating the Four Horsemen 1. Understand the nature of disks 2. Tier storage 3. Attack bottlenecks 4. Get integrated © Foskett Services 20

Hinweis der Redaktion

  1. Why do some data storage solutions perform better than others? What tradeoffs are made for economy and how do they affect the system as a whole? These questions can be puzzling, but there are core truths that are difficult to avoid. Mechanical disk drives can only move a certain amount of data. RAM caching can improve performance, but only until it runs out. I/O channels can be overwhelmed with data. And above all, a system must be smart to maximize the potential of these components. These are the four horsemen of storage system performance, and they cannot be denied.
  2. Hard disk drives are getting faster all the time, but they are mechanical objects subject to the laws of physics. They spin, their heads move to seek data, they heat up and are sensitive to shock. Storage industry insiders recognize the physicality of hard disk drives in the name we apply to them: Spindles. And there is no way to escape the bounds of a spindle. The performance of a hard disk drive is constrained by both its physical limitations and how we use it. Physically, a hard disk drive must spin its platters under a moving arm with a read/write head at the tip. This arm slides across the media, creating a two-dimensional map of data across the disk. Hard disk drives spin at a constant speed, so data at the edge passes under the head quicker than data at the center, creating a distinctive curve of performance. Although they are random-access devices, hard disk drives cannot access multiple locations at once. Although modern command queueing and processing allows the drive controller to optimize access, I/O operations are serialized before the drive can act on them. It takes a moment for the head to move (seek time) and the disk to spin (rotational latency) before data can be accessed, so sequential operations are much faster than random ones. Most operating systems lay data out sequentially, beginning at the edge of the disk and moving inward. Although modern file systems try to keep individual files contiguous and optimize placement to keep similar data together, seeking is inevitable. This is the nature of physical hard disk drives.
  3. Although they are quick, the mechanical limitations of hard disk drives makes them the first suspect in cases of poor storage performance. A single modern hard disk drive can easily read and write over 100 MB per second, with the fastest drives pushing twice that much data. But most applications do not make this sort of demand. Instead, they ask the drive to seek a certain piece of data, introducing latency and reducing average performance by orders of magnitude. Then there is the I/O blender of multitasking operating systems and virtualization. Just as each application requests data spread across a disk, multitasking operating systems allow multiple applications and process threads to request their own data at once. File system development has lagged behind the advent of multi-core and multi-thread CPUs, leading to frustrating slowdowns while the operating system waits for the hard disk drive. Virtualization magnifies this, allowing multiple operating systems running multiple applications with multiple threads to access storage all at once. The key innovation in enterprise storage, redundant arrays of independent disks or RAID, was designed to overcome the limits of disk spindles. In their seminal paper on RAID, Patterson, Gibson, and Katz focus on “the I/O crisis” caused by accelerating CPU and memory performance. They suggest five methods of combining spindles (now called RAID levels) to accelerate I/O performance to meet this challenge. Many of today’s storage system developments are outgrowths of this insight, allowing many more spindles to share the I/O load or optimizing it between different drive types.
  4. This is the rule of spindles: Adding more disk spindles is generally more effective than using faster spindles. Today’s storage systems often spread I/O across dozens of hard disk drives using concepts of stacked RAID, large sets, subdisk RAID, and wide striping. Faster spindles can certainly help performance, and this is evident when one examines the varying performance of midrange storage systems. Those that rely on large, slow drives are much slower than the same systems packed with smaller, quicker drives. But the rule of spindles cannot be ignored. Systems that spread data across more spindles, regardless of the capabilities of each individual disk, are bound to be quicker than those that use fewer drives.
  5. Perhaps the previous discussion of spindles left you exhausted, imagining a spindly-legged centipede of a storage system, trying and failing to run on stilts. The Rule of Spindles would be the end of the story were it not for the second horseman: Cache. He stands in front of the spindles, quickly dispatching requests using solid state memory rather than spinning disks. Cache also acts as a buffer, allowing writes to queue up without forcing the requesters to wait in line. Cache may be quick, but practical concerns limit its effectiveness. Solid state memory is available in many types, but all are far more expensive per gigabyte than magnetic hard disk media. DRAM has historically cost 400 times as much as disk capacity, and even NAND flash (the current darling of the industry) is more than 40 times as expensive. Practically speaking, this means that disk devices, from the drives themselves to large enterprise storage arrays, usually include a very small amount of cache relative to their total capacity. When specifying a storage system, the mathematics of cache and spindles adhere to a simple rule: More is better for performance but worse for the budget. This leads to a trade-off, where a point of diminishing return tells us to stop adding both spindles and cache and accepting the storage system as it is.
  6. Hard disk drives today normally contain a small amount of RAM to use as a buffer for I/O requests. This serves the following needs, though not all are found on all drives: A read cache, allowing frequently-requested data to be read from memory rather than involving mechanical disk operations An I/O-matching mechanism, allowing slower disks and faster interfaces to work together A read-around (ahead or behind) pre-fetch cache, saving a few blocks around any requested read on the assumption that they will also be requested soon A read-after-write cache, saving recently-written data to serve later read requests A command queue, allowing write commands to be reordered, avoiding the “elevator seeking” common to early hard disk drives Disk buffer size has expanded rapidly in recent years, with some devices including 64 MB or more or DRAM. Seagate’s Momentus XT drive even includes 4 GB of NAND flash as a massive read cache!
  7. The earliest systems used read-only or write-through caches. All I/O requests pass through the cache, which usually saves the most recent and serves them up when a read is requested. They don’t buffer write requests at all, simply passing them through to the storage system to process. They are safe, since the storage device always has a consistent set of committed writes, but they do nothing to offset the RAID penalty. Most modern storage systems use a write-back (also called “write-behind”) cache, which acknowledges writes before they are committed to disk. They use non-volatile RAM, battery-backed DRAM, or NAND flash to ensure that data is not lost in the event of a power outage. Though far more effective, this type of memory is also far more costly. Just about every modern storage array uses caching, and most employ the write-back method to accelerate writes as well as reads. Some have very smart controllers that perform other tricks, but Smart is another Horseman for another day. As mentioned before, RAID systems would be nearly unusable without write-back cache allowing the disks to catch up with random writes.
  8. It is tempting to think of storage as a game of hard disk drives, and consider only The Rule of Spindles. But RAM cache can compensate for the mechanical limitations of hard disk drives, and Moore’s Law continues to allow for ever-greater RAM-based storage, including cache, DRAM, and flash. But storage does not exist in a vacuum. All that data must go somewhere, and this is the job of the I/O channel. To be useful, storage capacity must connect to some sort of endpoint. This could be the CPU in a personal computer or an embedded processor in an industrial device. Indeed, there are endpoints and I/O channels throughout modern systems, with potential bottlenecks, caches, and smarts at each point. “Storage people” like me tend to think too small – imagining that the I/O channel ends at the disk drive, the “front end” of the array, or the storage network. But data must travel further, all the way to its final useful point in the core of the CPU. Once we consider I/O as a long chain of interconnected endpoints, we begin to see the fact that I/O constraints at any point can strangle overall system performance. This is not merely an academic exercise: Optimizing the I/O channel is a consuming passion for most practitioners of enterprise IT, including architects, engineers, and system developers. And, like a good game of Whack-a-Mole, increasing the speed of one link causes another chokepoint to rear its head.
  9. Most English speakers have encountered the French term, “cul de sac”, meaning “bottom of the bag” or dead end. But hard disk drives have plenty of “bottom end”, or storage capacity. When it comes to disks, the issue is usually at the neck of the bag: Data just can’t be pulled out of a hard disk drive fast enough. The density of modern hard disk drives (the capacity of our barrel) has been growing much more rapidly than the I/O channels serving them (the spigot). Where once a hard disk drive could be filled or emptied in an hour or two, modern drives take days or weeks! I once called this “flush time“, but I think the wine metaphor is much more appetizing! This “bottle neck” has serious implications beyond basic storage performance. Data protection is impacted, since ever-larger storage systems can no longer be backed up by dumping their content; system reliability is reduced, since week-long RAID rebuilds increase the risk of multiple drive failures; and cost containment efforts are also impacted, since adding spindles drives up prices. Nowhere is this bottleneck more evident than in portable devices. Modern drives (like the 1 TB Seagate USB drive I recently reviewed) have massive capacity and pathetic performance. The USB 2.0 interface just can’t keep up, and this creates a limit to the expansion of capacity. It would take half a day to fill that drive under perfect conditions at 25 MB/s, reducing its value as a massive data movement peripheral. The emerging USB 3.0 standard promises to alleviate this performance issue for now, as illustrated with Iomega’s new external SSD. Cache and solid state storage can help, but they have their own bottlenecks. Storage arrays typically use Fibre Channel or SAS SSDs, and their front-end interface remains the same. The best-performing SSDs use the PCI Express bus directly rather than emulating hard disk drives over SCSI interfaces. And even PCI Express might not be enough to handle the massive I/O of NAND flash or DRAM. In each case, the bottleneck moves down the chain.
  10. Let’s follow a typical I/O operation from the disk to the CPU core and count the I/O channels: A read head senses the state of a bit of magnetic material on the surface of a disk The head transmits this signal to a buffer on the disk controller board The data is picked up by the disk controller CPU and transmitted over a SATA or SAS connection The storage array or RAID controller receives the data and moves it over an internal bus to another buffer or cache The data is picked up by another CPU in the array controller and sent out another interface using Fibre Channel or Ethernet The data is buffered and retransmitted by one or more switches in the storage network The host bus adapter (HBA) on the server side receives the data and buffers it again before sending it over a local PCI Express bus to system memory The server memory controller pulls the data out of system memory and sends it via a local bus to the CPU core There are actually many more steps than this, but the picture should be clear by now. There are many, many I/O channels to consider when it comes to storage, and the drive interface is just one potential bottleneck.
  11. Disks can be made faster (and more added), solid-state storage and cache can be added, and I/O bottlenecks can be removed, but what then? How can storage performance keep up with Moore’s Law over the decades? The answer is intelligence: Storage systems must adapt and tune themselves to changing workloads. It’s far simpler to slap the label “intelligent” on the storage system than it is to add real smarts to the box. The biggest hurdle has always been a lack of communication between clients and applications (at the extreme top of the stack) and storage devices (at the extreme bottom). I’ve called virtualization “a stack of lies”, and in many ways that’s exactly what it is. At each point in the I/O chain, information is lost that would have helped a real intelligent storage array to make better decisions. Consider a very simple case: Your laptop. It probably contains a SATA hard disk drive connected to a basic controller on the PCIe bus addressed by the CPU. An operating system (probably Windows or Mac OS X) runs on the system, and it relies on a file system (NTFS or HFS+, respectively) to organize and access the hard disk drive. But it also has a volume manager (currently unnamed by Microsoft, though Apple internally calls theirs CoreStorage) that virtualizes storage and adds features like encryption and compression. The files seen by the operating system pass through “filter drivers”, then the file system (which chopped them into blocks), the volume manager (which organizes these blocks), the laptop’s SATA controller, the disk drive’s own controller (which decides where to place these blocks) and cache, and finally to the magnetic media. Even in this very simple scenario, the operating system literally has no idea where data is stored, and the disk literally has no idea what it is storing. But applications don’t really “care” about files. Each application has its own semantics for storage and retrieval of data, and the file is simply a universal and convenient metaphor for application data storage. Most applications use a proprietary container format which includes metadata and scratch data along with the actual content. The characteristic pattern of reads and writes to this subfile information varies widely by application. This is why a storage device that excels for video editing may be totally inappropriate for databases or e-mail storage. Enterprise servers add more layers of translation, with Fibre Channel HBA’s, network switches, redundant RAID controllers, and separate caches all performing their magic and discarding valuable meta-information. Many enterprise systems also include independent caching devices in the server, network, or as a gateway to the storage array. Everything in the stack is valuable in one way or another, adding reliability, recoverability, and performance. But the machinations of the stack obscure what goes on above, blocking the ability to add intelligence to the array. Higher-level applications and server virtualization further obfuscate the storage stack. An operating system may run only a small component of a large enterprise application, so related I/O may come from multiple directions at once. And each operating system may run on a virtual machine, with a hypervisor adding its own file system, volume manager, and storage abstractions. This so-called “I/O blender” purées and randomizes all storage access before it gets anywhere near the array.
  12. The only way truly to add intelligence to a storage system, from a lowly hard drive to high-end enterprise array, is to de-multiplex data and add a communications channel through the stack. If the array can untangle the randomized I/O coming from above, and can accept and act on information about that data stream, many things become possible. Data layout is an often-overlooked topic, but can have a massive impact on system performance. As we pointed out when discussing spindles, the physical placement of data on a disk can have a dramatic impact on I/O performance. But data placement is also critical for RAID systems and those that use automated tiered storage. Depending on system parameters, it may be better to keep data “together” or “apart” to improve performance, but this cannot be accomplished unless the array “knows” which I/O blocks belong together. As discussed previously, pre-fetch caching can be extremely valuable to accelerate I/O performance. But pre-fetching information is almost impossible on the wrong side of the I/O blender. If an array could de-multiplex the data stream and tag each access by application, pre-fetch algorithms could be much more effective. An array could even work with a cache in the network or the server to pre-fill buffers with the data that would be needed next. A storage system that intelligently manages caches all through the I/O chain is something of a Holy Grail in enterprise storage. Time and again, pundits and system architects have suggested moving data closer to the CPU to improve performance. At the same time, others recommend maintaining a distance to improve manageability, availability, and flexibility. Intelligently managing a set of caches in multiple locations is the ideal solution, but the inherent obfuscation of the current I/O paradigm makes this extremely difficult.