The Four Horsemen of Storage System Performance

•Als PPTX, PDF herunterladen•

1 gefällt mir•925 views

Why do some data storage solutions perform better than others? What tradeoffs are made for economy and how do they affect the system as a whole? These questions can be puzzling, but there are core truths that are difficult to avoid. Mechanical disk drives can only move a certain amount of data. RAM caching can improve performance, but only until it runs out. I/O channels can be overwhelmed with data. And above all, a system must be smart to maximize the potential of these components. These are the four horsemen of storage system performance, and they cannot be denied.

Technologie

The Four Horsemen of
Storage System Performance
Stephen Foskett
stephen@fosketts.net
@SFoskett
Blog.Fosketts.net
© Foskett Services
1

Stephen Foskett
is the organizer of Tech Field Day,
proprietor of Gestalt IT,
strangely interested in storage,
baseball believer,
all-around nerd, car nut,
Microsoft MVP and VMware vExpert,
former first-chair bass clarinet player and punk rock frontman,
obsessive about lightbulbs, lover of a good Manhattan,
watch blogger, Apple blogger, vegetarian blogger,
dad to three kids with anagram names,
grammar obsessive, avid reader,
King of the Andals and the First Men,
humanist, frequent traveler,
and (apparently) lover of his own voice
© Foskett Services 2

The Rule of Spindles
© Foskett Services 4

The Nature of Disks
• Disks are mechanical – heat,
vibration, rotation, seek
• Read/write heads can only
access a single spot on the
disk at once
• Sequential throughput is much
higher than random
© Foskett Services 5

Combining Spindles
• Spread data across drives to
overcome disk performance
limits
• RAID was invented for this
© Foskett Services 6

The Rule of Spindles
• Adding more spindles is
usually faster than adding
faster spindles
• Disks just can’t get much
faster
• Slower disks are becoming the
norm
© Foskett Services 7

Overcoming the Limits of Spindles
• Solid state storage is much
faster than disks – RAM, flash,
etc
• Most modern storage systems
are tiered, with RAM, flash,
and disk
• Solid-state is more expensive,
but flash is getting cheaper all
the time
© Foskett Services 9

Five Uses for Disk Buffers
• Read cache - frequently-requested data is read
from memory rather than disk
• I/O-matching - slower disks and faster interfaces
work together
• Read-around (ahead or behind) pre-fetch cache
• Read-after-write - saving recently-written data to
serve later read requests
• Command queue – writes are reordered
© Foskett Services 10

Write-Through and Write-Back Cache
© Foskett Services 11

I/O As a Chain of Bottlenecks
© Foskett Services 12

The Chain of Command
• Storage isn’t just disks and
arrays; all that data has to go
somewhere
• Most I/O travels through five or
more busses or channels
between CPU and disk drive
© Foskett Services 13

The Bottle Neck
• How long will it take to fill or
empty a disk drive or array?
• Which is the slowest link?
• Can we bring storage closer to
compute?
© Foskett Services 14

A Chain of Bottlenecks
© Foskett Services 15

A Lack of Intelligence
© Foskett Services 16

The Stack of Lies
• We have lots of compute
power, but very little
communication through the I/O
stack
• Each layer simplifies for the
next
• Disks “know” nothing about
data (and neither do most
arrays)
© Foskett Services 17

De-Multiplex and Communicate
• Generally, more
communication through the
stack gives a better result
overall
• Removing the I/O blender will
help
© Foskett Services 18

Building Better Storage
© Foskett Services 19

Defeating the Four Horsemen
1. Understand the nature of
disks
2. Tier storage
3. Attack bottlenecks
4. Get integrated
© Foskett Services 20

Thank You!
Stephen Foskett
stephen@fosketts.net
@SFoskett
blog.fosketts.net
TechFieldDay.com
21

Weitere ähnliche Inhalte

Ähnlich wie The Four Horsemen of Storage System Performance

disk sechdulinggopi7

Secondary Storage DevicesWe Learn - A Continuous Learning Forum from Welingkar's Distance Learning Program.

19IS305_U4_LP10_LM10-22-23.pdfJESUNPK

15 storageArun Kumar M

Unit 4 DBMS.pptHARRSHITHAASCSE

virtula memory.pptRAHULsingh156889

06 external memorydilip kumar

ThiruThiru Selvi

Disk memory systemsAbhishek Rajpoot

internal_memorylimyamahgoub

06_External Memory.pptsaurabhpawar98

Memorymapping.pptJeevanathanRavi

ch11KITE www.kitecolleges.com

Class notesfeb27Indian Oil Corporation

DownloadClassSessionFile (44).pdfHanaBurhan1

Ext4 write barrierSomdutta Roy

Mass Storage DevicesPrabu U

Magnetic disk - Krishna Geetha.pptComputerScienceDepar6

Computer Memory Hierarchy Computer ArchitectureHaris456

L21-Introduction-to-IO.pptsanaiftikhar23

Ähnlich wie The Four Horsemen of Storage System Performance (20)

disk sechduling

Secondary Storage Devices

19IS305_U4_LP10_LM10-22-23.pdf

15 storage

Unit 4 DBMS.ppt

virtula memory.ppt

06 external memory

Thiru

Disk memory systems

internal_memory

06_External Memory.ppt

Memorymapping.ppt

ch11

Class notesfeb27

DownloadClassSessionFile (44).pdf

Ext4 write barrier

Mass Storage Devices

Magnetic disk - Krishna Geetha.ppt

Computer Memory Hierarchy Computer Architecture

L21-Introduction-to-IO.ppt

Mehr von Stephen Foskett

The Zen of StorageStephen Foskett

What’s the Deal with Containers, Anyway?Stephen Foskett

Storage for Virtual Environments 2011 R2Stephen Foskett

"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011Stephen Foskett

State of the Art Thin ProvisioningStephen Foskett

Rearchitecting Storage for Server VirtualizationStephen Foskett

Eleven Essential Attributes For Email ArchivingStephen Foskett

Email Archiving Solutions Whats The DifferenceStephen Foskett

Storage School 1Stephen Foskett

Storage School 2Stephen Foskett

Deep Dive Into Email Archiving ProductsStephen Foskett

Storage Virtualization IntroductionStephen Foskett

Extreme Tiered Storage Flash, Disk, And CloudStephen Foskett

The Right Approach To Cloud StorageStephen Foskett

Storage Decisions Nirvanix IntroductionStephen Foskett

Solve 3 Enterprise Storage Problems TodayStephen Foskett

Cloud Storage BenefitsStephen Foskett

Mehr von Stephen Foskett (17)

The Zen of Storage

What’s the Deal with Containers, Anyway?

Storage for Virtual Environments 2011 R2

"FCoE vs. iSCSI - Making the Choice" from Interop Las Vegas 2011

State of the Art Thin Provisioning

Rearchitecting Storage for Server Virtualization

Eleven Essential Attributes For Email Archiving

Email Archiving Solutions Whats The Difference

Storage School 1

Storage School 2

Deep Dive Into Email Archiving Products

Storage Virtualization Introduction

Extreme Tiered Storage Flash, Disk, And Cloud

The Right Approach To Cloud Storage

Storage Decisions Nirvanix Introduction

Solve 3 Enterprise Storage Problems Today

Cloud Storage Benefits

Kürzlich hochgeladen

GenCyber Cyber Security Day PresentationMichael W. Hawkins

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Breaking the Kubernetes Kill Chain: Host Path Mount

Salesforce Community Group Quito, Salesforce 101

Maximizing Board Effectiveness 2024 Webinar.pptx

Scaling API-first – The story of a global engineering organization

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

CNv6 Instructor Chapter 6 Quality of Service

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Handwritten Text Recognition for manuscripts and early printed texts

SQL Database Design For Developers at php[tek] 2024

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Presentation on how to chat with PDF using ChatGPT code interpreter

The Four Horsemen of Storage System Performance

1. The Four Horsemen of Storage System Performance Stephen Foskett stephen@fosketts.net @SFoskett Blog.Fosketts.net © Foskett Services 1

2. Stephen Foskett is the organizer of Tech Field Day, proprietor of Gestalt IT, strangely interested in storage, baseball believer, all-around nerd, car nut, Microsoft MVP and VMware vExpert, former first-chair bass clarinet player and punk rock frontman, obsessive about lightbulbs, lover of a good Manhattan, watch blogger, Apple blogger, vegetarian blogger, dad to three kids with anagram names, grammar obsessive, avid reader, King of the Andals and the First Men, humanist, frequent traveler, and (apparently) lover of his own voice © Foskett Services 2

5. The Nature of Disks • Disks are mechanical – heat, vibration, rotation, seek • Read/write heads can only access a single spot on the disk at once • Sequential throughput is much higher than random © Foskett Services 5

6. Combining Spindles • Spread data across drives to overcome disk performance limits • RAID was invented for this © Foskett Services 6

7. The Rule of Spindles • Adding more spindles is usually faster than adding faster spindles • Disks just can’t get much faster • Slower disks are becoming the norm © Foskett Services 7

9. Overcoming the Limits of Spindles • Solid state storage is much faster than disks – RAM, flash, etc • Most modern storage systems are tiered, with RAM, flash, and disk • Solid-state is more expensive, but flash is getting cheaper all the time © Foskett Services 9

10. Five Uses for Disk Buffers • Read cache - frequently-requested data is read from memory rather than disk • I/O-matching - slower disks and faster interfaces work together • Read-around (ahead or behind) pre-fetch cache • Read-after-write - saving recently-written data to serve later read requests • Command queue – writes are reordered © Foskett Services 10

13. The Chain of Command • Storage isn’t just disks and arrays; all that data has to go somewhere • Most I/O travels through five or more busses or channels between CPU and disk drive © Foskett Services 13

14. The Bottle Neck • How long will it take to fill or empty a disk drive or array? • Which is the slowest link? • Can we bring storage closer to compute? © Foskett Services 14

17. The Stack of Lies • We have lots of compute power, but very little communication through the I/O stack • Each layer simplifies for the next • Disks “know” nothing about data (and neither do most arrays) © Foskett Services 17

18. De-Multiplex and Communicate • Generally, more communication through the stack gives a better result overall • Removing the I/O blender will help © Foskett Services 18

20. Defeating the Four Horsemen 1. Understand the nature of disks 2. Tier storage 3. Attack bottlenecks 4. Get integrated © Foskett Services 20

21. Thank You! Stephen Foskett stephen@fosketts.net @SFoskett blog.fosketts.net TechFieldDay.com 21

Hinweis der Redaktion

Why do some data storage solutions perform better than others? What tradeoffs are made for economy and how do they affect the system as a whole? These questions can be puzzling, but there are core truths that are difficult to avoid. Mechanical disk drives can only move a certain amount of data. RAM caching can improve performance, but only until it runs out. I/O channels can be overwhelmed with data. And above all, a system must be smart to maximize the potential of these components. These are the four horsemen of storage system performance, and they cannot be denied.
Hard disk drives are getting faster all the time, but they are mechanical objects subject to the laws of physics. They spin, their heads move to seek data, they heat up and are sensitive to shock. Storage industry insiders recognize the physicality of hard disk drives in the name we apply to them: Spindles. And there is no way to escape the bounds of a spindle. The performance of a hard disk drive is constrained by both its physical limitations and how we use it. Physically, a hard disk drive must spin its platters under a moving arm with a read/write head at the tip. This arm slides across the media, creating a two-dimensional map of data across the disk. Hard disk drives spin at a constant speed, so data at the edge passes under the head quicker than data at the center, creating a distinctive curve of performance. Although they are random-access devices, hard disk drives cannot access multiple locations at once. Although modern command queueing and processing allows the drive controller to optimize access, I/O operations are serialized before the drive can act on them. It takes a moment for the head to move (seek time) and the disk to spin (rotational latency) before data can be accessed, so sequential operations are much faster than random ones. Most operating systems lay data out sequentially, beginning at the edge of the disk and moving inward. Although modern file systems try to keep individual files contiguous and optimize placement to keep similar data together, seeking is inevitable. This is the nature of physical hard disk drives.
Although they are quick, the mechanical limitations of hard disk drives makes them the first suspect in cases of poor storage performance. A single modern hard disk drive can easily read and write over 100 MB per second, with the fastest drives pushing twice that much data. But most applications do not make this sort of demand. Instead, they ask the drive to seek a certain piece of data, introducing latency and reducing average performance by orders of magnitude. Then there is the I/O blender of multitasking operating systems and virtualization. Just as each application requests data spread across a disk, multitasking operating systems allow multiple applications and process threads to request their own data at once. File system development has lagged behind the advent of multi-core and multi-thread CPUs, leading to frustrating slowdowns while the operating system waits for the hard disk drive. Virtualization magnifies this, allowing multiple operating systems running multiple applications with multiple threads to access storage all at once. The key innovation in enterprise storage, redundant arrays of independent disks or RAID, was designed to overcome the limits of disk spindles. In their seminal paper on RAID, Patterson, Gibson, and Katz focus on “the I/O crisis” caused by accelerating CPU and memory performance. They suggest five methods of combining spindles (now called RAID levels) to accelerate I/O performance to meet this challenge. Many of today’s storage system developments are outgrowths of this insight, allowing many more spindles to share the I/O load or optimizing it between different drive types.
This is the rule of spindles: Adding more disk spindles is generally more effective than using faster spindles. Today’s storage systems often spread I/O across dozens of hard disk drives using concepts of stacked RAID, large sets, subdisk RAID, and wide striping. Faster spindles can certainly help performance, and this is evident when one examines the varying performance of midrange storage systems. Those that rely on large, slow drives are much slower than the same systems packed with smaller, quicker drives. But the rule of spindles cannot be ignored. Systems that spread data across more spindles, regardless of the capabilities of each individual disk, are bound to be quicker than those that use fewer drives.
Perhaps the previous discussion of spindles left you exhausted, imagining a spindly-legged centipede of a storage system, trying and failing to run on stilts. The Rule of Spindles would be the end of the story were it not for the second horseman: Cache. He stands in front of the spindles, quickly dispatching requests using solid state memory rather than spinning disks. Cache also acts as a buffer, allowing writes to queue up without forcing the requesters to wait in line. Cache may be quick, but practical concerns limit its effectiveness. Solid state memory is available in many types, but all are far more expensive per gigabyte than magnetic hard disk media. DRAM has historically cost 400 times as much as disk capacity, and even NAND flash (the current darling of the industry) is more than 40 times as expensive. Practically speaking, this means that disk devices, from the drives themselves to large enterprise storage arrays, usually include a very small amount of cache relative to their total capacity. When specifying a storage system, the mathematics of cache and spindles adhere to a simple rule: More is better for performance but worse for the budget. This leads to a trade-off, where a point of diminishing return tells us to stop adding both spindles and cache and accepting the storage system as it is.
Hard disk drives today normally contain a small amount of RAM to use as a buffer for I/O requests. This serves the following needs, though not all are found on all drives: A read cache, allowing frequently-requested data to be read from memory rather than involving mechanical disk operations An I/O-matching mechanism, allowing slower disks and faster interfaces to work together A read-around (ahead or behind) pre-fetch cache, saving a few blocks around any requested read on the assumption that they will also be requested soon A read-after-write cache, saving recently-written data to serve later read requests A command queue, allowing write commands to be reordered, avoiding the “elevator seeking” common to early hard disk drives Disk buffer size has expanded rapidly in recent years, with some devices including 64 MB or more or DRAM. Seagate’s Momentus XT drive even includes 4 GB of NAND flash as a massive read cache!
The earliest systems used read-only or write-through caches. All I/O requests pass through the cache, which usually saves the most recent and serves them up when a read is requested. They don’t buffer write requests at all, simply passing them through to the storage system to process. They are safe, since the storage device always has a consistent set of committed writes, but they do nothing to offset the RAID penalty. Most modern storage systems use a write-back (also called “write-behind”) cache, which acknowledges writes before they are committed to disk. They use non-volatile RAM, battery-backed DRAM, or NAND flash to ensure that data is not lost in the event of a power outage. Though far more effective, this type of memory is also far more costly. Just about every modern storage array uses caching, and most employ the write-back method to accelerate writes as well as reads. Some have very smart controllers that perform other tricks, but Smart is another Horseman for another day. As mentioned before, RAID systems would be nearly unusable without write-back cache allowing the disks to catch up with random writes.
It is tempting to think of storage as a game of hard disk drives, and consider only The Rule of Spindles. But RAM cache can compensate for the mechanical limitations of hard disk drives, and Moore’s Law continues to allow for ever-greater RAM-based storage, including cache, DRAM, and flash. But storage does not exist in a vacuum. All that data must go somewhere, and this is the job of the I/O channel. To be useful, storage capacity must connect to some sort of endpoint. This could be the CPU in a personal computer or an embedded processor in an industrial device. Indeed, there are endpoints and I/O channels throughout modern systems, with potential bottlenecks, caches, and smarts at each point. “Storage people” like me tend to think too small – imagining that the I/O channel ends at the disk drive, the “front end” of the array, or the storage network. But data must travel further, all the way to its final useful point in the core of the CPU. Once we consider I/O as a long chain of interconnected endpoints, we begin to see the fact that I/O constraints at any point can strangle overall system performance. This is not merely an academic exercise: Optimizing the I/O channel is a consuming passion for most practitioners of enterprise IT, including architects, engineers, and system developers. And, like a good game of Whack-a-Mole, increasing the speed of one link causes another chokepoint to rear its head.
Most English speakers have encountered the French term, “cul de sac”, meaning “bottom of the bag” or dead end. But hard disk drives have plenty of “bottom end”, or storage capacity. When it comes to disks, the issue is usually at the neck of the bag: Data just can’t be pulled out of a hard disk drive fast enough. The density of modern hard disk drives (the capacity of our barrel) has been growing much more rapidly than the I/O channels serving them (the spigot). Where once a hard disk drive could be filled or emptied in an hour or two, modern drives take days or weeks! I once called this “flush time“, but I think the wine metaphor is much more appetizing! This “bottle neck” has serious implications beyond basic storage performance. Data protection is impacted, since ever-larger storage systems can no longer be backed up by dumping their content; system reliability is reduced, since week-long RAID rebuilds increase the risk of multiple drive failures; and cost containment efforts are also impacted, since adding spindles drives up prices. Nowhere is this bottleneck more evident than in portable devices. Modern drives (like the 1 TB Seagate USB drive I recently reviewed) have massive capacity and pathetic performance. The USB 2.0 interface just can’t keep up, and this creates a limit to the expansion of capacity. It would take half a day to fill that drive under perfect conditions at 25 MB/s, reducing its value as a massive data movement peripheral. The emerging USB 3.0 standard promises to alleviate this performance issue for now, as illustrated with Iomega’s new external SSD. Cache and solid state storage can help, but they have their own bottlenecks. Storage arrays typically use Fibre Channel or SAS SSDs, and their front-end interface remains the same. The best-performing SSDs use the PCI Express bus directly rather than emulating hard disk drives over SCSI interfaces. And even PCI Express might not be enough to handle the massive I/O of NAND flash or DRAM. In each case, the bottleneck moves down the chain.
Let’s follow a typical I/O operation from the disk to the CPU core and count the I/O channels: A read head senses the state of a bit of magnetic material on the surface of a disk The head transmits this signal to a buffer on the disk controller board The data is picked up by the disk controller CPU and transmitted over a SATA or SAS connection The storage array or RAID controller receives the data and moves it over an internal bus to another buffer or cache The data is picked up by another CPU in the array controller and sent out another interface using Fibre Channel or Ethernet The data is buffered and retransmitted by one or more switches in the storage network The host bus adapter (HBA) on the server side receives the data and buffers it again before sending it over a local PCI Express bus to system memory The server memory controller pulls the data out of system memory and sends it via a local bus to the CPU core There are actually many more steps than this, but the picture should be clear by now. There are many, many I/O channels to consider when it comes to storage, and the drive interface is just one potential bottleneck.
Disks can be made faster (and more added), solid-state storage and cache can be added, and I/O bottlenecks can be removed, but what then? How can storage performance keep up with Moore’s Law over the decades? The answer is intelligence: Storage systems must adapt and tune themselves to changing workloads. It’s far simpler to slap the label “intelligent” on the storage system than it is to add real smarts to the box. The biggest hurdle has always been a lack of communication between clients and applications (at the extreme top of the stack) and storage devices (at the extreme bottom). I’ve called virtualization “a stack of lies”, and in many ways that’s exactly what it is. At each point in the I/O chain, information is lost that would have helped a real intelligent storage array to make better decisions. Consider a very simple case: Your laptop. It probably contains a SATA hard disk drive connected to a basic controller on the PCIe bus addressed by the CPU. An operating system (probably Windows or Mac OS X) runs on the system, and it relies on a file system (NTFS or HFS+, respectively) to organize and access the hard disk drive. But it also has a volume manager (currently unnamed by Microsoft, though Apple internally calls theirs CoreStorage) that virtualizes storage and adds features like encryption and compression. The files seen by the operating system pass through “filter drivers”, then the file system (which chopped them into blocks), the volume manager (which organizes these blocks), the laptop’s SATA controller, the disk drive’s own controller (which decides where to place these blocks) and cache, and finally to the magnetic media. Even in this very simple scenario, the operating system literally has no idea where data is stored, and the disk literally has no idea what it is storing. But applications don’t really “care” about files. Each application has its own semantics for storage and retrieval of data, and the file is simply a universal and convenient metaphor for application data storage. Most applications use a proprietary container format which includes metadata and scratch data along with the actual content. The characteristic pattern of reads and writes to this subfile information varies widely by application. This is why a storage device that excels for video editing may be totally inappropriate for databases or e-mail storage. Enterprise servers add more layers of translation, with Fibre Channel HBA’s, network switches, redundant RAID controllers, and separate caches all performing their magic and discarding valuable meta-information. Many enterprise systems also include independent caching devices in the server, network, or as a gateway to the storage array. Everything in the stack is valuable in one way or another, adding reliability, recoverability, and performance. But the machinations of the stack obscure what goes on above, blocking the ability to add intelligence to the array. Higher-level applications and server virtualization further obfuscate the storage stack. An operating system may run only a small component of a large enterprise application, so related I/O may come from multiple directions at once. And each operating system may run on a virtual machine, with a hypervisor adding its own file system, volume manager, and storage abstractions. This so-called “I/O blender” purÃ©es and randomizes all storage access before it gets anywhere near the array.
The only way truly to add intelligence to a storage system, from a lowly hard drive to high-end enterprise array, is to de-multiplex data and add a communications channel through the stack. If the array can untangle the randomized I/O coming from above, and can accept and act on information about that data stream, many things become possible. Data layout is an often-overlooked topic, but can have a massive impact on system performance. As we pointed out when discussing spindles, the physical placement of data on a disk can have a dramatic impact on I/O performance. But data placement is also critical for RAID systems and those that use automated tiered storage. Depending on system parameters, it may be better to keep data “together” or “apart” to improve performance, but this cannot be accomplished unless the array “knows” which I/O blocks belong together. As discussed previously, pre-fetch caching can be extremely valuable to accelerate I/O performance. But pre-fetching information is almost impossible on the wrong side of the I/O blender. If an array could de-multiplex the data stream and tag each access by application, pre-fetch algorithms could be much more effective. An array could even work with a cache in the network or the server to pre-fill buffers with the data that would be needed next. A storage system that intelligently manages caches all through the I/O chain is something of a Holy Grail in enterprise storage. Time and again, pundits and system architects have suggested moving data closer to the CPU to improve performance. At the same time, others recommend maintaining a distance to improve manageability, availability, and flexibility. Intelligently managing a set of caches in multiple locations is the ideal solution, but the inherent obfuscation of the current I/O paradigm makes this extremely difficult.

The Four Horsemen of Storage System Performance

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie The Four Horsemen of Storage System Performance

Ähnlich wie The Four Horsemen of Storage System Performance (20)

Mehr von Stephen Foskett

Mehr von Stephen Foskett (17)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The Four Horsemen of Storage System Performance

Hinweis der Redaktion