SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
White
Paper
Unstructured Data Efficiency and Cost
Savings in Virtualized Server
Environments
By Terri McClure, Senior Analyst
July 2013
This ESG White Paper was commissioned by Hitachi Data Systems (HDS)
and is distributed under license from ESG.
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 2
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Contents
Overview.......................................................................................................................................................3
Virtualization’s Impact on the Storage Environment ...................................................................................3
The Shift toward NAS for Virtualized Environments............................................................................................... 3
Storage Challenges in Virtualized Environments .................................................................................................... 4
Consolidation: Driving Efficiency in Virtualized Environments ....................................................................6
Automated Storage Tiering and Migration............................................................................................................. 7
Primary Deduplication............................................................................................................................................ 8
Efficient Data Protection ........................................................................................................................................ 9
The Bigger Truth .........................................................................................................................................10
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The
Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are
subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of
this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the
express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and,
if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 3
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Overview
Virtual server initiatives are straining already overwhelmed file storage environments. Unstructured data is growing
faster than ever, thanks largely to the proliferation of endpoint capture devices and advances in hardware and
software that allow bigger, richer files to be created. Virtualization is exacerbating the problem because many
organizations are underpinning their virtualized environments for both server and desktop virtualization with file
servers, since it can be much easier to manage the virtual server storage environment using NAS. This further
accelerates the growth rate of unstructured data.
This paper looks at the challenges associated with managing unstructured data in virtualized environments at scale,
and how to get unstructured data under control through file server consolidation. It provides guidelines to help
organizations understand what to look for in their consolidated file storage environments in order to make them as
efficient as possible through deduplication, tiering, and migration while efficiently keeping data protected and
meeting SLAs.
Virtualization’s Impact on the Storage Environment
Server virtualization—in other words, using software to divide a single physical server into multiple isolated virtual
environments—is driving significant technology and process change across storage, disaster recovery, and
management environments in enterprise organizations and small/medium size businesses alike. Server
virtualization technology is driving demand for networked storage solutions due to the net increase in storage
capacity requirements brought about by server virtualization initiatives. More importantly, the ability to realize
many of the key benefits of server virtualization—such as the mobility of virtual machines between physical servers
for load balancing, high availability, and maximum utilization of resources—fundamentally requires an underlying
networked storage infrastructure.
But supporting a virtual server environment introduces a number of storage challenges. First, with multiple virtual
machines hosted on a single physical server, chances are good that the associated applications have differing
storage policies. This can lead to some pretty complex storage provisioning exercises as storage is logically mapped
and provisioned to each virtual machine. And then there is the performance aspect. The storage infrastructure
must provide predictable performance scalability for the wide variety of mixed application workloads the virtual
machines will drive, with a variety of I/O patterns—for example small, large, sequential, or random operations.
Consider that virtual server data protection methods—which are often radically different than traditional physical
server methods—need to be designed and tested. And consider the implications of supporting backup and recovery
on a single physical machine that supports multiple virtual machines—kicking off backup for one physical server can
spike CPU usage and starve the other machines of resources. And when routine maintenance is performed, instead
of impacting a single application environment, multiple application environments are affected. ESG has seen
instances in which ten or twenty (and in a few edge cases even more), virtual machines share a single physical
server, all of which would need to be taken down or moved just to perform routine maintenance. This is really
where the importance of networked storage comes in: keeping applications available during everything from
routine maintenance to disaster handling by enabling virtual machines to move from physical server to physical
server without losing access to data.
The Shift toward NAS for Virtualized Environments
In fact, many storage challenges associated with server virtualization can be mitigated by leveraging networked
attached storage technologies. At their core, virtual machine and desktop images are files. Storing image files on
NAS systems simplifies image management significantly: It removes multiple layers of storage management
required in a block-based environment.
Take the example of provisioning capacity in a Fibre Channel SAN environment. For a Fibre Channel SAN, a storage
administrator needs to carve out and assign LUNs to each virtual machine hosted in the physical server; establish
and manage switch ports and zones; map HBAs; set up multi-pathing; and cross-mount the appropriate LUNs and
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 4
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
zones to multiple physical servers to allow for virtual machine portability. There is more to the process, but
describing everything involved would make this a much longer and more technical paper. The point is: That’s a
pretty complex and error-prone manual process. In these types of environments, all the mapping and zoning is
typically tracked in spreadsheets. It can become an even more complex, time-consuming, and error-prone task as
more virtual servers come online or as storage capacity is added and the environment needs to scale. Each time
capacity is added, the whole process needs to be repeated. And when you consider the implications of each virtual
machine having different protection requirements and performance characteristics, figuring out what LUNs are
supporting which virtual machine to ensure appropriate timeliness of snapshots or to perform load balancing can
become nearly impossible, especially at scale.
In an NFS environment, once a file system is exported to a virtual machine and mounted, it travels with the virtual
machine across physical servers, maintaining the relationship. And to add capacity, file system sizes can be
expanded on the fly, with no downtime. And because users are managing information,—a file system for each
virtual machine rather than a collection of HBAs, LUNs, and worldwide names—overall management is simplified.
So, provisioning capacity is much simpler when you treat VMDK files as files!
When it comes to NFS for data protection, the snapshot and remote replication capabilities of file systems are often
used for improved recoverability, space efficiency, and speed (more on that later). With networked storage,
multiple copies of virtual machines can be quickly created, efficiently stored and accessed for replication and
disaster recovery purposes, and used to more efficiently perform bare-metal restores. To alleviate the issue with
backing up the virtual machine, the backup load can be shifted from the physical server to the file server, leveraging
snapshot copies to meet recovery point and time objectives.
Storage Challenges in Virtualized Environments
When ESG surveyed just over 400 North American IT professionals concerning their organizations’ current data
storage environments, including current storage resources, challenges, purchase criteria, and forward-looking data
storage plans, participants were asked about their “significant” storage challenges related to their virtual server
environments. Almost half, or 43%, of participants indicated that the capital cost of a new storage infrastructure is
a significant challenge, and more than one in four (28%) cited operational cost of storage related to server
virtualization as a significant challenge (see Figure 1).1
1
Source: ESG Research Report, 2012 Storage Market Survey, November 2012.
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 5
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Figure 1. Storage Challenges Stemming from Server Virtualization Usage
Source: Enterprise Strategy Group, 2013.
In that same survey, respondents were asked about their biggest storage challenges in general, and their primary
storage challenge in particular. Rapid growth and management of unstructured data was cited by 40% of
respondents as a challenge and as the primary challenge by 15% of respondents. Data protection was close behind,
with 39% of respondents citing it as a challenge and 11% citing it as their primary challenge. Also in the top five
responses (out of 19 possible) were hardware costs, running out of physical space, and supporting a growing virtual
server environment (see Figure 2).2
The influx of unstructured data associated with virtualized environments is
certain to continue to strain the unstructured data storage environment as IT organizations struggle to scale and
meet these varied and unpredictable workload requirements.
2
Ibid.
5%
14%
19%
22%
24%
28%
29%
36%
42%
43%
0% 10% 20% 30% 40% 50%
We have not encountered any challenges
Lack of scalability
Sizing IOPS requirements to support virtual server
environments
Poor application response times
Impact on overall volume of storage capacity
Operational cost of new storage infrastructure
Limited I/O bandwidth, especially when workload spikes
occur
Sizing true capacity (storage) required to support virtual
server environment
Disaster recovery strategy
Capital cost of new storage infrastructure
From a storage infrastructure perspective, which of the following would you consider to
be significant challenges related to your organization’s server virtualization usage?
(Percent of respondents, N=418, multiple responses accepted)
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 6
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Figure 2.Top Ten Storage Environment Challenges, by 2012 Storage Budget
Source: Enterprise Strategy Group, 2013.
Of course, all of this brings up a question for users: How do I rein in unstructured data growth, cost-effectively
protect my data, and reduce my overall footprint while still maintaining service levels for my virtual server
environment? Undertaking a comprehensive file server consolidation exercise can be an answer—but only if it is
built on the right core principals.
Consolidation: Driving Efficiency in Virtualized Environments
Consolidation is the process of identifying and eliminating legacy storage silos that are the result of the way IT has
managed data growth to date, and then putting in place best practices for managing the storage environment in a
holistic manner that reduces the overall physical footprint (and costs) of data.
Before diving in to consolidation, let’s look at how we’ve arrived here. Why is rapid growth and management of
unstructured data the top storage challenge for almost half of IT organizations surveyed? This unrelenting increase
of data stems from natural application growth and from the new workloads being generated by social media; web
2.0 applications; and the creation of video, audio, photos, and similar content. Endpoint capture devices have
proliferated hugely: A smartphone is in almost everyone’s pocket. A tablet computer (business as much as
personal) is in many people’s laps. The ability to create and consume content requires nothing more than the press
of a button. Websites and barcode readers collect more data each second—data that organizations slice and dice to
identify what their customers need, or more accurately, what their customers will buy.
Big data is everywhere, and the rampant copying of data sets for analytics is only one reason for it. Other data-
growth culprits include snapshots and remote replication to increase uptime and availability, and programs or
initiatives to improve data protection and regulatory compliance. Those are good things, of course, but they
certainly accelerate overall data growth rates.
17%
19%
19%
20%
25%
25%
25%
39%
39%
40%
5%
6%
5%
5%
4%
5%
7%
10%
11%
15%
0% 10% 20% 30% 40% 50%
Discovery, analysis and reporting of storage…
Lack of skilled staff resources
Management, optimization & automation of…
Staff costs
Data migration
Running out of physical space
Hardware costs
Data protection (e.g., backup/recovery, etc.)
Rapid growth and management of unstructured…
In general, what would you say are your organization’s biggest challenges in terms of its
storage environment? Which would you characterize as the primary storage challenge for
your organization? (Percent of respondents, N=418)
Primary
storage
challenge
All storage
challenges
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 7
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Historically, the most common way to address the growth problem has been to toss even more storage capacity at
it:
 You want copies for testing and development? Here’s a server and some storage.
 You’d like offsite replication? We’ll build another infrastructure stack.
 You need backup? We’ll build another.
 You need an application server? We’ll carve out a VM for you, and somehow we’ll find the storage to
provision for your virtual machine image and the data it is going to need.
That strategy results in ever-expanding, unsharable silos of storage that are usually poorly utilized. They cost more
to buy; they take up more data center floor space; they use more energy to power and cool; and they require more
staff to manage. All these things are pretty much the opposite of efficiency, which is what most IT organizations are
after; yet too often, it was easier to continue to pour money into a suboptimal solution than “bite the bullet” and
make things right for the longer term. But in this era of changing consumption models, throwing capacity at
everything just won’t work.3
It’s also an ineffective way to spend money.
The first step in a comprehensive file storage consolidation strategy is to identify and eliminate these silos. This is
not an easy task—in fact, many organizations attempt this effort and at the end of the day, they just create bigger
silos, albeit fewer of them. But without the right underlying technology, this is only a Band-Aid that will provide
short term relief—the inefficient silo problem still exists, and IT organizations pay more than they need to for their
storage from both a CAPEX and OPEX standpoint. The underlying technology in any comprehensive consolidation
strategy must be seamless, scalable, and efficient in order to truly eliminate silos all together, but it also needs to
support sufficient performance to maintain SLAs in unpredictable virtualized environments. It can’t trade off
performance for efficiency because too many workloads could be affected. That means seamless tiering, both
within (as the classic definition of tiering has been) and between (which is required to eliminate silos) systems. It
also means efficient deduplication of primary data without a major performance hit/impact, and the ability to
maintain performance as the environment scales. But most importantly, to maintain SLAs in virtual server
environments, it means tight integration into the tools of those environments. Hitachi Data Systems offers such
technology and can help IT organizations accomplish this.
Automated Storage Tiering and Migration
Automated storage tiering has been the topic of much discussion in the industry. Typically, when vendors discuss
this capability, they mean tiering within an array and using some combination of flash or solid-state storage for
highly active data with (possibly) some serial-attached SCSI (SAS) drives and (likely) the bulk of data on slower
rotating, high capacity, nearline SAS (NL-SAS) drives. This makes sense as most data is only active within 30 days of
creation, and afterwards is retained yet rarely accessed (this is often called long tail data). In a traditional single tier
architecture, this would mean buying an array full of SAS drives to support the active data, and storing long tail data
on the same expensive drives. Even worse, it means buying a high performance, highly available tier, one flash array
to support the highly active data, and storing the long tail data on that same tier-1 system—but more on that later.
Using a small amount of solid-state storage for active data, with a tier of high capacity, slower rotating (hence less
power-consuming) NL-SAS disks is a highly effective way to reduce the overall storage footprint as well as cut power
and cooling costs.
In virtualized environments, where performance is typically a bit write heavy (due to the way virtual servers cache
data and stage I/Os) with lots of random I/O, accessing many VMDKs creates a lot of metadata activity. In fact,
metadata operations can be very disk intensive and can make up as much as half of all file operations.
Automatically moving metadata to a flash or SSD tier, as HDS does, can significantly improve performance in virtual
server environments by speeding metadata lookups.
3
A portion of the text in the previous three paragraphs of this section is from the ESG White Paper, Hitachi Data Systems Storage: Improving
Storage Efficiency Can Be a Catalyst for IT Innovation, June 2013.
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 8
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Even with automated storage tiering within storage systems, IT organizations still find themselves with the
challenge of having the bulk of their expensive, tier-1 storage arrays taken up by long tail data. The need is for
automated tiering, based on user-defined policies, between storage systems. This is rarely discussed by storage
vendors because (a) storage vendors like to sell lots of tier-1 storage and when it fills up, the IT organizations need
to buy more, and (b) most storage vendors just don’t have a good story when it comes to automatically migrating
data off of tier-1 arrays and onto secondary or tertiary tiers. But HDS does.
HDS offers intelligent file tiering that allows IT organizations to search across the file environment and set policies
that will trigger automated migration between arrays or even to a cloud tier such as Amazon S3. IT can set policies
based on parameters like age, activity, or content type.
Think of the power such functionality could have in virtual desktop environments, where users are creating many
versions of documents that are rarely, if ever, accessed after 30 days. Highly performing, highly available tier -1
storage systems need to be deployed to meet the demands of virtual desktop environments. Moving user
documents off of tier-1 storage as they age or their activity tails off allows IT organizations to reclaim tier-1 storage
capacity to service active use cases. HDS claims users can reclaim up to 60% of primary storage capacity via
automated migration in the virtual desktop use case.
Primary Deduplication
Deduplication is the process of identifying duplicate data if it is written to the file system and storing it just once,
instead of every time the same data is written. In most cases, a “virtual” file is created that just has pointers to the
original copy of the data. Deduplication has largely been deployed in backup environments to reduce storage
capacity associated with keeping backup data, which is by nature highly duplicative. Deduplication can be
performed at the source file system (which requires server CPU and can drain performance in volatile virtualized
environments), inline as data is written (which often drains performance because the process happens during the
write, which cannot be committed until the operation is complete), or in a post process (which is often a scheduled,
batch-oriented process done off hours) in which the pointers are created. Space needs to be reserved to perform
the deduplication process, and the space that the duplicate data resided in needs to be reclaimed after the
deduplication process completes. Many IT organizations are hesitant to use deduplication in primary storage
environments because of the overhead associated with identifying duplicate data and the negative impact that may
have on the system’s file serving performance.
HDS has developed deduplication technology that mitigates much of the associated overhead and makes it viable to
use deduplication in a primary storage environment. Hitachi NAS hardware acceleration, inherent with its “Hybrid-
core” architecture, helps calculate secure hash algorithm (SHA-256) values to speed dedupe comparisons without
interfering with file sharing workflow (whether through NFS or SMB/CIFS). It also has intelligence that knows when
new data is added and automatically starts up to four parallel deduplication engines if needed to eliminate
redundant data. When file serving load reaches 50% of available IOPS, the deduplication engines throttle back to
prevent impacting user performance, then automatically resumes when the system is less busy. This unique and
patented approach to deduplication enables customers to enjoy the benefits of increased capacity efficiency and
reduced total cost of ownership provided by deduplication without compromising performance or scalability.
The HDS approach features data in-place deduplication. Data is stored as it normally would be. The deduplication
process then combs through that data, eliminating redundancy. Data in-place deduplication eliminates the need to
set aside capacity to be used as temporary deduplication “workspace,” minimizes the space needed to track
deduplicated data, and delivers greater ROI.
Deduplication can be as highly effective in virtual server environments as it is in backup environments because
virtual machines often have many of the same files, such as operating system images. In virtualized environments
(server and desktop), IT organizations can see as much as 90% capacity reduction through the use of deduplication.
Deduplication provides a big “bang for the buck” and offers one of the best ways to reduce the overall storage
footprint. HDS makes it a viable choice for primary storage.
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 9
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Efficient Data Protection
Data protection can really pile on to storage management challenges, and the challenges are magnified in
virtualized environments. To manage copies efficiently, bidirectional block-level replication technologies that can
utilize the deduplicated storage pool should be used. By doing that, only the unique data elements are transmitted
to the other appropriate repositories. In this case, efficiency happens in terms of how little space is consumed
(regardless of the number of copies) and how little network throughput is saturated (due to smarter discernment of
what should be replicated). But it requires a management tier that understands all of the storage assets across the
enterprise, such as HDS does.
In virtualized environments, it is important that snapshots are performed at the VM level (as opposed to the LUN,
file, or file system level, in which IT administrators could risk cloning the wrong LUN or file thinking it was
associated with the VM). This level of granularity is not only efficient, but also effective. It allows for rapid virtual
machine and application cloning, with no additional scrubbing operations to get up and running. A highly efficient
approach, such as that taken by HDS, only stores pointers to the original data, and only unique data is added to the
clone. HDS supports a highly scalable model with up to 100 million snapshots per file system and 100 million clones
per file system.
An effective data protection strategy in a virtualized environment must be tightly integrated into the virtual
environment management tools to ensure the storage administrator and virtualization administrator are working in
concert, rather than at odds. Hitachi NAS Virtual Infrastructure Integrator (Virtual V2I) is a VMware vCenter plugin
plus associated software that addresses virtual machine backup and recovery and cloning services. It allows users to
create storage-based snapshots at intervals ranging from hours between backups to minutes resulting in improved
recovery point objectives. Because restores are pointer-based, recovery time can be near instantaneous (a matter
of seconds) regardless of size. Virtual V2I allows users to schedule and monitor VM backups to ensure they have an
application-consistent recoverable environment.
Leveraging space-saving snapshot and clone technology can significantly reduce the storage and network overhead
associated with data protection and copy management. But it isn’t just about data protection. Having an efficient
copy management engine can speed test and development as well as provisioning. In a dynamic virtual server
world, where new servers can be spun up easily and quickly, speeding provisioning or the deployment of new
applications or patches can provide businesses the high-tech edge they need to stay ahead of the pack in an
increasingly competitive world.
White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 10
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
The Bigger Truth
Over the last decade, almost all areas of IT have been forced to adapt to transformations. Server virtualization is
now ubiquitous. Leading-edge IT organizations are now beginning to realize a much broader spectrum of benefits
from server virtualization initiatives, such as expanding virtualization to the next tier of applications, automating
manual tasks, and streamlining access to IT resources. All of these advantages, in turn, drive hard savings, such as
reduced OPEX and CAPEX (from deferred procurement as well as waste reduction), and soft savings from simplified
management, reduced downtime, and performance gains.
Server virtualization has spawned a need for change in other areas of IT infrastructure, perhaps most significantly in
storage. As noted in Figure 1, the biggest storage challenge associated with server virtualization among respondent
organizations is the capital cost of the storage infrastructure to support it. Storage costs can quickly eat away at any
CAPEX and OPEX savings achieved from virtualization initiatives. As we’ve observed for the past decade, server
virtualization accelerates storage growth. But we are only just beginning to see the impact of desktop virtualization
on storage, and the emerging picture does not bode well for storage administrators. When ESG surveyed storage
administrators that said desktop virtualization presented a storage challenge, 77% of them said that desktop
virtualization significantly increased storage capacity requirements, and 51% said it had a negative impact on
performance.
Taking a holistic view and consolidating the storage environment can help mitigate the storage costs associated
with supporting virtualized environments. But consolidation alone is not enough. For many storage vendors,
consolidation just means putting everything on a tier-1 storage system that tiers internally. A truly efficient
consolidation strategy ensures data is stored on the right tier (within a system to meet performance needs, or on a
separate long term archive tier for long tail data) at the right costs at the right time. And it means storing only one
copy of data, while creating space efficient copies to use as a basis for backup and restore operations. Combined,
this can significantly reduce the overall storage footprint and not only help organizations maintain the cost saving
associated with virtualization initiatives, but also attain significant cost savings on the storage front. Not all users
will see a 90% reduction in capacity associated with deduplication, but a 20, 30, or 40% reduction would pay off
handsomely in the primary storage environment. Add that to the reclamation of tier-1 storage from migrating data
between tiers, and the savings multiply quickly.
20 Asylum Street | Milford, MA 01757 | Tel: 508.482.0188 Fax: 508.482.0218 | www.esg-global.com

Weitere ähnliche Inhalte

Andere mochten auch

IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
Hitachi Vantara
 

Andere mochten auch (7)

Accelerate the Business Value of Enterprise Storage
Accelerate the Business Value of Enterprise StorageAccelerate the Business Value of Enterprise Storage
Accelerate the Business Value of Enterprise Storage
 
Step 2: Back Up Less Datasheet
Step 2: Back Up Less DatasheetStep 2: Back Up Less Datasheet
Step 2: Back Up Less Datasheet
 
Cloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards InfographicCloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards Infographic
 
HDS Cloud Solutions Infographic
HDS Cloud Solutions Infographic HDS Cloud Solutions Infographic
HDS Cloud Solutions Infographic
 
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...
 
Storage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business EnablerStorage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business Enabler
 
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
 

Mehr von Hitachi Vantara

Redefine Your IT Future With Continuous Cloud Infrastructure
Redefine Your IT Future With Continuous Cloud InfrastructureRedefine Your IT Future With Continuous Cloud Infrastructure
Redefine Your IT Future With Continuous Cloud Infrastructure
Hitachi Vantara
 
Hu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On World
Hitachi Vantara
 
Define Your Future with Continuous Cloud Infrastructure Checklist Infographic
Define Your Future with Continuous Cloud Infrastructure Checklist InfographicDefine Your Future with Continuous Cloud Infrastructure Checklist Infographic
Define Your Future with Continuous Cloud Infrastructure Checklist Infographic
Hitachi Vantara
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White Paper
Hitachi Vantara
 
HitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution ProfileHitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution Profile
Hitachi Vantara
 
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Hitachi Vantara
 
The Next Evolution in Storage Virtualization Management White Paper
The Next Evolution in Storage Virtualization Management White PaperThe Next Evolution in Storage Virtualization Management White Paper
The Next Evolution in Storage Virtualization Management White Paper
Hitachi Vantara
 
The Future of Convergence Paper
The Future of Convergence PaperThe Future of Convergence Paper
The Future of Convergence Paper
Hitachi Vantara
 
Hitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualizationHitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualization
Hitachi Vantara
 
Hitachi white-paper-ibm-mainframe-storage-compatibility-and-innovation-quick-...
Hitachi white-paper-ibm-mainframe-storage-compatibility-and-innovation-quick-...Hitachi white-paper-ibm-mainframe-storage-compatibility-and-innovation-quick-...
Hitachi white-paper-ibm-mainframe-storage-compatibility-and-innovation-quick-...
Hitachi Vantara
 

Mehr von Hitachi Vantara (20)

Webinar: What Makes a Smart City Smart
Webinar: What Makes a Smart City SmartWebinar: What Makes a Smart City Smart
Webinar: What Makes a Smart City Smart
 
Hyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital TransformationHyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital Transformation
 
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsPowering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
 
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
 
Virtual Infrastructure Integrator Overview Presentation
Virtual Infrastructure Integrator Overview PresentationVirtual Infrastructure Integrator Overview Presentation
Virtual Infrastructure Integrator Overview Presentation
 
HDS and VMware vSphere Virtual Volumes (VVol)
HDS and VMware vSphere Virtual Volumes (VVol) HDS and VMware vSphere Virtual Volumes (VVol)
HDS and VMware vSphere Virtual Volumes (VVol)
 
Five Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud ExperienceFive Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud Experience
 
Economist Intelligence Unit: Preparing for Next-Generation Cloud
Economist Intelligence Unit: Preparing for Next-Generation CloudEconomist Intelligence Unit: Preparing for Next-Generation Cloud
Economist Intelligence Unit: Preparing for Next-Generation Cloud
 
HDS Influencer Summit 2014: Innovating with Information to Address Business N...
HDS Influencer Summit 2014: Innovating with Information to Address Business N...HDS Influencer Summit 2014: Innovating with Information to Address Business N...
HDS Influencer Summit 2014: Innovating with Information to Address Business N...
 
Information Innovation Index 2014 UK Research Results
Information Innovation Index 2014 UK Research ResultsInformation Innovation Index 2014 UK Research Results
Information Innovation Index 2014 UK Research Results
 
Redefine Your IT Future With Continuous Cloud Infrastructure
Redefine Your IT Future With Continuous Cloud InfrastructureRedefine Your IT Future With Continuous Cloud Infrastructure
Redefine Your IT Future With Continuous Cloud Infrastructure
 
Hu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On World
 
Define Your Future with Continuous Cloud Infrastructure Checklist Infographic
Define Your Future with Continuous Cloud Infrastructure Checklist InfographicDefine Your Future with Continuous Cloud Infrastructure Checklist Infographic
Define Your Future with Continuous Cloud Infrastructure Checklist Infographic
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White Paper
 
HitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution ProfileHitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution Profile
 
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
 
The Next Evolution in Storage Virtualization Management White Paper
The Next Evolution in Storage Virtualization Management White PaperThe Next Evolution in Storage Virtualization Management White Paper
The Next Evolution in Storage Virtualization Management White Paper
 
The Future of Convergence Paper
The Future of Convergence PaperThe Future of Convergence Paper
The Future of Convergence Paper
 
Hitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualizationHitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualization
 
Hitachi white-paper-ibm-mainframe-storage-compatibility-and-innovation-quick-...
Hitachi white-paper-ibm-mainframe-storage-compatibility-and-innovation-quick-...Hitachi white-paper-ibm-mainframe-storage-compatibility-and-innovation-quick-...
Hitachi white-paper-ibm-mainframe-storage-compatibility-and-innovation-quick-...
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

ESG Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments Analyst Report

  • 1. White Paper Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments By Terri McClure, Senior Analyst July 2013 This ESG White Paper was commissioned by Hitachi Data Systems (HDS) and is distributed under license from ESG. © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
  • 2. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 2 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Contents Overview.......................................................................................................................................................3 Virtualization’s Impact on the Storage Environment ...................................................................................3 The Shift toward NAS for Virtualized Environments............................................................................................... 3 Storage Challenges in Virtualized Environments .................................................................................................... 4 Consolidation: Driving Efficiency in Virtualized Environments ....................................................................6 Automated Storage Tiering and Migration............................................................................................................. 7 Primary Deduplication............................................................................................................................................ 8 Efficient Data Protection ........................................................................................................................................ 9 The Bigger Truth .........................................................................................................................................10 All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
  • 3. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 3 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Overview Virtual server initiatives are straining already overwhelmed file storage environments. Unstructured data is growing faster than ever, thanks largely to the proliferation of endpoint capture devices and advances in hardware and software that allow bigger, richer files to be created. Virtualization is exacerbating the problem because many organizations are underpinning their virtualized environments for both server and desktop virtualization with file servers, since it can be much easier to manage the virtual server storage environment using NAS. This further accelerates the growth rate of unstructured data. This paper looks at the challenges associated with managing unstructured data in virtualized environments at scale, and how to get unstructured data under control through file server consolidation. It provides guidelines to help organizations understand what to look for in their consolidated file storage environments in order to make them as efficient as possible through deduplication, tiering, and migration while efficiently keeping data protected and meeting SLAs. Virtualization’s Impact on the Storage Environment Server virtualization—in other words, using software to divide a single physical server into multiple isolated virtual environments—is driving significant technology and process change across storage, disaster recovery, and management environments in enterprise organizations and small/medium size businesses alike. Server virtualization technology is driving demand for networked storage solutions due to the net increase in storage capacity requirements brought about by server virtualization initiatives. More importantly, the ability to realize many of the key benefits of server virtualization—such as the mobility of virtual machines between physical servers for load balancing, high availability, and maximum utilization of resources—fundamentally requires an underlying networked storage infrastructure. But supporting a virtual server environment introduces a number of storage challenges. First, with multiple virtual machines hosted on a single physical server, chances are good that the associated applications have differing storage policies. This can lead to some pretty complex storage provisioning exercises as storage is logically mapped and provisioned to each virtual machine. And then there is the performance aspect. The storage infrastructure must provide predictable performance scalability for the wide variety of mixed application workloads the virtual machines will drive, with a variety of I/O patterns—for example small, large, sequential, or random operations. Consider that virtual server data protection methods—which are often radically different than traditional physical server methods—need to be designed and tested. And consider the implications of supporting backup and recovery on a single physical machine that supports multiple virtual machines—kicking off backup for one physical server can spike CPU usage and starve the other machines of resources. And when routine maintenance is performed, instead of impacting a single application environment, multiple application environments are affected. ESG has seen instances in which ten or twenty (and in a few edge cases even more), virtual machines share a single physical server, all of which would need to be taken down or moved just to perform routine maintenance. This is really where the importance of networked storage comes in: keeping applications available during everything from routine maintenance to disaster handling by enabling virtual machines to move from physical server to physical server without losing access to data. The Shift toward NAS for Virtualized Environments In fact, many storage challenges associated with server virtualization can be mitigated by leveraging networked attached storage technologies. At their core, virtual machine and desktop images are files. Storing image files on NAS systems simplifies image management significantly: It removes multiple layers of storage management required in a block-based environment. Take the example of provisioning capacity in a Fibre Channel SAN environment. For a Fibre Channel SAN, a storage administrator needs to carve out and assign LUNs to each virtual machine hosted in the physical server; establish and manage switch ports and zones; map HBAs; set up multi-pathing; and cross-mount the appropriate LUNs and
  • 4. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 4 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. zones to multiple physical servers to allow for virtual machine portability. There is more to the process, but describing everything involved would make this a much longer and more technical paper. The point is: That’s a pretty complex and error-prone manual process. In these types of environments, all the mapping and zoning is typically tracked in spreadsheets. It can become an even more complex, time-consuming, and error-prone task as more virtual servers come online or as storage capacity is added and the environment needs to scale. Each time capacity is added, the whole process needs to be repeated. And when you consider the implications of each virtual machine having different protection requirements and performance characteristics, figuring out what LUNs are supporting which virtual machine to ensure appropriate timeliness of snapshots or to perform load balancing can become nearly impossible, especially at scale. In an NFS environment, once a file system is exported to a virtual machine and mounted, it travels with the virtual machine across physical servers, maintaining the relationship. And to add capacity, file system sizes can be expanded on the fly, with no downtime. And because users are managing information,—a file system for each virtual machine rather than a collection of HBAs, LUNs, and worldwide names—overall management is simplified. So, provisioning capacity is much simpler when you treat VMDK files as files! When it comes to NFS for data protection, the snapshot and remote replication capabilities of file systems are often used for improved recoverability, space efficiency, and speed (more on that later). With networked storage, multiple copies of virtual machines can be quickly created, efficiently stored and accessed for replication and disaster recovery purposes, and used to more efficiently perform bare-metal restores. To alleviate the issue with backing up the virtual machine, the backup load can be shifted from the physical server to the file server, leveraging snapshot copies to meet recovery point and time objectives. Storage Challenges in Virtualized Environments When ESG surveyed just over 400 North American IT professionals concerning their organizations’ current data storage environments, including current storage resources, challenges, purchase criteria, and forward-looking data storage plans, participants were asked about their “significant” storage challenges related to their virtual server environments. Almost half, or 43%, of participants indicated that the capital cost of a new storage infrastructure is a significant challenge, and more than one in four (28%) cited operational cost of storage related to server virtualization as a significant challenge (see Figure 1).1 1 Source: ESG Research Report, 2012 Storage Market Survey, November 2012.
  • 5. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 5 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Figure 1. Storage Challenges Stemming from Server Virtualization Usage Source: Enterprise Strategy Group, 2013. In that same survey, respondents were asked about their biggest storage challenges in general, and their primary storage challenge in particular. Rapid growth and management of unstructured data was cited by 40% of respondents as a challenge and as the primary challenge by 15% of respondents. Data protection was close behind, with 39% of respondents citing it as a challenge and 11% citing it as their primary challenge. Also in the top five responses (out of 19 possible) were hardware costs, running out of physical space, and supporting a growing virtual server environment (see Figure 2).2 The influx of unstructured data associated with virtualized environments is certain to continue to strain the unstructured data storage environment as IT organizations struggle to scale and meet these varied and unpredictable workload requirements. 2 Ibid. 5% 14% 19% 22% 24% 28% 29% 36% 42% 43% 0% 10% 20% 30% 40% 50% We have not encountered any challenges Lack of scalability Sizing IOPS requirements to support virtual server environments Poor application response times Impact on overall volume of storage capacity Operational cost of new storage infrastructure Limited I/O bandwidth, especially when workload spikes occur Sizing true capacity (storage) required to support virtual server environment Disaster recovery strategy Capital cost of new storage infrastructure From a storage infrastructure perspective, which of the following would you consider to be significant challenges related to your organization’s server virtualization usage? (Percent of respondents, N=418, multiple responses accepted)
  • 6. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 6 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Figure 2.Top Ten Storage Environment Challenges, by 2012 Storage Budget Source: Enterprise Strategy Group, 2013. Of course, all of this brings up a question for users: How do I rein in unstructured data growth, cost-effectively protect my data, and reduce my overall footprint while still maintaining service levels for my virtual server environment? Undertaking a comprehensive file server consolidation exercise can be an answer—but only if it is built on the right core principals. Consolidation: Driving Efficiency in Virtualized Environments Consolidation is the process of identifying and eliminating legacy storage silos that are the result of the way IT has managed data growth to date, and then putting in place best practices for managing the storage environment in a holistic manner that reduces the overall physical footprint (and costs) of data. Before diving in to consolidation, let’s look at how we’ve arrived here. Why is rapid growth and management of unstructured data the top storage challenge for almost half of IT organizations surveyed? This unrelenting increase of data stems from natural application growth and from the new workloads being generated by social media; web 2.0 applications; and the creation of video, audio, photos, and similar content. Endpoint capture devices have proliferated hugely: A smartphone is in almost everyone’s pocket. A tablet computer (business as much as personal) is in many people’s laps. The ability to create and consume content requires nothing more than the press of a button. Websites and barcode readers collect more data each second—data that organizations slice and dice to identify what their customers need, or more accurately, what their customers will buy. Big data is everywhere, and the rampant copying of data sets for analytics is only one reason for it. Other data- growth culprits include snapshots and remote replication to increase uptime and availability, and programs or initiatives to improve data protection and regulatory compliance. Those are good things, of course, but they certainly accelerate overall data growth rates. 17% 19% 19% 20% 25% 25% 25% 39% 39% 40% 5% 6% 5% 5% 4% 5% 7% 10% 11% 15% 0% 10% 20% 30% 40% 50% Discovery, analysis and reporting of storage… Lack of skilled staff resources Management, optimization & automation of… Staff costs Data migration Running out of physical space Hardware costs Data protection (e.g., backup/recovery, etc.) Rapid growth and management of unstructured… In general, what would you say are your organization’s biggest challenges in terms of its storage environment? Which would you characterize as the primary storage challenge for your organization? (Percent of respondents, N=418) Primary storage challenge All storage challenges
  • 7. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 7 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Historically, the most common way to address the growth problem has been to toss even more storage capacity at it:  You want copies for testing and development? Here’s a server and some storage.  You’d like offsite replication? We’ll build another infrastructure stack.  You need backup? We’ll build another.  You need an application server? We’ll carve out a VM for you, and somehow we’ll find the storage to provision for your virtual machine image and the data it is going to need. That strategy results in ever-expanding, unsharable silos of storage that are usually poorly utilized. They cost more to buy; they take up more data center floor space; they use more energy to power and cool; and they require more staff to manage. All these things are pretty much the opposite of efficiency, which is what most IT organizations are after; yet too often, it was easier to continue to pour money into a suboptimal solution than “bite the bullet” and make things right for the longer term. But in this era of changing consumption models, throwing capacity at everything just won’t work.3 It’s also an ineffective way to spend money. The first step in a comprehensive file storage consolidation strategy is to identify and eliminate these silos. This is not an easy task—in fact, many organizations attempt this effort and at the end of the day, they just create bigger silos, albeit fewer of them. But without the right underlying technology, this is only a Band-Aid that will provide short term relief—the inefficient silo problem still exists, and IT organizations pay more than they need to for their storage from both a CAPEX and OPEX standpoint. The underlying technology in any comprehensive consolidation strategy must be seamless, scalable, and efficient in order to truly eliminate silos all together, but it also needs to support sufficient performance to maintain SLAs in unpredictable virtualized environments. It can’t trade off performance for efficiency because too many workloads could be affected. That means seamless tiering, both within (as the classic definition of tiering has been) and between (which is required to eliminate silos) systems. It also means efficient deduplication of primary data without a major performance hit/impact, and the ability to maintain performance as the environment scales. But most importantly, to maintain SLAs in virtual server environments, it means tight integration into the tools of those environments. Hitachi Data Systems offers such technology and can help IT organizations accomplish this. Automated Storage Tiering and Migration Automated storage tiering has been the topic of much discussion in the industry. Typically, when vendors discuss this capability, they mean tiering within an array and using some combination of flash or solid-state storage for highly active data with (possibly) some serial-attached SCSI (SAS) drives and (likely) the bulk of data on slower rotating, high capacity, nearline SAS (NL-SAS) drives. This makes sense as most data is only active within 30 days of creation, and afterwards is retained yet rarely accessed (this is often called long tail data). In a traditional single tier architecture, this would mean buying an array full of SAS drives to support the active data, and storing long tail data on the same expensive drives. Even worse, it means buying a high performance, highly available tier, one flash array to support the highly active data, and storing the long tail data on that same tier-1 system—but more on that later. Using a small amount of solid-state storage for active data, with a tier of high capacity, slower rotating (hence less power-consuming) NL-SAS disks is a highly effective way to reduce the overall storage footprint as well as cut power and cooling costs. In virtualized environments, where performance is typically a bit write heavy (due to the way virtual servers cache data and stage I/Os) with lots of random I/O, accessing many VMDKs creates a lot of metadata activity. In fact, metadata operations can be very disk intensive and can make up as much as half of all file operations. Automatically moving metadata to a flash or SSD tier, as HDS does, can significantly improve performance in virtual server environments by speeding metadata lookups. 3 A portion of the text in the previous three paragraphs of this section is from the ESG White Paper, Hitachi Data Systems Storage: Improving Storage Efficiency Can Be a Catalyst for IT Innovation, June 2013.
  • 8. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 8 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Even with automated storage tiering within storage systems, IT organizations still find themselves with the challenge of having the bulk of their expensive, tier-1 storage arrays taken up by long tail data. The need is for automated tiering, based on user-defined policies, between storage systems. This is rarely discussed by storage vendors because (a) storage vendors like to sell lots of tier-1 storage and when it fills up, the IT organizations need to buy more, and (b) most storage vendors just don’t have a good story when it comes to automatically migrating data off of tier-1 arrays and onto secondary or tertiary tiers. But HDS does. HDS offers intelligent file tiering that allows IT organizations to search across the file environment and set policies that will trigger automated migration between arrays or even to a cloud tier such as Amazon S3. IT can set policies based on parameters like age, activity, or content type. Think of the power such functionality could have in virtual desktop environments, where users are creating many versions of documents that are rarely, if ever, accessed after 30 days. Highly performing, highly available tier -1 storage systems need to be deployed to meet the demands of virtual desktop environments. Moving user documents off of tier-1 storage as they age or their activity tails off allows IT organizations to reclaim tier-1 storage capacity to service active use cases. HDS claims users can reclaim up to 60% of primary storage capacity via automated migration in the virtual desktop use case. Primary Deduplication Deduplication is the process of identifying duplicate data if it is written to the file system and storing it just once, instead of every time the same data is written. In most cases, a “virtual” file is created that just has pointers to the original copy of the data. Deduplication has largely been deployed in backup environments to reduce storage capacity associated with keeping backup data, which is by nature highly duplicative. Deduplication can be performed at the source file system (which requires server CPU and can drain performance in volatile virtualized environments), inline as data is written (which often drains performance because the process happens during the write, which cannot be committed until the operation is complete), or in a post process (which is often a scheduled, batch-oriented process done off hours) in which the pointers are created. Space needs to be reserved to perform the deduplication process, and the space that the duplicate data resided in needs to be reclaimed after the deduplication process completes. Many IT organizations are hesitant to use deduplication in primary storage environments because of the overhead associated with identifying duplicate data and the negative impact that may have on the system’s file serving performance. HDS has developed deduplication technology that mitigates much of the associated overhead and makes it viable to use deduplication in a primary storage environment. Hitachi NAS hardware acceleration, inherent with its “Hybrid- core” architecture, helps calculate secure hash algorithm (SHA-256) values to speed dedupe comparisons without interfering with file sharing workflow (whether through NFS or SMB/CIFS). It also has intelligence that knows when new data is added and automatically starts up to four parallel deduplication engines if needed to eliminate redundant data. When file serving load reaches 50% of available IOPS, the deduplication engines throttle back to prevent impacting user performance, then automatically resumes when the system is less busy. This unique and patented approach to deduplication enables customers to enjoy the benefits of increased capacity efficiency and reduced total cost of ownership provided by deduplication without compromising performance or scalability. The HDS approach features data in-place deduplication. Data is stored as it normally would be. The deduplication process then combs through that data, eliminating redundancy. Data in-place deduplication eliminates the need to set aside capacity to be used as temporary deduplication “workspace,” minimizes the space needed to track deduplicated data, and delivers greater ROI. Deduplication can be as highly effective in virtual server environments as it is in backup environments because virtual machines often have many of the same files, such as operating system images. In virtualized environments (server and desktop), IT organizations can see as much as 90% capacity reduction through the use of deduplication. Deduplication provides a big “bang for the buck” and offers one of the best ways to reduce the overall storage footprint. HDS makes it a viable choice for primary storage.
  • 9. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 9 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Efficient Data Protection Data protection can really pile on to storage management challenges, and the challenges are magnified in virtualized environments. To manage copies efficiently, bidirectional block-level replication technologies that can utilize the deduplicated storage pool should be used. By doing that, only the unique data elements are transmitted to the other appropriate repositories. In this case, efficiency happens in terms of how little space is consumed (regardless of the number of copies) and how little network throughput is saturated (due to smarter discernment of what should be replicated). But it requires a management tier that understands all of the storage assets across the enterprise, such as HDS does. In virtualized environments, it is important that snapshots are performed at the VM level (as opposed to the LUN, file, or file system level, in which IT administrators could risk cloning the wrong LUN or file thinking it was associated with the VM). This level of granularity is not only efficient, but also effective. It allows for rapid virtual machine and application cloning, with no additional scrubbing operations to get up and running. A highly efficient approach, such as that taken by HDS, only stores pointers to the original data, and only unique data is added to the clone. HDS supports a highly scalable model with up to 100 million snapshots per file system and 100 million clones per file system. An effective data protection strategy in a virtualized environment must be tightly integrated into the virtual environment management tools to ensure the storage administrator and virtualization administrator are working in concert, rather than at odds. Hitachi NAS Virtual Infrastructure Integrator (Virtual V2I) is a VMware vCenter plugin plus associated software that addresses virtual machine backup and recovery and cloning services. It allows users to create storage-based snapshots at intervals ranging from hours between backups to minutes resulting in improved recovery point objectives. Because restores are pointer-based, recovery time can be near instantaneous (a matter of seconds) regardless of size. Virtual V2I allows users to schedule and monitor VM backups to ensure they have an application-consistent recoverable environment. Leveraging space-saving snapshot and clone technology can significantly reduce the storage and network overhead associated with data protection and copy management. But it isn’t just about data protection. Having an efficient copy management engine can speed test and development as well as provisioning. In a dynamic virtual server world, where new servers can be spun up easily and quickly, speeding provisioning or the deployment of new applications or patches can provide businesses the high-tech edge they need to stay ahead of the pack in an increasingly competitive world.
  • 10. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 10 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. The Bigger Truth Over the last decade, almost all areas of IT have been forced to adapt to transformations. Server virtualization is now ubiquitous. Leading-edge IT organizations are now beginning to realize a much broader spectrum of benefits from server virtualization initiatives, such as expanding virtualization to the next tier of applications, automating manual tasks, and streamlining access to IT resources. All of these advantages, in turn, drive hard savings, such as reduced OPEX and CAPEX (from deferred procurement as well as waste reduction), and soft savings from simplified management, reduced downtime, and performance gains. Server virtualization has spawned a need for change in other areas of IT infrastructure, perhaps most significantly in storage. As noted in Figure 1, the biggest storage challenge associated with server virtualization among respondent organizations is the capital cost of the storage infrastructure to support it. Storage costs can quickly eat away at any CAPEX and OPEX savings achieved from virtualization initiatives. As we’ve observed for the past decade, server virtualization accelerates storage growth. But we are only just beginning to see the impact of desktop virtualization on storage, and the emerging picture does not bode well for storage administrators. When ESG surveyed storage administrators that said desktop virtualization presented a storage challenge, 77% of them said that desktop virtualization significantly increased storage capacity requirements, and 51% said it had a negative impact on performance. Taking a holistic view and consolidating the storage environment can help mitigate the storage costs associated with supporting virtualized environments. But consolidation alone is not enough. For many storage vendors, consolidation just means putting everything on a tier-1 storage system that tiers internally. A truly efficient consolidation strategy ensures data is stored on the right tier (within a system to meet performance needs, or on a separate long term archive tier for long tail data) at the right costs at the right time. And it means storing only one copy of data, while creating space efficient copies to use as a basis for backup and restore operations. Combined, this can significantly reduce the overall storage footprint and not only help organizations maintain the cost saving associated with virtualization initiatives, but also attain significant cost savings on the storage front. Not all users will see a 90% reduction in capacity associated with deduplication, but a 20, 30, or 40% reduction would pay off handsomely in the primary storage environment. Add that to the reclamation of tier-1 storage from migrating data between tiers, and the savings multiply quickly.
  • 11. 20 Asylum Street | Milford, MA 01757 | Tel: 508.482.0188 Fax: 508.482.0218 | www.esg-global.com