ESG Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments Analyst Report

White
Paper
Unstructured Data Efficiency and Cost
Savings in Virtualized Server
Environments
By Terri McClure, Senior Analyst
July 2013
This ESG White Paper was commissioned by Hitachi Data Systems (HDS)
and is distributed under license from ESG.
© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 2
Contents
Overview.......................................................................................................................................................3
Virtualization’s Impact on the Storage Environment ...................................................................................3
The Shift toward NAS for Virtualized Environments............................................................................................... 3
Storage Challenges in Virtualized Environments .................................................................................................... 4
Consolidation: Driving Efficiency in Virtualized Environments ....................................................................6
Automated Storage Tiering and Migration............................................................................................................. 7
Primary Deduplication............................................................................................................................................ 8
Efficient Data Protection ........................................................................................................................................ 9
The Bigger Truth .........................................................................................................................................10
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The
Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are
subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of
this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the
express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and,
if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.

Overview
Virtual server initiatives are straining already overwhelmed file storage environments. Unstructured data is growing
faster than ever, thanks largely to the proliferation of endpoint capture devices and advances in hardware and
software that allow bigger, richer files to be created. Virtualization is exacerbating the problem because many
organizations are underpinning their virtualized environments for both server and desktop virtualization with file
servers, since it can be much easier to manage the virtual server storage environment using NAS. This further
accelerates the growth rate of unstructured data.
This paper looks at the challenges associated with managing unstructured data in virtualized environments at scale,
and how to get unstructured data under control through file server consolidation. It provides guidelines to help
organizations understand what to look for in their consolidated file storage environments in order to make them as
efficient as possible through deduplication, tiering, and migration while efficiently keeping data protected and
meeting SLAs.
Virtualization’s Impact on the Storage Environment
Server virtualization—in other words, using software to divide a single physical server into multiple isolated virtual
environments—is driving significant technology and process change across storage, disaster recovery, and
management environments in enterprise organizations and small/medium size businesses alike. Server
virtualization technology is driving demand for networked storage solutions due to the net increase in storage
capacity requirements brought about by server virtualization initiatives. More importantly, the ability to realize
many of the key benefits of server virtualization—such as the mobility of virtual machines between physical servers
for load balancing, high availability, and maximum utilization of resources—fundamentally requires an underlying
networked storage infrastructure.
But supporting a virtual server environment introduces a number of storage challenges. First, with multiple virtual
machines hosted on a single physical server, chances are good that the associated applications have differing
storage policies. This can lead to some pretty complex storage provisioning exercises as storage is logically mapped
and provisioned to each virtual machine. And then there is the performance aspect. The storage infrastructure
must provide predictable performance scalability for the wide variety of mixed application workloads the virtual
machines will drive, with a variety of I/O patterns—for example small, large, sequential, or random operations.
Consider that virtual server data protection methods—which are often radically different than traditional physical
server methods—need to be designed and tested. And consider the implications of supporting backup and recovery
on a single physical machine that supports multiple virtual machines—kicking off backup for one physical server can
spike CPU usage and starve the other machines of resources. And when routine maintenance is performed, instead
of impacting a single application environment, multiple application environments are affected. ESG has seen
instances in which ten or twenty (and in a few edge cases even more), virtual machines share a single physical
server, all of which would need to be taken down or moved just to perform routine maintenance. This is really
where the importance of networked storage comes in: keeping applications available during everything from
routine maintenance to disaster handling by enabling virtual machines to move from physical server to physical
server without losing access to data.
The Shift toward NAS for Virtualized Environments
In fact, many storage challenges associated with server virtualization can be mitigated by leveraging networked
attached storage technologies. At their core, virtual machine and desktop images are files. Storing image files on
NAS systems simplifies image management significantly: It removes multiple layers of storage management
required in a block-based environment.
Take the example of provisioning capacity in a Fibre Channel SAN environment. For a Fibre Channel SAN, a storage
administrator needs to carve out and assign LUNs to each virtual machine hosted in the physical server; establish
and manage switch ports and zones; map HBAs; set up multi-pathing; and cross-mount the appropriate LUNs and

zones to multiple physical servers to allow for virtual machine portability. There is more to the process, but
describing everything involved would make this a much longer and more technical paper. The point is: That’s a
pretty complex and error-prone manual process. In these types of environments, all the mapping and zoning is
typically tracked in spreadsheets. It can become an even more complex, time-consuming, and error-prone task as
more virtual servers come online or as storage capacity is added and the environment needs to scale. Each time
capacity is added, the whole process needs to be repeated. And when you consider the implications of each virtual
machine having different protection requirements and performance characteristics, figuring out what LUNs are
supporting which virtual machine to ensure appropriate timeliness of snapshots or to perform load balancing can
become nearly impossible, especially at scale.
In an NFS environment, once a file system is exported to a virtual machine and mounted, it travels with the virtual
machine across physical servers, maintaining the relationship. And to add capacity, file system sizes can be
expanded on the fly, with no downtime. And because users are managing information,—a file system for each
virtual machine rather than a collection of HBAs, LUNs, and worldwide names—overall management is simplified.
So, provisioning capacity is much simpler when you treat VMDK files as files!
When it comes to NFS for data protection, the snapshot and remote replication capabilities of file systems are often
used for improved recoverability, space efficiency, and speed (more on that later). With networked storage,
multiple copies of virtual machines can be quickly created, efficiently stored and accessed for replication and
disaster recovery purposes, and used to more efficiently perform bare-metal restores. To alleviate the issue with
backing up the virtual machine, the backup load can be shifted from the physical server to the file server, leveraging
snapshot copies to meet recovery point and time objectives.
Storage Challenges in Virtualized Environments
When ESG surveyed just over 400 North American IT professionals concerning their organizations’ current data
storage environments, including current storage resources, challenges, purchase criteria, and forward-looking data
storage plans, participants were asked about their “significant” storage challenges related to their virtual server
environments. Almost half, or 43%, of participants indicated that the capital cost of a new storage infrastructure is
a significant challenge, and more than one in four (28%) cited operational cost of storage related to server
virtualization as a significant challenge (see Figure 1).1
1
Source: ESG Research Report, 2012 Storage Market Survey, November 2012.

Figure 1. Storage Challenges Stemming from Server Virtualization Usage
Source: Enterprise Strategy Group, 2013.
In that same survey, respondents were asked about their biggest storage challenges in general, and their primary
storage challenge in particular. Rapid growth and management of unstructured data was cited by 40% of
respondents as a challenge and as the primary challenge by 15% of respondents. Data protection was close behind,
with 39% of respondents citing it as a challenge and 11% citing it as their primary challenge. Also in the top five
responses (out of 19 possible) were hardware costs, running out of physical space, and supporting a growing virtual
server environment (see Figure 2).2
The influx of unstructured data associated with virtualized environments is
certain to continue to strain the unstructured data storage environment as IT organizations struggle to scale and
meet these varied and unpredictable workload requirements.
2
Ibid.
5%
14%
19%
22%
24%
28%
29%
36%
42%
43%
0% 10% 20% 30% 40% 50%
We have not encountered any challenges
Lack of scalability
Sizing IOPS requirements to support virtual server
environments
Poor application response times
Impact on overall volume of storage capacity
Operational cost of new storage infrastructure
Limited I/O bandwidth, especially when workload spikes
occur
Sizing true capacity (storage) required to support virtual
server environment
Disaster recovery strategy
Capital cost of new storage infrastructure
From a storage infrastructure perspective, which of the following would you consider to
be significant challenges related to your organization’s server virtualization usage?
(Percent of respondents, N=418, multiple responses accepted)

Figure 2.Top Ten Storage Environment Challenges, by 2012 Storage Budget
Source: Enterprise Strategy Group, 2013.
Of course, all of this brings up a question for users: How do I rein in unstructured data growth, cost-effectively
protect my data, and reduce my overall footprint while still maintaining service levels for my virtual server
environment? Undertaking a comprehensive file server consolidation exercise can be an answer—but only if it is
built on the right core principals.
Consolidation: Driving Efficiency in Virtualized Environments
Consolidation is the process of identifying and eliminating legacy storage silos that are the result of the way IT has
managed data growth to date, and then putting in place best practices for managing the storage environment in a
holistic manner that reduces the overall physical footprint (and costs) of data.
Before diving in to consolidation, let’s look at how we’ve arrived here. Why is rapid growth and management of
unstructured data the top storage challenge for almost half of IT organizations surveyed? This unrelenting increase
of data stems from natural application growth and from the new workloads being generated by social media; web
2.0 applications; and the creation of video, audio, photos, and similar content. Endpoint capture devices have
proliferated hugely: A smartphone is in almost everyone’s pocket. A tablet computer (business as much as
personal) is in many people’s laps. The ability to create and consume content requires nothing more than the press
of a button. Websites and barcode readers collect more data each second—data that organizations slice and dice to
identify what their customers need, or more accurately, what their customers will buy.
Big data is everywhere, and the rampant copying of data sets for analytics is only one reason for it. Other data-
growth culprits include snapshots and remote replication to increase uptime and availability, and programs or
initiatives to improve data protection and regulatory compliance. Those are good things, of course, but they
certainly accelerate overall data growth rates.
17%
19%
19%
20%
25%
25%
25%
39%
39%
40%
5%
6%
5%
5%
4%
5%
7%
10%
11%
15%
0% 10% 20% 30% 40% 50%
Discovery, analysis and reporting of storage…
Lack of skilled staff resources
Management, optimization & automation of…
Staff costs
Data migration
Running out of physical space
Hardware costs
Data protection (e.g., backup/recovery, etc.)
Rapid growth and management of unstructured…
In general, what would you say are your organization’s biggest challenges in terms of its
storage environment? Which would you characterize as the primary storage challenge for
your organization? (Percent of respondents, N=418)
Primary
storage
challenge
All storage
challenges

Historically, the most common way to address the growth problem has been to toss even more storage capacity at
it:
 You want copies for testing and development? Here’s a server and some storage.
 You’d like offsite replication? We’ll build another infrastructure stack.
 You need backup? We’ll build another.
 You need an application server? We’ll carve out a VM for you, and somehow we’ll find the storage to
provision for your virtual machine image and the data it is going to need.
That strategy results in ever-expanding, unsharable silos of storage that are usually poorly utilized. They cost more
to buy; they take up more data center floor space; they use more energy to power and cool; and they require more
staff to manage. All these things are pretty much the opposite of efficiency, which is what most IT organizations are
after; yet too often, it was easier to continue to pour money into a suboptimal solution than “bite the bullet” and
make things right for the longer term. But in this era of changing consumption models, throwing capacity at
everything just won’t work.3
It’s also an ineffective way to spend money.
The first step in a comprehensive file storage consolidation strategy is to identify and eliminate these silos. This is
not an easy task—in fact, many organizations attempt this effort and at the end of the day, they just create bigger
silos, albeit fewer of them. But without the right underlying technology, this is only a Band-Aid that will provide
short term relief—the inefficient silo problem still exists, and IT organizations pay more than they need to for their
storage from both a CAPEX and OPEX standpoint. The underlying technology in any comprehensive consolidation
strategy must be seamless, scalable, and efficient in order to truly eliminate silos all together, but it also needs to
support sufficient performance to maintain SLAs in unpredictable virtualized environments. It can’t trade off
performance for efficiency because too many workloads could be affected. That means seamless tiering, both
within (as the classic definition of tiering has been) and between (which is required to eliminate silos) systems. It
also means efficient deduplication of primary data without a major performance hit/impact, and the ability to
maintain performance as the environment scales. But most importantly, to maintain SLAs in virtual server
environments, it means tight integration into the tools of those environments. Hitachi Data Systems offers such
technology and can help IT organizations accomplish this.
Automated Storage Tiering and Migration
Automated storage tiering has been the topic of much discussion in the industry. Typically, when vendors discuss
this capability, they mean tiering within an array and using some combination of flash or solid-state storage for
highly active data with (possibly) some serial-attached SCSI (SAS) drives and (likely) the bulk of data on slower
rotating, high capacity, nearline SAS (NL-SAS) drives. This makes sense as most data is only active within 30 days of
creation, and afterwards is retained yet rarely accessed (this is often called long tail data). In a traditional single tier
architecture, this would mean buying an array full of SAS drives to support the active data, and storing long tail data
on the same expensive drives. Even worse, it means buying a high performance, highly available tier, one flash array
to support the highly active data, and storing the long tail data on that same tier-1 system—but more on that later.
Using a small amount of solid-state storage for active data, with a tier of high capacity, slower rotating (hence less
power-consuming) NL-SAS disks is a highly effective way to reduce the overall storage footprint as well as cut power
and cooling costs.
In virtualized environments, where performance is typically a bit write heavy (due to the way virtual servers cache
data and stage I/Os) with lots of random I/O, accessing many VMDKs creates a lot of metadata activity. In fact,
metadata operations can be very disk intensive and can make up as much as half of all file operations.
Automatically moving metadata to a flash or SSD tier, as HDS does, can significantly improve performance in virtual
server environments by speeding metadata lookups.
3
A portion of the text in the previous three paragraphs of this section is from the ESG White Paper, Hitachi Data Systems Storage: Improving
Storage Efficiency Can Be a Catalyst for IT Innovation, June 2013.

Even with automated storage tiering within storage systems, IT organizations still find themselves with the
challenge of having the bulk of their expensive, tier-1 storage arrays taken up by long tail data. The need is for
automated tiering, based on user-defined policies, between storage systems. This is rarely discussed by storage
vendors because (a) storage vendors like to sell lots of tier-1 storage and when it fills up, the IT organizations need
to buy more, and (b) most storage vendors just don’t have a good story when it comes to automatically migrating
data off of tier-1 arrays and onto secondary or tertiary tiers. But HDS does.
HDS offers intelligent file tiering that allows IT organizations to search across the file environment and set policies
that will trigger automated migration between arrays or even to a cloud tier such as Amazon S3. IT can set policies
based on parameters like age, activity, or content type.
Think of the power such functionality could have in virtual desktop environments, where users are creating many
versions of documents that are rarely, if ever, accessed after 30 days. Highly performing, highly available tier -1
storage systems need to be deployed to meet the demands of virtual desktop environments. Moving user
documents off of tier-1 storage as they age or their activity tails off allows IT organizations to reclaim tier-1 storage
capacity to service active use cases. HDS claims users can reclaim up to 60% of primary storage capacity via
automated migration in the virtual desktop use case.
Primary Deduplication
Deduplication is the process of identifying duplicate data if it is written to the file system and storing it just once,
instead of every time the same data is written. In most cases, a “virtual” file is created that just has pointers to the
original copy of the data. Deduplication has largely been deployed in backup environments to reduce storage
capacity associated with keeping backup data, which is by nature highly duplicative. Deduplication can be
performed at the source file system (which requires server CPU and can drain performance in volatile virtualized
environments), inline as data is written (which often drains performance because the process happens during the
write, which cannot be committed until the operation is complete), or in a post process (which is often a scheduled,
batch-oriented process done off hours) in which the pointers are created. Space needs to be reserved to perform
the deduplication process, and the space that the duplicate data resided in needs to be reclaimed after the
deduplication process completes. Many IT organizations are hesitant to use deduplication in primary storage
environments because of the overhead associated with identifying duplicate data and the negative impact that may
have on the system’s file serving performance.
HDS has developed deduplication technology that mitigates much of the associated overhead and makes it viable to
use deduplication in a primary storage environment. Hitachi NAS hardware acceleration, inherent with its “Hybrid-
core” architecture, helps calculate secure hash algorithm (SHA-256) values to speed dedupe comparisons without
interfering with file sharing workflow (whether through NFS or SMB/CIFS). It also has intelligence that knows when
new data is added and automatically starts up to four parallel deduplication engines if needed to eliminate
redundant data. When file serving load reaches 50% of available IOPS, the deduplication engines throttle back to
prevent impacting user performance, then automatically resumes when the system is less busy. This unique and
patented approach to deduplication enables customers to enjoy the benefits of increased capacity efficiency and
reduced total cost of ownership provided by deduplication without compromising performance or scalability.
The HDS approach features data in-place deduplication. Data is stored as it normally would be. The deduplication
process then combs through that data, eliminating redundancy. Data in-place deduplication eliminates the need to
set aside capacity to be used as temporary deduplication “workspace,” minimizes the space needed to track
deduplicated data, and delivers greater ROI.
Deduplication can be as highly effective in virtual server environments as it is in backup environments because
virtual machines often have many of the same files, such as operating system images. In virtualized environments
(server and desktop), IT organizations can see as much as 90% capacity reduction through the use of deduplication.
Deduplication provides a big “bang for the buck” and offers one of the best ways to reduce the overall storage
footprint. HDS makes it a viable choice for primary storage.

Efficient Data Protection
Data protection can really pile on to storage management challenges, and the challenges are magnified in
virtualized environments. To manage copies efficiently, bidirectional block-level replication technologies that can
utilize the deduplicated storage pool should be used. By doing that, only the unique data elements are transmitted
to the other appropriate repositories. In this case, efficiency happens in terms of how little space is consumed
(regardless of the number of copies) and how little network throughput is saturated (due to smarter discernment of
what should be replicated). But it requires a management tier that understands all of the storage assets across the
enterprise, such as HDS does.
In virtualized environments, it is important that snapshots are performed at the VM level (as opposed to the LUN,
file, or file system level, in which IT administrators could risk cloning the wrong LUN or file thinking it was
associated with the VM). This level of granularity is not only efficient, but also effective. It allows for rapid virtual
machine and application cloning, with no additional scrubbing operations to get up and running. A highly efficient
approach, such as that taken by HDS, only stores pointers to the original data, and only unique data is added to the
clone. HDS supports a highly scalable model with up to 100 million snapshots per file system and 100 million clones
per file system.
An effective data protection strategy in a virtualized environment must be tightly integrated into the virtual
environment management tools to ensure the storage administrator and virtualization administrator are working in
concert, rather than at odds. Hitachi NAS Virtual Infrastructure Integrator (Virtual V2I) is a VMware vCenter plugin
plus associated software that addresses virtual machine backup and recovery and cloning services. It allows users to
create storage-based snapshots at intervals ranging from hours between backups to minutes resulting in improved
recovery point objectives. Because restores are pointer-based, recovery time can be near instantaneous (a matter
of seconds) regardless of size. Virtual V2I allows users to schedule and monitor VM backups to ensure they have an
application-consistent recoverable environment.
Leveraging space-saving snapshot and clone technology can significantly reduce the storage and network overhead
associated with data protection and copy management. But it isn’t just about data protection. Having an efficient
copy management engine can speed test and development as well as provisioning. In a dynamic virtual server
world, where new servers can be spun up easily and quickly, speeding provisioning or the deployment of new
applications or patches can provide businesses the high-tech edge they need to stay ahead of the pack in an
increasingly competitive world.

The Bigger Truth
Over the last decade, almost all areas of IT have been forced to adapt to transformations. Server virtualization is
now ubiquitous. Leading-edge IT organizations are now beginning to realize a much broader spectrum of benefits
from server virtualization initiatives, such as expanding virtualization to the next tier of applications, automating
manual tasks, and streamlining access to IT resources. All of these advantages, in turn, drive hard savings, such as
reduced OPEX and CAPEX (from deferred procurement as well as waste reduction), and soft savings from simplified
management, reduced downtime, and performance gains.
Server virtualization has spawned a need for change in other areas of IT infrastructure, perhaps most significantly in
storage. As noted in Figure 1, the biggest storage challenge associated with server virtualization among respondent
organizations is the capital cost of the storage infrastructure to support it. Storage costs can quickly eat away at any
CAPEX and OPEX savings achieved from virtualization initiatives. As we’ve observed for the past decade, server
virtualization accelerates storage growth. But we are only just beginning to see the impact of desktop virtualization
on storage, and the emerging picture does not bode well for storage administrators. When ESG surveyed storage
administrators that said desktop virtualization presented a storage challenge, 77% of them said that desktop
virtualization significantly increased storage capacity requirements, and 51% said it had a negative impact on
performance.
Taking a holistic view and consolidating the storage environment can help mitigate the storage costs associated
with supporting virtualized environments. But consolidation alone is not enough. For many storage vendors,
consolidation just means putting everything on a tier-1 storage system that tiers internally. A truly efficient
consolidation strategy ensures data is stored on the right tier (within a system to meet performance needs, or on a
separate long term archive tier for long tail data) at the right costs at the right time. And it means storing only one
copy of data, while creating space efficient copies to use as a basis for backup and restore operations. Combined,
this can significantly reduce the overall storage footprint and not only help organizations maintain the cost saving
associated with virtualization initiatives, but also attain significant cost savings on the storage front. Not all users
will see a 90% reduction in capacity associated with deduplication, but a 20, 30, or 40% reduction would pay off
handsomely in the primary storage environment. Add that to the reclamation of tier-1 storage from migrating data
between tiers, and the savings multiply quickly.

20 Asylum Street | Milford, MA 01757 | Tel: 508.482.0188 Fax: 508.482.0218 | www.esg-global.com

ESG Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments Analyst Report

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (7)

Mehr von Hitachi Vantara

Mehr von Hitachi Vantara (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ESG Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments Analyst Report