You data center is constantly becoming more complicated. The proliferation of virtualization has added an all-new layer of complexity to the picture. But there's a better way: New techniques, new technologies, and new solutions exist not only to make data protection easier, faster, and more reliable but also better to integrate the independent tasks of physical and virtual machine protection.
The buyers' guide to virtual + physical data protection
1. The buyers’ guide to
virtual + physical
data protection
by Greg Shields, MS MVP & VMware vExpert
2. Chapter 1: Business goals
for data protection
Your data center is constantly becoming more and more complicated. Today,
the proliferation of virtualization—both for virtual servers as well as virtual
desktop projects—has added an all-new layer of complexity to the picture.
What used to be simple processes and technologies—such as data protection—
have become bifurcated and chaotic. Now, on top of the tools and techniques
you use to protect your physical data assets, you’re being forced to adopt an
entirely new set of solutions to protect your virtual data assets.
Regardless of the operation you always want the shortest possible RTO for a
given operation. The RTO is often heavily weighted by “overhead:” the amount
of time it takes to retrieve tapes, load the tapes, spin through the tapes to find
the needed data, and so on. Only a fraction of the total restoration time is spent
actually copying data from the backups, although the physical copy time can
also, with some approaches, be a major component of the total RTO.
Shortening the RTO, then, will require you to reduce or eliminate overhead
as much as possible as well as speed the physical copy operation as much as
possible. For example:
• Your organization might rely more heavily on online and near-line
storage, such as disk arrays, for backup data. Disks are faster than tape,
providing a shorter restoration time.
Making the situation worse is that data protection techniques have never done a
superior job of actually meeting your business goals. Unreliable backup media,
awkward backup management processes, difficult-to-test disaster recovery
techniques, and time-consuming recovery workflows have always presented
business risks. Toss in the new challenges presented by virtualization, and you’re
looking at a recipe for business disaster.
• You might utilize data protection solutions that provide direct access to
But there’s a better way: New techniques, new technologies, and new solutions
exist not only to make data protection easier, faster, and more reliable but
also to better integrate the independent tasks of physical and virtual machine
protection. This guide will help you sort through the new as well as the more
traditional approaches that you’re probably already familiar with, providing the
background you need to select the right approach for your environment. This
chapter focuses on defining the true operational and business goals for data
protection, then using that definition to create a template for comparison and
evaluation of the various approaches and solutions.
• Maintaining a full, online index of backed-up data makes it faster to locate
Fast recovery: Obtaining the shortest possible RTO
The Recovery Time Objective, or RTO, is your organization’s stated tolerance
for downtime while a restoration is in process. Most organizations will define
several RTOs for different scenarios: Restoring a single file, for example, might
have a shorter expected time than restoring an entire server, which might be
shorter than restoring an entire data center.
Chapter 1: Business Goals for Data Protection | 2012 Dell. All rights reserved..
the backed-up data without requiring it to be restored. For simple files
and folders, this process is pretty straightforward; for database-oriented
data, such as Exchange messages or SQL Server tables, this approach
might require a data protection solution that can natively mount
databases without requiring them to be restored to a live server.
the data you need to restore. As finding the desired data is often one of the
most time-consuming parts of a restore, such an index could significantly
lower your RTO.
• A data protection approach that embraces and leverages automation as
much as possible will also help reduce the RTO. Manual labor is always
slower, more error-prone, and less consistent; automation will always
speed up the process and reduce or eliminate wasted effort.
The more you can reduce the overhead associated with a restore, the faster
an RTO you can set. Moving away from slower, sequential media such as
tape and moving toward random-access media such as disk or SSD will also
help shorten the RTO by reducing the amount of time the physical data copy
requires to complete.
3. The least data at risk: Achieving the shortest possible RPO
Low overhead:Enabling easy day-to-day restoration
The Recovery Point Objective, or RPO, is essentially a business’ statement of
how much data they’re willing to have at risk. Many organizations routinely
accept an entire day’s worth of data as their RPO, meaning that if a disaster
struck, the point they could recover to would usually be the previous evening’s
backup. That’s a pretty long RPO, and it’s largely based on what traditional
data protection approaches can deliver rather than what businesses are really
comfortable with.
Organizations tend to build data protection approaches with disaster in mind,
meaning they’re often constructed primarily to recover entire systems. But
day-to-day restoration of individual files, email messages, and other data is
what most organizations deal with most of the time. Much of the overhead and
inefficiency in existing data protection approaches comes from the fact that
you’re using what is essentially a disaster recovery system to restore individual
files, emails, and so on. It’s like buying a new car every time you get a chip in the
paint—not terribly efficient.
Frankly, your pie-in-the-sky value for the RPO should be “zero,” or very close to
it. In other words, your business doesn’t want to lose any data, ever. You want
to be able to restore your environment, or any given piece of data, to the way it
was immediately before a failure or unintended change. This point is important:
When defining business goals, don’t worry about what technology is capable of.
Accept the fact that you might not be able to fully achieve your business goal—
which should definitely be “zero data at risk, ever”—but certainly aim high.
Zero at-risk data isn’t easy to achieve, but there are definitely approaches to
consider to help your organization come close:
• Continuous data protection is going to be required, meaning you
dispense with traditional backup windows and instead back up
everything, all the time, as changes occur. You might still have to tolerate
a few minutes’ of at-risk data, but that’s far better than an entire day or
more. For some tasks, organizations might already be achieving this
goal. For example, data clustering and mirroring technologies can often
provide zero at-risk data by having a redundant data set available at all
times. Although it can be expensive, it’s certainly an existing option.
A modern data protection approach should make those day-to-day restorations
as easy as possible. For some organizations, that might even extend to enduser self-service restoration mechanisms, but most organizations will prefer to
retain control over data restoration. Restoration does, after all, raise potential
operational, data integrity, and security concerns, so having the IT staff handle
restores helps to centralize the effort and avoid human error.
That said, the IT staff needs to be able to easily and quickly perform day-to-day
restores. Many of the approaches that can help shorten the RTO can also ease
day-to-day restore tasks:
• A data protection solution that provides direct access to the backed-up
data without requiring it to be restored will also ease day-to-day restore
tasks. Again, this approach might simply entail mounting the backed-up
data as a browse-able disk volume that doesn’t require actually restoring
the data. In other words, this method enables IT staff to access the data
directly from the backup archive, even if the data is in a database file such
as an Exchange mail store or SQL Server database.
• Backed-up data would ideally be time-stamped, allowing you to recover
not only the most recent version but also any specific version of that
data from the recent past. This functionality becomes important when
you’re relying on continuous data protection because if someone makes
an unwanted change that is immediately backed up, you obviously don’t
want to rely on that most recent version of the backup to undo the
change.
• An index of the backed-up data would, again, make it faster to locate the
data to be restored.
• Storing backups in an online or near-line fashion, rather than offline on
magnetic tape, would also speed day-to-day backups by simply making
the backup archive more readily available.
•
• Because continual data protection will necessarily involve online storage
and some kind of server, which is a potential failure point, you’ll need to
build in redundancy so that your backups are protected.
Chapter 1: Business Goals for Data Protection | 2012 Dell. All rights reserved..
The more than can be done to reduce the overhead and time of day-to-day
restores, the better.
4. Reliability: Ensuring easy, fast, testable disaster recovery
Having easier day-to-day single-item recovery doesn’t mean that whole-system
disaster recovery is any less important. This task also needs to be made as
easy and as automated as possible. Most important, whole-system disaster
recovery needs to be easily testable, something that traditional disaster recovery
processes rarely offer. Organizations shouldn’t have to send half the IT team offsite to test the disaster recovery process; you should be able to push a couple of
buttons and bring up selected servers in a test environment—possibly a virtual
one—right within the data center. Yes, testing off-site recovery is important if it’s
a component of your overall recovery plan, but you should be able to frequently
test your ability to bring critical systems back online without having to wait for
the once-a-year trip to the off-site recovery facility.
This discussion should have parameters around “easy” because different people
definitely have different definitions. Ideally, recovering an entire system to
its most recent state should be as straightforward as selecting the system in
the data protection solution, specifying a restoration target (such as a virtual
machine in a test environment), and clicking “OK.” The data protection system
should take over entirely at that point, bringing the selected system back online
on the designated target.
And of course disaster recovery should be fast. Most businesses, accustomed to
the limitations of traditional data protection approaches, might accept “several
hours” as a reasonable RTO for full system recovery. They shouldn’t: It should be
possible to have a system back online in minutes.
For example, a recovery solution that is able to directly mount backed-up data
as a usable volume can simply provide such a volume to a virtual machine,
enabling the virtual machine to come online almost instantly, albeit possibly
with a somewhat degraded level of performance. Other techniques might
enable a data protection system to bring a system back online while the data
is still being streamed to the recovery target. Smart algorithms can allow the
system to prioritize data that users are attempting to access, enabling users to
be productive even while the recovery is still technically underway. That is a
short RTO, and is a goal that business decision makers should push for.
instead create a variety of scenarios to choose from:
•
•
•
•
•
Physical server restored to virtual machine
Physical server restored to dissimilar physical hardware
Virtual machine restored to physical server
Physical server restored to similar physical hardware
Virtual machine restored to virtual machine—running under the same, or
even a different, hypervisor technology
There’s no reason to settle for anything less; the technologies to support
these scenarios all exist. They’re just rarely incorporated in traditional recovery
solutions and processes.
Think of the business flexibility these options give you. If a critical server dies,
you can quickly bring it back up in a virtual machine while the hardware is
repaired. If your entire site becomes inaccessible or unusable, you can spin up
critical servers in someone else’s virtualized data center—potentially even in a
cloud environment. With this kind of flexibility, your disaster recovery plan can
include numerous options that best fit a variety of circumstances and gives your
IT team and management numerous options to choose from based on the
specific situation at hand.
Less overhead: Minimizing backup storage and management
Many of the goals outlined so far in this chapter will likely involve disk-based
backup storage, which is not always inexpensive. That means organizations
also have to set goals for controlling storage costs and reducing the potential
administrative overhead related to that storage, for example:
• Minimize storage consumption by automatically collecting incremental
and differential backup data into a periodic “full backup” image.
• Further minimize storage consumption by de-duplicating data on a per-
byte basis.
• Further minimize storage consumption by compressing data using a
variety of advanced compression algorithms.
Creating options: Designing flexible disaster recovery
Organizations should demand much more flexibility in recovery options, too.
Today, recovery approaches tend to focus on restoring a system to its original
system, whether physical or virtual. IT should push to break down that wall, and
Chapter 1: Business Goals for Data Protection | 2012 Dell. All rights reserved..
• Reduce management overhead by automatically cycling older backup
data out of the system, based on configurable top-down policies.
5. By aggressively minimizing the storage used by the data protection solution, and
by automating the removal of older backup data, you should be able to make
such a system both affordable and easily manageable over the long term.
How can you meet these goals?
In the next chapter, you’ll get a concise primer on the different data protection
approaches that currently exist in the marketplace. The idea is to familiarize you
with the technologies, on their own merit, without immediately worrying about
how well they meet business objectives. Chapter 3 looks at different scenarios
these approaches have to fit into, which will help reinforce your business
goals and put them into context. Chapter 4 compares each approach to those
scenarios and objectives.
Chapter 1: Business Goals for Data Protection | 2012 Dell. All rights reserved..
6. Chapter 2: Overview of
technical approaches to
data protection
As you begin to evaluate potential solutions for data protection and try to
integrate protection services for both virtual and physical machines, it’s useful
to understand the underlying details of various approaches. That’s what this
chapter helps you do.
Approach 1: Backing up files
The most time-tested and traditional approach to data protection is to simply
back up the files. Often, those backups are written to magnetic tape, which can
be conveniently rotated off-site to provide a higher level of survivability in the
event of a total facility disaster.
Starting at the bottom of the figure, there is the server, its disk storage, and
the files on disk. Backups are made by backup software, which is often a
small agent that communicates with an external backup server. That software
must request file access from the operating system in order to obtain the
files’ contents, and then either write those contents to the backup storage or
transmit them to the external server for storage.
The point of this explanation is that the operating system provides access to the
files, and it does so at a fairly high level of abstraction. The backup software, in
this approach, is largely ignorant of how the file is stored, what the file contains,
and so on. The software is also dependent upon the operating system’s ability
to actually provide access to the file—something that, with certain kinds of
files, the operating system may not be able to do. Database files are the most
common downfall of this approach, as the database software—say, SQL Server,
Exchange Server, or even Access—already has the file open with an exclusive
lock, preventing the operating system from giving the backup software
simultaneous access to the file.
Approach 2: Backing up data
It might seem like remedial education, but it’s worth spending a few
moments discussing exactly how this kind of backup works. Consider this
functional diagram:
The difficulty with backing up databases and other always-on data using the
traditional file-based backup approach has led many software vendors to add
built-in backup provisions to their software. Consider the following diagram:
Chapter 2: Overview of Technical Approaches to Data Protection | 2012 Dell. All rights reserved..
7. Here, the operating system continues to provide access to the low-level files on
disk—only the operating system can do so, after all. But that access is provided
exclusively to the application software, which again might be software such as
SQL Server or Exchange Server. The application software itself then provides
access to the data within the file(s) to any compliant backup software.
The practical upshot of this approach has been the proliferation of a vast array
of application-specific backup agents, and a revenue opportunity for backup
software vendors. You must not only purchase the base backup software but
also obtain a backup agent for each kind of application you want to back up!
For always-on applications for which an agent isn’t available… well, you might
just be out of luck for backing them up.
This limitation includes virtual machine files. From the perspective of the
virtualization host machine, a virtual machine’s virtual hard disk file is really just
a kind of database file, held exclusively open by the virtualization software (the
hypervisor) whenever the virtual machine is running. With a specialized agent,
and a hypervisor capable of providing the data to the agent, you can’t back up
these files very well. Unfortunately, due to the incredibly dynamic nature of
virtual machines, hypervisors can’t readily provide a “backup data stream” to a
backup agent in the way that software such as Microsoft Exchange can.
Approach 3: Dealing with open files
Aside from specialized software hooks and backup agents, the industry has
made other attempts at enabling backups for always-open files. One of those
methods, which is specific to the Microsoft Windows platform, is the Volume
Shadow Copy Service, or VSS. Its functionality is illustrated here:
Here, the operating system is—via the VSS functionality—responsible for
making a “shadow copy,” or independent snapshot, of a requested file. The
backup software then backs up that snapshot.
In most cases, support for VSS must be built into the application software. For
simple types of files, VSS is capable of creating a snapshot without applicationspecific support. However, for more complex data structures—again, think SQL
Server and Microsoft Exchange—the application must be informed by VSS that
a snapshot is required. The application can then ensure that its disk files are in
a consistent state, tell VSS to make the snapshot (which is a quick process in
most cases), and then continue working. VSS can then provide the snapshot for
however long the backup software requires it, while the application continues
working.
Approach 4: Backing up virtual machines
The VSS approach—or something like it—is often used to back up entire virtual
machine disk files from the virtualization host. In other words, the hypervisor
is informed by VSS that a snapshot is required. The hypervisor gets its files
into a consistent state, which often simply means dumping any in-memory
caches and getting the files up to date. VSS takes the snapshot, and the backup
solution backs it up.
VSS isn’t the only game in town for this approach, of course. VSS is used for
Microsoft’s Hyper-V hypervisor but it isn’t supported in VMware’s vSphere
product; instead, VMware provides its own technology that more or less
follows the same functional outline. The point is that the hypervisor is asked
to “quiesce” its files, a snapshot is taken, and the backup software backs up the
snapshot instead of the actual “live” file.
This entire process isn’t entirely automated, although some administrators
can automate it by using scripts or other techniques. There are also software
backup vendors whose solutions “plug in” to this native technology, providing
high-level automation to generate and back up the snapshots on whatever
schedule you set.
Approach 5: Backing up disk blocks
One thing in common about the first four approaches examined is that they
all rely on operating system file-level access. As you’ve seen, that reliance can
create problems with concurrent access to always-open files. Those problems
then result in what are essentially workarounds, such as VSS. None of these
Chapter 2: Overview of Technical Approaches to Data Protection | 2012 Dell. All rights reserved..
8. approaches provides a good basis for continual data protection. Although
you could, in theory, use an approach such as VSS to take backups every few
minutes, in practice such an approach would consume a high level of server
overhead. Simply copying the snapshot files would take enough time to create
a large window of time between backups.
With this approach, there’s no such thing as an “open file.” The backup software
never asks the operating system for file-level access; the backup software
simply grabs changes to disk blocks as they occur. This method can be
done on a physical machine or within a virtual machine, meaning this same
approach lends itself to both physical and virtual assets.
An alternative approach is to back up disk blocks. This process is a bit more
complex but represents a re-thinking about data protection. Consider this
functional diagram:
Data protection scenarios
With these data protection approaches in mind, and with your business goals
clearly stated, it’s time to consider actual data protection scenarios. Business
goals are obviously important, but sometimes the nitty-gritty real-world details
of actual situations can help further refine and evolve business needs to truly
identify the best data protection solution for the organization. That’s what we’ll
attempt in the next chapter.
The process starts with the application software making a change. That change
must always be handed off to the operating system, which is the sole arbiter of
disk access. The operating system breaks down that change into one or more
changes to specific, small disk blocks (typically less than 2 kilobytes apiece).
The backup software works by registering itself as a volume-level filter with
the file system, meaning it gets to see a copy of every disk block–level change.
Those changes are then replicated to a backup server, where they are timestamped, compressed, de-duplicated, and saved to disk. They might also be
replicated off-site for safekeeping.
Chapter 2: Overview of Technical Approaches to Data Protection | 2012 Dell. All rights reserved..
9. Chapter 3: Data protection
and recovery scenarios
The first chapter spends time reviewing business goals for data recovery. That’s
obviously an important step if you’re evaluating data protection approaches
and solutions, but it isn’t the only step. The business cares about very specific
considerations, such as uptime, reliability, data loss, and so forth, but there’s
also the operational side of the equation. That drives a number of more subtle
needs, such as ease of use, ease of achieving accuracy, and so on. Too often,
organizations overlook these extremely important needs and find themselves
using solutions that—although technically doing everything the business needs
them to do—require too much overhead or miss some of the more subtle needs
that aren’t well articulated in requirements lists.
This chapter identifies those operational needs, mainly by examining actual
data protection scenarios that your IT team could find themselves handling. By
adding these operational needs to the business requirements, your organization
can create a much more complete “shopping list” of evaluation criteria.
Remember: Physical + virtual
There’s an enormous danger looming in our increasingly virtualized data
centers: inconsistency. When a disaster strikes or a failure occurs, you need
your IT team to be able to respond quickly and accurately. The fewer number of
processes and moving parts they have to deal with, the more likely they will be
able to respond quickly and correctly. Start throwing a million “what if” questions
into the response process, and you not only slow things down but also open up
more opportunities for inaccuracy and inconsistency. When things are moving
fast, you don’t want your team to have to make a lot of decisions; you want
them to implement as quickly as possible.
You explicitly do not want a disaster recovery process with phrases like, “If it’s
a virtual machine, do this… and if it’s a physical machine, do this….” That’s not
only a decision that the IT team has to evaluate, it’s a decision that can quickly
change. Organizations today are migrating physical machines to virtual ones
in record numbers, and in some cases, actually going from virtual to physical
for a variety of reasons. It’s a constantly changing landscape. For this reason,
one major operational requirement that you have to identify up front is the
Chapter 3: Data Protection and Recovery Scenarios | 2012 Dell. All rights reserved..
need for a single data protection solution that protects both virtual and physical
machines in exactly the same way.
This need is also driven by something other than response times: service
consistency. “Hi, this is Bob the User down in Sales. I accidentally deleted a file.
Can you restore it for me?”
“Well, Bob, if it had been on a physical file server, it would be easy—I could just
go grab the backup tapes. But it was on a virtual file server, so I have to restore
the whole virtual machine, boot it up, and then go get your file out of it.”
Why should your users be concerned about what happens in the data center?
And why should a server’s hosting platform—virtual or physical—have any
bearing on the level of recovery service you can provide to your users? It
shouldn’t, so again, a data protection solution needs to protect both physical
and virtual assets in exactly the same way. You don’t want two stacks of tools,
you don’t want two recovery processes, you don’t want two sets of service
capabilities—you want one.
Total site disaster in a multi-site organization
This first scenario is one of the worst. It’s the one you truly hope never happens,
the one that’s most difficult (and often expensive) to prepare for, and the one
that’s the most risky to implement data protection for. This scenario is where
your entire data center is down and inaccessible, perhaps due to a fire, natural
disaster, utility failure, or other out-of-your-control situation. First, consider this
scenario from the perspective of an organization that has other sites that could
conceivably take over from the failed one.
In addition to the basic business requirements
•
•
•
•
•
Shortest possible RTO
Shortest possible RPO
Minimized backup storage and management
Easy, fast, testable disaster recovery
Flexible disaster recovery
what are some of the more specific operational goals? Those last two business
requirements actually hint at some of the underlying operational needs.
10. The first, and perhaps most obvious, is the ability to quickly get data off-site on
a more-or-less continuous basis. Having such a capability in place allows you
to replicate data to an alternate data center, where the data could lie in wait
for the day when a total site failure occurred. With that up-to-date data in the
alternate site (or with two sites replicating to each other for redundancy), key
servers could be quickly brought back online—perhaps as virtual machines—and
accessed by users with very little downtime to the organization.
• Prioritization. For recoveries where a lot of data needs to be moved from
Which segues nicely into the flexibility requirements. Whether a backed-up
server was originally virtual or physical isn’t important; the odds are that in this
situation you’ll want to recover to virtual machines, at least for a certain period
of time. Doing so lets you pack more recovery capability into your alternate site
at a much lower cost.
Total site disaster in a single-site organization
There are also specific operational needs around disaster recovery testing—
something most people know they don’t do often enough, due in large part
to the expense and inconvenience. Therefore, a good data protection solution
will need to make disaster recovery testing inexpensive and convenient so
that you’ll run your tests often. Flexibility is once again the easiest way to make
that happen: If your disaster recovery tests can take place entirely in a “sidelined” virtual environment, then you can test to your heart’s content without
interrupting any production resources. So the ability to restore both physical and
virtual machines to an alternate virtualization host is crucial.
work in a degraded performance environment in the initial phases of
disaster recovery and are instead intent on getting key services running.
the backup to a restoration target, a data protection solution that can
prioritize that data—copying based on what users are trying to access—
can give you the semblance of being fully functional without that being
the reality. Users get what they need more quickly, without needing to
wait for the entire body of data to be copied.
A total site disaster in a single-site organization is much like the multi-site
scenario just discussed but with the added complication of there not being a
dedicated alternate site to recover to. Organizations in this situation might have a
shared offsite recovery location (e.g., a rented disaster recovery facility) or might
in fact have nothing to fall back on.
The additional operational capability of cloud replication then becomes
important. Rather than replicating backup data to an alternate data center
owned by the organization, data can be replicated to someone else’s data
center—the cloud. In the event of a failure, that cloud-based data can be utilized
in one of two ways:
• At the least, the cloud data could be used to restore servers at any
Additional operational requirements come to mind:
• No thinking required. You need the actual restoration process to
be incredibly easy. A total site disaster brings enough pressures and
difficulties; you shouldn’t have to think very much when it comes to
recovering both physical and virtual servers. As much of the process as
possible should be configurable in advance so that when disaster strikes,
you simply click a few buttons and let the software do its job.
• Fast recovery. You need to be able to get data off the backup solution
and on to the restoration target as quickly as possible, getting servers
back up and running as quickly as possible. This requirement might, for
example, involve the backup solution actually presenting the backed-up
data as a virtual machine hard disk image, which can be mounted to a
configured virtual machine in just minutes. The goal is not maximum
production performance at this stage; most organizations are happy to
Chapter 3: Data Protection and Recovery Scenarios | 2012 Dell. All rights reserved..
alternate location with Internet access. Given the bulk of data involved,
organizations might select a cloud storage provider that could—using
overnight shipping or a courier—deliver the data physically on removable
storage devices.
• A better solution is to replicate data to a cloud storage provider that can
also utilize that data within their own data centers, restoring the backedup data to virtual machines in the cloud. The cloud, then, becomes the
“alternate data center site,” enabling the organization to get at least its
most critical services up and running quickly.
Organizations looking at this scenario will almost always have to make triage
decisions about their servers and services because it might not be practical to
have every server receive this level of protection. But that’s fine: in a total disaster
scenario, you can almost always identify servers that you can live without.
Knowing that only a subset of servers might be available in a disaster also helps
11. you make smarter design and architecture decisions when implementing
services. “Hey, this application is crucial to us—let’s keep its data on the same
SQL Server that’s already flagged for cloud-based recovery rather than putting it
on another server that would also have to be recovered.”
Single data item change/deletion
Single-server failure
•
•
•
•
Single-server failure is potentially a more common scenario than the wholesite disaster, but they share a number of important criteria. From a business
perspective, you obviously want:
•
•
•
•
•
Shortest possible RTO
Shortest possible RPO
Minimized backup storage and management
Easy, fast, testable disaster recovery
Flexible disaster recovery
Although the overall scope of the problem is of course smaller because you’re
dealing with only a single failed server, for the most part, your operational needs
will mirror those of the total-failure scenario.
Above all, you want simplicity. Whether the failed server is virtual or physical, you
pretty much want to be able to push a button and restore it either to its original
location or to an alternate physical or virtual machine. That’s the flexibility
business requirement—P2V, V2P, P2P, V2V—whatever works at the moment is
what you want to do.
A solution that can get a virtual machine up and running really, really quickly is
obviously of benefit also. That might involve providing a mountable disk image
that a virtual machine can immediately boot from, or streaming prioritized data
to a restoration target to get key services up and running as quickly as possible—
if not with the highest level of performance, initially.
Server failures will almost always be something you want to restore to the
most recent point in time possible, meaning you want to put the server back to
exactly the way it was before it died, with little or no data missing. A continuous
backup solution can provide minimum data at risk because data is being
continuously captured and backed up.
Chapter 3: Data Protection and Recovery Scenarios | 2012 Dell. All rights reserved..
Single data item recovery is a much more common scenario than any of the
“failure” scenarios discussed to this point. In this situation, you’re not recovering
from failed sites or servers, you’re restoring:
Files that were deleted or changed by accident
E-mails that were lost or deleted
SQL Server tables or databases that were changed incorrectly
Documents out of a SharePoint site that were deleted by accident
This scenario presents the bulk of recovery work an IT team deals with, and a
data protection solution can go a long way toward making this task easier and
less time consuming.
Probably the biggest operational need is the ability to recover without having to
restore. That is, you don’t want to have to restore an entire Exchange database—
or worse, an entire server—just to dig up a single e-mail message. You want a
data protection solution that can mount data stores and file systems directly
from its backup archive without the need to put that data somewhere first, such
as on a restored server. If you can just pop open some kind of management
user interface, browse to the data you want, and then copy the single desired
item to a file share or whatever, you’ve achieved the lowest-impact way of
getting the job done.
You also need the ability to get a version other than the most recent version of
the backed-up data. Remember, the industry is sort of trending in the direction
of a continuous data protection scheme of some kind, meaning data is backed
up as it is changed. If some file is improperly changed, it still gets backed up. You
don’t want that most recent version, though, so the data protection solution will
need to store, and let you retrieve, earlier versions. How long a period you retain
earlier versions is up to you: You might decide to keep a week, or a month, of
time-stamped recovery before rolling everything into a “monthly full backup
image” past which more granular retrieval isn’t possible. The point is that you get
to make the decision, not some programmer or software vendor.
12. Can you meet these needs?
Given both your business requirements and the identified operational needs,
how do the various data protection approaches stack up? Chapter 2 reviews
several approaches. The next chapter examines each approach’s ability to get
the job done.
.
About the Author
Greg Shields is a Senior Partner with
Concentrated Technology. With fifteen years
of IT experience, Greg is one of the world’s
leading experts on virtualization, cloud, and
systems management technologies. He
is a Contributing Editor and columnist for
Microsoft TechNet Magazine and Redmond
Magazine, has written over fourteen
books, and contributes regularly to online
publications like TechTarget and MCPMag.
com. He is also a highly sought-after and top-ranked speaker for both live
and recorded events, and is seen regularly at conferences like TechMentor,
the Microsoft Management Summit, Microsoft Tech Ed, VMworld,
Connections, among others. Greg is a multiple-year recipient of Microsoft’s
Most Valuable Professional (MVP) award and VMware’s vExpert award.
Chapter 3: Data Protection and Recovery Scenarios | 2012 Dell. All rights reserved..