A solid continuity solution begins with well-defined goals and success criteria. The technology to get the job done falls into place after that, but usually requires a blend of tools and techniques to get there. This is the presentation I delivered at the 2015 SIM event in Hartford, CT
2. The Dream
“Send it to the cloud!”
• Recover to someone
else’s infrastructure
• Only pay for what you use,
when you use it!
3. The Harsh Reality
• Operational Impact
o Who will be providing the daily
care and feeding of the DR
solution?
• Consistency
o Configuration Drift
o Does your IT staff need to now
make changes in 2 places for
every move in production?
• Costs
o Is the solution a Ferrari? Ford?
Fiero?
o What does it actually cost to
declare? Do you know?
4. The Challenge:
Complexity
Lots of VMs…
Lots of storage…
Intricate WAN
topology…
Physical appliances…
Mainframes…
Users need access to
input data…
5. Goals for this Talk
Learn common BCP pitfalls
Strategies to focus your continuity
planning
Streamline your needs analysis
Map tools and techniques to meet those
needs best
6. Why listen to me?
• Solutions Architect at TierPoint
• I design managed DR and IaaS solutions every day
• I talk to all walks of IT, across many verticals
o People who have implemented their own solutions
o What’s working? What’s not?
o Where’s the mind share?
8. 6 Common Blunders
1. Picking a product or solution before defining
requirements fully
2. Designing and pricing solutions before talking to your
internal stakeholders
3. Designing for the “check-box” rather than function
4. Neglecting to test regularly
5. Neglecting to test fail-back
6. Settling for less because of past experiences
9. Boring….
“I just spun up 10 AWS
instances while you talked.
I’m in the cloud already!”
• You feel like you’ve just
made PROGRESS…
• …but how does it address
your project’s
Success Criteria?
16. What are your priorities?
How up to date must the data be? (RPO)
How quickly must it be made available? (RTO)
Can employees/clients access the data?
How near or far must this be from [critical site] ?
What operational overhead does this create for my team?
How does the cost of DR align with changes to production? Is it
predictable?
Can I live with a subset of production running at declaration
time?
Does DR need to have full performance capabilities of
production?
22. Solution Approach
1. Easy Wins – native replication or clustering
2. RTOs & RPOs
3. VMs – what can be protected as a portable unit?
4. Physicals
1. Are any pending virtualization?
2. Does it support recovering into a VM?
3. Does it require a matching physical system?
5. Networking
1. What carriers are required?
2. What does the data change rate look like?
6. “Where are the eyeballs?”
1. Is workspace needed to ensure consistent data flow?
7. Security
1. What frameworks must be adhered to?
23. RTO/RPO Technology
Drivers
Configure
hardware (p or v)
Install
OS
Configure
OS
Install
backup
agent
Start recovery
Restore
VM
Power
on VM
Backup &
Recovery
24+ hr RTO; 24 hr RPO
< 1-4 hr RTO; ~0-15min RPO
Replication
Load
Balanced
High
Availability ~0 hr RTO, ~0 hr RPO
25. DR Responsibility Matrix
Process Client Provider
DR Declaration Primary
Replication Health Primary
Backup Success Primary
DR Testing Primary Assist
DR Runbook Assist Primary
DR Infrastructure Primary
27. Hybrid Cloud DR
Infrastructure Continuity Approach
• Colocation
• Cloud
o Private
o Public
• Managed Hosting
• Workspace Recovery
• High-Availability
• Replication
o VM
o OS
o SAN
• Backup and Recovery
28.
29. Summary
1. Focus first on your Applications and
People
2. Derive RTOs and RPOs
3. Collect your data
4. Map requirements to infrastructure
and technologies to best fit those
individual needs
Mitigate Risk with Hybrid Disaster Recovery in the Cloud
Pragmatic Business Continuity for Non-Trivial Environments
When I talk to people about Disaster Recovery solutions, it usually starts with a tenor similar to this:
The ideal sounds great, but it’s not that easy.
These are questions that you must ask and think about
Operational Impact
Is your team now doing double duty for the second environment?
Consistency
Is it a point-in-time solution or sustainable?
Costs
What does it cost to declare?
Predictable?
I’ll share strategies for quickly performing needs assessment & then mapping technologies and solutions to find the best comprehensive approach
Modern tools and techniques greatly enhance the capabilities for recovery, but it is rare that one-size-fits-all
You need to ensure your business can keep running under any scenario with the least amount of impact & friction possible.
I work as a Solutions Architect at TierPoint, a national cloud, managed services, colocation and DR service provider.
We’re the smaller, easy-to-do-business-with alternative to Terremark, Rackspace, Savvis, and the like.
I design managed DR and IaaS solutions every day
I talk to all walks of IT across multiple verticals
I hear what’s working, what’s not and can relay what I’m seeing and hearing to you!
Kick it off with some of the common missteps I see (and have done myself)
If you walk into your DR planning with a point product or tool as your guiding principle, you’re in for a tough time.
I’ve seen client have to go back to the drawing board because they havent consulted internally first to level-set. Security is a big one here.
Are you looking to do “good enough” to pass an audit? Or do you need this to actually WORK?
Testing takes 2 to tango. Like patching, it’s best to keep this as a scheduled routine (annually works for most)
When you do your DR tests, are you also testing the fail-back?
Don’t settle for less because replication was expensive when you last looked 5 years ago. Things have changed!
If they laugh here then they hate you
To create a functional and sustainable business continuity plan, you must start at the beginning with the basics
Timing: quick
Before you make any blunders you need to plan!
Timing: quick
In my experience, these are the key drivers in an IT Biz continuity plan
Your Applications
Drives IT infrastructure needs (network, compute, storage, security)
Consumed by employees and clients
People
“Where are the eyeballs”
RTOs & RPOs
Reflect on those key drivers, and then discuss with your internal stakeholders
Internal DR teams
Special interest groups
task forces
A business impact analysis (BIA) predicts the consequences of disruption of a business function and process and gathers information needed to develop recovery strategies
Resultant from the BIA, a set of requirements should emerge
RTOs & RPOs
Distance Requirements for DR
The BIA should identify the operational and financial impacts resulting from the disruption of business functions and processes. Impacts to consider include:
Lost sales and income
Delayed sales or income
Increased expenses (e.g., overtime labor, outsourcing, expediting costs, etc.)
Regulatory fines
Contractual penalties or loss of contractual bonuses
Customer dissatisfaction or defection
Delay of new business plans
You’ve talked internally
You’ve done the BIA
Now prioritize the details
Walk through the 8 points
(Halfway through preso)
Go quick
Planning Complete, time to start collecting the Data!
Spreadsheet of systems – cpu/ram/storage/p or v/OS/Apps/RTOs RPOs
Network Diagram – how does everything connect together?
Data flow Diagram – how does data flow between the systems?
Network Utilization – LAN and WAN utilization metrics
Backups – whats a nightly backup run look like? How much protected data?
Let’s take a look at how needs can map to solutions
Timing Quick
Final Step!
Timing: quick
Now that needs have been established, lets look at mapping to solutions
first, some applications are best served as live replicas, such as active directory. These take precedence
Next, look at the RTOs and RPOs for order of magnitude for the solution’s design.
VMs can offer portability and should be examined next
Physical – includes appliances, PBXes, DB servers, etc.
Networking – are we using an existing network? MPLS? Look at backup data in prod to understand the daily change rate and impact to WAN
Eyeballs – how do users interact with this environment? Do you need workspace to ensure local proximity?
Security – what frameworks must be adhered to? Is the org comfortable with shared resources?
The RTOs and RPOs will impact the technology selected by nature of design
While some technologies can fit other brackets, results may vary.
Talk to the different approaches for each layer
Workspace
App Repl and HA (SQL, AD, Exchange)
OS Repl (DoubleTake, Arcserv)
VM Replication (CBT tools like vSphere Repl, Veeam, Zerto)
SAN Replication (Recover Points/EMC, SnapMirror, etc)
Backup and Recovery – Commvault etc
An example of a DRaaS Resp. Matrix
What responsibilities for DR do you want?
What are we saying when we talk about “Hybrid Disaster Recovery?”
It can be a blend of infrastructures and continuity approaches
These should be RESULTANT from the foundational planning!
You may very well find your “hybrid” approach to look like this