Elastic DR: a solution architecture that aims to optimally balance cost and recovery time via three core principles that are germane the cloud world:
On-Demand: The disaster recovery cloud can be provisioned on any availability zone, region, or public/private cloud through Cloudify's cloud-agnostic bootstrapping mechanism.
Elastic: The ability to automatically provision resources in the recovery cloud in case of disaster while eliminating the need for idle resources in normal scenarios, thereby fully profiting from the pay-per-use pricing model of clouds.
Flexible RTO/RPO: The architecture can be easily extended from a warm DR to a hot DR pattern through enabling/disabling application recipes. This allows us to exploit economies of scale that the cloud provides by matching the number of recipes/tiers to provision (in the recovery cloud) against the recovery time/point objective for our disaster recovery strategy
2. RTO (Recovery Time Objective)
RPO (Recovery Point Objective)
Cost
DISASTER RECOVERY, THE CONTEXT
Disaster Recovery Key Drivers
Accomplishing high levels of redundancy is
expensive
The hard cold reality for most businesses:
The cost of losing 24 hours of data is less than
the cost of maintaining another active data
center.
Determining an appropriate RPO and RTO is
ultimately a financial calculation:
…at what point does the cost of data loss and
downtime exceed the cost of a backup strategy
that will prevent that level of data loss and
downtime?
Disaster Recovery Constraints
Copyright 2013 Gigaspaces. All Rights Reserved
3. COST OF AN RTO = 1 HOUR
Copyright 2013 Gigaspaces. All Rights Reserved
Industry Cost/Hour
Finance (Brokerage Operations) $ 5.15 Million
Finance (Credit Card Authorizations) $ 3.10 Million
Telecom $ 2.00 Million
Manufacturing $ 1.60 Million
Online Retail $ 613,000
Communications (ISP) $ 90,000
Media (Ticket Sales) $ 90,000
Transportation $ 89,000
Transportation (Packaging and Shipping) $ 28,000
Average cost per
hour of downtime
by industry
Source: “The Meta
Group &
Contingency
Planning
Research”
4. COST OF TESTING DR PROCEDURES
In cloud computing, there’s even a third Time
Objective metric
TTO – Testing Time Objective: The Time required to
test recovery plans to ensure a successful failover in
case of disaster
Copyright 2013 Gigaspaces. All Rights Reserved
5. Deployable on any cloud any time
Public, Private, Bare-metal
Multi-zone, Multi-region out of the box
Fail over and provision applications
Automatically (polling master cloud)
Ad-hoc: Shell command, REST API
TTO
Easily provision a test environment for
failover on a micro cloud (laptop)
Test failure frequently, and test often
ensuring highest resiliency (similar to
Netflix)
WHAT IS ELASTIC ON-DEMAND DISASTER RECOVERY?
On-Demand
RTO / RPO
Easily configurable recipes to
increase/decrease RTO and RPO
Cost
Pay only for failures without
compromising RTO/RPO
Leverage cloud economics (any cloud)
Elastic
Copyright 2013 Gigaspaces. All Rights Reserved
7. 7
CLOUD HIGH AVAILABILITY: MATURITY MODEL
Single server instance,
same data center
Same geographical
region
Same operational
procedures, provider
Single Points of
Failures
Copyright 2013 Gigaspaces. All Rights Reserved
8. 8
CLOUD HIGH AVAILABILITY: THE REALITY
Consistent deployment
Cross zone configuration
Machine images, security groups, keys
Different API, zone/region hierarchies
Accidental Complexity: The higher we
move in the HA scale, the less manageable
the deployments become
Copyright 2013 Gigaspaces. All Rights Reserved
9. 9
CLOUD HIGH AVAILABILITY: THROUGH CLOUDIFY
Across clouds
(AWS, Rackspace, Azure…etc)
Across AWS regions
Across AWS zones
1 application
+ overrides
Several cloud
drivers
1 application
+ overrides
1 cloud driver
1 application +
overrides
1 cloud driver
Availability
Same
application and
service recipe
Single recipe, deployable on-demand on any
data center, zone, region, or cloud
Copyright 2013 Gigaspaces. All Rights Reserved
12. ELASTIC ON-DEMAND DISASTER RECOVERY
12
Problem
Can we eliminate the
RTO vs. Cost trade-off
in the cloud?
Solution (Elastic DR)
A hybrid between Hot
and Warm DR
Switch to Active site
in matter of seconds
through cloud-
agnostic lifecycle
automation recipes
Copyright 2013 Gigaspaces. All Rights Reserved
13. 13
ELASTIC ON-DEMAND DISASTER RECOVERY: CONTEXT
Cold/Warm
Disaster
Recovery
Hot
Disaster
Recovery
High RTO
Low Cost
Low RTO
High Cost
Elastic DR
Recovery time objective (RTO)—The duration of time
and the service level to which a business process must
be restored after a disaster (or disruption) to avoid
unacceptable consequences associated with a break in
business continuity.
Applying the cloud principle of
Elasticity to Disaster Recovery
Copyright 2013 Gigaspaces. All Rights Reserved
14. 14
ELASTICITY VS CRITICALITY CONTINUUM
Cold/Warm
Disaster
Recovery
Hot
Disaster
Recovery
High RTO
Low Cost
Low RTO
High Cost
Elastic DR
Copyright 2013 Gigaspaces. All Rights Reserved
Operationally
Critical
Business
Critical
Mission
Critical
XAP WAN Gateway
16. Solution
CASE STUDY: CLOUDIFY CUSTOMER
High
Availability
Data
Replication
Disaster
Recovery
Auto scaling
Self healing
Cross-zone, region, and cloud
redundancy
Automated lifecycle
management of PostgreSQL
+ Cassandra replication
Elastic Disaster Recovery
pattern
Copyright 2013 Gigaspaces. All Rights Reserved
Technology-based concrete
process control and information
service
Deployments across North
America
Bi-directional messaging and
data transfer from web-
UI, mobile devices
NoSQL and Relational data
stores for reporting/analytics
Lacking disaster recovery and
high availability aspects
Problem
17. SAMPLE (INITIAL) ARCHITECTURE
17
Availability region (US-West: Oregon)
Data Volume
Internet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
EC2 Instance
EC2 Instance
PostgresSQL
Cassandra
4 recipes
18. EXTENDED ARCHITECTURE: CLOUDIFY DR SCENARIO
18
Region (US-West Oregon)
App Servers
PostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
App Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure
occurs
Bootstrap another cloud in
a different region using the
same application recipe
used to bootstrap cloud #2
above*
Liveness poll
Liveness poll
Upon initial deployment, the primary deployment
of the application will be bootstrapped onto cloud
#1, another slightly modified application recipe
will be bootstrapped as cloud #2, polling cloud #1
for failure, and acting as a PostgresSQL db slave.
Turn Postgres slave into
master, Start app server
instances*
Copyright 2013 Gigaspaces. All Rights Reserved
20. Copyright 2013 Gigaspaces. All Rights Reserved20
ELASTIC ON-DEMAND DR: COSTS*
Main Site (US-West) Warm DR Site (US-East) Hot DR Site
Cost $82,068 $12,625 $82,068
Main Site
1 Load balancer, 2 JBoss instances, 1 PostgreSQL master, 3 Cassandra
DR Site
1 PostgreSQL slave – All other instance start on demand upon failover
What if we deploy on different clouds?
*Costs calculated using http://planforcloud.com
21. Copyright 2013 Gigaspaces. All Rights Reserved21
ELASTIC DR: WARM DR COST, CLOUD PORTABILITY
4 recipes
DR Site
$12k
SameRecipe
$14k
$6k
$5k
$9k
22. Copyright 2013 Gigaspaces. All Rights Reserved22
ELASTIC DR: HOT DR COST
4 recipes
DR Site
$82k
SameRecipe
$79k
$115k
$68k
$91k
23. Demo – AWS DR across regions
http://www.youtube.com/watch?v=U-PdZe1g_yw
23
Hinweis der Redaktion
The tolerance for RTO and RTP varies from industry to industry. Financialinstitutions, for example, require services back online in minutes, ratherthan hours. Even more critically, healthcare providers require emergencyresponse immediately. Other industries can afford to be down 24 hourswithout access to IT. Organizations that cannot afford to lose more than asingle minute’s worth of transactional data must have strategies that includeclustering or high availability, where online data is captured realtime in boththe production and backup environments. Other organizations might fi nd thattape backup programs supply ample data protection.
Servers store a wide variety of data types from different applications. Server data can be classified by its impact on business operations: - Mission critical: producing revenue or customer-facing - Business critical: supporting cross-organization functions Operationally critical: important to individual departments