Cloudifying High Availability and Disaster Recovery

Cloudifying High Availability
The Case for Elastic Disaster Recovery
Ali Hodroj
Senior Solutions Architect

 RTO (Recovery Time Objective)
 RPO (Recovery Point Objective)
 Cost
DISASTER RECOVERY, THE CONTEXT
Disaster Recovery Key Drivers
 Accomplishing high levels of redundancy is
expensive
 The hard cold reality for most businesses:
 The cost of losing 24 hours of data is less than
the cost of maintaining another active data
center.
 Determining an appropriate RPO and RTO is
ultimately a financial calculation:
 …at what point does the cost of data loss and
downtime exceed the cost of a backup strategy
that will prevent that level of data loss and
downtime?
Disaster Recovery Constraints
Copyright 2013 Gigaspaces. All Rights Reserved

COST OF AN RTO = 1 HOUR
Industry Cost/Hour
Finance (Brokerage Operations) $ 5.15 Million
Finance (Credit Card Authorizations) $ 3.10 Million
Telecom $ 2.00 Million
Manufacturing $ 1.60 Million
Online Retail $ 613,000
Communications (ISP) $ 90,000
Media (Ticket Sales) $ 90,000
Transportation $ 89,000
Transportation (Packaging and Shipping) $ 28,000
 Average cost per
hour of downtime
by industry
 Source: “The Meta
Group &
Contingency
Planning
Research”

COST OF TESTING DR PROCEDURES
In cloud computing, there’s even a third Time
Objective metric
TTO – Testing Time Objective: The Time required to
test recovery plans to ensure a successful failover in
case of disaster

 Deployable on any cloud any time
 Public, Private, Bare-metal
 Multi-zone, Multi-region out of the box
 Fail over and provision applications
 Automatically (polling master cloud)
 Ad-hoc: Shell command, REST API
 TTO
 Easily provision a test environment for
failover on a micro cloud (laptop)
 Test failure frequently, and test often
ensuring highest resiliency (similar to
Netflix)
WHAT IS ELASTIC ON-DEMAND DISASTER RECOVERY?
On-Demand
 RTO / RPO
 Easily configurable recipes to
increase/decrease RTO and RPO
 Cost
 Pay only for failures without
compromising RTO/RPO
 Leverage cloud economics (any cloud)
Elastic

7
CLOUD HIGH AVAILABILITY: MATURITY MODEL
Single server instance,
same data center
Same geographical
region
Same operational
procedures, provider
Single Points of
Failures

8
CLOUD HIGH AVAILABILITY: THE REALITY
Consistent deployment
Cross zone configuration
Machine images, security groups, keys
Different API, zone/region hierarchies
Accidental Complexity: The higher we
move in the HA scale, the less manageable
the deployments become

9
CLOUD HIGH AVAILABILITY: THROUGH CLOUDIFY
Across clouds
(AWS, Rackspace, Azure…etc)
Across AWS regions
Across AWS zones
1 application
+ overrides
Several cloud
drivers
1 application
+ overrides
1 cloud driver
1 application +
overrides
1 cloud driver
Availability
Same
application and
service recipe
Single recipe, deployable on-demand on any
data center, zone, region, or cloud

10
ON-DEMAND, ON ANY CLOUD

ELASTIC ON-DEMAND DISASTER RECOVERY
12
 Problem
 Can we eliminate the
RTO vs. Cost trade-off
in the cloud?
 Solution (Elastic DR)
 A hybrid between Hot
and Warm DR
 Switch to Active site
in matter of seconds
through cloud-
agnostic lifecycle
automation recipes

13
ELASTIC ON-DEMAND DISASTER RECOVERY: CONTEXT
Cold/Warm
Disaster
Recovery
Hot
Disaster
Recovery
High RTO
Low Cost
Low RTO
High Cost
Elastic DR
Recovery time objective (RTO)—The duration of time
and the service level to which a business process must
be restored after a disaster (or disruption) to avoid
unacceptable consequences associated with a break in
business continuity.
Applying the cloud principle of
Elasticity to Disaster Recovery

14
ELASTICITY VS CRITICALITY CONTINUUM
Cold/Warm
Disaster
Recovery
Hot
Disaster
Recovery
High RTO
Low Cost
Low RTO
High Cost
Elastic DR
Operationally
Critical
Business
Critical
Mission
Critical
XAP WAN Gateway

Solution
CASE STUDY: CLOUDIFY CUSTOMER
High
Availability
Data
Replication
Disaster
Recovery
Auto scaling
Self healing
Cross-zone, region, and cloud
redundancy
Automated lifecycle
management of PostgreSQL
+ Cassandra replication
Elastic Disaster Recovery
pattern
 Technology-based concrete
process control and information
service
 Deployments across North
America
 Bi-directional messaging and
data transfer from web-
UI, mobile devices
 NoSQL and Relational data
stores for reporting/analytics
 Lacking disaster recovery and
high availability aspects
Problem

SAMPLE (INITIAL) ARCHITECTURE
17
Availability region (US-West: Oregon)
Data Volume
Internet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
EC2 Instance
EC2 Instance
PostgresSQL
Cassandra
4 recipes

EXTENDED ARCHITECTURE: CLOUDIFY DR SCENARIO
18
Region (US-West Oregon)
App Servers
PostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
App Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure
occurs
Bootstrap another cloud in
a different region using the
same application recipe
used to bootstrap cloud #2
above*
Liveness poll
Liveness poll
Upon initial deployment, the primary deployment
of the application will be bootstrapped onto cloud
#1, another slightly modified application recipe
will be bootstrapped as cloud #2, polling cloud #1
for failure, and acting as a PostgresSQL db slave.
Turn Postgres slave into
master, Start app server
instances*

Copyright 2013 Gigaspaces. All Rights Reserved20
ELASTIC ON-DEMAND DR: COSTS*
Main Site (US-West) Warm DR Site (US-East) Hot DR Site
Cost $82,068 $12,625 $82,068
 Main Site
 1 Load balancer, 2 JBoss instances, 1 PostgreSQL master, 3 Cassandra
 DR Site
 1 PostgreSQL slave – All other instance start on demand upon failover
What if we deploy on different clouds?
*Costs calculated using http://planforcloud.com

ELASTIC DR: WARM DR COST, CLOUD PORTABILITY
4 recipes
DR Site
$12k
SameRecipe
$14k
$6k
$5k
$9k

ELASTIC DR: HOT DR COST
4 recipes
DR Site
$82k
SameRecipe
$79k
$115k
$68k
$91k

Demo – AWS DR across regions
http://www.youtube.com/watch?v=U-PdZe1g_yw
23

Cloudifying High Availability and Disaster Recovery

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie Cloudifying High Availability and Disaster Recovery

Ähnlich wie Cloudifying High Availability and Disaster Recovery (20)

Mehr von Ali Hodroj

Mehr von Ali Hodroj (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cloudifying High Availability and Disaster Recovery

Hinweis der Redaktion