Application-level Disaster Recovery on OpenStack

Cloudifying High Availability
Application-level Disaster Recovery on OpenStack
Ali Hodroj
Director, Solution Architecture

 Context and Concepts
 Regions, Zones, and Single Points of Failure
 Challenges and Trade-Offs
 Architecting HA/DR Solutions with Cloudify
 Case Studies
 Resources and Q&A
AGENDA
Copyright 2014 Gigaspaces. All Rights Reserved

FOCUS OF THIS SESSION
Application DR, Fault
isolation strategies,
deployment patterns
• Pacemaker
• Corosync
messaging
• HAProxy
• Galera MySQL
replication
HA/DR Layers
Power, air conditioning,
fire protection…etc

 Fault Tolerance
 Ability to withstand failure and operate
with normal or degraded performance
 Redundancy and Replication
 High Availability
 “The nines” – 99.99% = 33mins/year
 Minutes/Hours of uptime per year
 Single Point of Failure
 Part of a system that, if it fails, it will
bring down the entire system
CONCEPTS
High Availability
 RTO
 How much downtime are you willing to
tolerate?
 RPO
 How much data are you willing to lose ?
 Cost
 Development Effort
 Redundant environments
Disaster Recovery

 Availability includes both planned and
unplanned outage
 “Everything fails, all the time”
 Cloud vendor SLA’s demand multi-zone outage
and deployments to be effective
CONCEPTS…IN THE REAL WORLD
High Availability Disaster Recovery
+ + +
99.95% 99.90% 99.90% 99.99%
21 minutes 43 minutes 43 minutes 21 minutes
= 99.74% 112.3 minutes
 Accomplishing high levels of redundancy in
the cloud is expensive
 Determining an appropriate RPO and RTO is
ultimately a financial calculation

Regions, Zones, and Single Points of
Failure
7

8
CLOUD HIGH AVAILABILITY: MATURITY MODEL
Single server instance,
same data center
Same geographical
region
Same operational
procedures, provider
Single Points of
Failures

9
MULTI-ZONE ARCHITECTURE
 Physically separated data
centers within a region
 Each availability zone
 Independent power feeds from
separate substations
 Redundant Power on each rack and
diverse cabling
 Shared images, security groups,
and floating IPs

10
MULTI-REGION ARCHITECTURE
 Characteristics
 Geographically dispersed
architecture
 Disaster Recovery Patterns
 Replicate stateful tiers, orchestrate stateless
upon failure
 Challenges
 Data replication costs and
performance
 Network flow
 Orchestrating recovery

11
MULTI-CLOUD ARCHITECTURE
 Characteristics
 Leverages cloud economics
 Workload migration
(“Own the base, rent the spike”)
 Least single points of failure
 Disaster Recovery Patterns
 Replicate stateful tiers, orchestrate
stateless upon failure
 Challenges
 Bootstrapping data for stateful
services (snapshot or async
replication?)
 Data replication challenges over
WAN
 Complex setup

13
DEPLOYMENT (ACCIDENTAL) COMPLEXITY
Consistent deployment
Cross zone configuration
Machine images, security groups, keys
Different API, zone/region hierarchies
Accidental Complexity: The higher we
move in the HA scale, the less manageable
the deployments become
Replication in itself is useless, it’s the
recovery orchestration that counts

 Compute, Storage Cost
 Bandwidth Cost
COST OF REDUNDANCY
Cost
 VM Startup time / Instance Acquisition
 Latency/Bandwidth across regions
 General performance (IOPS, SSD)
RTO/RPO Impacting
http://www.slideshare.net/mingtemp/a-performance-study-on-the-vm-
startup-time-in-the-cloud

Architecting HA/DR Solutions with
Cloudify
15

Cloudify provides the equivalent of
Amazon OpsWork on OpenStack
APP CENTRIC DEVOPS
http://appcatalog.cloudifysource.org/
Nova, Cinder,
NeutronHeat
OpenShift,
CloudFoundry

ORCHESTRATORS, RECIPES, AND “CLOUDS”
Existing Data Center OpenStack Private Cloud
Cloud Driver
OpenStack Public CloudOpenStack Micro Cloud
Cloud – a set of shared
compute, storage, network
resources behind an OpenStack
API, e.g.: resource in:
• Availability zone
• Region
• Public cloud
• HP Cloud US-West /
AZ1
• RackSpace Chicago
(ORD) region
• DevStack, Vagrant
• Recipe
Development & DR
testing
• Bare metal or
virtual
environment

18
KEY PRINCIPLES
• Automation First
(operational processes)
• Decouple the Application from
the infrastructure
(design for failure)
• Use Plug-In approach to plug
the right cloud for the Job
(balance cost, complexity, testing)
• Aggressive monitoring across
the app stack

19
KEY PRINCIPLES
• Automation First
(operational processes)
Provision
Install
Configure
Deploy
Monitor
Scale
https://github.com/CloudifySource/cloudify-recipes/

20
KEY PRINCIPLES
• Decouple the Application
from the infrastructure
(design for failure)
Storage
Network
Cloud Templates
Compute

21
KEY PRINCIPLES
• Use Plug-In approach to
plug the right cloud for
the Job
(balance cost, complexity, testing)

22
KEY PRINCIPLES
• Aggressive monitoring
across the app stack
Scaling rules
Automatic
Failover
Scaling rules

Case Studies
(putting it all together)
23

24
DR ELASTICITY CONTINUUM
Cold/Warm
Disaster
Recovery
Hot
Disaster
Recovery
Higher RTO
Lower Cost
Lower RTO
Higher Cost
Operationally
Critical
Business
Critical
Mission
Critical

25
OPERATIONALLY CRITICAL: COLD DR
Characteristics Design / Recipe Implementation
Financial Services customer,
post-trade processing
application
• Cold Disaster Recovery
(clone your recipe on
another cloud in case of
disaster)
• Recipes used for Disaster
Recovery planning trade-off
analysis

26
BUSINESS CRITICAL: CROSS-REGION DR
Transportation/Logistics Big
Data / Realtime Analytics
• Autoscaling JBoss
• 4 services recipes deployed
across both regions
• Recipes orchestrate setup,
snapshot, and provisioning
of PostgreSQL, Cassandra
replication
• Federated data between
cloud controllers (failover,
polling, SQL master/slave
promotion)

27
MISSION CRITICAL: IN-MEMORY WAN REPLICATION
Transportation/Logistics
• Replication as a Service
https://github.com/dfilppi/repl-service
• Low-latency asynchronous
replication across regions
using in-memory replication
technology (GigaSpaces
XAP)
• Topologies: Master-Slave,
Master-Master, Hub/Soke,
Ring
• Reference data, HTTP
session sharing

29
TRY IT OUT TODAY
Join the community
http://www.cloudifysource.org
https://github.com/CloudifySource/cloudify-recipes
Try out and contribute some recipes

Application-level Disaster Recovery on OpenStack

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Application-level Disaster Recovery on OpenStack

Ähnlich wie Application-level Disaster Recovery on OpenStack (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Application-level Disaster Recovery on OpenStack