2. Context and Concepts
Regions, Zones, and Single Points of Failure
Challenges and Trade-Offs
Architecting HA/DR Solutions with Cloudify
Case Studies
Resources and Q&A
AGENDA
Copyright 2014 Gigaspaces. All Rights Reserved
4. FOCUS OF THIS SESSION
Copyright 2014 Gigaspaces. All Rights Reserved
Application DR, Fault
isolation strategies,
deployment patterns
• Pacemaker
• Corosync
messaging
• HAProxy
• Galera MySQL
replication
HA/DR Layers
Power, air conditioning,
fire protection…etc
5. Fault Tolerance
Ability to withstand failure and operate
with normal or degraded performance
Redundancy and Replication
High Availability
“The nines” – 99.99% = 33mins/year
Minutes/Hours of uptime per year
Single Point of Failure
Part of a system that, if it fails, it will
bring down the entire system
CONCEPTS
High Availability
RTO
How much downtime are you willing to
tolerate?
RPO
How much data are you willing to lose ?
Cost
Development Effort
Redundant environments
Disaster Recovery
Copyright 2014 Gigaspaces. All Rights Reserved
6. Availability includes both planned and
unplanned outage
“Everything fails, all the time”
Cloud vendor SLA’s demand multi-zone outage
and deployments to be effective
CONCEPTS…IN THE REAL WORLD
High Availability Disaster Recovery
Copyright 2014 Gigaspaces. All Rights Reserved
+ + +
99.95% 99.90% 99.90% 99.99%
21 minutes 43 minutes 43 minutes 21 minutes
= 99.74% 112.3 minutes
Accomplishing high levels of redundancy in
the cloud is expensive
Determining an appropriate RPO and RTO is
ultimately a financial calculation
8. 8
CLOUD HIGH AVAILABILITY: MATURITY MODEL
Single server instance,
same data center
Same geographical
region
Same operational
procedures, provider
Single Points of
Failures
Copyright 2014 Gigaspaces. All Rights Reserved
9. 9
MULTI-ZONE ARCHITECTURE
Copyright 2014 Gigaspaces. All Rights Reserved
Physically separated data
centers within a region
Each availability zone
Independent power feeds from
separate substations
Redundant Power on each rack and
diverse cabling
Shared images, security groups,
and floating IPs
10. 10
MULTI-REGION ARCHITECTURE
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics
Geographically dispersed
architecture
Disaster Recovery Patterns
Replicate stateful tiers, orchestrate stateless
upon failure
Challenges
Data replication costs and
performance
Network flow
Orchestrating recovery
11. 11
MULTI-CLOUD ARCHITECTURE
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics
Leverages cloud economics
Workload migration
(“Own the base, rent the spike”)
Least single points of failure
Disaster Recovery Patterns
Replicate stateful tiers, orchestrate
stateless upon failure
Challenges
Bootstrapping data for stateful
services (snapshot or async
replication?)
Data replication challenges over
WAN
Complex setup
13. 13
DEPLOYMENT (ACCIDENTAL) COMPLEXITY
Consistent deployment
Cross zone configuration
Machine images, security groups, keys
Different API, zone/region hierarchies
Accidental Complexity: The higher we
move in the HA scale, the less manageable
the deployments become
Copyright 2014 Gigaspaces. All Rights Reserved
Replication in itself is useless, it’s the
recovery orchestration that counts
14. Compute, Storage Cost
Bandwidth Cost
COST OF REDUNDANCY
Cost
VM Startup time / Instance Acquisition
Latency/Bandwidth across regions
General performance (IOPS, SSD)
RTO/RPO Impacting
Copyright 2014 Gigaspaces. All Rights Reserved
http://www.slideshare.net/mingtemp/a-performance-study-on-the-vm-
startup-time-in-the-cloud
16. Cloudify provides the equivalent of
Amazon OpsWork on OpenStack
APP CENTRIC DEVOPS
http://appcatalog.cloudifysource.org/
Nova, Cinder,
NeutronHeat
OpenShift,
CloudFoundry
17. ORCHESTRATORS, RECIPES, AND “CLOUDS”
Existing Data Center OpenStack Private Cloud
Cloud Driver
OpenStack Public CloudOpenStack Micro Cloud
Cloud – a set of shared
compute, storage, network
resources behind an OpenStack
API, e.g.: resource in:
• Availability zone
• Region
• Public cloud
• HP Cloud US-West /
AZ1
• RackSpace Chicago
(ORD) region
• DevStack, Vagrant
• Recipe
Development & DR
testing
• Bare metal or
virtual
environment
18. 18
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Automation First
(operational processes)
• Decouple the Application from
the infrastructure
(design for failure)
• Use Plug-In approach to plug
the right cloud for the Job
(balance cost, complexity, testing)
• Aggressive monitoring across
the app stack
19. 19
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Automation First
(operational processes)
Provision
Install
Configure
Deploy
Monitor
Scale
https://github.com/CloudifySource/cloudify-recipes/
20. 20
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Decouple the Application
from the infrastructure
(design for failure)
Storage
Network
Cloud Templates
Compute
21. 21
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Use Plug-In approach to
plug the right cloud for
the Job
(balance cost, complexity, testing)
22. 22
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Aggressive monitoring
across the app stack
Scaling rules
Automatic
Failover
Scaling rules
25. 25
OPERATIONALLY CRITICAL: COLD DR
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics Design / Recipe Implementation
Financial Services customer,
post-trade processing
application
• Cold Disaster Recovery
(clone your recipe on
another cloud in case of
disaster)
• Recipes used for Disaster
Recovery planning trade-off
analysis
26. 26
BUSINESS CRITICAL: CROSS-REGION DR
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics Design / Recipe Implementation
Transportation/Logistics Big
Data / Realtime Analytics
• Autoscaling JBoss
• 4 services recipes deployed
across both regions
• Recipes orchestrate setup,
snapshot, and provisioning
of PostgreSQL, Cassandra
replication
• Federated data between
cloud controllers (failover,
polling, SQL master/slave
promotion)
27. 27
MISSION CRITICAL: IN-MEMORY WAN REPLICATION
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics Design / Recipe Implementation
Transportation/Logistics
• Replication as a Service
https://github.com/dfilppi/repl-service
• Low-latency asynchronous
replication across regions
using in-memory replication
technology (GigaSpaces
XAP)
• Topologies: Master-Slave,
Master-Master, Hub/Soke,
Ring
• Reference data, HTTP
session sharing
29. 29
TRY IT OUT TODAY
Copyright 2014 Gigaspaces. All Rights Reserved
Join the community
http://www.cloudifysource.org
https://github.com/CloudifySource/cloudify-recipes
Try out and contribute some recipes