Aws 201:Advanced Breakout Track on HA and DR

AWS 201 : Breakout Track Singapore
“Design for Failure”
HA and DR Best practices
Harish Ganesan
Co founder & CTO
8KMiles
www.twitter.com/harish11g
http://www.linkedin.com/in/harishganesan

Agenda

• Explain HA Architecture with Real Customer
Case
• Understand how to Architect a web app in AWS
with
– Highly Availability
– DR
– Scalability
• Why AWS ?

About the Customer

• Online ecommerce company
• NASDAQ Listed
• Application consumed by Online users , Mobile
and Web Services

Requirements

• High Availability on all tiers with No SPOF
• Auto Scalable and elastic infrastructure
• Ability to serve millions of requests per day
• Serve peak HTTP traffic of 8000+ reqs/sec
• Serve peak HTTPS traffic of 2500+ reqs/sec
• 65% of the business is done during holiday , so
no downtime is affordable
• Monitoring , Backup and deployment ease
• Optimal DR setup ( Cost vs RTO/RPO)

Technology and Tiers

• Multi tiered Linux, Apache, Java Web site on
AWS
• Data base tier using MySQL
• Cache Tier
• Integration tier with Queues and Background
programs
• HTTP and HTTPS protocol

What 8KMiles did ?

• Consulting : Architected the entire website infra
on AWS
• Implementation:
– Configured the Infra on AWS
– Developed custom DevOps scripts on AWS
• Supported during the Thanksgiving and Holiday
• Cloud Development Partner :
– Currently Reengineering the customer App to
leverage more AWS services

A simple LAMJ Architecture

1 Web/App Server interacts
US-EAST-1a
with MySQL for Queries
AWS Security Groups and Transactions

Integration
Web/App/Cache
Services
Server

MySQL DB

CloudWatch

What is the problem in this Architecture ?

A simple LAMJ Architecture

Single Point of Failure at
US-EAST-1a
multiple tiers
AWS Security Groups

Integration
Web/App/Cache
Services
Server

MySQL DB

CloudWatch

Not a Highly Available Architecture

How to avoid SPOF and build a robust
architecture ?

Step 1: Distribute the Application to
Multiple Tiers
1 Separate out the
US-EAST-1a
individual tiers into
AWS Security Groups separate EC2 instances

Integration
Web/App Server
Service tier

MySQL DB

CloudWatch

Step 2: Add Multiple Servers in each layer

1 Add Multiple EC2
US-EAST-1a
instances in every tier
AWS Security Groups

Integration
Web/App Server
Service tier

MySQL DB

CloudWatch

Building HA @ Load Balancing Tier

Load Balancing Tier

• Load Balancing Options
– ELB
– HAProxy
– Nginx

Why AWS ELB ?

• AWS ELB provides load balancing service with
thousands of EC2 servers behind them
• AWS ELB will automatically Scale up /down
the load balancing servers in backend
• The theoretical maximum response rate of
AWS ELB is limitless
• It can handle 20000+ concurrent requests
easily (RightScale Benchmark)
• AWS ELB works seamlessly with AWS Auto
Scaling

Why AWS ELB ?

• AWS ELB is integrated well with other AWS
• No maintenance
• Pay as you go

Load balancing Layer

Online / Web / Mobile 1 Simple Round Robin
Algorithm
AWS Elastic Load balancer

US-EAST-1a

AWS Security Groups 2 Health Checks , SSL
termination

3 ELB is a Highly Available
Web/App Server
Service with No SPOF

MySQL DB

High Availability @ Web/App tier

1 Add AWS Auto Scaling to
Web / App tier

US-EAST-1a

AWS Security Groups 2 Tie AWS Auto Scaling with
Web/App Server AWS ELB
S3 Puppet

Auto Scaling 3 Deploy the app using
Puppet
Integration
Service Tier

MySQL DB

Designing HA @ Web/App Tier

• AWS Auto Scaling will manage un Healthy EC2
instances
• AWS Auto Scaling will ensure minimum
number Web/App EC2 instances are always
running
• In event of failure , new instances will be
launched between 30-120 seconds
automatically
• ELB traffic is seamlessly attached to the Auto
Scaled EC2 instances

Designing HA @ Web/App Tier

• Deploy the application / patches in Auto Scaling
environment using Puppet / S3 scripts
• Choose the right EC2 instance Type
– Large ( Less CPU intensive , HEAP 5.5 GB RAM )
– High CPU Extra Large ( More CPU intensive , HEAP 5.5
GB RAM , Concurrent GC)
• Points to remember
– Do not store the Session in-memory of web/app server
– Rotate and move the log files to S3 periodically
– Move the Uploaded data files , images to S3 or
GlusterFS

What happens when US-EAST-1a AZ fails ?

Solution : Leverage AWS Multi-AZ architecture

1 Infrastructure is spread across
HTTP/S requests hit the Amazon Load Balancer
from the browser or mobile devices
Multi AZ’s of AWS inside a
Region
AZ: US-EAST-1a AZ: US-EAST-1b

AWS Security Groups 2 AWS Elastic Load balancer
Web/App EC2 Web/App EC2 directs requests to EC2
instances across Multiple AZ’s

Auto Scaling Auto Scaling
3 Amazon AutoScaling
automatically launches new
EC2 instances
across Multiple AZ’s

4 No Code Changes required to
leverage Multi-AZ

High Availability @ Web/App/DEX layer

• AZ’s are connected by Low Latency network
• AZ’s are insulated from failures in other
Availability Zones *
• AWS Auto Scaling can manage EC2 instances
across AZ’s
• AWS ELB can direct load to EC2 instances
across AZ’s
• AWS CloudWatch can monitor the EC2
instance availability across AZ

Database Tier

• Options
– MySQL Master- Slave replication
– MySQL ndbCluster
– RDS MySQL Master – Standby
– RDS MySQL Master – Standby + Read Replica’s

High Availability @ DB Layer
1 Read Replica’s launched
in Multiple AZ’s for HA
AWS Elastic Load Balancer

USA- EAST -1A USA- EAST -1B
AWS Security groups
2 RDS Standby will be
launched on different AZ
from the RDS master for
Web/App EC2 Web/App EC2
HA

3 Web/APP hosted on
Amazon EC2 will transact
S3
Read Read
with RDS master and
Replica Replica read from Read replica’s
RDS RDS
Master Standby

D

CloudWatch


• RDS Master and RDS Standby in Multiple AZ
for HA
• Read Replica’s in Multiple AZ for HA
• Offers No SPOF on AZ level
• Read Replica’s can be launched/terminated
without affecting the RDS Master availability
• In event of RDS master failure, RDS Standby
will be automatically promoted
• Promotion <180 seconds and no changes in
the application


• DB snapshots and MySQL Dumps facility
available
• Automatic full backups at configured
maintenance windows
• Point in time recovery till last minute
• Recovery might require App layer
configuration changes


• Points to remember
– RDS supports only MySQL innodb engine
– Give more memory to RDS Master
• Use Extra Large or High Memory instance types
– Keep your Read Replica’s and RDS Master with
same size
– Multiple Read Replica’s can be Load Balanced
using HAProxy LB

Use AWS Building blocks in your architecture

Use AWS Building blocks

• AWS Building blocks are in built with
– Inherent fault tolerance
– HA and scalability

• Following Building blocks were used
– S3 , CloudFront , Route 53 , CloudWatch , SNS ,
SQS , SES , ELB , EIP , EBS

Application Architecture in AWS
Browser / Web Services /
Mobile

Route 53
AWS CloudFront
Elastic Load balancer CDN

AZ: US-EAST-1a AZ: US-EAST-1b
AWS Simple
AWS Security Groups Email Service

Amazon EC2 Servers Amazon EC2 Servers C
L
O
U
D
W
ElastiCache
A
T AWS Simple
S3 C
Notification Service
(Alerts)
Read Slave Read Slave
1 2 H
DB Master DB Standby

Puppet SQS

How it is used in the Project ?

• ELB – Load Balancing
• Route 53 – DNS mappings , Algo- RR
• CloudFront - Assets , HTML , CSS , JS , Images
• S3 – Logs , Snapshots , Images
• CloudWatch – Monitor the CPU , ELB , RDS ,
Custom metrics
• SNS – System Alerts
• SES – Emails ( Password , activation , app alerts )
• EBS – EBS backed AMI for Web/app tier
• EIP – Elastic IP for Puppet server

What happens if the Entire AWS region is
affected ?

Solution : Design HA/DR across Regions

High Availability across AWS Regions

DR Web site is hosted in
AWS Tokyo

Main Web Site is hosted
in AWS Singapore region

DR / HA Options in AWS

No downtime Hot Active

In minutes Hot DR

> 1-2 hours Warm DR

> Few hours Cold DR

$ $$ $$$ $$$$

Cold DR

Passive
Active

AWS Tokyo
AWS Singapore Amazon
Route 53
ELB ELB

Web / App EC2 Web/App EC2 Web / App EC2
Web/App EC2

Database Layer
Database Layer

Master Standby Master Standby

Puppet
D
D
Sync DB Snaphsots /
Dumps every X hours
Sync

Cold DR

• When the primary is Down , entire Secondary site is
manually activated in Cold DR
• RTO > Few Hours to get the Secondary site up and
running
• RPO – Data loss is acceptable
• CloudFormation templates can be configured on
Primary and Secondary
• AMI’s , App and DB Data are synced periodically

Cold DR

• EIP Problem – Integration Services ( FTP ,
WebServices)
• Cost effective
• Most common

Warm DR

Passive
Active

AWS Tokyo
AWS Singapore
Amazon
Route 53 ELB
ELB

Web / App EC2 Web/App EC2 Web / App EC2
Web/App EC2

Database Layer Database Layer


Puppet Puppet
D
D
Asynchronous Replication of databases between AWS regions

Sync

Warm DR

• When the primary is Down , Secondary site is
manually activated in Warm DR
• RTO > 1 hours to get the Secondary site up and
running
• RPO – minimal Data loss is acceptable
Primary and Secondary site
• DB Data are replicated using Asynchronously
• Only DB and Puppet Servers are ready and running

Warm DR

• AMI’s, Application Patches and deployments are
managed through Puppet
• EIP Problem – Integration Services ( FTP , Web
Services)
• Costlier than Cold DR
• Recommended in many use cases

Hot DR

Passive
Active

AWS Singapore AWS Tokyo
Amazon
ELB Route 53 ELB

Web/App EC2 Web / App EC2 Web/App EC2 Web / App EC2



Puppet Puppet
D
D
Asynchronous Replication of databases between AWS regions

Sync

Hot DR

• When the primary is Down , Secondary site is
activated in Hot DR
• RTO > few minutes to get the Secondary site up and
running
• RPO – very minimal Data loss is acceptable
Primary and Secondary site
• All the tiers are in ready and running state in
secondary but not active with live transactions

Hot DR

• DB Data are replicated using Asynchronously
managed through Puppet
Services)
• Costlier than Warm DR
• Rare usage

Hot Active
Directional DNS / Traffic

Active Active

AWS Singapore AWS Tokyo
Amazon
ELB Route 53 ELB

Web/App EC2 Web / App EC2 Web/App EC2 Web / App EC2



Puppet Puppet
D
D
2- way Asynchronous Replication of databases between AWS regions

Sync

Hot Active-Active

• Both primary and Secondary site are active
• RTO > few seconds to direct the traffic from
primary to Secondary site
• RPO – negligible Data loss
• Managed DNS server will provide automatic
failover at DNS level in case of a outage at the
primary website location
• Transparent switch between websites hosted in
AWS Singapore and AWS Tokyo within <30-60
seconds during outage

Hot Active-Active

• Automatic Traffic diversion to nearest site location
• Managed/Directional DNS servers are globally
distributed and Highly Available Service
• Persistent Data are replicated using Asynchronously
(2-way)
managed through Distributed Puppet
Services)
• Use case specific

Hot Active-Active

• Website deployed in both regions can scale and
shrink according to load
• Cost effective for large server farm deployments
• Low latency achieved through traffic direction
• No customers are lost because of load or
availability problems . Ops are happy !!!

Hot Active-Active

• Technically complex and intricate setup
• Costlier to build and operate (Sophistication
comes at a cost)
• No Unified Infra Management currently for this
architecture
– Example : Directional DNS Console
– AWS Console
– Puppet Console

Summary

• Understood how to Architect HA on AWS for LAMJ
website case
• Understood AWS Building blocks for HA and fault
tolerance
• How to achieve High Availability across AWS
Availability Zones (AZ’s) ?
• How to achieve High Availability across AWS
regions ?

If you need help in architecting High Availability
solutions on AWS?

Leave it to the experts , we will
handle this

Cloud Architecture Consulting
Cloud Application Development
Cloud Migration & Implementation
Cloud Adoption Strategy

“Let's get the job done”

Q&A

“All you need is an idea and the cloud will execute it for you.” (Structure 2010 event)
- Dr Werner Vogels , CTO of Amazon on 8KMiles

Contact :

cloud@8KMiles.com

harish@8KMiles.com

www.twitter.com/harish11g

http://www.linkedin.com/in/harishganesan

Aws 201:Advanced Breakout Track on HA and DR

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Aws 201:Advanced Breakout Track on HA and DR

Ähnlich wie Aws 201:Advanced Breakout Track on HA and DR (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Aws 201:Advanced Breakout Track on HA and DR