This presentation talks about how you can optimize your Application Architecture on AWS Cloud and create a Fault Tolerant Architecture that will have Zero Down Time! The best practices for a fault tolerant Web Applicaiton.
4. How Often Do You See This?
Blazeclan
4
Cloud IT Better
5. Cost of Downtime
A report published in 2010 for top
412 eCommerce sites says,
• The median length of downtime was 840
minutes
• On average, each of them saw 3291 minutes
of downtime
Lost Revenue
• On average, each of them lost $800,099 in
revenue due to downtime
• The total amount of revenue lost due to
downtime
of
all
was $329,640,928!
Blazeclan
412
companies
5
Cloud IT Better
6. Online Business & Downtime Facts
The Average Hourly Loss because
of Data Center Down Time in 2012
Source: http://www.techrepublic.com/blog/data-center/infographic-the-outrageous-costs-of-data-center-downtime
Blazeclan
6
Cloud IT Better
7. How to Build a HIGHLY
AVAILABLE, SCALABLE,
DURABLE AND
RESILIENT Web Application
Blazeclan
7
Cloud IT Better
8. High Availability
99.999%
• Up Time of an Application
uptime
• Planned or Unplanned Outage or Downtime
• Offline, Unreachable, or Partially Available
• Slow to Use
• Goal
• No Downtime
• Always Available
Blazeclan
8
Cloud IT Better
9. Scalability
Ability of an
Application to
accommodate
change in traffic
without
architectural
changes
Availability may
be impacted if
application
cannot Scale
Resources
Demand
Scalability
doesn’t
Guarantee
Availability
Blazeclan
Time
9
Cloud IT Better
10. Fault Tolerance
X
• Built-in Redundancy so
applications can Continue
Functioning when Components
fail
X
• Fault tolerance is crucial to
High Availability
Image courtesy: Gigamone.com
Blazeclan
10
Cloud IT Better
12. AWS democratizes High Availability
• Multiple Servers
• Isolated Redundant Data
Centers
• Regions across the
Globe
• Availability Zones within
Source: http://aws.amazon.com/about-aws/globalinfrastructure/#reglink-sa
Regions
Blazeclan
12
Cloud IT Better
17. Everything fails, all the time
– Werner Vogels, CTO, Amazon
Avoid
single
points of
failure
Application
Should
Continue to
Function
Assume
everything
fails, and
work
backwards
Obama’s Prized Limo after it
broke down in his Israel visit!
Blazeclan
17
Avoid Impact on
Business
Cloud IT Better
18. Ask Questions for Right Architecture
What kind of
Scenarios do I
have to
plan for?
What are my
single points
of failure?
If there are
master and slaves
In your architecture,
what if the master
node fails?
Blazeclan
If a load balancer
is sitting in front
of an array of application
servers, what if
that load
balancer fails?
What happens
if a node in your
system fails?
18
Cloud IT Better
19. Lots of Questions
How do you recognize
that failure?
How do I replace that node?
What if the cache keys grow beyond
memory limit of an instance?
How does the failover occur &
how is a new slave instantiated &
brought into sync with the master?
What if downstream service
times out or returns an exception?
Blazeclan
19
Cloud IT Better
20. Build Mechanisms to Handle Failure
• Build process threads that resume on reboot
• Allow the state of the system to re-sync
by reloading messages from queues
• Keep pre-configured and pre-optimized
virtual images to support above point
on launch/boot
• Avoid in-memory sessions or stateful
user context, move that to data stores
Image courtesy: http://www.outsmarthormones.com/wp-content/uploads/2011/06/Fix.jpg
• Have a coherent backup and restore
strategy for your data and automate it
Blazeclan
20
Cloud IT Better
23. Auto Scaling
• Enables to automatically scale
Amazon EC2 capacity up or down
• Enables to terminate Server
Instances at will
• Enables to add more instances
in response to an increasing load
• Enables launch of a replacement
Image Courtesy: http://www.knovelblogs.com/wp-content/uploads
instance immediately, in case of a failure
• Enables application to transition
seamlessly in case the primary server fails
Blazeclan
23
Cloud IT Better
24. Elastic Load Balancing (ELB)
• Distributes incoming traffic to a
application across several Amazon
EC2 instances
• ELB is given a DNS host name &
Requests Sent to this host name
are Delegated to a pool
of Amazon EC2 instances
• ELB Detects Unhealthy Instances
within its pool of Amazon EC2 instances and automatically
reroutes traffic to healthy instances, until the unhealthy
instances have been restored
Blazeclan
24
Cloud IT Better
25. ELB & Auto Scaling
• Auto Scaling & ELB are
an ideal combination
• ELB gives a single DNS
name for addressing
• Auto
Scaling ensures
there is always the right
number
of
healthy
Amazon EC2 instances to
accept requests
Blazeclan
25
Cloud IT Better
27. Fault Tolerance
• In order to build fault-tolerant
applications on Amazon EC2,
it’s important to follow best
practices such as,
• Quickly being able to commission
replacement instances
• Using Amazon EBS for persistent
storage
• Use Multiple Availability Zones and
elastic IP addresses.
Blazeclan
27
Cloud IT Better
29. Multi-AZ Design Considerations
• Achieve greater Fault Tolerance
by Distributing your application geographically
• The Amazon EC2 service level
agreement commitment is 99.95%
availability for each Amazon EC2 Region
• Deploy application that spans
across multiple Availability Zones
• Redundant instances for each tier of an
Image Courtesy: http://chriscampcommunications.blogspot.in
application could be placed in distinct Availability Zones
• ELB can automatically balance traffic across multiple instances &
multiple Availability Zones
Blazeclan
29
Cloud IT Better
32. Loose Coupled Systems
• Loosely coupled systems are
more fault tolerant and can achieve
a bigger scale
• Loosely coupled systems on AWS
• De-coupling systems allows for hybrid models
(in-cloud + in-physical data center)
• Balancing between clusters enables easier scaling
• Using queues (Amazon SQS) buffers against failures
• Design for a jumble of black boxes
Blazeclan
32
Cloud IT Better
34. Loose Coupling - Best Practices on AWS
• Use Amazon SQS to isolate components
• Use Amazon SQS as buffers between components
• Design every component such that it expose a service
interface and is responsible for its own scalability and
interacts with other components asynchronously
• Bundle the logical construct of a component
into an Amazon Machine Image so that it can
be deployed more often
• Make your applications as stateless as
possible. Store session state outside of component
(in Amazon SimpleDB, if appropriate)
Blazeclan
34
Cloud IT Better
39. Design for High Availability & Scale
Don’t let this happen to your Business
Our AWS Expert Solution Architects can help
you review your Architecture.
Avail for our 2hr Free Consultancy!
For any assistance please contact us at
info@blazeclan.com
Blazeclan
39
Cloud IT Better