3. 3
#
RightScale Basics
• World’s #1 cloud management system
• Managing cloud deployments for 4.5 Years – globally
• More than 45,000 users; over 3MM servers launched
• HQ’d in Santa Barbara, CA
4. 4
#
Who Uses RightScale?
Powering the largest production deployments on the cloud
9. 9
#
So what’s the problem?
• Paying attention to design
• Using cloud means designing for cloud
• Accidental tourist in cloud
• Sign up, launch servers, sit back
10. 10
#
U5lity
What
End
Users
See
Compu5ng
What
IT/Service
Chargeback/Billing
Runbook/Process
Automa5on
Providers
See
Applica5on
Lifecycle
Management
Workload
Management
Configura5on
Management
Applica5on
Streaming
Automa5on
OS
Provisioning
Virtualiza5on
GRID/HPC/clusters
11. 11#
U5lity
What
End
Users
See
Compu5ng
Chargeback/Billing
What
IT/Service
Runbook/Process
Automa5on
Applica5on
Lifecycle
Management
Workload
Management
Providers
See
…
Configura5on
Management
Applica5on
Streaming
Automa5on
OS
Provisioning
COMPLEXITY!
Virtualiza5on
GRID/HPC/clusters
12. 12
#
So what’s the problem?
Top roadblocks to cloud
13. 13
#
So what’s the problem?
Top roadblocks to cloud
14. 14
#
The “Do it Yourself” Trap
• Most clouds are a set of APIs
and simple UI to launch
servers – is that what you need?
• These are basic building blocks, not a management system
• How should you spend your time?
• Managing multiple users with different levels of access?
• Configuration management and app lifecycle management?
• Track usage and costs across applications and business units?
• All the general purpose things you need to do like: Monitoring, Alarms,
Auto-scaling, etc.?
15. 15
#
The Reasons for Cloud Management
ü Cloud-ready solutions
ü Automation
ü Governance & Control
ü Multi-zone & Multi-region
• A good cloud management platform delivers:
1. Ability to on your core competency and NOT the complexity and
tedious work of managing infrastructure
2. Agility to IT and business
3. ROI almost immediately
21. 21
#
ServerTemplates can be used for Single
Servers or Complete Systems
22. 22
#
What Success Looks Like:
" Clone system blueprint
" Change respective input variables (e.g., application code)
" Re-launch in current or new cloud
On-demand, repeatable process, predictable systems
26. 26
#
Automation
Operational efficiency
Site Architectural
Software Management Engineering
Development Total Total
Total 7% 8%
7%
Requests Total
6%
Deployment
Overhead Total Management
11% Total
31%
Incident
Problem Management
Engineering Total
Total 20% Source: Deepak Patil, GFS
10%
27. 27
#
Automation
Operational efficiency
Site Architectural
Software Management Engineering
Development Total Total
• Server-to-admin ratio is an indicator of admin costs
Total 7% 8%
• 7%
Inefficientoperations as low as 20:1
• Above average ratio 150:1 (enterprises typically in the 70 to 140 range)
Requests Total
• Best 6%
practices over 2,000:1
• Savings on admin costs of easily 50%
Deployment
Overhead Total Management
11% Total
31%
Incident
Problem Management
Engineering Total
Total 20% Source: Deepak Patil, GFS
10%
30. 30
#
Governance & Control
• Monitoring
• Logs + audit entries
• Alerts & escalations
• Last access
• User management
• Authentication, SSO
• Roles, permissions
• Umbrella accounts
• Accountability & Billing
• Single billing
• Cost tracking & quotas
• Real-time run rate projects
31. 31
#
Infrastructure Audits
• Review of all Security Groups or
SSH Keys
• Security Group audit analyzes
public ports, or all ports
• SSH Key audit analyzes running
servers, or all servers
• Up to 10 audits can be stored,
with one being marked as a
baseline for comparison
• Audits can be downloaded as
text or JSON files
36. 36
#
Multi-Cloud Pain Points
• APIs differ
• Different sets of resources
• Different formats, encodings and versions
• Abstractions and features differ
• Network architectures differ: VLANs, security groups, NAT, IPs, ACLs, …
• Storage architectures differ: local/attachable disks, backup, snapshots, …
• Hypervisors, machine images…cost models, billing, reporting…etc.
• They are truly different beasts, with different semantics
37. 37
#
General HA Best Practices
ü Avoid single points of failure
ü Always place (at least) one of each component (load balancers,
app servers, databases) in at least two AZs
ü Maintain sufficient capacity to absorb AZ / cloud failures
ü Reserved Instances – guarantee capacity is available in a separate region/
cloud
ü Replicate data across AZs and backup or replicate across
clouds/regions for failover
ü Setup monitoring, alerts and operations to identify and automate
problem resolution or failover process
ü Design stateless applications for resilience to reboot / relaunch
38. 38
#
Multi-Availability Zone
Consider distributed NoSQL databases
with the same distribution considerations.
Spread primary and replica nodes across
multiple AZs. Place as many as you need
for required resiliency.
Consider local storage for additional
slave database to remove
Snapshot EBS volume for Place Slave databases in one
dependency on attached volume
backups so the database can or more AZs for failover.
(Use LVM snapshots to create
be readily recovered within backups)
the region.
39. 39
#
Multi-Cloud Cold / Warm / Hot DR Options
No Downtime Multi-Cloud HA
(Live/Live Config)
> 5 Minutes Hot DR
(Least Common)
> 1 Hour Warm DR
(Recommended)
> Few Hours Cold DR
(Most Common)
$ $$ $$$ $$$$
40. 40
#
Multi-Cloud Cold DR
Staged Server Configuration and generally no staged data
Data Copy Mechanism
• Not recommended if rapid recovery is required
• Slow to replicate data to other cloud
• Slow to bring database to an operational state
41. 41
#
Multi-Cloud Warm DR
Staged Server Configuration, pre-staged data and running Slave Database Server
• Generally recommended DR solution
• Minimal additional cost
• Allows fairly rapid recovery
42. 42
#
Multi-Cloud Hot DR
Parallel Deployment with all servers running but all traffic going to primary
• Not recommended
• Very high additional cost
• Allows rapid recovery, but not significantly faster than “warm” configuration
43. 43
#
Multi-Cloud HA
Live/Live configuration. May use Geo-target IP services to direct traffic to regional
load balancers.
• Possible, but not recommended (more to follow…)
• Maximum additional cost
• Provides high availability, but complex to implement and manage
44. 44
#
Multi-Cloud HA
Multi-Cloud looks similar to Multi-AZ… but there are additional problems to solve as some resources
are not shared across clouds
45. 45
#
Multi-Cloud HA
Multi-Cloud looks similar to Multi-AZ… but there are additional problems to solve as some resources
are not shared across clouds
You need DNS management
or a global load balancer.
46. 46
#
Multi-Cloud HA
Multi-Cloud looks similar to Multi-AZ… but there are additional problems to solve as some resources
are not shared across clouds
You need DNS management
or a global load balancer.
Machine Images are specific to
the cloud/region.
47. 47
#
Multi-Cloud HA
Multi-Cloud looks similar to Multi-AZ… but there are additional problems to solve as some resources
are not shared across clouds
You need DNS management
or a global load balancer.
You need to copy or replicate data
Machine Images are specific to yourself as snapshots are specific to the
the cloud/region. source Region. Data migration requires
manual synchronization or taking LVM
snapshots and transferring the data.
48. 48
#
Multi-Cloud HA
Multi-Cloud looks similar to Multi-AZ… but there are additional problems to solve as some resources
are not shared across clouds
Security is an issue as security
You need DNS management groups are Region-specific.
or a global load balancer.
You need to copy or replicate data
Machine Images are specific to yourself as snapshots are specific to the
the cloud/region. source Region. Data migration requires
manual synchronization or taking LVM
snapshots and transferring the data.
49. 49
#
Delivering Workload Deployment Freedom
Applica5on
Requirements
Resource
PorKolio
Filter
Pools
App
1
Performance
App
1
Public
Cloud
App
2
Cost
App
2
App
3
Security
App
3
Private
Cloud
1
… Compliance
…
Reliability
Private
Cloud
2
App
N
App
N
It’s
about
using
mulQple
resource
pools,
not
choosing
one.
50. 50
#
What Success Looks Like:
“This ability to invoke and coordinate both private and public clouds
is the hidden jewel of Zynga’s success”
- Lessons from Farmville: How Zynga Uses the Cloud, InformationWeek, May 13, 2011
Read Zynga’s blog at:
http://code.zynga.com/
51. Thank You!
Contact RightScale in India:
Sher vin Chua – shervin@rightscale.com
Business Development & Sales, APAC