10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure

10ThingsYou Didn’t Know About Cloud
Platforms: Azure, GAE and AWS
Dr. Anna Liu, Dr. Hiroshi Wada, Kevin Lee
National ICT Australia

The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic

5
The Reality of Eventual Consistency in
Amazon SimpleDB
• The probability to read updated data in SimpleDB in US West
– An application reads data X (ms) after it has written data
• SimpleDB has two
read operations
– Eventual Consistent
Read
– Consistent Read
• This pattern is
consistent
regardless of the
time of day
Eventual ConsistentConsistent Read

6
Consistent vs. Eventual Consistent Read
• SimpleDB’s consistent read guarantees to read
updated data
• What is the cost you need to pay for consistency?
– RTT is same as that of eventual consistent read
– Monetary cost (usage fee) is exactly same as eventual
consistent read
 Trade-off is not clear! We suspect consistent read is
less scalable and slower under datacenter failures.
However, we’ve not observed any differences

7
Other Commercial NoSQL Databases
• Google App Engine
– Offers eventual consistent read and consistent read
– Behavior of eventual consistent read is completely
different from Amazon’s
– In GAE, both types of reads behave exactly same unless
data centers have a failure(s)
• Windows Azure
– Offers no options for read
– Always consistent

Limitations and Quotas
Limitations Quotas
Amazon
Web
Services
•Manually setup all
applications
•Maximum 5 GB per file in S3
•Maximum 5 seconds query
execution time in SimpleDB
•20 On-Demand or Reserved
Instances and 100 Spot Instances by
default
•1GB free outgoing bandwidth per
month in SimpleDB, S3 and EC2
Microsof
t
Windows
Azure
•2 deployments per service
(production and staging)
•.NET, PHP or Java
programming language
•Up to 50 GB for a SQL Azure
•20 concurrent small compute
instances or equivalent per month
•10 TB of total data transfers per
month
Google
App
Engine
•Java or Python programming
language
•Maximum 30 seconds for
each request
•1 MB for each Datastore
entity
•Maximum 2 GB per file in
Blobstore (per API call
manipulate <1MB)
•10 web applications per user
•43, 200, 000 requests per day
•1 GB (1, 046 GB maximum if billing
enabled) incoming/outgoing
bandwidth per day
•6.5 CPU-hours (1, 729 CPU-hours
maximum if billing enabled) per day

Performance Unpredictability in Cloud
• Performance unpredictability is one of the major
obstacles
– Performance variance of a MapReduce job for a 50-node
EC2 cluster and a 50-node local cluster
– Examples (time as performance
metric)
• Repeatability of results for
researchers
• Time critical tasks for enterprises

Benchmark Details
Metrics Measurements
Benchma
rk Tools
Instance Startup
elapsed time from the moment a request
for an instance is sent to the moment that
the requested instance is available.
CPU
a single score by executing various
concurrent integer and floating point
calculations
Ubench
Memory Speed
a single score by executing random memory
allocations as well as memory to memory
copying
Ubench
Disk I/O
sequential reads/writes and random reads
block I/O Bonnie++
Network Bandwidth bandwidth, delay jitter and diagram loss Iperf
S3 Access
uploading a 100 MB file from one unused
node of physical cluster at Saarland
University to a newly created bucket on S3

Benchmark Results in EC2
CPU
Memor
y
Sequen
tial
Read
Rando
m
Read
Networ
k
S3
Access
COV in
Physical
Cluster
0.1% 0.3% 0.6% 1.9% 0.2%
COV in
Small EC2 21% 8% 17% 9%
19% 54%
COV in
Large EC2 24% 10% 20% 13%
The COV of large instance is higher than the small. However,
both are at least by an order magnitude less stable than on a
physical cluster.
The COV of S3 Access may be influenced by other traffic on
the network, showing this experiment just for completeness.
Reference - Schad, Jo rg, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. 2010. Runtime Measurements in the Cloud: Observing, Analyzing, and̈
Reducing Variance. In Proceedings of the 36th international conference on Very large data bases. Vol. 3. 1. Singapore, Singapore: VLDB
Endowment.

Distributed Transactions in Cloud
• There is now a range of Cloud Database types
• NOSQL (Azure Table, GAE Datastore, Amazon SimpleDB...)
– Much more ‘shardable’ architecture; No joins, not full ACID support
• SQL (Azure SQL, Amazon RDS, Oracle on EC2...)
– Variable distributed transactional support compared to their traditional
RDBMS counterpart
• Experience with porting PetShop
• Challenge with porting the data access layer
– Some JDO interface not supported by App Engine, eg. ‘Join query’
– No distributed transaction support in Azure SQL atm
15

Pricing fluctuates over space and time
• On demand pricing (hourly, per GB, per ‘000 requests)
• Reserved instances (1 or 3 year term + unit cost)
• Spot pricing (typically cheaper in US-East!)
• Similar pricing schemes observed for GAE and Azure
17

Sticky Session Support
• Autoscaling alone does not guarantee that clients of the
same session will always contact the same instance
• Clients cannot perform a series of connected operations
• Amazon ELB supports Session Affinity
– Session affinity allows mapping to be created at the ELB
– Limitations
• Session affinity cannot handle HTTPS
• Autoscaling down an instance with a live session
• MS Azure advocates stateless sessions
– If you must – store session state in eg table storage
• Design issue - Server to remember conversation context? Or
for client to remind it every time? How long should it ‘stick’?
Too long: compromise server ability to distribute load

Infrastructure Configuration
(VPN, VMs, Disk, …)
Infrastructure Configuration
(VPN, VMs, Disk, …)
OS/ApplicationSecurity
(e.g.,ActiveDirectory)
OS/ApplicationSecurity
(e.g.,ActiveDirectory)
OS/Middleware Installation/ConfigurationOS/Middleware Installation/Configuration
OS
Patching
OS
Patching
Application Installation/ConfigurationApplication Installation/Configuration
Application
Patching
Application
Patching
Billing
(CostCenterCharging)
Billing
(CostCenterCharging)
AntivirusAntivirus OS
Backup
OS
Backup
OS
Monitoring
OS
Monitoring
App Data
Backup
App Data
Backup
Application
Monitoring
Application
Monitoring
Amazon EC2
(IaaS providers)
Infrastructure
Monitoring
(CPU, Disk, Net, …)
Infrastructure
Monitoring
(CPU, Disk, Net, …)
Usage Report
and
Basic Billing
Usage Report
and
Basic Billing
Access Control
to IaaS
Access Control
to IaaS
Customers’ Responsibility in IaaS Cloud
Customers’
Responsibility

Secure Connection to the Cloud
23

Performance Implications
• Low Security Option – max throughput 5.6MB/sec
• High Security Option - connection throughput is 4MB/sec
– Performance hit due to encryption, decryption and firewall
• Other interesting observations:
– VPC only available US East-1 and EU-west1
– in single availability zone only
– S3 not working well with VPC yet (very slow), EBS is a workaround
– MS Azure VPN support next year
– Google Secure Connector

Time to Getting a New Instance
• Typically takes minutes to create an instance from its image
on EC2
• Trick to “create” instances quicker
– Create a pool of instances in advance, and stop (hibernate) them all
• Pay no instance cost but need to pay for storage cost (for stopped
instances)
– Revive stopped instances if new instances are needed
Operating
System
Method Time
Windows Create from image 10-15 minutes
Linux Create from image 5-10 minutes
Windows Revive stopped instance 30 seconds
Linux Revive stopped instance 30 seconds

Autoscaling is Not All Magic
• Amazon EC2
“… your application can automatically scale itself up and down depending on its
needs.”
• Windows Azure
“Optimizd for scale-out applications-designed so that developers can easily build
scale-out applications…”
• Google App Engine
“No matter how many users you have or how much data your application stores,
App Engine can scale to meet your needs”

Autoscaling is Not All Magical (contd)
Provider How to Scale? Limitations
Amazon EC2 • Load balancing with Elastic Load
Balancer (ELB)
• Event processing with Autoscaling API
• Monitoring through CloudWatch
• Load balancer is the bottle-neck,
hence limited throughput
• Limited load balancing options (e.g.,
no hardware load balancer)
• Limited rule support (e.g. no
conjunctions allowed in rules)
• Limited monitoring support (e.g.
limited to minute granularity)
Windows
Azure
• Load balancing with Azure Queue
Storage
• Event processing with WF rules engine
• Monitoring through Azure Diagnostics
• Create/Delete instances with
Management API
• Throughput limited by Azure Queue
• Limited monitoring support (e.g.
billing information not monitored)
Google App
Engine
• Built-in with App Engine • No control over how it scales
• Number of simultaneous sessions
limited by per-minute (burst) quota
(500 requests per sec by default),
server request time-out (30 secs), etc.

Getting Involved
• Linkage with National ICT Australia
•Contract Research, Expert Advisory Services,
Architecture Reviews
•Public and In-house Training Courses
•Market Surveys, Case Studies
•Professional in Research Residence
Anna.Liu@nicta.com.au, @annaliu
http://blogs.unsw.edu.au/annaliu/

Virtual Machine ‘Stolen Time’
• Using traditional system resource monitoring tools in cloud
– Measuring system performance within a virtual instance (using tools
such as vmstat and top) can give misleading information
– Example: An EC2 instance (e.g. m1.small with 1 EC2 compute unit)
does not go above around 40% CPU load as observed from vmstat
• Certain percentage (around 50-60%) appears on vmstat as ‘st’
“st – Time stolen from a virtual machine” (from vmstat manpage)
• Does it mean I am not getting what I paid for? No, not really
– Amazon instances are measured by EC2 compute units
– “One EC2 compute Unit provides the equivalent CPU capacity of a 1.0-
1.2GHz 2007 Opteron or 2007 Xeon process”
• Monitoring system performance in cloud
– Use Cloud monitoring tools such as CloudWatch and RightScale

Limitation of Virtual Private Cloud (VPC)
• VPC hosts are logically detached from (but physically
attached to) the Amazon network
– No direct connection to and from S3 via the Amazon local network
– Connection via internet only
• What happen if we need to transfer data from S3 to a VPC
host?
– E.g. If we ship a removable media to Amazon, it would be uploaded to
S3. How do we transfer the data to a VPC host?
– Option 1: Direct transfer from S3 to VPC host
• Traffic routes through the remote side and comes back (High latency)
– Option 2: Transfer to EBS and mount EBS to VPC host
• Traffic routes through local network (Low latency)

35
How Long You Need to Wait to Get Updated
with Eventual Consistent Read?
• Result of the “5 minutes run” for one week
• t1: the first time to
read updated data
• t2: the first time to
reach 100% of
reading updated
• t3: the last time to
read stale data
 Mostly updated
after 600ms but no
guarantee

10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie 10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure

Ähnlich wie 10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure

Hinweis der Redaktion