[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Cloud auto-scaling with deadline and budget constraints
1. Ming Mao, Jie Li, Marty Humphrey
eScience Group
CS Department, University of Virginia
Grid 2010 – Oct 27, 2010
2. A fast growing computing platform
IDC - Cloud spending increases 27.4% a year to $56
billion (compared 5% a year of traditional IT)
$16.5 billion (2009) -> $55.5 billion (2014)
src: Worldwide and Regional Public IT Cloud Service 2010-2014 Forecast
Two most quoted benefits
Scalable computing and storage
Reduced cost
Concerns
Security, availability, cost management, integration
interoperability, etc.
3. Q1. Cost – the most important factor in
practice?
Rate the benefits commonly ascribed to the How important is it that Cloud service providers...
cloud on-demand model Offer competitive pricing 91.60%
Pay only for what you use 77.90% Offer Service Level Agreements 88.60%
Easy/fast to deply to end-users Option to move cloud offerings back on premise 87.80%
77.70%
Provide a complete solution 86.00%
Monthly payments 75.30%
Understand my business and industry 84.50%
Encourages standard systems 68.50%
Allow managing on-premise & cloud together 82.10%
Requires less in-house IT staff, costs 67.00% Support many of my IT needes 81.00%
Alwasys offers latest functionality 64.60% Offer both on-premise and public cloud services 79.20%
Sharing systems with partners simpler 63.90% Are a technology and business model innovator 78.30%
Seems like the way of future 54.00% Have local presence, can come to my offices 72.90%
0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00%
Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009 Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009
Q2. Moving into Cloud == Reduced Cost ?
4. Resource utilization information based triggers (e.g.
AWS auto-scaling, RightScale, enStratus, Scalr, etc)
5. Multiple instance types
Current billing models
Full hour billing
Non-ignorable instance acquisition time
7-15 min in Windows Azure
More specific performance goals
Budget awareness (e.g. dollars/month, dollars/job)
6. Cloud
Deadline Users
Application
(Job finish time)
Cloud Server
Cost
Job
Problem Statement – how to enable cloud
applications to finish all the submitted jobs
before user specified deadline with as little
money as possible using auto-scaling.
7. Workload are non-dependent jobs submitted
in the job queue
FCFS manner and fairly distributed
Different classes of jobs
Same performance goal (e.g.1 hour deadline)
VM instances take time to startup
9. Workload
W (J j , nj )
Computing Power of Instance I i
D nj
P (J j , ) Running Instance
i
j
t j ,type ( Ii ) n j
( D (dtype ( Ii ) si )) n j
P (J j , )
j t j ,type ( Ii ) n j Pending Instance
i
10. Scale up
Sufficient budget
Min(i ctype ( Ii ') ) P ' W P
i i
Insufficient budget
Max( Pi ') c
i type ( Ii ') C i ctype ( Ii )
Scale down
P P W
i i s
12. Cloud Cruise Control
notify Decider
admin Min( i ctype ( Ii ') ) & Pj ' W P
dynamic j
configuration
vm plan
VM
Monitor Repository
Manager
+, –
Config
workload update update vm info
enqueue
Historical
VM instances
Data
users
dequeue
13. Workload & VM simulation parameters
Mix Computing IO Intensive
Avg 30 jobs/hour Intensive Avg 30 jobs/hour
STD 5 jobs/hour Avg 30 jobs/hour STD 5 jobs/hour
STD 5 jobs/hour
General Average 300s Average 300s Average 300s
0.085$/hour STD 50s STD 50s STD 50s
Delay 600s
High-CPU Average 210s Average 75s Average 300s
0.17$/hour STD 25s STD 15s STD 50s
Delay 720s
High-IO Average 210s Average 300s Average 75s
0.17$/hour STD 25s STD 50s STD 15s
Delay 720s
16. VM Types Total Cost ($)
% more than optimal
Choice #1 General 98.52$ (43%)
Choice #2 High-CPU 128.86$ (87%)
Choice #3 High-IO 129.71$ (88%)
Choice #4 General, High-CPU, High-IO 78.62$ (14%)
Optimal General, High-CPU, High-IO 68.85$
17. MODIS
200X – Year Terra & Aqua – Satellite
(X - Y) – Day X to day Y 15 images / day
Moderate scale test (up to 20 instances)
1hour deadline 2hour deadline 3hour deadline
Terra 2004(10-12) 18 min late 8 min early 20 min early
Total 45 jobs 9 C.H.or 1.08$ 6 C.H or 0.72$ 5 C.H.or 0.6$
4 C.H.* or 0.48$
Aqua 2008(30-32) 15min late 20 min early 29 min early
Total 45 jobs 10 C.H or 1.2$ 7 C.H.or 0.84$ 5 C.H.or 0.6$
4 C.H. or 0.48$
Large Scale test (up to 90 instances)
2 hour deadline 4 hour deadline
Terra & Aqua 2006(1-75) 20min late 6 min early
Total 1125 jobs 170 C.H. or 20.4$ 132 C.H. or 15.84$
93 C.H. or 11.16$
Terra & Aqua 2006(1-150) Admission Denied 22 min early
Total 2250 jobs 243 C.H. or 29.16$
185 C.H. or 22.2$
* C.H. – computing hour 1C.H. = 0.12$ in Windows Azure
18. Test: Terra & Aqua 2006(1-75) - total 1125 jobs
6min early
theoretical cost - 93 C.H. or 11.16$
actual cost - 132 C.H. or 15.84$
Instance Acquisition and Release
40
38
36
34
32
30
28
26
Instance Number
24
22
20
18
16
14
12
10
8
6
4
2
0
0 1 2 3 4 5
Time (hour) Released Acquiring Ready
19. Conclusions
More cost-efficient than fixed-size instance choice
VM startup delay can affect hugely in practice
Future works
More general cloud application model
Multiple job classes
Consider other instance types (e.g. spot instances &
reserved instances)
Data transfer performance and storage cost