SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
Towards a UnifiedView of Elasticity
Srikumar Venugopal & Team

School of Computer Science and Engineering, 
University of New South Wales, Sydney, Australia
srikumarv@cse.unsw.edu.au
Acknowledgements
•  Basem Suleiman
•  Han Li
•  Reza Nouri
•  Freddie Sunarso
•  Richard Gow
Agenda
•  Introduction to elasticity and its
challenges
•  Performance Modeling of Elasticity Rules
•  Autonomic Decentralised Elasticity
Management of Cloud Applications
•  Efficient Bootstrapping for Decentralised
Shared-nothing Key-value Stores
Simple Service Deployment on Cloud
Elasticity
	
  
The ability of a system to change its capacity 
in direct response to the workload demand
	
  
DifferentViews of Elasticity
•  Performance View
– When to scale and how much ?
•  Application View
– Does the architecture accommodate scaling ?
– How is state managed ?
•  Configuration View
– Are there changes in configuration due to
scaling?
Elastic Deployment Architecture
Elasticizing Application Layer
Trigger – Controller – Action
•  Trigger: Threshold Breach
•  Controller: Intelligence/Logic
•  Action: Add or Remove Capacity
State-of-the-art in Auto-scaling
Product/Project	
   Trigger	
   Controller	
   Ac3ons	
  
Amazon	
  
Autoscaling	
  
Cloudwatch	
  
metrics/	
  Threshold	
  
Rule-­‐based/
Schedule-­‐based	
  
Add/Remove	
  
Capacity	
  
WASABi	
   Azure	
  Diagnos3cs/
Threshold	
  
Rule-­‐based	
   Add/Remove	
  
Capacity,	
  Custom	
  
RightScale/Scalr	
   Load	
  monitoring	
   Rule-­‐based/
Schedule-­‐based	
  
Add/Remove	
  
Capacity,	
  Custom	
  
Google	
  Compute	
  
Engine	
  
CPU	
  Load,	
  etc.	
   Rule-­‐based	
   Add/Remove	
  
Capacity	
  
Academic	
  
CloudScale	
   Demand	
  Predic3on	
   Control	
  theory	
   Voltage-­‐scaling	
  
Cataclysm	
   Threshold-­‐based	
   Queueing-­‐model	
   Admission	
  Control	
  
IBM	
  Unity	
   Applica3on	
  U3lity	
   U3lity	
  func3ons/RL	
   Add/Remove	
  
Capacity	
  
Summary
•  Currently, the most popular mechanisms
for auto-scaling are rule-based
mechanisms
•  The effectiveness of rule-based
autoscaling is determined by the trigger
conditions
•  So, how do we know how to set up the
right triggers ?
Performance Modeling of Elasticity
Rules
Basem Suleiman
Elasticity (Auto-Scaling) Rules
Examples: 
•  If CPU Utilization ≥ 85% for 7 min. add 1 server (Scale Out)
•  If RespTimeSLA ≥ 95% for 10 min. remove 1 server (Scale In)
	
  
B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
Performance of Different Elasticity Rules
•  How well do elasticity rules perform in terms of SLA satisfaction,
CPU utilization , costs and % served request?
Rule
 Elasticity Rules
CPU75
If CPU Util.>75% for 5 min; add 1 server
If CPU Util.≤30% for 5 min; remove 1 server
CPU80
If CPU Util.>80% for 5 min; add 1 server 
If CPU Util.≤30% for 5 min; remove 1 server
CPU85
If CPU Util.>85% for 5 min; add 1 server
If CPU Util.≤30% for 5 min; remove 1 server
SLA90
If SLA < 90% for 5 min; add 1 server
If SLA ≥ 90% for 5 min; remove 1 server
SLA95
If SLA < 95% for 5 mins; add 1 server
If SLA ≥ 95% for 5 mins; remove 1 server
B.	
  Suleiman,	
  S.	
  Sakr,	
  S.	
  Venugopal,	
  W.	
  Sadiq,	
  Trade-­‐off	
  Analysis	
  of	
  Elas2city	
  Approaches	
  for	
  Cloud-­‐Based	
  Business	
  Applica2ons,	
  Proc.	
  WISE	
  
2012	
  	
  
Cloud Testbed for Collecting Metrics
	
  
	
  
	
  
	
  
	
  
TPC-W
database
	
  
EC2	
  
	
  
	
  
	
  
	
  
	
  
	
  
EC2	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
TPC-W
application
	
  
.......	
  
	
  
	
  
	
  
	
  
	
  
	
  
Elastic Load
Balancer
	
  
EC2	
  
EC2	
  
% SLA Satisfaction, Avg. CPU Utilization
Server Costs and % served Requests 
Response Time
B.	
  Suleiman,	
  S.	
  Sakr,	
  S.	
  Venugopal,	
  W.	
  Sadiq,	
  Trade-­‐off	
  Analysis	
  of	
  Elas2city	
  Approaches	
  for	
  Cloud-­‐Based	
  Business	
  Applica2ons,	
  Proc.	
  WISE	
  
2012	
  	
  
Performance Evaluation - Different
Elasticity Rules
Max
Min
Median
Q3
Q1
Mean
Legend
$0.00
$0.50
$1.00
$1.50
$2.00
$2.50
CPU75
CPU80
CPU85
SLA90
SLA95
Costs
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
CPU75
CPU80
CPU85
SLA90
SLA95
CPUUtilization
B.	
  Suleiman,	
  S.	
  Sakr,	
  S.	
  Venugopal,	
  W.	
  Sadiq,	
  Trade-­‐off	
  Analysis	
  of	
  Elas2city	
  Approaches	
  for	
  Cloud-­‐Based	
  Business	
  Applica2ons,	
  Proc.	
  WISE	
  
2012	
  	
  
The Challenges of Thresholds
You must be at least
this tall to scale up!
•  Threshold values determine performance
and cost
•  E.g. Low CPU utilization => Higher cost,
Better Performance
•  Thresholds vary from one application to
another
•  Empirically determining thresholds is
expensive.
B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
Can we construct a model that allows
us to establish the right thresholds ?
Queue Model of 3-tier
B.	
  Suleiman,	
  S.	
  Venugopal,	
  Modeling	
  Performance	
  of	
  Elas2city	
  Rules	
  for	
  Cloud-­‐based	
  Applica2ons,	
  EDOC	
  2013	
  (Accepted)	
  	
  
Establishing Rule Thresholds
•  Developed a model based on M/M/m
queuing model
– Simultaneous session initiations on 1 server
– Provisioning Lag Time of the provider
– Cool-down interval after elasticity action
– Algorithms to model scale-in and scale-out 
– Request Mix
•  Compared model fidelity with actual cloud
execution of TPC-W workload.
B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
Experiments: Methodology
•  Run the TPC-W workload on Amazon
cloud resources using thresholds
•  Simulate the model using MATLAB with
the same thresholds
•  Compare the simulation results to the
results from the actual execution
– If both are equivalent, then we are good J
B.	
  Suleiman,	
  S.	
  Venugopal,	
  Modeling	
  Performance	
  of	
  Elas2city	
  Rules	
  for	
  Cloud-­‐based	
  Applica2ons,	
  EDOC	
  2013	
  (Accepted)	
  	
  
Experiments: Testbed
	
  
	
  
	
  
	
  
	
  
TPC-W
database
	
  
EC2	
  
TPC-W user
emulation
Linux – Extra-large
	
  
	
  
	
  
	
  
	
  
	
  
EC2	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
TPC-W
application
	
  
.......
Elastic Load
Balancer
	
  
	
  
	
  
	
  
	
  
EC2	
  
Small/Medium server
Linux – JBoss/JSDK
Extra-large server
Linux - MySQL
EC2	
  
Experiments: Input Workload
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
RequestArrivalRate(req/min)
Time (minutes)
Workload
•  Used TPC-W Browsing profile (95% read) 
•  Stress on application tier
•  Number of concurrent-users – Zipf
•  Inter-arrival times - Poisson
Experiments: Elasticity Rules
Rule	
   Rule	
  Expansion	
  
CPU75
 If CPU Util. > 75% for 5 min, add 1 server
If CPU Util. < 30% for 5 min, remove 1 server
CPU80
 If CPU Util. > 80% for 5 min, add 1 server
If CPU Util. < 30% for 5 min, remove 1 server
Common parameters:
•  Waiting time – 10 mins., Measuring interval – 1 min.

Metrics Captured:
•  Average CPU Utilization across all the servers
•  Average Response Time in a time interval
•  Number of servers in operation at any point of time
Results
CPU Utilization
CPU75M CPU75E CPU80M CPU80E
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Elasticity Rules - Model (M) & Empirical (E)
Avg.CPUUtilization
CPU75M
CPU75E
CPU80M
CPU80E
Average	
  Response	
  Time	
  
CPU75M CPU75E CPU80M CPU80E
0.0
0.1
0.2
0.3
0.4
0.5
Elasticity Rules - Models (M) & Empirical (E)
Avg.ResponseTime(sec)
CPU75M
CPU75E
CPU80M
CPU80E
0 40 80 120 160 200 240 280 320 360 400 440 480 520 560
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%Avg.CPUUtilization(%)
Time (minutes)
CPU80M
CPU80E
CPU Utilization over Time
0 40 80 120 160 200 240 280 320 360 400 440 480 520 560
0
1
2
3
4
5
6
No.Servers(App.Tier)
Time (minutes)
CPU75M
CPU75E
CPU80M
CPU80E
Number of Servers Initialized
Summary
•  Developed a queueing model that can be
used to reason about elasticity
•  Model captures effects of thresholds and
can be used for testing different rules
•  Evaluations show that the model approx.
real-world conditions closely
•  Future work: handling initial bursts in
workload
Autonomic Decentralised Elasticity
Management of Cloud Applications
Reza Nouri and Han Li
Cons of Rule-based Autoscaling
•  Commercial products are rule-based
– Gives “illusion of control” to users
– Leads to the problem of defining the “right”
thresholds
•  Centralised controllers
– Communication overhead increases with size
– Processing overhead also increases (Big
Data!)
•  One application/VM at a time
Challenges of large-scale elasticity
•  Large numbers of instances and apps
– Deriving solutions takes time
•  Dynamic conditions
– Apps are going into critical all the time
•  Shifting bottlenecks
– Greedy solutions may create bottlenecks in
other places
•  Network partitions, fault tolerance…
H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of
8th ICAC '11.
Initial Conditions
Instance1	
  
App	
  Server1	
  
app1 app2
Instance2	
  
App	
  Server2	
  
app3 app4
IaaS Provider
A Critical Event
Instance1	
  
App	
  Server1	
  
app1 app2
IaaS Provider
Instance2	
  
App	
  Server2	
  
app3 app4
Placement 1
Instance1	
  
App	
  Server1	
  
app1
IaaS Provider
Instance2	
  
App	
  Server2	
  
app3 app4 app2
Placement 2
Instance1	
  
App	
  Server1	
  
app2
IaaS Provider
Instance2	
  
App	
  Server2	
  
app3 app4
Instance3	
  
App	
  Server3	
  
app1
$$	
  
Placements 3 & 4
Instance1	
  
App	
  Server1	
  
app2
IaaS Provider
Instance2	
  
App	
  Server2	
  
app3 app4
Instance1	
  
App	
  Server1	
  
app2
IaaS Provider
Instance2	
  
App	
  Server2	
  
app3 app4
Instance3	
  
App	
  Server3	
  
app1 app1
app1 app1
Problems for Automatic Placement
•  Provisioning
– Smallest number of servers required to satisfy
resource requirements of all the applications

•  Dynamic Placement
– Distribute applications so as to maximise
utilisation yet meet each app’s response time
and availability requirements
H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of
8th ICAC '11.
Co-ordinated Control of Elasticity
•  Instances control their own utilisation
– Monitoring, management and feedback
•  Local controllers are learning agents
– Reinforcement Learning
•  Controllers learn from each other
– Share their knowledge and update their own
•  Servers are linked by a DHT
– Agility, Flexibility, Co-ordination
H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform”, Proceedings
of 8th ICAC '11.
Abstract View of the Control
Scheme
H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform”, Proceedings
of 8th ICAC '11.
Fuzzy Thresholds
H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of
8th ICAC '11.
Basic Actions
Instance	
  
Applica3on	
  
create! terminate! find!
move! duplicate! merge!
(-­‐3.5)	
   (3.5)	
   (3.5)	
  
(0.5)	
   (0.5)	
   (0.5)	
  
Co-ordination using find!
•  Server looks up other servers with the
least load
– DHT lookup
•  Sends a move message to the selected
server
•  Replies with accept or reject!
– accept has a +ve reward
Shrinking
•  The controller is always reward
maximising
– Highest Reward is for merge+terminate
•  A controller initiates its own shutdown
– Low load on its applications
•  Gets exclusive lock on termination
– Only one instance can terminate at a time
•  Transfers state before shutdown
Experiments
•  Six web applications
–  Test Application: Hotel Management
–  Search à Book à Confirm
•  Five were subjected to a background load
–  Uniform Random
•  One was subjected to the test load
•  Application threshold: 200 and 500 ms
•  Metrics
–  Average Response Time, Drop Rate, Servers
H.	
  Li,	
  S.	
  Venugopal,	
  “Using	
  Reinforcement	
  Learning	
  for	
  Controlling	
  an	
  Elas3c	
  Web	
  Applica3on	
  Hos3ng	
  Plaorm”,	
  Proceedings	
  of	
  8th	
  ICAC	
  
'11.	
  
Experimental Results (EC2)
Elasticising Persistence Layer
Efficient Bootstrapping for
Decentralised Shared-nothing Key-
value Stores
Han Li
Key-value Stores
•  The standard component for cloud data
management
•  Increasing workload à Node bootstrapping
–  Incorporate a new, empty node as a member of KVS

•  Decreasing workload à Node decommissioning
–  Eliminate an existing member with redundant data off
the KVS
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Research Questions
•  As the system scales, how to efficiently
incorporate or remove data nodes?
– Load balancing, migration overheads, etc.
•  How to partition and place the data
replicas when the system is elastic?
– Data consistency, durability, availability, etc..
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Elasticity in Key-Value Stores
•  Minimise the overhead of data movement
–  How to partition/store data?

•  Balance the load at node bootstrapping
–  Both data volume and workload
–  How to place/allocate data?

•  Maintain data consistency and availability
–  How to execute data movement?
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
A
B
G
F
C
D
E
I
H
Key space
Split-Move Approach
A
I
C C
D
Node 1 Node 2
Node 3 Node 4
B
I
B
B
A
Master Replica Slave Replica
A
H
A
I B2
C CD
Node 1 Node 2
Node 3 Node 4
New Node
B1 B2
I
B1
B2
A
B1
Master Replica Slave Replica
A
H
①
①①
A
B
G
F
C
D
E
I
H
B2
B1
①
Key space
②A
I B2
C CD
B2
A B1
Node 1 Node 2
Node 3 Node 4
New Node
②
B1 B2
I
B1
B2
A
B1
Master Replica Slave Replica
A
H
A
I B2
C CD
B2
A B1
Node 1 Node 2
Node 3 Node 4
New Node
②②
B1 B2
I
B1
B2
A
B1
Master Replica Slave Replica
To be deleted
③
A
H
Partition at node bootstrapping
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Virtual-Node Approach
A
B
G
F
C
D
E
I
H
Key space
D B
E H
I G
A C
D F
G I
A B
C E
I
C D
F H
G
Node 1 Node 2
Node 3 Node 4
D B
E H
I G
A C
D F
G I
A B
C E
I
C D
F H
G
Node 1 Node 2
Node 3 Node 4
New Node
D B
E H
I G
A C
D F
G I
A B
C E
I
C D
F H
G
B A
E F
H
Node 1 Node 2
Node 3 Node 4
New Node
......
Partition at system startup
Data skew: e.g., the majority of data is stored in a minority of partitions.
Moving around giant partitions is not a good idea.
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Our Solution
•  Virtual-node based movement
–  Each partition of data is stored in separated files
–  Reduced overhead of data movement
–  Many existing nodes can participate in bootstrapping
•  Automatic sharding
–  Split and merge partitions at runtime
–  Each partition stores a bounded volume of data
–  Easy to reallocate data
–  Easy to balance the load
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
The timing for data partitioning
•  Shard partitions at writes (insert and delete)
–  Split: Size(Pi) ≤ Θmax
–  Merge: Size(Pi) + Size(Pi+1) ≥ Θmin
Split
Delete
Insert
Merge
B
A
C
D
E
B1
A
C
D
E
B2
B1A
C
D
E
B2
B1A
M
D
E
Split
Delete
Insert
B
A
C
D
E
B1
A
C
D
E
B2
B1A
C
D
E
B2
Split
Insert
B
A
C
D
E
B1
A
C
D
E
B2
B
A
C
D
E
Θmax	
  ≥	
  2Θmin	
  
Avoid	
  oscilla3on!	
  
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Sharding coordination
•  Solution: Election-based coordination
Node-A
Node-C
Node-E
Node-B
SortedList:
C, E, ..., A, ..., B Step1
Election
Node-A
Coordinator
Node-C
Node-E
Node-B
Step 2
Enforce Split/Merge
Data/Node
mappingNode-A
Coordinator
Node-C
Node-E
Node-B 1st
Data/Node
mapping
Step 3
Finish Split/Merge
2nd
3rd
4th
Node-A
Coordinator
Node-C
Node-E
Node-B
Step 4
Announce to all nodes
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Node failover during sharding
Non-
coordinators
Non-
coordinators
Non-
coordinators
Election
Notification:
Shard Pi
Time
Before
execution
During
execution
After
execution
Replace Replicas
Coordinator
Announce:
Successful
Step2
Step3
Step4
Step1
Non-
coordinators
Non-
coordinators
Removed from
candidate list
Non-
coordinators
Election
Failed Resurrect
yes
No
Yes
Notification:
Shard Pi
Append to
candidate list
Gossip
No Dead
Time
Before
execution
During
execution
After
execution
Replace Replicas
Coordinator
Announce:
Successful
Step2
Step3
Step4
Step1
Non-
coordinators
Non-
coordinators
Non-
coordinators
Election
Notification:
Shard Pi
Gossip Continue without coordinator Resurrect
Dead
No
Yes
Time
Before
execution
During
execution
After
execution
Failed
Replace Replicas
Coordinator
Announce:
Successful
Step2
Step3
Step4
Step1
Non-
coordinators
Non-
coordinators
Non-
coordinators
Election
Notification:
Shard Pi
Failed
Gossip
Yes
Continue without coordinator
Elect
New coordinator
No
Invalidate Pi
in this node
Timeout
Time
Before
execution
During
execution
After
execution
Replace Replicas
Coordinator
Announce:
Successful
Step2
Step3
Step4
Step1
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Evaluation Setup
•  ElasCass: An implemention of auto-sharding,
building on Apache Cassandra (version 1.0.5),
which uses Split-Move approach.
•  Key-value stores: ElasCass vs. Cassandra
(v1.0.5)
•  Test bed: Amazon EC2, m1.large type, 2 CPU
cores, 8GB ram
•  Benchmark: YCSB
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Evaluation – Bootstrap Time
•  Start from 1 node, with 100GB of data,
R=2. Scale up to 10 nodes.
•  In Split-Move, data volume transferred
reduces by half from 3 nodes onwards.
•  In ElasCass, data volume transferred
remains below 10GB from 2 nodes.
•  Bootstrap time is determined by data
volume transferred. ElasCass exhibits a
consistent performance at all scales.
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of
MIddleware 2013.
Conclusions
•  We have designed and implemented a
decentralised auto-sharding scheme that
– consolidates each partition replica into single
transferable units to provide efficient data
movement;
– automatically shards the partitions into
bounded ranges to address data skew;
– reduces the time to bootstrap nodes,
achieves more balancing load and better
performance of query processing.
A UnifiedView of Elasticity (?)
Final Thoughts
•  Elasticising Application Logic is done
– How do we eliminate thresholds ?
– Should it be more autonomic ?
•  Application View of Elasticity
– Managing state is the big challenge
– Decoupling of different components (service-
oriented model)
– How would you scale interconnected
components ?
Questions ?
srikumarv@cse.unsw.edu.au
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Sam Fell - Electric Cloud - Automating Continuous Delivery with ElectricFlow
Sam Fell - Electric Cloud - Automating Continuous Delivery with ElectricFlowSam Fell - Electric Cloud - Automating Continuous Delivery with ElectricFlow
Sam Fell - Electric Cloud - Automating Continuous Delivery with ElectricFlowDevOps Enterprise Summit
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is uselessAdrian Cockcroft
 
Tanay Nagjee - Electric Cloud - Better Continuous Integration with Test Accel...
Tanay Nagjee - Electric Cloud - Better Continuous Integration with Test Accel...Tanay Nagjee - Electric Cloud - Better Continuous Integration with Test Accel...
Tanay Nagjee - Electric Cloud - Better Continuous Integration with Test Accel...DevOps Enterprise Summit
 
Rohit Jainendra - Electric Cloud - Enabling DevOps Adoption with Electric Cloud
Rohit Jainendra - Electric Cloud - Enabling DevOps Adoption with Electric CloudRohit Jainendra - Electric Cloud - Enabling DevOps Adoption with Electric Cloud
Rohit Jainendra - Electric Cloud - Enabling DevOps Adoption with Electric CloudDevOps Enterprise Summit
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllersPooyan Jamshidi
 
Chaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryChaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryKarun Chennuri
 
Capacity Planning for Virtualized Datacenters - Sun Network 2003
Capacity Planning for Virtualized Datacenters - Sun Network 2003Capacity Planning for Virtualized Datacenters - Sun Network 2003
Capacity Planning for Virtualized Datacenters - Sun Network 2003Adrian Cockcroft
 
System and User Aspects of Web Search Latency
System and User Aspects of Web Search LatencySystem and User Aspects of Web Search Latency
System and User Aspects of Web Search LatencyTelefonica Research
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s DilemmaAmazon Web Services
 
Java/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimizationJava/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimizationEPAM Lviv
 

Was ist angesagt? (10)

Sam Fell - Electric Cloud - Automating Continuous Delivery with ElectricFlow
Sam Fell - Electric Cloud - Automating Continuous Delivery with ElectricFlowSam Fell - Electric Cloud - Automating Continuous Delivery with ElectricFlow
Sam Fell - Electric Cloud - Automating Continuous Delivery with ElectricFlow
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
Tanay Nagjee - Electric Cloud - Better Continuous Integration with Test Accel...
Tanay Nagjee - Electric Cloud - Better Continuous Integration with Test Accel...Tanay Nagjee - Electric Cloud - Better Continuous Integration with Test Accel...
Tanay Nagjee - Electric Cloud - Better Continuous Integration with Test Accel...
 
Rohit Jainendra - Electric Cloud - Enabling DevOps Adoption with Electric Cloud
Rohit Jainendra - Electric Cloud - Enabling DevOps Adoption with Electric CloudRohit Jainendra - Electric Cloud - Enabling DevOps Adoption with Electric Cloud
Rohit Jainendra - Electric Cloud - Enabling DevOps Adoption with Electric Cloud
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllers
 
Chaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryChaos Engineering on Cloud Foundry
Chaos Engineering on Cloud Foundry
 
Capacity Planning for Virtualized Datacenters - Sun Network 2003
Capacity Planning for Virtualized Datacenters - Sun Network 2003Capacity Planning for Virtualized Datacenters - Sun Network 2003
Capacity Planning for Virtualized Datacenters - Sun Network 2003
 
System and User Aspects of Web Search Latency
System and User Aspects of Web Search LatencySystem and User Aspects of Web Search Latency
System and User Aspects of Web Search Latency
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
 
Java/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimizationJava/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimization
 

Ähnlich wie Towards a Unified View of Cloud Elasticity

Autonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud ApplicationsAutonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud ApplicationsSrikumar Venugopal
 
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...Ostrato
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliencyMasashi Narumoto
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmssmarar
 
SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 Sujit Ghosh
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 
Total cloud control with oracle enterprise manager 12c
Total cloud control with oracle enterprise manager 12cTotal cloud control with oracle enterprise manager 12c
Total cloud control with oracle enterprise manager 12csolarisyougood
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the CloudAmazon Web Services
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the CloudAmazon Web Services
 
Fuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringFuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringPooyan Jamshidi
 
Service Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand ServicesService Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand ServicesAnil Gursel
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsSAIL_QU
 
Planning a Successful Cloud - Design from Workload to Infrastructure
Planning a Successful Cloud - Design from Workload to InfrastructurePlanning a Successful Cloud - Design from Workload to Infrastructure
Planning a Successful Cloud - Design from Workload to Infrastructurebuildacloud
 
Adaptive Server Farms for the Data Center
Adaptive Server Farms for the Data CenterAdaptive Server Farms for the Data Center
Adaptive Server Farms for the Data Centerelliando dias
 
VCS_QAPerformanceSlides
VCS_QAPerformanceSlidesVCS_QAPerformanceSlides
VCS_QAPerformanceSlidesMichael Cowan
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applicationsGR8Conf
 
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL PerformanceTommy Lee
 

Ähnlich wie Towards a Unified View of Cloud Elasticity (20)

Autonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud ApplicationsAutonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud Applications
 
Venugopal adec
Venugopal adecVenugopal adec
Venugopal adec
 
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
 
SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Total cloud control with oracle enterprise manager 12c
Total cloud control with oracle enterprise manager 12cTotal cloud control with oracle enterprise manager 12c
Total cloud control with oracle enterprise manager 12c
 
Presentacion 1.10
Presentacion 1.10Presentacion 1.10
Presentacion 1.10
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud
 
Fuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringFuzzy Control meets Software Engineering
Fuzzy Control meets Software Engineering
 
Service Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand ServicesService Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand Services
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
 
Planning a Successful Cloud - Design from Workload to Infrastructure
Planning a Successful Cloud - Design from Workload to InfrastructurePlanning a Successful Cloud - Design from Workload to Infrastructure
Planning a Successful Cloud - Design from Workload to Infrastructure
 
Adaptive Server Farms for the Data Center
Adaptive Server Farms for the Data CenterAdaptive Server Farms for the Data Center
Adaptive Server Farms for the Data Center
 
VCS_QAPerformanceSlides
VCS_QAPerformanceSlidesVCS_QAPerformanceSlides
VCS_QAPerformanceSlides
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
Web Performance Testing
Web Performance TestingWeb Performance Testing
Web Performance Testing
 
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
 

Kürzlich hochgeladen

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 

Kürzlich hochgeladen (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 

Towards a Unified View of Cloud Elasticity

  • 1. Towards a UnifiedView of Elasticity Srikumar Venugopal & Team School of Computer Science and Engineering, University of New South Wales, Sydney, Australia srikumarv@cse.unsw.edu.au
  • 2. Acknowledgements •  Basem Suleiman •  Han Li •  Reza Nouri •  Freddie Sunarso •  Richard Gow
  • 3. Agenda •  Introduction to elasticity and its challenges •  Performance Modeling of Elasticity Rules •  Autonomic Decentralised Elasticity Management of Cloud Applications •  Efficient Bootstrapping for Decentralised Shared-nothing Key-value Stores
  • 5. Elasticity   The ability of a system to change its capacity in direct response to the workload demand  
  • 6. DifferentViews of Elasticity •  Performance View – When to scale and how much ? •  Application View – Does the architecture accommodate scaling ? – How is state managed ? •  Configuration View – Are there changes in configuration due to scaling?
  • 9. Trigger – Controller – Action •  Trigger: Threshold Breach •  Controller: Intelligence/Logic •  Action: Add or Remove Capacity
  • 10. State-of-the-art in Auto-scaling Product/Project   Trigger   Controller   Ac3ons   Amazon   Autoscaling   Cloudwatch   metrics/  Threshold   Rule-­‐based/ Schedule-­‐based   Add/Remove   Capacity   WASABi   Azure  Diagnos3cs/ Threshold   Rule-­‐based   Add/Remove   Capacity,  Custom   RightScale/Scalr   Load  monitoring   Rule-­‐based/ Schedule-­‐based   Add/Remove   Capacity,  Custom   Google  Compute   Engine   CPU  Load,  etc.   Rule-­‐based   Add/Remove   Capacity   Academic   CloudScale   Demand  Predic3on   Control  theory   Voltage-­‐scaling   Cataclysm   Threshold-­‐based   Queueing-­‐model   Admission  Control   IBM  Unity   Applica3on  U3lity   U3lity  func3ons/RL   Add/Remove   Capacity  
  • 11. Summary •  Currently, the most popular mechanisms for auto-scaling are rule-based mechanisms •  The effectiveness of rule-based autoscaling is determined by the trigger conditions •  So, how do we know how to set up the right triggers ?
  • 12. Performance Modeling of Elasticity Rules Basem Suleiman
  • 13. Elasticity (Auto-Scaling) Rules Examples: •  If CPU Utilization ≥ 85% for 7 min. add 1 server (Scale Out) •  If RespTimeSLA ≥ 95% for 10 min. remove 1 server (Scale In)   B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
  • 14. Performance of Different Elasticity Rules •  How well do elasticity rules perform in terms of SLA satisfaction, CPU utilization , costs and % served request? Rule Elasticity Rules CPU75 If CPU Util.>75% for 5 min; add 1 server If CPU Util.≤30% for 5 min; remove 1 server CPU80 If CPU Util.>80% for 5 min; add 1 server If CPU Util.≤30% for 5 min; remove 1 server CPU85 If CPU Util.>85% for 5 min; add 1 server If CPU Util.≤30% for 5 min; remove 1 server SLA90 If SLA < 90% for 5 min; add 1 server If SLA ≥ 90% for 5 min; remove 1 server SLA95 If SLA < 95% for 5 mins; add 1 server If SLA ≥ 95% for 5 mins; remove 1 server B.  Suleiman,  S.  Sakr,  S.  Venugopal,  W.  Sadiq,  Trade-­‐off  Analysis  of  Elas2city  Approaches  for  Cloud-­‐Based  Business  Applica2ons,  Proc.  WISE   2012    
  • 15. Cloud Testbed for Collecting Metrics           TPC-W database   EC2               EC2                     TPC-W application   .......               Elastic Load Balancer   EC2   EC2   % SLA Satisfaction, Avg. CPU Utilization Server Costs and % served Requests Response Time B.  Suleiman,  S.  Sakr,  S.  Venugopal,  W.  Sadiq,  Trade-­‐off  Analysis  of  Elas2city  Approaches  for  Cloud-­‐Based  Business  Applica2ons,  Proc.  WISE   2012    
  • 16. Performance Evaluation - Different Elasticity Rules Max Min Median Q3 Q1 Mean Legend $0.00 $0.50 $1.00 $1.50 $2.00 $2.50 CPU75 CPU80 CPU85 SLA90 SLA95 Costs 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% CPU75 CPU80 CPU85 SLA90 SLA95 CPUUtilization B.  Suleiman,  S.  Sakr,  S.  Venugopal,  W.  Sadiq,  Trade-­‐off  Analysis  of  Elas2city  Approaches  for  Cloud-­‐Based  Business  Applica2ons,  Proc.  WISE   2012    
  • 17. The Challenges of Thresholds You must be at least this tall to scale up! •  Threshold values determine performance and cost •  E.g. Low CPU utilization => Higher cost, Better Performance •  Thresholds vary from one application to another •  Empirically determining thresholds is expensive. B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
  • 18. Can we construct a model that allows us to establish the right thresholds ?
  • 19. Queue Model of 3-tier B.  Suleiman,  S.  Venugopal,  Modeling  Performance  of  Elas2city  Rules  for  Cloud-­‐based  Applica2ons,  EDOC  2013  (Accepted)    
  • 20. Establishing Rule Thresholds •  Developed a model based on M/M/m queuing model – Simultaneous session initiations on 1 server – Provisioning Lag Time of the provider – Cool-down interval after elasticity action – Algorithms to model scale-in and scale-out – Request Mix •  Compared model fidelity with actual cloud execution of TPC-W workload. B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
  • 21. Experiments: Methodology •  Run the TPC-W workload on Amazon cloud resources using thresholds •  Simulate the model using MATLAB with the same thresholds •  Compare the simulation results to the results from the actual execution – If both are equivalent, then we are good J B.  Suleiman,  S.  Venugopal,  Modeling  Performance  of  Elas2city  Rules  for  Cloud-­‐based  Applica2ons,  EDOC  2013  (Accepted)    
  • 22. Experiments: Testbed           TPC-W database   EC2   TPC-W user emulation Linux – Extra-large             EC2                       TPC-W application   ....... Elastic Load Balancer           EC2   Small/Medium server Linux – JBoss/JSDK Extra-large server Linux - MySQL EC2  
  • 23. Experiments: Input Workload 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 RequestArrivalRate(req/min) Time (minutes) Workload •  Used TPC-W Browsing profile (95% read) •  Stress on application tier •  Number of concurrent-users – Zipf •  Inter-arrival times - Poisson
  • 24. Experiments: Elasticity Rules Rule   Rule  Expansion   CPU75 If CPU Util. > 75% for 5 min, add 1 server If CPU Util. < 30% for 5 min, remove 1 server CPU80 If CPU Util. > 80% for 5 min, add 1 server If CPU Util. < 30% for 5 min, remove 1 server Common parameters: •  Waiting time – 10 mins., Measuring interval – 1 min. Metrics Captured: •  Average CPU Utilization across all the servers •  Average Response Time in a time interval •  Number of servers in operation at any point of time
  • 25. Results CPU Utilization CPU75M CPU75E CPU80M CPU80E 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Elasticity Rules - Model (M) & Empirical (E) Avg.CPUUtilization CPU75M CPU75E CPU80M CPU80E Average  Response  Time   CPU75M CPU75E CPU80M CPU80E 0.0 0.1 0.2 0.3 0.4 0.5 Elasticity Rules - Models (M) & Empirical (E) Avg.ResponseTime(sec) CPU75M CPU75E CPU80M CPU80E
  • 26. 0 40 80 120 160 200 240 280 320 360 400 440 480 520 560 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%Avg.CPUUtilization(%) Time (minutes) CPU80M CPU80E CPU Utilization over Time
  • 27. 0 40 80 120 160 200 240 280 320 360 400 440 480 520 560 0 1 2 3 4 5 6 No.Servers(App.Tier) Time (minutes) CPU75M CPU75E CPU80M CPU80E Number of Servers Initialized
  • 28. Summary •  Developed a queueing model that can be used to reason about elasticity •  Model captures effects of thresholds and can be used for testing different rules •  Evaluations show that the model approx. real-world conditions closely •  Future work: handling initial bursts in workload
  • 29. Autonomic Decentralised Elasticity Management of Cloud Applications Reza Nouri and Han Li
  • 30. Cons of Rule-based Autoscaling •  Commercial products are rule-based – Gives “illusion of control” to users – Leads to the problem of defining the “right” thresholds •  Centralised controllers – Communication overhead increases with size – Processing overhead also increases (Big Data!) •  One application/VM at a time
  • 31. Challenges of large-scale elasticity •  Large numbers of instances and apps – Deriving solutions takes time •  Dynamic conditions – Apps are going into critical all the time •  Shifting bottlenecks – Greedy solutions may create bottlenecks in other places •  Network partitions, fault tolerance… H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.
  • 32. Initial Conditions Instance1   App  Server1   app1 app2 Instance2   App  Server2   app3 app4 IaaS Provider
  • 33. A Critical Event Instance1   App  Server1   app1 app2 IaaS Provider Instance2   App  Server2   app3 app4
  • 34. Placement 1 Instance1   App  Server1   app1 IaaS Provider Instance2   App  Server2   app3 app4 app2
  • 35. Placement 2 Instance1   App  Server1   app2 IaaS Provider Instance2   App  Server2   app3 app4 Instance3   App  Server3   app1 $$  
  • 36. Placements 3 & 4 Instance1   App  Server1   app2 IaaS Provider Instance2   App  Server2   app3 app4 Instance1   App  Server1   app2 IaaS Provider Instance2   App  Server2   app3 app4 Instance3   App  Server3   app1 app1 app1 app1
  • 37. Problems for Automatic Placement •  Provisioning – Smallest number of servers required to satisfy resource requirements of all the applications •  Dynamic Placement – Distribute applications so as to maximise utilisation yet meet each app’s response time and availability requirements H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.
  • 38. Co-ordinated Control of Elasticity •  Instances control their own utilisation – Monitoring, management and feedback •  Local controllers are learning agents – Reinforcement Learning •  Controllers learn from each other – Share their knowledge and update their own •  Servers are linked by a DHT – Agility, Flexibility, Co-ordination H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform”, Proceedings of 8th ICAC '11.
  • 39. Abstract View of the Control Scheme H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform”, Proceedings of 8th ICAC '11.
  • 40. Fuzzy Thresholds H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.
  • 41. Basic Actions Instance   Applica3on   create! terminate! find! move! duplicate! merge! (-­‐3.5)   (3.5)   (3.5)   (0.5)   (0.5)   (0.5)  
  • 42. Co-ordination using find! •  Server looks up other servers with the least load – DHT lookup •  Sends a move message to the selected server •  Replies with accept or reject! – accept has a +ve reward
  • 43. Shrinking •  The controller is always reward maximising – Highest Reward is for merge+terminate •  A controller initiates its own shutdown – Low load on its applications •  Gets exclusive lock on termination – Only one instance can terminate at a time •  Transfers state before shutdown
  • 44. Experiments •  Six web applications –  Test Application: Hotel Management –  Search à Book à Confirm •  Five were subjected to a background load –  Uniform Random •  One was subjected to the test load •  Application threshold: 200 and 500 ms •  Metrics –  Average Response Time, Drop Rate, Servers H.  Li,  S.  Venugopal,  “Using  Reinforcement  Learning  for  Controlling  an  Elas3c  Web  Applica3on  Hos3ng  Plaorm”,  Proceedings  of  8th  ICAC   '11.  
  • 47. Efficient Bootstrapping for Decentralised Shared-nothing Key- value Stores Han Li
  • 48. Key-value Stores •  The standard component for cloud data management •  Increasing workload à Node bootstrapping –  Incorporate a new, empty node as a member of KVS •  Decreasing workload à Node decommissioning –  Eliminate an existing member with redundant data off the KVS H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 49. Research Questions •  As the system scales, how to efficiently incorporate or remove data nodes? – Load balancing, migration overheads, etc. •  How to partition and place the data replicas when the system is elastic? – Data consistency, durability, availability, etc.. H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 50. Elasticity in Key-Value Stores •  Minimise the overhead of data movement –  How to partition/store data? •  Balance the load at node bootstrapping –  Both data volume and workload –  How to place/allocate data? •  Maintain data consistency and availability –  How to execute data movement? H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 51. A B G F C D E I H Key space Split-Move Approach A I C C D Node 1 Node 2 Node 3 Node 4 B I B B A Master Replica Slave Replica A H A I B2 C CD Node 1 Node 2 Node 3 Node 4 New Node B1 B2 I B1 B2 A B1 Master Replica Slave Replica A H ① ①① A B G F C D E I H B2 B1 ① Key space ②A I B2 C CD B2 A B1 Node 1 Node 2 Node 3 Node 4 New Node ② B1 B2 I B1 B2 A B1 Master Replica Slave Replica A H A I B2 C CD B2 A B1 Node 1 Node 2 Node 3 Node 4 New Node ②② B1 B2 I B1 B2 A B1 Master Replica Slave Replica To be deleted ③ A H Partition at node bootstrapping H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 52. Virtual-Node Approach A B G F C D E I H Key space D B E H I G A C D F G I A B C E I C D F H G Node 1 Node 2 Node 3 Node 4 D B E H I G A C D F G I A B C E I C D F H G Node 1 Node 2 Node 3 Node 4 New Node D B E H I G A C D F G I A B C E I C D F H G B A E F H Node 1 Node 2 Node 3 Node 4 New Node ...... Partition at system startup Data skew: e.g., the majority of data is stored in a minority of partitions. Moving around giant partitions is not a good idea. H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 53. Our Solution •  Virtual-node based movement –  Each partition of data is stored in separated files –  Reduced overhead of data movement –  Many existing nodes can participate in bootstrapping •  Automatic sharding –  Split and merge partitions at runtime –  Each partition stores a bounded volume of data –  Easy to reallocate data –  Easy to balance the load H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 54. The timing for data partitioning •  Shard partitions at writes (insert and delete) –  Split: Size(Pi) ≤ Θmax –  Merge: Size(Pi) + Size(Pi+1) ≥ Θmin Split Delete Insert Merge B A C D E B1 A C D E B2 B1A C D E B2 B1A M D E Split Delete Insert B A C D E B1 A C D E B2 B1A C D E B2 Split Insert B A C D E B1 A C D E B2 B A C D E Θmax  ≥  2Θmin   Avoid  oscilla3on!   H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 55. Sharding coordination •  Solution: Election-based coordination Node-A Node-C Node-E Node-B SortedList: C, E, ..., A, ..., B Step1 Election Node-A Coordinator Node-C Node-E Node-B Step 2 Enforce Split/Merge Data/Node mappingNode-A Coordinator Node-C Node-E Node-B 1st Data/Node mapping Step 3 Finish Split/Merge 2nd 3rd 4th Node-A Coordinator Node-C Node-E Node-B Step 4 Announce to all nodes H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 56. Node failover during sharding Non- coordinators Non- coordinators Non- coordinators Election Notification: Shard Pi Time Before execution During execution After execution Replace Replicas Coordinator Announce: Successful Step2 Step3 Step4 Step1 Non- coordinators Non- coordinators Removed from candidate list Non- coordinators Election Failed Resurrect yes No Yes Notification: Shard Pi Append to candidate list Gossip No Dead Time Before execution During execution After execution Replace Replicas Coordinator Announce: Successful Step2 Step3 Step4 Step1 Non- coordinators Non- coordinators Non- coordinators Election Notification: Shard Pi Gossip Continue without coordinator Resurrect Dead No Yes Time Before execution During execution After execution Failed Replace Replicas Coordinator Announce: Successful Step2 Step3 Step4 Step1 Non- coordinators Non- coordinators Non- coordinators Election Notification: Shard Pi Failed Gossip Yes Continue without coordinator Elect New coordinator No Invalidate Pi in this node Timeout Time Before execution During execution After execution Replace Replicas Coordinator Announce: Successful Step2 Step3 Step4 Step1 H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 57. Evaluation Setup •  ElasCass: An implemention of auto-sharding, building on Apache Cassandra (version 1.0.5), which uses Split-Move approach. •  Key-value stores: ElasCass vs. Cassandra (v1.0.5) •  Test bed: Amazon EC2, m1.large type, 2 CPU cores, 8GB ram •  Benchmark: YCSB H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 58. Evaluation – Bootstrap Time •  Start from 1 node, with 100GB of data, R=2. Scale up to 10 nodes. •  In Split-Move, data volume transferred reduces by half from 3 nodes onwards. •  In ElasCass, data volume transferred remains below 10GB from 2 nodes. •  Bootstrap time is determined by data volume transferred. ElasCass exhibits a consistent performance at all scales. H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
  • 59. Conclusions •  We have designed and implemented a decentralised auto-sharding scheme that – consolidates each partition replica into single transferable units to provide efficient data movement; – automatically shards the partitions into bounded ranges to address data skew; – reduces the time to bootstrap nodes, achieves more balancing load and better performance of query processing.
  • 60. A UnifiedView of Elasticity (?)
  • 61. Final Thoughts •  Elasticising Application Logic is done – How do we eliminate thresholds ? – Should it be more autonomic ? •  Application View of Elasticity – Managing state is the big challenge – Decoupling of different components (service- oriented model) – How would you scale interconnected components ?