SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Autonomous control in Big
Data platforms: an
experience with Cassandra
Emiliano Casalicchio (emc@bth.se)
Joint research with:
Lars Lundberg and Sogand Shirinbab
Computer Science Dep.
Blekinge Institute of Technology
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Research framework
ā€¢ Scalable resource-efficient systems for big data analytics
ā€¢ awarded by Knowledge Foundation, Sweden (20140032),
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Agenda
ā€¢ Big Data Platform
ā€¢ Main properties
ā€¢ Why autonomous control is important
ā€¢ Challenges
ā€¢ The Cassandra case study
ā€¢ Conclusions
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
The NIST BDRABig	
 Ā Data	
 Ā Framework	
 Ā Providers
BD	
 Ā Platforms	
 Ā (logical	
 Ā data	
 Ā organization	
 Ā and	
 Ā 
distribution,	
 Ā access	
 Ā API)
Infrastructures	
 Ā (networking,	
 Ā computing,	
 Ā 
storage)
BD	
 Ā Processing	
 Ā Frameworks	
 Ā (batch,	
 Ā 
interactive,	
 Ā streaming)
Big	
 Ā Data	
 Ā Applications
e.g.	
 Ā HDFS,	
 Ā Cassandra,	
 Ā Hbase,	
 Ā Dynamo,	
 Ā PNUTS,	
 Ā ā€¦	
 Ā 
e.g.	
 Ā MapReduce,	
 Ā Flink,	
 Ā Mahart,	
 Ā Storm,	
 Ā pbdR,	
 Ā Tez,	
 Ā Spark,	
 Ā 
Esper,	
 Ā WSO2-Ā­ā€CEP
ā€¢ File	
 Ā systems
ā€¢ Google	
 Ā File	
 Ā System	
 Ā 
ā€¢ Apache	
 Ā Hadoop	
 Ā File	
 Ā 
Systems	
 Ā (HDFS)
ā€¢ NoSQL	
 Ā data	
 Ā store
ā€¢ Hbase,	
 Ā BigTable
ā€¢ Cassandra
ā€¢ Dynamo,	
 Ā DynamoDB
ā€¢ Sherpa,	
 Ā PNUTS
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Properties of NoSQL data stores
ā€¢ Scalability
ā€¢ Throughput / Dataset size
ā€¢ Availability
ā€¢ Data replication
ā€¢ Eventual consistency
ā€¢ Consistency level for R/W, to trade off availability and latency
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Autonomic control is a must
ā€¢ System complexity
ā€¢ Human assisted management is unrealistic
ā€¢ Security
ā€¢ complete automation of procedures
ā€¢ self-configuration, self-healing and self-protection
ā€¢ Optimization
ā€¢ Self-optimization
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Two approaches
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Multi-layer
adaptation
(e.g. orchestration
of DB nodes auto-
scaling and VM
placement on top
of the physical
infrastructure)
Single-layer adaptation
(e.g. auto-scaling of DB nodes, self
configuration of DB parameters)
Platforms	
 Ā (logical	
 Ā data	
 Ā organization	
 Ā and	
 Ā 
distribution,	
 Ā access	
 Ā API)
Infrastructures
Big	
 Ā Data	
 Ā Applications	
 Ā +	
 Ā processing	
 Ā 
Framework
e.g.	
 Ā HDFS,	
 Ā Cassandra,	
 Ā Hbase,	
 Ā Dynamo,	
 Ā PNUTS,	
 Ā ā€¦	
 Ā 
Virtual	
 Ā infrastructure
Physical	
 Ā infrastructure
Issues in single layer adaptation
ā€¢ Interference between infrastructure adaptation and
platform adaptation
ā€¢ Platform properties can limit infrastructure level adaptation
actions
ā€¢ E.g. effect of auto-scaling can be limited by serialization constraints.
ā€¢ Geographical distribution (and network configuration) can
conflict with latency/availability trade off at platform layer
ā€¢ Infrastructure adaptation can hurt NoSQL data store
properties
ā€¢ E.g. 2 or more replicas on the same PM impact node reliabilitt and
consistency level reliability
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
An example
DB DB DB
VM VM VM
PM PM PM
reliability=1-(1-r)3
If r=0.9 reliability is 0.999
DB DB DB
VM VM VM
PM
reliability=0.9
RF=3 each node store a replica of the data set
DB DB DB
VM VM VM
PM
reliability=0.99
PM
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Multi layer adaptation
ā€¢ It means to coordinate at run time
ā€¢ Self configuration of BD platform
ā€¢ Deployment of the platform on the virtual infrastructure
ā€¢ Allocation/placement of virtual infrastructure on physical infrastructure
ā€¢ The challenges are:
ā€¢ To formulate an optimization model that account for all the
dependencies and constraints imposed by the system architecture
ā€¢ To formulate multi objective functions that account for contrasting
objectives at infrastructure level and application level
ā€¢ E.g. minimizing power consumption and maximizing platform reliability
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
The Cassandra case study
ā€¢ E.Casalicchio, L.Lundberg, S.Shirinbab, Energy-aware adaptation in managed Cassandra
datacenters, IEEE International Conference on Cloud and Autonomic Computing (ICCAC
2016), Augsburg, Germany, September 12-16, 2016
ā€¢ E. Casalicchio, L.Lundberg, S.Shirinbab (2017), Energy-aware Auto-scaling Algorithms for
Cassandra Virtual Data Centers, Cluster Computing, Elsevier (TO APPEAR JUNE 2017).
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Motivations
ā€¢ Data storage (or serving systems) are playing an important role in
the cloud and big data industry
ā€¢ The management is a challenging task
ā€¢ The complexity increase when multitenancy is considered
ā€¢ Human assisted control is unrealistic
ā€¢ there is a growing demand for autonomic solutions
ā€¢ Our industrial partner Ericsson AB is interested in autonomic
management of cassandra
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Problem description
ā€¢ We consider a provider of a
managed Apache Cassandra
service
ā€¢ The applications or tenants of the
service are independent and each
uses its own Cassandra Virtual
Data Center (VDC)
ā€¢ The service provider want to maintain SLAs, that requires
ā€¢ to properly plan the capacity and the configuration of each Cassandra VDC
ā€¢ To dynamically adapt the infrastructure and VDC configuration without
disrupting performances
ā€¢ To minimize power consumption.
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Solution proposed (1000 ft view)
ā€¢ An energy-aware adaptation model specifically designed for Cassandra VDC
running on a cloud infrastructure
ā€¢ Architectural constraints imposed by Cassandra (minimum number of nodes,
homogeneity of nodes, replication factor and heap size)
ā€¢ Constraints on throughput and replication factor imposed by the SLA
ā€¢ Power consumption model based on CPU utilization
ā€¢ An adaptation policy of the Cassandra VDC configuration and the cloud
infrastructure configuration, that orchestrate three strategies:
ā€¢ Vertical scaling
ā€¢ Horizontal scaling
ā€¢ Optimal placement
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Solution proposed (deep details)
ā€¢ A Workload and SLA model
ā€¢ A System architecture model
ā€¢ A throughput model
ā€¢ The utility function and problem formulation
ā€¢ Drawbacks of the optimal solution and alternatives
ā€¢ Experimental results
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Workload and SLA model
ā€¢ Workload features
ā€¢ the type of requests, e.g. read only, write only, read & write, scan, or a
combination of those
ā€¢ The rate of the operation requests
ā€¢ The size of the dataset
ā€¢ Workload types
ā€¢ CPU bound, Memory bound
ā€¢ The data replication_factor
ā€¢ SLA
R, W, RW (75/25) Dataset size
Replication factor
< š‘™#
, š‘‡#
&#'
, š·#, š‘Ÿ# >
Min Thr ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Architecture model
ā€¢ Homogeneous physical machines (H)
ā€¢ VMs of different type and size (V)
ā€¢ A VDC is composed of š‘›# homogeneous VMs
ā€¢ š‘›# ā‰„ š·#
ā€¢ At least š·# vnodes out of š‘›# must run on different PMs
ā€¢ Datacenter configuration is described by a vector
ā€¢ š‘„ = š‘„#,/,0
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and conļ¬g. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ā‡„103
8.3 ā‡„103
13.3 ā‡„103
2 4 16 4 8.3 ā‡„103
8.3 ā‡„103
8.3 ā‡„103
3 2 16 4 3.3 ā‡„103
3.3 ā‡„103
3.3 ā‡„103
TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA
VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE.
mj (RAM size in GB) 1 2 4 8 16 32
heapSizej (max Heap size in GB) 0.5 1 1 2 4 8
the minimum throughput the service provider must guarantee
In case ri > heapSize
constraint ni Di. Con
can be deļ¬ned as
ni =
j
and considering that in
heapSizej for all co
constraints are modelle
X
j2J ,h2H
xi,j,h
X
j2J
yi,j = 1
X
si,h Di
j cj mj heapSizej R W RW
1 8 32 8 16.6 ā‡„103
8.3 ā‡„103
13.3 ā‡„103
2 4 16 4 8.3 ā‡„103
8.3 ā‡„103
8.3 ā‡„103
3 2 16 4 3.3 ā‡„103
3.3 ā‡„103
3.3 ā‡„103
TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA
VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE.
mj (RAM size in GB) 1 2 4 8 16 32
heapSizej (max Heap size in GB) 0.5 1 1 2 4 8
the minimum throughput the service provider must guarantee
to process the requests from application i. The SLA parameters
Di and ri are used to determine the number of vnodes to be
instantiated, as discussed in the next section.
Concerning Assumption 1, we limit the study to the set L =
{R, W, RW} because the model we propose can deal with any
type of operation requests, as clariļ¬ed later in Section IV-C
Assumption 4 implies that the service provider have to set up
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Architecture model (contā€™d)
ā€¢ To make CPU bound a VDC we need š‘›# ā‰„ š·#
12
03456#738
, if š‘Ÿ# > ā„Žš‘’š‘Žš‘š‘†š‘–š‘§š‘’/
ā€¢ š‘¦#,/ = 1 if application i use a VM configuration j to run Cassandra vnodes,
otherwise š‘¦#,/ = 0
ā€¢ š‘ #,0 = 1	
 Ā if a Cassandra vnode serving application i run of PM h. Otherwise š‘ #,/ =
0
ā€¢ Vertical scaling is modelled assuming that is possible to switch from
configuration j1 to j2 at runtime
ri > heapSizej it holds Eq. 1, otherwise, hold the
nt ni Di. Considering that the number ni of vnodes
eļ¬ned as
ni =
X
j2J ,h2H
xi,j,h 8i 2 I. (2)
sidering that in our industrial case is always ri
ej for all conļ¬gurations j, the above introduced
nts are modelled by the following equations:
X
2J ,h2H
xi,j,h Di Ā·
ri
heapSizej
8i 2 I (3)
X
2J
yi,j = 1 8i 2 I (4)
X
2H
si,h Di 8i 2 I (5)
Throughput for different workloads (ops/sec)
R W RW
16.6 ā‡„103
8.3 ā‡„103
13.3 ā‡„103
8.3 ā‡„103
8.3 ā‡„103
8.3 ā‡„103
3.3 ā‡„103
3.3 ā‡„103
3.3 ā‡„103
BLE FOR THE DATASET IN A CASSANDRA
NCTION OF THE VM MEMORY SIZE.
1 2 4 8 16 32
GB) 0.5 1 1 2 4 8
e service provider must guarantee
application i. The SLA parameters
mine the number of vnodes to be
the next section.
1, we limit the study to the set L =
odel we propose can deal with any
as clariļ¬ed later in Section IV-C.
he service provider have to set up,
oarding phase, and to maintain, at
f vnodes for tenant i. Dealing only
ni =
X
j2J ,h2H
xi,j,h 8i 2 I. (2)
and considering that in our industrial case is always ri
heapSizej for all conļ¬gurations j, the above introduced
constraints are modelled by the following equations:
X
j2J ,h2H
xi,j,h Di Ā·
ri
heapSizej
8i 2 I (3)
X
j2J
yi,j = 1 8i 2 I (4)
X
h2H
si,h Di 8i 2 I (5)
where: yi,j is equal to 1 if application i use a VM conļ¬guration
j to run Cassandra vnodes, otherwise yi,j = 0; si,h is equal
to 1 if a Cassandra vnode serving application i run of PM h.
Otherwise si,h = 0.
Finally, to model vertical scaling actions, that is a change
from conļ¬guration j1 to j2, we replace the VM of type j1
with a VM of type j2. However, in a real setting, hypervisorsACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Throughput model
ā€¢ The actual throughput š‘‡# is a function of š‘„#,/,0
0 2 4 6 8 10 12
Number of nodes (ni
)
0
20
40
60
80
100
Throughputti,j,h
(103
ops/sec)
R
W
RW
t0
t(5< ni
ā‰¤ 8) = t0
Ā· Ī“k
l
i
,j
Ā· (ni
-4) + t(ni
=4)
n
i
Fig. 2. A real example of Cassandra throughput as function of the number
of Cassandra vnodes allocated for different type of requests. The plot shows
how the model we propose is realistic.
min f(x) = P(x)
subject to:
X
J ,H
t(xi,j,h) Tmin
i , 8i 2 I
X
H
xi,j,h Ā· yi,j
Di Ā· ri
heapSizej
, 8i 2 I,
xi,j,h ļ£æ Ā· yi,j, 8i 2 I, j 2 J , h 2 H
X
J
yi,j = 1, 8i 2 I
X
I,J
xi,j,h Ā· cj ļ£æ Ch, 8 h 2 H
X
i
ple, in Figure 2. k
li,j is the slope of the kth
segment and
id for a number of Cassandra vnodes ni between nk 1
nk. Therefore, for nk 1 ļ£æ ni ļ£æ nk, we can write the
wing expression:
t(ni) = t(nk 1) + t0
li,j Ā· k
li,j Ā· (ni nk 1) (6)
e k 1, n0 = 1 and ti,j,h(1) = t0
li,j.
nally, for a conļ¬guration x of a VDC, and considering
ion 2 we deļ¬ne the overall throughput Ti as:
Ti(x) = t (ni) , 8i 2 I (7)
ower consumption model
s service provider utility we chose the power consumption
s directly related with the provider revenue (and with IT
nability).
literature has been proposed many work for reducing
tacenter running N independent
vector x = [xi,j,h], where xi,j,h
nodes serving application i and
ation j allocated on PM h, 8i 2
2 H = [1, H] and I, J , H ā‡¢
s a nominal CPU capacity Ch,
ble cores, and a RAM of Mh
onļ¬gured with cj virtual cores,
mum JVM heap size heapSizej
mportant parameter in our case
of the data a Cassandra vnodes
or fast retrieval and processing.
size of the RAM of the heap
ummarised in Table II. Hence,
C instantiated for application i
li,j
is valid for a number of Cassandra vnodes n
and nk. Therefore, for nk 1 ļ£æ ni ļ£æ nk, w
following expression:
t(ni) = t(nk 1) + t0
li,j Ā· k
li,j Ā· (ni
where k 1, n0 = 1 and ti,j,h(1) = t0
li,j.
Finally, for a conļ¬guration x of a VDC,
Equation 2 we deļ¬ne the overall throughput
Ti(x) = t (ni) , 8i 2 I
D. Power consumption model
As service provider utility we chose the po
that is directly related with the provider reven
sustainability).
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and conļ¬g. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ā‡„103
8.3 ā‡„103
13.3 ā‡„103
2 4 16 4 8.3 ā‡„103
8.3 ā‡„103
8.3 ā‡„103
3 2 16 4 3.3 ā‡„103
3.3 ā‡„103
3.3 ā‡„103
In case ri > heapSizej it holds Eq.
constraint ni Di. Considering that t
can be deļ¬ned as
ni =
X
j2J ,h2H
xi,j,h 8
and considering that in our industria
heapSizej for all conļ¬gurations j,
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Energy consumption model
ā€¢ Based on physical node utilization
ā€¢ š‘ƒ0
&4F
= 500š‘Š and š‘˜0 = 0.7
core used (e.g. [7]).
In this work we chose a linear model [3] where the power
Ph consumed by a physical machine h is a function of the
CPU utilization and hence of the system conļ¬guration x:
Ph(x) = kh Ā· Pmax
h + (1 kh) Ā· Pmax
h Ā· Uh(x) (8)
where Pmax
h is the maximum power consumed when the PM
h is fully utilised (e.g. 500W), kh is the fraction of power
consumed by the idle PM h (e.g. 70%), and the CPU utilisation
for PM h is deļ¬ned by
Uh(x) =
1
Ch
Ā·
X
I,J
xi,j,h Ā· cj (9)
I
yi,j, si,h a
xi,j,k 2 N,
where: t
that the SLA
all the tenan
non linear, b
from operat
eq. 6. Eq. 1
the number
portion of da
and that the
implemente
that for eac
is an extrem
Ph(x) = kh Ā· Pmax
h + (1 kh) Ā· Pmax
h Ā· Uh(x) (8)
Pmax
h is the maximum power consumed when the PM
ly utilised (e.g. 500W), kh is the fraction of power
ed by the idle PM h (e.g. 70%), and the CPU utilisation
h is deļ¬ned by
Uh(x) =
1
Ch
Ā·
X
I,J
xi,j,h Ā· cj (9)
optimisation problem
ntroduced before, the service provider aims to minimise
all energy consumption P(x) deļ¬ned by
X
where: the set of c
that the SLA is satisļ¬e
all the tenants. For the s
non linear, but they can
from operational resear
eq. 6. Eq. 12 introduce
the number of vnodes a
portion of dataset handl
and that the replicatio
implemented. Equation
that for each tenant mu
is an extremely large po
maximum capacity of t
relaxation of such cons
In the same way, Eq.
for the vnodes do not e
physical nodes. Eq. 17ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Problem formulation
ā€¢ Linear model
ā€¢ Objective function linear
ā€¢ Constraints linear
ā€¢ Constraints imposed by
ā€¢ SLA on throughput and
replication factor
ā€¢ Replication factor (and number of
distinct PMs)
ā€¢ Homogeneity of vnodes
configuration
ā€¢ Heap size (we want a CPU bound
configuration)
0 2 4 6 8 10 12
Number of nodes (ni
)
0
20
40
60
80
100
Throughputti,j,h
(103
ops/sec)
R
W
RW
t
0
t(5< ni
ā‰¤ 8) = t
0
Ā· Ī“
k
l
i
,j
Ā· (ni
-4) + t(ni
=4)
n
i
Fig. 2. A real example of Cassandra throughput as function of the number
of Cassandra vnodes allocated for different type of requests. The plot shows
how the model we propose is realistic.
on cloud management systems the techniques typically used
are: scheduling, placement, migration, and reconļ¬guration
of virtual machines. In the speciļ¬c the ultimate goal is to
optimise the use of resources to reduce power consumption.
Optimisation depends on the context, it could means min-
imising PM utilisation or to balance the utilisation level of
physical machine with the use of network devices for data
transfer and storage. Independently from the conļ¬guration or
adaptation policy adopted all these techniques are based on
power and/or energy consumption models. Power consumption
models usually deļ¬ne a linear relationship between the amount
of power used by a system as function of the CPU utilisation
(e.g. [3]ā€“[5]), or processor frequency (e.g. [6]) or number of
core used (e.g. [7]).
In this work we chose a linear model [3] where the power
Ph consumed by a physical machine h is a function of the
min f(x) = P(x)
subject to:
X
J ,H
t(xi,j,h) Tmin
i , 8i 2 I (11)
X
H
xi,j,h Ā· yi,j
Di Ā· ri
heapSizej
, 8i 2 I, j 2 J (12)
xi,j,h ļ£æ Ā· yi,j, 8i 2 I, j 2 J , h 2 H (13)
X
J
yi,j = 1, 8i 2 I (14)
X
I,J
xi,j,h Ā· cj ļ£æ Ch, 8 h 2 H (15)
X
I,J
xi,j,h Ā· mj ļ£æ Mh, 8 h 2 H (16)
X
H
si,h Di, 8i 2 I (17)
X
J
xi,j,h si,h Ā· ļ£æ 0, 8h 2 H (18)
X
J
xi,j,h + si,h ļ£æ 0, 8h 2 H (19)
X
I
si,h rh Ā· ļ£æ 0, 8h 2 H (20)
X
I
si,h + rh ļ£æ 0, 8h 2 H (21)
yi,j, si,h and rh 2 [0, 1], 8i 2 I, j 2 J , h 2 H (22)
xi,j,k 2 N, 8i 2 I, j 2 J , h 2 H (23)ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Sub-optimal adaptation
ā€¢ LocalOpt
ā€¢ LocalOpt-H
ā€¢ BestFit
ā€¢ BestFit-H
10
0 500 1000
Number of tenants
102
10
4
10
6
10
8
Num.ofIterations
V=3
Opt (H=100)
BestFit (H=100)
LocalOpt (H=100)
Opt (H=1000)
BestFit (H=1000)
LocalOpt (H=1000)
0 500 1000
Number of tenants
102
10
4
106
10
8
V=10
Fig. 3 Number of Iterations for diā†µerent values of N, V and
H
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Scenarios
ā€¢ New Service Subscriptions
ā€¢ Dataset size increase
ā€¢ Throughput Increase
ā€¢ Surge in the throughput
ā€¢ Physical node failures
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
asible solutions.
ines 19 - 30 place the vnodes on the PMs minimising
umber of PMs used packing as much vnodes as possible
PM, of course in the respect of Di constraint. That also
mise the energy consumption. The function any(cjā‡¤ ļ£æ
compare cjā‡¤ with all the element of Ca
and it returns
f exist at least one element of Ca
that is greater or equal
cjā‡¤ . Otherwise, if no PMs satisfy the constraint it returns
The same behaviour is valid for any(mjā‡¤ ļ£æ Ma
). The
ion sortDescendent(Ha) sorts the Ha in descending
. The function popRR(Ha,Di) extracts, in round-robin
, a PM from the ļ¬rst Di in Ha. At Line 28, if there is
ore room in the selected PMs the set Ha is sorted again
allows to try the allocation for the PMs that have now
capacity available). At line 32, if not all the nā‡¤
i,jā‡¤ vnodes
allocated the empty set is returned because no feasible
ions for the allocation. Otherwise, the suboptimal solution
is returned.
VI. PERFORMANCE EVALUATION METHODOLOGY
TABLE III. MODEL PARAMETERS USED IN THE EXPERIMENTS
Parameter Value Description
N 1 ā€“ 10 Number of tenants
V 3 Number of VM types
H 8 Number of PMs
Di 1 ā€“ 4 Replication factor for App. i
ri 5 - 50 Dataset size for App. i
L {R, W, RW } Set of request types
T min
i 10000 70000 ops/sec Minimum throughput agreed in the
SLA
Ch 16 Number of cores for PM h
cj 2 ā€“ 8 Number of vcores used by VM
type j
Mh 128 GB Memory size of PM h
mj 16 ā€“ 32 GB Total memory used by VM type j
heapSizej 4 ā€“ 8 GB Max heap size used by VM type j
8li: 1
li
1 1 ļ£æ xi,j,h ļ£æ 2
2
li
0.8 3 ļ£æ xi,j,h ļ£æ 7
3
li
0.66 xi,j,h 8
P max
h 500 Watt Maximum power consumed by PM
h if fully loaded
kh 0.7 Fraction of P max
h consumed by
PM h if idle
Performance metrics
ā€¢ Total power consumption
ā€¢ Scaling Index
ā€¢ Count for horizontal and vertical scaling
ā€¢ The Migration Index
ā€¢ Count the number of migrations
ā€¢ Delayed Requests
ā€¢ Assume not request timeout
ā€¢ Consistency level reliability
ā€¢ Defined as the probability that the number of healthy replicas in the Cassandra VDC is
enough to guarantee a specific level of consistency over a fixed time interval
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
New service subscriptions
Workload
ā€¢ 75%R, 15%W, 10%RW (unif)
ā€¢ Tmin = 10-18Kops/sec (unif)
ā€¢ Di = 2,3 (unif)
ā€¢ Ri = 8GB
et size variation scenarios, while
for the SLA variation scenario.
d on using Matlab R2015b 64-
gle Intel Core i5 processor with
model parameters we used for
ble III.
MENTAL RESULTS
compare the performance of the
lOpt and BestFit heuristics.
we increase the number of sub-
e SLA parameters are randomly
ts summarise the data collected
nant subscription.
system scale when new tenants
mber of vnodes allocated grows
enants. The differences are that:
alOpt tend to allocate small
Number of tenants (N)
Fig. 3. New service subscription: number of vnodes allocated by the three
adaptation policies
1 2 3 4 5 6 7 8
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
1 2 3 4 5 6 7 8
LocalOpt
1 2 3 4 5 6 7 8
BestFit
Number of tenants (N)
Fig. 4. New service subscription: total power consumed by the three
adaptation policies
3
4
5
6
tionIndex(MI)
ing Monte Carlo simulation for the New service
ns and Small dataset size variation scenarios, while
valuation is used for the SLA variation scenario.
s have been carried on using Matlab R2015b 64-
X running on a single Intel Core i5 processor with
ain memory. The model parameters we used for
are reported in Table III.
VII. EXPERIMENTAL RESULTS
vice subscription
owing experiments compare the performance of the
icy with the LocalOpt and BestFit heuristics.
th one tenant and we increase the number of sub-
to 8. Because the SLA parameters are randomly
he following results summarise the data collected
ns for each new tenant subscription.
3 shows how the system scale when new tenants
he service. The number of vnodes allocated grows
h the number of tenants. The differences are that:
l policy and LocalOpt tend to allocate small
hines (VM type 1 and 2) and rarely select VM
the contrary, the BestFit algorithm often uses
(VM type 3) and that explains why the minimum
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Number of tenants (N)
Fig. 3. New service subscription: number of vnodes allocated by the three
adaptation policies
1 2 3 4 5 6 7 8
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
1 2 3 4 5 6 7 8
LocalOpt
1 2 3 4 5 6 7 8
BestFit
Number of tenants (N)
Fig. 4. New service subscription: total power consumed by the three
adaptation policies
1 2 3 4 5 6 7
Number of tenants
0
1
2
3
4
5
6
MigrationIndex(MI)
Results
ā€¢ Optimal policy and LocalOpt
ā€¢ allocate small VMs (type 1 and 2)
ā€¢ rarely select VM type 3
ā€¢ BestFit algorithm
ā€¢ uses large VMs (VM type 3)
ā€¢ the minimum number of vnodes
allocated is lower than the other policies.
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Dataset size increase
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and conļ¬g. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ā‡„103
8.3 ā‡„103
13.3 ā‡„103
2 4 16 4 8.3 ā‡„103
8.3 ā‡„103
8.3 ā‡„103
3 2 16 4 3.3 ā‡„103
3.3 ā‡„103
3.3 ā‡„103
TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA
VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE.
mj (RAM size in GB) 1 2 4 8 16 32
heapSizej (max Heap size in GB) 0.5 1 1 2 4 8
the minimum throughput the service provider must guarantee
to process the requests from application i. The SLA parameters
Di and ri are used to determine the number of vnodes to be
instantiated, as discussed in the next section.
Concerning Assumption 1, we limit the study to the set L =
{R, W, RW} because the model we propose can deal with any
I
c
c
a
h
c
w
j
t
O
Workload
ā€¢ 3 tenants ( R, W, RW )
ā€¢ ri = 10 ā€“ 50 GB
ā€¢ Tmin = [14,10,18] Kops/sec
ā€¢ Di = [3,2,3]
plot for the power consumed P(x)
ht.
40 50
Ā± %)
D2
=2
0 20 30 40 50
r
i
(8GB Ā± %)
App. 3, D3
=3
ox plot for the number of vnodes
0, Tmin
2 = 14.000 and Tmin
3 =
ect the performance of the
from a dataset size of 10GB
Scali
10 15 20 25 30 35 40 45 50
0
10
20
10 15 20 25 30 35 40 45 50
ri
(GByte)
10 15 20 25 30 35 40 45 50
Fig. 11. Dataset size increase: bar plot represents the Scaling Index and the
line represents the number of virtual nodes allocated during each experiment
10 15 20 25 30 35 40 45 50
r
i
(GByte)
LocalOpt
10 15 20 25 30 35 40 45 50
BestFit
10 15 20 25 30 35 40 45 50
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
Fig. 12. Dataset size increase: box plot for the power consumed P(x).
4
0
10
0
10
20
30
Scalingindexandnum.ofvnod
10 15 20 25 30 35 40 45 50
0
10
20
30
10 15 20 25 30 35 40 45 50
r
i
(GByte)
10 15 20 25 30 35 40 45 50
VM type 1 VM type 2 VM type 3 vnodes
LocalOpt
BestFit
Fig. 11. Dataset size increase: bar plot represents the Scaling Index and the
line represents the number of virtual nodes allocated during each experiment
10 15 20 25 30 35 40 45 50
r
i
(GByte)
LocalOpt
10 15 20 25 30 35 40 45 50
BestFit
10 15 20 25 30 35 40 45 50
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
Fig. 12. Dataset size increase: box plot for the power consumed P(x).
15 20 25 30 35 40 45 50
ri
(GByte)
0
1
2
3
4
MigrationIndexMI
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Throughput increase
0 10 20 30 40 50 60 70
T
min
i
Ɨ 10
3
ops/sec
10 20 30 40 50 60 70
ndex for the three applications and for the three policies. The line represent the number of VMs
or the write
ies have the
not capable
ervice level
orkload RW
nodes used
selects VM
back to VM
ise, locally,
adopts VM
o VM type
the price of
of vnodes migrations.
10 20 30 40 50 60 70
LocalOpt
10 20 30 40 50 60 70
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
10 20 30 40 50 60 70
BestFit
Tmin
i
(Ɨ 103
ops/sec.)
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and conļ¬g. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ā‡„103
8.3 ā‡„103
13.3 ā‡„103
2 4 16 4 8.3 ā‡„103
8.3 ā‡„103
8.3 ā‡„103
3 2 16 4 3.3 ā‡„103
3.3 ā‡„103
3.3 ā‡„103
TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA
VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE.
mj (RAM size in GB) 1 2 4 8 16 32
heapSizej (max Heap size in GB) 0.5 1 1 2 4 8
the minimum throughput the service provider must guarantee
to process the requests from application i. The SLA parameters
Di and ri are used to determine the number of vnodes to be
instantiated, as discussed in the next section.
Concerning Assumption 1, we limit the study to the set L =
I
c
c
a
h
c
w
j
Workload
ā€¢ 3 tenants ( R, W, RW )
ā€¢ Tmin = 10-70 Kops/sec
ā€¢ Di = 3
ā€¢ Ri = 8GB
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Throughput increase (contā€™d)
-10
0
10
ScalingIndex(SI)andnum.ofvnodes
App. 1 (R) App. 2 (W) App. 3 (RW)
-10
0
10
10 20 30 40 50 60 70
-10
0
10
10 20 30 40 50 60 70
Tmin
i
Ɨ 103
ops/sec
10 20 30 40 50 60 70
VM type 1 VM type 2 VM type 3 vnodes
LocalOpt
BestFit
Optimal policy
ut increase: Box plot represent the scaling Index for the three applications and for the three policies. The line represent the number of VMs
ch experiment
higher power consumption. Also for the write of vnodes migrations.
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li . THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and conļ¬g. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ā‡„103
8.3 ā‡„103
13.3 ā‡„103
2 4 16 4 8.3 ā‡„103
8.3 ā‡„103
8.3 ā‡„103
3 2 16 4 3.3 ā‡„103
3.3 ā‡„103
3.3 ā‡„103
In case ri > heapSizej it holds Eq
constraint ni Di. Considering that t
can be deļ¬ned as
ni =
X
j2J ,h2H
xi,j,h
and considering that in our industri
2 but return immediately to VM type 1. That at the price of
an higher energy consumption.
From the optimal policy behaviour we learned that is
better to allocate always smaller virtual machines. That choice
usually allow to satisfy both the dataset and throughput con-
straints minimising the actual throughput provided, that for
large dataset can be higher than Tmin
i .
The power consumption is plotted in Figure 7. The
LocalOpt outperforms the BestFit, speciļ¬cally for low
throughput. The penalty payed is between 15 and 25% if
LocalOpt is used and between the 13 and 50% for the
BestFit. The higher loss is observed for low values of the
throughput.
Figure 8 shows the Migration Index. Each box plot is
computed over the data collected for the three tenants. We
can observe that each application experiments between 0 and
3 vnode migrations depending on the infrastructure load state.
The case Tmin
i = 50.000 ops/sec. is an exception because it
corresponds to the vertical scaling action taken for App. 1. The
decrease in the number of vnodes used impacts also the value
T
min
i
(Ɨ 10
3
ops/sec.)
Fig. 7. Throughput increase: the power consumed P(x) by the three
adaptation policies.
20 30 40 50 60 70
T
min
i
Ɨ 10
3
ops/sec
0
1
2
3
MigrationIndex(MI)
Fig. 8. Throughput increase: Migration Index for the optimal policy.
LocalOpt and BestFit have a MI equal to zero by deļ¬nition.
C. SLA variation: small dataset size variations
Dataset size could heavily impact the number of vnodes
used in a VDC (see Eq. 1). In this experiment we assess if
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Throughput increase (contā€™d)Auto-scaling Algorithms for Cassandra Virtual Data Centers 13
20 30 40 50 60 70
-10
-5
0
5
10
ScalingIndex(SI)
Opt
20 30 40 50 60 70
-10
-5
0
5
10
LocalOpt
20 30 40 50 60 70
Throughput Ti
min
(Ɨ 103
tps)
-10
-5
0
5
10
LocalOpt-H
20 30 40 50 60 70
-10
-5
0
5
10
BestFit
20 30 40 50 60 70
-10
-5
0
5
10
BestFit-H
VM type 1 VM type 2 VM type 3
ghput increase: The box represent the scaling Index actions for the RW workload and for the ļ¬ve policies
he adaptation policy switches between two
ations (vertical scaling). The negative bar
M type dismissed and the positive for the
e allocated. Observations with only pos-
respond to horizontal scaling adaptation
example, for the Optimal policy there is
m VM type 3 (yellow bar) to VM type 2
r the observation Tmin
i = 30.000 ops/sec.
between 0 and 3 vnode migrations depending on the
infrastructure load state.
Considering the low values for the migration index
for the Opt allocation and the high saving in the en-
ergy consumed compared with the other algorithms, it
makes sense to perform periodic VDC consolidation us-
ing the Opt policy, as recommended in Section 7.
is for the VM type dismissed and the positive for the
new VM type allocated. Observations with only pos-
itive bars correspond to horizontal scaling adaptation
actions. For example, for the Optimal policy there is
a change from VM type 3 (yellow bar) to VM type 2
(green bar) for the observation Tmin
i = 30.000 ops/sec.
The number of new allocated VMs is smaller because
each new VM oā†µers a higher throughput. The optimal
adaptation policy always starts allocating VMs of Type
3 (cf. Tab. 1) and, if needed progressively moves to more
powerful VM types. The Opt policy performs only one
vertical scaling and when the VM type if changed from
type 3 to type 2; after that it always does horizon-
tal scaling actions (this is a particularly lucky case).
The two heuristics LocalOpt and BestFit show a very
unstable behaviour performing both vertical and hor-
izontal scaling. Both ļ¬rst scale to VM type 1 from
VM type 3 and then they scale back to VM type 2.
When the variant of the above algorithm is used, that is
LocalOpt-H and BestFit-H respectively, the VM type
is ļ¬xed to type 1 and the only action taken is horizontal
scaling.
The power consumption is plotted in Figure 6. For
throughput higher than 40ā‡„103
ops/sec, with the opti-
mal scaling is possible to save about 50% of the energy
consumed by the heuristic allocation. For low values of
the throughput (10 20ā‡„103
ops/sec) the BestFit and
BestFit-H show a very high energy consumption com-
Considering the low values for the migration index
for the Opt allocation and the high saving in the en-
ergy consumed compared with the other algorithms, it
makes sense to perform periodic VDC consolidation us-
ing the Opt policy, as recommended in Section 7.
10 20 30 40 50 60 70
Throughput T
i
min
(Ɨ 103
tps)
1000
1500
2000
2500
3000
3500
4000
PowerconsumedP(x)(Watt)
Opt
LocalOpt
LocalOpt-H
BestFit
BestFit-H
Fig. 6 Throughput increase: the power consumed P(x) by
the ļ¬ve adaptation policies when increasing the throughput
for Application 3 (RW workload).
9.2 Throughput surge
In this set of experiments we analyse how fast the scal-
ing is, with respect to the throughput variation rate,
and what is the number of delayed requests. We assume
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Auto-scaling Algorithms for Cassandra Virtual Data Centers 11
of the auto-scaling algorithms
Use case Opt LocalOpt BestFit LocalOpt-H BestFit-H
Capacity planning X
Data center consolidation X
VDC consolidation X X
run-time adaptation X X X X
Surge in the throughput (RW)Emiliano Casalicchio et al.
0 2 4 6 8 10 12 14 16 18 20 22
Time (minutes)
0
10
20
30
40
50
60
70
80
Throughput(Ɨ10
3
ops/sec)
ThrRW
min
Opt (Case A, actual thr)
BestFit (Case A, actual thr)
Opt (Case B, actual thr)
BestFit (Case B, actual thr)
Fig. 8 Auto-scaling actions in case of a throughput surge:
Case A and Case B
). Than, at time t = 10, twelve vnodes are al-
Considering the serialization of the horizontal
actions (cf. Section 7) the seven Cassandra vn-
e added in 14 minutes. The LocalOpt behaves
Opt in terms of scaling decisions. The BestFit
aling start allocationg 4vnodes of Type 3, than
up to seven vnodes (at time t = 8) and ļ¬nally
wo vertical scaling actions: the ļ¬rst from vnodes
to Type 2, and the second from Type 2to Type
s.
number of delayed requests Qi and the percent-
h respect the total number of received requests
) are reported in table 5. Qi and tot.req. are
ed over the time interval the requested through-
n
W exceed the actual throughput.
itively, with Cassandra vnodes capable to han-
gher workload it should be possible to better
the surge in the throughput. Hence, we have an-
Case B where we conļ¬gure three new types of
dra vnodes capable to handle the following RW
put: type 4, 20 ā‡„ 103
ops/sec.; type 5, 15 ā‡„ 103
vnodes type capable to handle from low throughput
to very high throughput allow to manage throughput
surges.
Table 5 The number of delayed requests Qi and the per-
centage with respect the total number of received requests
(tot.req.). Qi and tot.req. are computed over the time in-
terval the requested throughput (Tmin
RW ) exceed the actual
throughput.
Case A Qi (ā‡„103
) Qi
tot.req.
(%)
Opt 191.84 22.78
LocalOpt 191.84 22.78
BestFit 70.89 46.33
Case B
Opt 7.66 4
LocalOpt 7.66 4
BestFit 70.58 30.29
Case	
 Ā B
T4,	
 Ā 20	
 Ā Ć— 103ops/sec.	
 Ā 
T5,	
 Ā 15	
 Ā Ć— 103ops/sec.
T6,	
 Ā 7	
 Ā Ć— 103	
 Ā ops/sec.	
 Ā 
Case	
 Ā A
T1,	
 Ā 13.3x103 ops/sec
T2,	
 Ā 8.3x103 ops/sec
T3,	
 Ā 3.3x103	
 Ā ops/sec
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Physical node failure
Consistency	
 Ā level	
 Ā reliability	
 Ā R	
 Ā defined	
 Ā as	
 Ā the	
 Ā probability	
 Ā that	
 Ā the	
 Ā number	
 Ā of	
 Ā healthy	
 Ā replicas	
 Ā in	
 Ā the	
 Ā 
Cassandra	
 Ā VDC	
 Ā is	
 Ā enough	
 Ā to	
 Ā guarantee	
 Ā a	
 Ā specific	
 Ā level	
 Ā of	
 Ā consistency	
 Ā over	
 Ā a	
 Ā fixed	
 Ā time	
 Ā interval	
 Ā 
Consistency	
 Ā level	
 Ā ONE	
 Ā and	
 Ā QUORUM	
 Ā (Q=	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā )
rithms are applied.
In case the Cassandra VDC has a number of physi-
cal nodes H equal to the number of vnodes n, and there
is a one-to-one mapping between vnodes and physical
nodes, the consistency level of ONE is guaranteed if one
replica is up. Hence, the Consistence reliability is the
probability that at least one vnode is up and a replica
is on that node:
RO = 1
D
n
ā‡„ (1 ā‡¢)n
(24)
where: ā‡¢ is the resiliency of a physical node, and D
n is the
probability that a replica is on a Cassandra vnode when
the data replication strategy used is the SimpleStrategy
(cf. the Datastax documentation for Cassandra). In the
same way, we can deļ¬ne the reliability of the Cassandra
VDC to guarantee a consistency level of QUORUM as
the probability that at least Q vnodes are up and that
Q replicas are on them:
RQ = 1
D
ā‡„ (1 ā‡¢)n Q+1
. (25)
physical nodes is unknown
puted the values of KO and
the value for KO is equa
nodes used, the values for
allocation and on the node
allocation, we could have
where the max{K1
Q, K2
Q, .
and the min{K1
Q, K2
Q, ...} r
example, if 8 vnodes are d
following way {1, 1, 2, 1, 3}
n Q + 1 = 7, KO = 5, K
Table 6 reports the val
and D = 5, ā‡¢ = 0.9. Th
following way. We consid
for a randomly generated S
RW, 15%W and 75% R; T
in the interval [10.000, 18.0
ri is constant (8GB). The
each tenant is n = 6 for th
the case D = 5. We run 1
the best and worst case ov
In the ļ¬rst set of expe
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
RO = 1
D
n
ā‡„ (1 ā‡¢)n
(24)
where: ā‡¢ is the resiliency of a physical node, and D
n is the
probability that a replica is on a Cassandra vnode when
the data replication strategy used is the SimpleStrategy
(cf. the Datastax documentation for Cassandra). In the
same way, we can deļ¬ne the reliability of the Cassandra
VDC to guarantee a consistency level of QUORUM as
the probability that at least Q vnodes are up and that
Q replicas are on them:
RQ = 1
D
n
ā‡„ (1 ā‡¢)n Q+1
. (25)
Table 6 shows the values of RO and RQ for D = 3 and
5 and for ā‡¢ = 0.9 and ā‡¢ = 0.8.
In a managed Cassandra data center, a Cassandra
VDC is rarely allocated using a one-to-one mapping of
vnodes on physical nodes. The resource management
policies adopted by the provider usually end-up with
a many-to-one mapping, that is h physical nodes run
n Cassandra vnodes: D ļ£æ h < n. In that case we can
following way {1, 1, 2, 1, 3}
n Q + 1 = 7, KO = 5, K
Table 6 reports the valu
and D = 5, ā‡¢ = 0.9. Th
following way. We conside
for a randomly generated S
RW, 15%W and 75% R; T
in the interval [10.000, 18.0
ri is constant (8GB). The
each tenant is n = 6 for th
the case D = 5. We run 10
the best and worst case ov
In the ļ¬rst set of exper
The one-to-one mapping
ONE and QUORUM with
9s and ļ¬ve 9s respectively
tion factor increase to 5 th
9s and eight 9s for consis
respectively.
Unfortunately, when a
the reliability of the consi
orders of magnitude. In th
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
DB DB DB
VM VM VM
PM PM PM
DB DB DB
VM VM VM
PM PM
of the Opt and BestFit auto-scaling al-
LocalOpt behaves as the Opt. From the
ent that with more powerful vnodes the
gorithms are capable to satisfy the re-
hput with a delay of only 2 minutes. The
ating 3 vnodes of type 6, at time t = 4
type 6 is added and at time t = 6 the
a vertical scaling allocating 4 vnodes of
Cassandra oā†µers three main levels of consistency (both
for Read and Write): ONE, QUORUM and ALL. Con-
sistency level of ONE means that only one replica node
is required to reply correctly, that is it contains the
replica of the portion of the dataset needed to answer
the query. Consistency level QUORUM means that Q =āŒ…D
2
ā‡§
+ 1 replicas nodes are available to reply correctly
Q replicas are on them:
RQ = 1
D
n
ā‡„ (1 ā‡¢)n Q+1
. (25)
Table 6 shows the values of RO and RQ for D = 3 and
5 and for ā‡¢ = 0.9 and ā‡¢ = 0.8.
In a managed Cassandra data center, a Cassandra
VDC is rarely allocated using a one-to-one mapping of
vnodes on physical nodes. The resource management
policies adopted by the provider usually end-up with
a many-to-one mapping, that is h physical nodes run
n Cassandra vnodes: D ļ£æ h < n. In that case we can
generalise equations 24 and 25 to the following:
RO = 1
D
n
ā‡„ (1 r)KO
(26)
the case D = 5. We
the best and worst ca
In the ļ¬rst set of
The one-to-one map
ONE and QUORUM
9s and ļ¬ve 9s respec
tion factor increase t
9s and eight 9s for
respectively.
Unfortunately, wh
the reliability of the
orders of magnitude.
that all the auto-scali
ONE with a reliabilit
tor increases to 5 the
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
for Cassandra Virtual Data Centers 15
the consistency level of ONE and QUORUM. The probability that a data replica is on
= 5. We assume the reliability of a physical node is ā‡¢ = 0.9
n-to-one
to-one Opt LocalOpt LocalOpt-H BestFit BestFit-H
99995 0.9995 0.9995
99995 0.995 ā€“ 0.9995 0.9995
9999995 0.999995 0.999995
999995 0.9995 ā€“ 0.999995 0.99995
9996 0.996 0.996
9984 0.98 ā€“ 0.996 0.996
999948 0.99984 0.99984
99987 0.996 ā€“ 0.99984 0.9992
). Consistency level
e available.
Consistency level re-
f ONE and of QUO-
RQ = 1
D
n
ā‡„ (1 r)KQ
. (27)
where: KO is the number of failed physical nodes that
causes a failure of n vnodes; and KQ is the number of
KO is	
 Ā the	
 Ā number	
 Ā of	
 Ā failed	
 Ā physical	
 Ā nodes	
 Ā that	
 Ā causes	
 Ā a	
 Ā 
failure	
 Ā of	
 Ā n	
 Ā vnodes;
KQ is	
 Ā the	
 Ā number	
 Ā of	
 Ā failed	
 Ā physical	
 Ā nodes	
 Ā that	
 Ā causes	
 Ā a	
 Ā 
failure	
 Ā of	
 Ā 	
 Ā (n	
 Ā āˆ’	
 Ā Q	
 Ā +	
 Ā 1)	
 Ā vnodes.	
 Ā 
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Physical node failure (contā€™d)ware Auto-scaling Algorithms for Cassandra Virtual Data Centers
Consistency reliability R for the consistency level of ONE and QUORUM. The probability that a data repli
s 0.5 for both D = 3 and D = 5. We assume the reliability of a physical node is ā‡¢ = 0.9
n-to-one
ā‡¢ = 0.9 one-to-one Opt LocalOpt LocalOpt-H BestFit BestFit-H
RO|D=3 0.9999995 0.9995 0.9995
RQ|D=3 0.999995 0.995 ā€“ 0.9995 0.9995
RO|D=5 0.99999999995 0.999995 0.999995
RQ|D=5 0.999999995 0.9995 ā€“ 0.999995 0.99995
ā‡¢ = 0.8
RO|D=3 0.99996 0.996 0.996
RQ|D=3 0.99984 0.98 ā€“ 0.996 0.996
RO|D=5 0.999999948 0.99984 0.99984
RQ|D=5 0.9999987 0.996 ā€“ 0.99984 0.9992
D is the replication factor). Consistency level
ans that all the replicas are available.
RQ = 1
D
n
ā‡„ (1 r)KQ
.ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Lesson learned
ā€¢ When you have to deal with a specific technology there are many
constraint to be considered
ā€¢ The multi-layer adaptation is a must
ā€¢ Not a single policies fit all the workload
ā€¢ Not all the policies fit all the application life cycle
aware Auto-scaling Algorithms for Cassandra Virtual Data Centers
3 Use of the auto-scaling algorithms
Use case Opt LocalOpt BestFit LocalOpt-H BestFit-H
Capacity planning X
Data center consolidation X
VDC consolidation X X
run-time adaptation X X X X
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting
Questions ?
Emiliano Casalicchio
http://www.bth.se/people/emc
emc@bth.se
ACROSS	
 Ā -Ā­ā€ Rome	
 Ā Meeting

Weitere Ƥhnliche Inhalte

Was ist angesagt?

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemShuai Yuan
Ā 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
Ā 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
Ā 
ć‚Æćƒ©ć‚¦ćƒ‰ę™‚ä»£ć®åŠå°Žä½“ćƒ”ćƒ¢ćƒŖćƒ¼ęŠ€č”“
ć‚Æćƒ©ć‚¦ćƒ‰ę™‚ä»£ć®åŠå°Žä½“ćƒ”ćƒ¢ćƒŖćƒ¼ęŠ€č”“ć‚Æćƒ©ć‚¦ćƒ‰ę™‚ä»£ć®åŠå°Žä½“ćƒ”ćƒ¢ćƒŖćƒ¼ęŠ€č”“
ć‚Æćƒ©ć‚¦ćƒ‰ę™‚ä»£ć®åŠå°Žä½“ćƒ”ćƒ¢ćƒŖćƒ¼ęŠ€č”“Ryousei Takano
Ā 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009James McGalliard
Ā 
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open CloudCoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open CloudAta Turk
Ā 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1Stefanie Zhao
Ā 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
Ā 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsHeechul Yun
Ā 
Cassandra at teads
Cassandra at teadsCassandra at teads
Cassandra at teadsRomain Hardouin
Ā 
Performance features12102 doag_2014
Performance features12102 doag_2014Performance features12102 doag_2014
Performance features12102 doag_2014Trivadis
Ā 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...Bikash Chandra Karmokar
Ā 
Health Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS SystemHealth Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS Systemsjreese
Ā 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Gluster.org
Ā 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and HadoopNicola Cadenelli
Ā 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
Ā 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleDataWorks Summit
Ā 
IMSBufferpool Tuning concept AMS presentation v01
IMSBufferpool Tuning concept AMS presentation v01IMSBufferpool Tuning concept AMS presentation v01
IMSBufferpool Tuning concept AMS presentation v01Manoj Kaveri
Ā 

Was ist angesagt? (20)

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Ā 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
Ā 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Ā 
ć‚Æćƒ©ć‚¦ćƒ‰ę™‚ä»£ć®åŠå°Žä½“ćƒ”ćƒ¢ćƒŖćƒ¼ęŠ€č”“
ć‚Æćƒ©ć‚¦ćƒ‰ę™‚ä»£ć®åŠå°Žä½“ćƒ”ćƒ¢ćƒŖćƒ¼ęŠ€č”“ć‚Æćƒ©ć‚¦ćƒ‰ę™‚ä»£ć®åŠå°Žä½“ćƒ”ćƒ¢ćƒŖćƒ¼ęŠ€č”“
ć‚Æćƒ©ć‚¦ćƒ‰ę™‚ä»£ć®åŠå°Žä½“ćƒ”ćƒ¢ćƒŖćƒ¼ęŠ€č”“
Ā 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Ā 
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open CloudCoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
Ā 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
Ā 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
Ā 
MapReduce
MapReduceMapReduce
MapReduce
Ā 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Ā 
Cassandra at teads
Cassandra at teadsCassandra at teads
Cassandra at teads
Ā 
Performance features12102 doag_2014
Performance features12102 doag_2014Performance features12102 doag_2014
Performance features12102 doag_2014
Ā 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
Ā 
Health Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS SystemHealth Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS System
Ā 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Ā 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
Ā 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Ā 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
Ā 
IMSBufferpool Tuning concept AMS presentation v01
IMSBufferpool Tuning concept AMS presentation v01IMSBufferpool Tuning concept AMS presentation v01
IMSBufferpool Tuning concept AMS presentation v01
Ā 
Latest in ml
Latest in mlLatest in ml
Latest in ml
Ā 

Ƅhnlich wie Autonomous control in Big Data platforms: and experience with Cassandra

Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud WorkshopAmer Ather
Ā 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
Ā 
WSO2 Customer Webinar: WEST Interactiveā€™s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactiveā€™s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactiveā€™s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactiveā€™s Deployment Approach and DevOps Prac...WSO2
Ā 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
Ā 
A stochastic approach to analysis of energy aware dvs-enabled cloud datacenters
A stochastic approach to analysis of energy aware dvs-enabled cloud datacentersA stochastic approach to analysis of energy aware dvs-enabled cloud datacenters
A stochastic approach to analysis of energy aware dvs-enabled cloud datacentersieeepondy
Ā 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud nedamaleki87
Ā 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningCloudLightning
Ā 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationRECAP Project
Ā 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green CloudNeda Maleki
Ā 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
Ā 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
Ā 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
Ā 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Ā 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsDeepak Shankar
Ā 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...DataStax Academy
Ā 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computerHassan A-j
Ā 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
Ā 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
Ā 
Data Replication In Cloud Computing
Data Replication In Cloud ComputingData Replication In Cloud Computing
Data Replication In Cloud ComputingRahul Garg
Ā 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial ServicesAerospike
Ā 

Ƅhnlich wie Autonomous control in Big Data platforms: and experience with Cassandra (20)

Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud Workshop
Ā 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
Ā 
WSO2 Customer Webinar: WEST Interactiveā€™s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactiveā€™s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactiveā€™s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactiveā€™s Deployment Approach and DevOps Prac...
Ā 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
Ā 
A stochastic approach to analysis of energy aware dvs-enabled cloud datacenters
A stochastic approach to analysis of energy aware dvs-enabled cloud datacentersA stochastic approach to analysis of energy aware dvs-enabled cloud datacenters
A stochastic approach to analysis of energy aware dvs-enabled cloud datacenters
Ā 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
Ā 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
Ā 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource Configuration
Ā 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green Cloud
Ā 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
Ā 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
Ā 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ā 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Ā 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
Ā 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Ā 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computer
Ā 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Ā 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
Ā 
Data Replication In Cloud Computing
Data Replication In Cloud ComputingData Replication In Cloud Computing
Data Replication In Cloud Computing
Ā 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
Ā 

KĆ¼rzlich hochgeladen

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
Ā 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
Ā 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
Ā 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
Ā 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
Ā 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
Ā 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
Ā 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
Ā 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
Ā 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
Ā 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
Ā 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
Ā 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
Ā 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
Ā 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
Ā 
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
Ā 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
Ā 
Call Us ā‰½ 8377877756 ā‰¼ Call Girls In Shastri Nagar (Delhi)
Call Us ā‰½ 8377877756 ā‰¼ Call Girls In Shastri Nagar (Delhi)Call Us ā‰½ 8377877756 ā‰¼ Call Girls In Shastri Nagar (Delhi)
Call Us ā‰½ 8377877756 ā‰¼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
Ā 

KĆ¼rzlich hochgeladen (20)

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
Ā 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
Ā 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Ā 
young call girls in Green ParkšŸ” 9953056974 šŸ” escort Service
young call girls in Green ParkšŸ” 9953056974 šŸ” escort Serviceyoung call girls in Green ParkšŸ” 9953056974 šŸ” escort Service
young call girls in Green ParkšŸ” 9953056974 šŸ” escort Service
Ā 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
Ā 
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Serviceyoung call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
Ā 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
Ā 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
Ā 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
Ā 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
Ā 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
Ā 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Ā 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
Ā 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Ā 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
Ā 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
Ā 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
Ā 
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
Ā 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
Ā 
Call Us ā‰½ 8377877756 ā‰¼ Call Girls In Shastri Nagar (Delhi)
Call Us ā‰½ 8377877756 ā‰¼ Call Girls In Shastri Nagar (Delhi)Call Us ā‰½ 8377877756 ā‰¼ Call Girls In Shastri Nagar (Delhi)
Call Us ā‰½ 8377877756 ā‰¼ Call Girls In Shastri Nagar (Delhi)
Ā 

Autonomous control in Big Data platforms: and experience with Cassandra

  • 1. Autonomous control in Big Data platforms: an experience with Cassandra Emiliano Casalicchio (emc@bth.se) Joint research with: Lars Lundberg and Sogand Shirinbab Computer Science Dep. Blekinge Institute of Technology ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 2. Research framework ā€¢ Scalable resource-efficient systems for big data analytics ā€¢ awarded by Knowledge Foundation, Sweden (20140032), ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 3. Agenda ā€¢ Big Data Platform ā€¢ Main properties ā€¢ Why autonomous control is important ā€¢ Challenges ā€¢ The Cassandra case study ā€¢ Conclusions ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 4. The NIST BDRABig Ā Data Ā Framework Ā Providers BD Ā Platforms Ā (logical Ā data Ā organization Ā and Ā  distribution, Ā access Ā API) Infrastructures Ā (networking, Ā computing, Ā  storage) BD Ā Processing Ā Frameworks Ā (batch, Ā  interactive, Ā streaming) Big Ā Data Ā Applications e.g. Ā HDFS, Ā Cassandra, Ā Hbase, Ā Dynamo, Ā PNUTS, Ā ā€¦ Ā  e.g. Ā MapReduce, Ā Flink, Ā Mahart, Ā Storm, Ā pbdR, Ā Tez, Ā Spark, Ā  Esper, Ā WSO2-Ā­ā€CEP ā€¢ File Ā systems ā€¢ Google Ā File Ā System Ā  ā€¢ Apache Ā Hadoop Ā File Ā  Systems Ā (HDFS) ā€¢ NoSQL Ā data Ā store ā€¢ Hbase, Ā BigTable ā€¢ Cassandra ā€¢ Dynamo, Ā DynamoDB ā€¢ Sherpa, Ā PNUTS ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 5. Properties of NoSQL data stores ā€¢ Scalability ā€¢ Throughput / Dataset size ā€¢ Availability ā€¢ Data replication ā€¢ Eventual consistency ā€¢ Consistency level for R/W, to trade off availability and latency ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 6. Autonomic control is a must ā€¢ System complexity ā€¢ Human assisted management is unrealistic ā€¢ Security ā€¢ complete automation of procedures ā€¢ self-configuration, self-healing and self-protection ā€¢ Optimization ā€¢ Self-optimization ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 7. Two approaches ACROSS Ā -Ā­ā€ Rome Ā Meeting Multi-layer adaptation (e.g. orchestration of DB nodes auto- scaling and VM placement on top of the physical infrastructure) Single-layer adaptation (e.g. auto-scaling of DB nodes, self configuration of DB parameters) Platforms Ā (logical Ā data Ā organization Ā and Ā  distribution, Ā access Ā API) Infrastructures Big Ā Data Ā Applications Ā + Ā processing Ā  Framework e.g. Ā HDFS, Ā Cassandra, Ā Hbase, Ā Dynamo, Ā PNUTS, Ā ā€¦ Ā  Virtual Ā infrastructure Physical Ā infrastructure
  • 8. Issues in single layer adaptation ā€¢ Interference between infrastructure adaptation and platform adaptation ā€¢ Platform properties can limit infrastructure level adaptation actions ā€¢ E.g. effect of auto-scaling can be limited by serialization constraints. ā€¢ Geographical distribution (and network configuration) can conflict with latency/availability trade off at platform layer ā€¢ Infrastructure adaptation can hurt NoSQL data store properties ā€¢ E.g. 2 or more replicas on the same PM impact node reliabilitt and consistency level reliability ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 9. An example DB DB DB VM VM VM PM PM PM reliability=1-(1-r)3 If r=0.9 reliability is 0.999 DB DB DB VM VM VM PM reliability=0.9 RF=3 each node store a replica of the data set DB DB DB VM VM VM PM reliability=0.99 PM ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 10. Multi layer adaptation ā€¢ It means to coordinate at run time ā€¢ Self configuration of BD platform ā€¢ Deployment of the platform on the virtual infrastructure ā€¢ Allocation/placement of virtual infrastructure on physical infrastructure ā€¢ The challenges are: ā€¢ To formulate an optimization model that account for all the dependencies and constraints imposed by the system architecture ā€¢ To formulate multi objective functions that account for contrasting objectives at infrastructure level and application level ā€¢ E.g. minimizing power consumption and maximizing platform reliability ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 11. The Cassandra case study ā€¢ E.Casalicchio, L.Lundberg, S.Shirinbab, Energy-aware adaptation in managed Cassandra datacenters, IEEE International Conference on Cloud and Autonomic Computing (ICCAC 2016), Augsburg, Germany, September 12-16, 2016 ā€¢ E. Casalicchio, L.Lundberg, S.Shirinbab (2017), Energy-aware Auto-scaling Algorithms for Cassandra Virtual Data Centers, Cluster Computing, Elsevier (TO APPEAR JUNE 2017). ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 12. Motivations ā€¢ Data storage (or serving systems) are playing an important role in the cloud and big data industry ā€¢ The management is a challenging task ā€¢ The complexity increase when multitenancy is considered ā€¢ Human assisted control is unrealistic ā€¢ there is a growing demand for autonomic solutions ā€¢ Our industrial partner Ericsson AB is interested in autonomic management of cassandra ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 13. Problem description ā€¢ We consider a provider of a managed Apache Cassandra service ā€¢ The applications or tenants of the service are independent and each uses its own Cassandra Virtual Data Center (VDC) ā€¢ The service provider want to maintain SLAs, that requires ā€¢ to properly plan the capacity and the configuration of each Cassandra VDC ā€¢ To dynamically adapt the infrastructure and VDC configuration without disrupting performances ā€¢ To minimize power consumption. ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 14. Solution proposed (1000 ft view) ā€¢ An energy-aware adaptation model specifically designed for Cassandra VDC running on a cloud infrastructure ā€¢ Architectural constraints imposed by Cassandra (minimum number of nodes, homogeneity of nodes, replication factor and heap size) ā€¢ Constraints on throughput and replication factor imposed by the SLA ā€¢ Power consumption model based on CPU utilization ā€¢ An adaptation policy of the Cassandra VDC configuration and the cloud infrastructure configuration, that orchestrate three strategies: ā€¢ Vertical scaling ā€¢ Horizontal scaling ā€¢ Optimal placement ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 15. Solution proposed (deep details) ā€¢ A Workload and SLA model ā€¢ A System architecture model ā€¢ A throughput model ā€¢ The utility function and problem formulation ā€¢ Drawbacks of the optimal solution and alternatives ā€¢ Experimental results ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 16. Workload and SLA model ā€¢ Workload features ā€¢ the type of requests, e.g. read only, write only, read & write, scan, or a combination of those ā€¢ The rate of the operation requests ā€¢ The size of the dataset ā€¢ Workload types ā€¢ CPU bound, Memory bound ā€¢ The data replication_factor ā€¢ SLA R, W, RW (75/25) Dataset size Replication factor < š‘™# , š‘‡# &#' , š·#, š‘Ÿ# > Min Thr ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 17. Architecture model ā€¢ Homogeneous physical machines (H) ā€¢ VMs of different type and size (V) ā€¢ A VDC is composed of š‘›# homogeneous VMs ā€¢ š‘›# ā‰„ š·# ā€¢ At least š·# vnodes out of š‘›# must run on different PMs ā€¢ Datacenter configuration is described by a vector ā€¢ š‘„ = š‘„#,/,0 TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and conļ¬g. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ā‡„103 8.3 ā‡„103 13.3 ā‡„103 2 4 16 4 8.3 ā‡„103 8.3 ā‡„103 8.3 ā‡„103 3 2 16 4 3.3 ā‡„103 3.3 ā‡„103 3.3 ā‡„103 TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE. mj (RAM size in GB) 1 2 4 8 16 32 heapSizej (max Heap size in GB) 0.5 1 1 2 4 8 the minimum throughput the service provider must guarantee In case ri > heapSize constraint ni Di. Con can be deļ¬ned as ni = j and considering that in heapSizej for all co constraints are modelle X j2J ,h2H xi,j,h X j2J yi,j = 1 X si,h Di j cj mj heapSizej R W RW 1 8 32 8 16.6 ā‡„103 8.3 ā‡„103 13.3 ā‡„103 2 4 16 4 8.3 ā‡„103 8.3 ā‡„103 8.3 ā‡„103 3 2 16 4 3.3 ā‡„103 3.3 ā‡„103 3.3 ā‡„103 TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE. mj (RAM size in GB) 1 2 4 8 16 32 heapSizej (max Heap size in GB) 0.5 1 1 2 4 8 the minimum throughput the service provider must guarantee to process the requests from application i. The SLA parameters Di and ri are used to determine the number of vnodes to be instantiated, as discussed in the next section. Concerning Assumption 1, we limit the study to the set L = {R, W, RW} because the model we propose can deal with any type of operation requests, as clariļ¬ed later in Section IV-C Assumption 4 implies that the service provider have to set up ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 18. Architecture model (contā€™d) ā€¢ To make CPU bound a VDC we need š‘›# ā‰„ š·# 12 03456#738 , if š‘Ÿ# > ā„Žš‘’š‘Žš‘š‘†š‘–š‘§š‘’/ ā€¢ š‘¦#,/ = 1 if application i use a VM configuration j to run Cassandra vnodes, otherwise š‘¦#,/ = 0 ā€¢ š‘ #,0 = 1 Ā if a Cassandra vnode serving application i run of PM h. Otherwise š‘ #,/ = 0 ā€¢ Vertical scaling is modelled assuming that is possible to switch from configuration j1 to j2 at runtime ri > heapSizej it holds Eq. 1, otherwise, hold the nt ni Di. Considering that the number ni of vnodes eļ¬ned as ni = X j2J ,h2H xi,j,h 8i 2 I. (2) sidering that in our industrial case is always ri ej for all conļ¬gurations j, the above introduced nts are modelled by the following equations: X 2J ,h2H xi,j,h Di Ā· ri heapSizej 8i 2 I (3) X 2J yi,j = 1 8i 2 I (4) X 2H si,h Di 8i 2 I (5) Throughput for different workloads (ops/sec) R W RW 16.6 ā‡„103 8.3 ā‡„103 13.3 ā‡„103 8.3 ā‡„103 8.3 ā‡„103 8.3 ā‡„103 3.3 ā‡„103 3.3 ā‡„103 3.3 ā‡„103 BLE FOR THE DATASET IN A CASSANDRA NCTION OF THE VM MEMORY SIZE. 1 2 4 8 16 32 GB) 0.5 1 1 2 4 8 e service provider must guarantee application i. The SLA parameters mine the number of vnodes to be the next section. 1, we limit the study to the set L = odel we propose can deal with any as clariļ¬ed later in Section IV-C. he service provider have to set up, oarding phase, and to maintain, at f vnodes for tenant i. Dealing only ni = X j2J ,h2H xi,j,h 8i 2 I. (2) and considering that in our industrial case is always ri heapSizej for all conļ¬gurations j, the above introduced constraints are modelled by the following equations: X j2J ,h2H xi,j,h Di Ā· ri heapSizej 8i 2 I (3) X j2J yi,j = 1 8i 2 I (4) X h2H si,h Di 8i 2 I (5) where: yi,j is equal to 1 if application i use a VM conļ¬guration j to run Cassandra vnodes, otherwise yi,j = 0; si,h is equal to 1 if a Cassandra vnode serving application i run of PM h. Otherwise si,h = 0. Finally, to model vertical scaling actions, that is a change from conļ¬guration j1 to j2, we replace the VM of type j1 with a VM of type j2. However, in a real setting, hypervisorsACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 19. Throughput model ā€¢ The actual throughput š‘‡# is a function of š‘„#,/,0 0 2 4 6 8 10 12 Number of nodes (ni ) 0 20 40 60 80 100 Throughputti,j,h (103 ops/sec) R W RW t0 t(5< ni ā‰¤ 8) = t0 Ā· Ī“k l i ,j Ā· (ni -4) + t(ni =4) n i Fig. 2. A real example of Cassandra throughput as function of the number of Cassandra vnodes allocated for different type of requests. The plot shows how the model we propose is realistic. min f(x) = P(x) subject to: X J ,H t(xi,j,h) Tmin i , 8i 2 I X H xi,j,h Ā· yi,j Di Ā· ri heapSizej , 8i 2 I, xi,j,h ļ£æ Ā· yi,j, 8i 2 I, j 2 J , h 2 H X J yi,j = 1, 8i 2 I X I,J xi,j,h Ā· cj ļ£æ Ch, 8 h 2 H X i ple, in Figure 2. k li,j is the slope of the kth segment and id for a number of Cassandra vnodes ni between nk 1 nk. Therefore, for nk 1 ļ£æ ni ļ£æ nk, we can write the wing expression: t(ni) = t(nk 1) + t0 li,j Ā· k li,j Ā· (ni nk 1) (6) e k 1, n0 = 1 and ti,j,h(1) = t0 li,j. nally, for a conļ¬guration x of a VDC, and considering ion 2 we deļ¬ne the overall throughput Ti as: Ti(x) = t (ni) , 8i 2 I (7) ower consumption model s service provider utility we chose the power consumption s directly related with the provider revenue (and with IT nability). literature has been proposed many work for reducing tacenter running N independent vector x = [xi,j,h], where xi,j,h nodes serving application i and ation j allocated on PM h, 8i 2 2 H = [1, H] and I, J , H ā‡¢ s a nominal CPU capacity Ch, ble cores, and a RAM of Mh onļ¬gured with cj virtual cores, mum JVM heap size heapSizej mportant parameter in our case of the data a Cassandra vnodes or fast retrieval and processing. size of the RAM of the heap ummarised in Table II. Hence, C instantiated for application i li,j is valid for a number of Cassandra vnodes n and nk. Therefore, for nk 1 ļ£æ ni ļ£æ nk, w following expression: t(ni) = t(nk 1) + t0 li,j Ā· k li,j Ā· (ni where k 1, n0 = 1 and ti,j,h(1) = t0 li,j. Finally, for a conļ¬guration x of a VDC, Equation 2 we deļ¬ne the overall throughput Ti(x) = t (ni) , 8i 2 I D. Power consumption model As service provider utility we chose the po that is directly related with the provider reven sustainability). TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and conļ¬g. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ā‡„103 8.3 ā‡„103 13.3 ā‡„103 2 4 16 4 8.3 ā‡„103 8.3 ā‡„103 8.3 ā‡„103 3 2 16 4 3.3 ā‡„103 3.3 ā‡„103 3.3 ā‡„103 In case ri > heapSizej it holds Eq. constraint ni Di. Considering that t can be deļ¬ned as ni = X j2J ,h2H xi,j,h 8 and considering that in our industria heapSizej for all conļ¬gurations j, ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 20. Energy consumption model ā€¢ Based on physical node utilization ā€¢ š‘ƒ0 &4F = 500š‘Š and š‘˜0 = 0.7 core used (e.g. [7]). In this work we chose a linear model [3] where the power Ph consumed by a physical machine h is a function of the CPU utilization and hence of the system conļ¬guration x: Ph(x) = kh Ā· Pmax h + (1 kh) Ā· Pmax h Ā· Uh(x) (8) where Pmax h is the maximum power consumed when the PM h is fully utilised (e.g. 500W), kh is the fraction of power consumed by the idle PM h (e.g. 70%), and the CPU utilisation for PM h is deļ¬ned by Uh(x) = 1 Ch Ā· X I,J xi,j,h Ā· cj (9) I yi,j, si,h a xi,j,k 2 N, where: t that the SLA all the tenan non linear, b from operat eq. 6. Eq. 1 the number portion of da and that the implemente that for eac is an extrem Ph(x) = kh Ā· Pmax h + (1 kh) Ā· Pmax h Ā· Uh(x) (8) Pmax h is the maximum power consumed when the PM ly utilised (e.g. 500W), kh is the fraction of power ed by the idle PM h (e.g. 70%), and the CPU utilisation h is deļ¬ned by Uh(x) = 1 Ch Ā· X I,J xi,j,h Ā· cj (9) optimisation problem ntroduced before, the service provider aims to minimise all energy consumption P(x) deļ¬ned by X where: the set of c that the SLA is satisļ¬e all the tenants. For the s non linear, but they can from operational resear eq. 6. Eq. 12 introduce the number of vnodes a portion of dataset handl and that the replicatio implemented. Equation that for each tenant mu is an extremely large po maximum capacity of t relaxation of such cons In the same way, Eq. for the vnodes do not e physical nodes. Eq. 17ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 21. Problem formulation ā€¢ Linear model ā€¢ Objective function linear ā€¢ Constraints linear ā€¢ Constraints imposed by ā€¢ SLA on throughput and replication factor ā€¢ Replication factor (and number of distinct PMs) ā€¢ Homogeneity of vnodes configuration ā€¢ Heap size (we want a CPU bound configuration) 0 2 4 6 8 10 12 Number of nodes (ni ) 0 20 40 60 80 100 Throughputti,j,h (103 ops/sec) R W RW t 0 t(5< ni ā‰¤ 8) = t 0 Ā· Ī“ k l i ,j Ā· (ni -4) + t(ni =4) n i Fig. 2. A real example of Cassandra throughput as function of the number of Cassandra vnodes allocated for different type of requests. The plot shows how the model we propose is realistic. on cloud management systems the techniques typically used are: scheduling, placement, migration, and reconļ¬guration of virtual machines. In the speciļ¬c the ultimate goal is to optimise the use of resources to reduce power consumption. Optimisation depends on the context, it could means min- imising PM utilisation or to balance the utilisation level of physical machine with the use of network devices for data transfer and storage. Independently from the conļ¬guration or adaptation policy adopted all these techniques are based on power and/or energy consumption models. Power consumption models usually deļ¬ne a linear relationship between the amount of power used by a system as function of the CPU utilisation (e.g. [3]ā€“[5]), or processor frequency (e.g. [6]) or number of core used (e.g. [7]). In this work we chose a linear model [3] where the power Ph consumed by a physical machine h is a function of the min f(x) = P(x) subject to: X J ,H t(xi,j,h) Tmin i , 8i 2 I (11) X H xi,j,h Ā· yi,j Di Ā· ri heapSizej , 8i 2 I, j 2 J (12) xi,j,h ļ£æ Ā· yi,j, 8i 2 I, j 2 J , h 2 H (13) X J yi,j = 1, 8i 2 I (14) X I,J xi,j,h Ā· cj ļ£æ Ch, 8 h 2 H (15) X I,J xi,j,h Ā· mj ļ£æ Mh, 8 h 2 H (16) X H si,h Di, 8i 2 I (17) X J xi,j,h si,h Ā· ļ£æ 0, 8h 2 H (18) X J xi,j,h + si,h ļ£æ 0, 8h 2 H (19) X I si,h rh Ā· ļ£æ 0, 8h 2 H (20) X I si,h + rh ļ£æ 0, 8h 2 H (21) yi,j, si,h and rh 2 [0, 1], 8i 2 I, j 2 J , h 2 H (22) xi,j,k 2 N, 8i 2 I, j 2 J , h 2 H (23)ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 22. Sub-optimal adaptation ā€¢ LocalOpt ā€¢ LocalOpt-H ā€¢ BestFit ā€¢ BestFit-H 10 0 500 1000 Number of tenants 102 10 4 10 6 10 8 Num.ofIterations V=3 Opt (H=100) BestFit (H=100) LocalOpt (H=100) Opt (H=1000) BestFit (H=1000) LocalOpt (H=1000) 0 500 1000 Number of tenants 102 10 4 106 10 8 V=10 Fig. 3 Number of Iterations for diā†µerent values of N, V and H 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 23. Scenarios ā€¢ New Service Subscriptions ā€¢ Dataset size increase ā€¢ Throughput Increase ā€¢ Surge in the throughput ā€¢ Physical node failures ACROSS Ā -Ā­ā€ Rome Ā Meeting asible solutions. ines 19 - 30 place the vnodes on the PMs minimising umber of PMs used packing as much vnodes as possible PM, of course in the respect of Di constraint. That also mise the energy consumption. The function any(cjā‡¤ ļ£æ compare cjā‡¤ with all the element of Ca and it returns f exist at least one element of Ca that is greater or equal cjā‡¤ . Otherwise, if no PMs satisfy the constraint it returns The same behaviour is valid for any(mjā‡¤ ļ£æ Ma ). The ion sortDescendent(Ha) sorts the Ha in descending . The function popRR(Ha,Di) extracts, in round-robin , a PM from the ļ¬rst Di in Ha. At Line 28, if there is ore room in the selected PMs the set Ha is sorted again allows to try the allocation for the PMs that have now capacity available). At line 32, if not all the nā‡¤ i,jā‡¤ vnodes allocated the empty set is returned because no feasible ions for the allocation. Otherwise, the suboptimal solution is returned. VI. PERFORMANCE EVALUATION METHODOLOGY TABLE III. MODEL PARAMETERS USED IN THE EXPERIMENTS Parameter Value Description N 1 ā€“ 10 Number of tenants V 3 Number of VM types H 8 Number of PMs Di 1 ā€“ 4 Replication factor for App. i ri 5 - 50 Dataset size for App. i L {R, W, RW } Set of request types T min i 10000 70000 ops/sec Minimum throughput agreed in the SLA Ch 16 Number of cores for PM h cj 2 ā€“ 8 Number of vcores used by VM type j Mh 128 GB Memory size of PM h mj 16 ā€“ 32 GB Total memory used by VM type j heapSizej 4 ā€“ 8 GB Max heap size used by VM type j 8li: 1 li 1 1 ļ£æ xi,j,h ļ£æ 2 2 li 0.8 3 ļ£æ xi,j,h ļ£æ 7 3 li 0.66 xi,j,h 8 P max h 500 Watt Maximum power consumed by PM h if fully loaded kh 0.7 Fraction of P max h consumed by PM h if idle
  • 24. Performance metrics ā€¢ Total power consumption ā€¢ Scaling Index ā€¢ Count for horizontal and vertical scaling ā€¢ The Migration Index ā€¢ Count the number of migrations ā€¢ Delayed Requests ā€¢ Assume not request timeout ā€¢ Consistency level reliability ā€¢ Defined as the probability that the number of healthy replicas in the Cassandra VDC is enough to guarantee a specific level of consistency over a fixed time interval ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 25. New service subscriptions Workload ā€¢ 75%R, 15%W, 10%RW (unif) ā€¢ Tmin = 10-18Kops/sec (unif) ā€¢ Di = 2,3 (unif) ā€¢ Ri = 8GB et size variation scenarios, while for the SLA variation scenario. d on using Matlab R2015b 64- gle Intel Core i5 processor with model parameters we used for ble III. MENTAL RESULTS compare the performance of the lOpt and BestFit heuristics. we increase the number of sub- e SLA parameters are randomly ts summarise the data collected nant subscription. system scale when new tenants mber of vnodes allocated grows enants. The differences are that: alOpt tend to allocate small Number of tenants (N) Fig. 3. New service subscription: number of vnodes allocated by the three adaptation policies 1 2 3 4 5 6 7 8 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy 1 2 3 4 5 6 7 8 LocalOpt 1 2 3 4 5 6 7 8 BestFit Number of tenants (N) Fig. 4. New service subscription: total power consumed by the three adaptation policies 3 4 5 6 tionIndex(MI) ing Monte Carlo simulation for the New service ns and Small dataset size variation scenarios, while valuation is used for the SLA variation scenario. s have been carried on using Matlab R2015b 64- X running on a single Intel Core i5 processor with ain memory. The model parameters we used for are reported in Table III. VII. EXPERIMENTAL RESULTS vice subscription owing experiments compare the performance of the icy with the LocalOpt and BestFit heuristics. th one tenant and we increase the number of sub- to 8. Because the SLA parameters are randomly he following results summarise the data collected ns for each new tenant subscription. 3 shows how the system scale when new tenants he service. The number of vnodes allocated grows h the number of tenants. The differences are that: l policy and LocalOpt tend to allocate small hines (VM type 1 and 2) and rarely select VM the contrary, the BestFit algorithm often uses (VM type 3) and that explains why the minimum 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Number of tenants (N) Fig. 3. New service subscription: number of vnodes allocated by the three adaptation policies 1 2 3 4 5 6 7 8 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy 1 2 3 4 5 6 7 8 LocalOpt 1 2 3 4 5 6 7 8 BestFit Number of tenants (N) Fig. 4. New service subscription: total power consumed by the three adaptation policies 1 2 3 4 5 6 7 Number of tenants 0 1 2 3 4 5 6 MigrationIndex(MI) Results ā€¢ Optimal policy and LocalOpt ā€¢ allocate small VMs (type 1 and 2) ā€¢ rarely select VM type 3 ā€¢ BestFit algorithm ā€¢ uses large VMs (VM type 3) ā€¢ the minimum number of vnodes allocated is lower than the other policies. ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 26. Dataset size increase TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and conļ¬g. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ā‡„103 8.3 ā‡„103 13.3 ā‡„103 2 4 16 4 8.3 ā‡„103 8.3 ā‡„103 8.3 ā‡„103 3 2 16 4 3.3 ā‡„103 3.3 ā‡„103 3.3 ā‡„103 TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE. mj (RAM size in GB) 1 2 4 8 16 32 heapSizej (max Heap size in GB) 0.5 1 1 2 4 8 the minimum throughput the service provider must guarantee to process the requests from application i. The SLA parameters Di and ri are used to determine the number of vnodes to be instantiated, as discussed in the next section. Concerning Assumption 1, we limit the study to the set L = {R, W, RW} because the model we propose can deal with any I c c a h c w j t O Workload ā€¢ 3 tenants ( R, W, RW ) ā€¢ ri = 10 ā€“ 50 GB ā€¢ Tmin = [14,10,18] Kops/sec ā€¢ Di = [3,2,3] plot for the power consumed P(x) ht. 40 50 Ā± %) D2 =2 0 20 30 40 50 r i (8GB Ā± %) App. 3, D3 =3 ox plot for the number of vnodes 0, Tmin 2 = 14.000 and Tmin 3 = ect the performance of the from a dataset size of 10GB Scali 10 15 20 25 30 35 40 45 50 0 10 20 10 15 20 25 30 35 40 45 50 ri (GByte) 10 15 20 25 30 35 40 45 50 Fig. 11. Dataset size increase: bar plot represents the Scaling Index and the line represents the number of virtual nodes allocated during each experiment 10 15 20 25 30 35 40 45 50 r i (GByte) LocalOpt 10 15 20 25 30 35 40 45 50 BestFit 10 15 20 25 30 35 40 45 50 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy Fig. 12. Dataset size increase: box plot for the power consumed P(x). 4 0 10 0 10 20 30 Scalingindexandnum.ofvnod 10 15 20 25 30 35 40 45 50 0 10 20 30 10 15 20 25 30 35 40 45 50 r i (GByte) 10 15 20 25 30 35 40 45 50 VM type 1 VM type 2 VM type 3 vnodes LocalOpt BestFit Fig. 11. Dataset size increase: bar plot represents the Scaling Index and the line represents the number of virtual nodes allocated during each experiment 10 15 20 25 30 35 40 45 50 r i (GByte) LocalOpt 10 15 20 25 30 35 40 45 50 BestFit 10 15 20 25 30 35 40 45 50 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy Fig. 12. Dataset size increase: box plot for the power consumed P(x). 15 20 25 30 35 40 45 50 ri (GByte) 0 1 2 3 4 MigrationIndexMI ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 27. Throughput increase 0 10 20 30 40 50 60 70 T min i Ɨ 10 3 ops/sec 10 20 30 40 50 60 70 ndex for the three applications and for the three policies. The line represent the number of VMs or the write ies have the not capable ervice level orkload RW nodes used selects VM back to VM ise, locally, adopts VM o VM type the price of of vnodes migrations. 10 20 30 40 50 60 70 LocalOpt 10 20 30 40 50 60 70 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy 10 20 30 40 50 60 70 BestFit Tmin i (Ɨ 103 ops/sec.) TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and conļ¬g. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ā‡„103 8.3 ā‡„103 13.3 ā‡„103 2 4 16 4 8.3 ā‡„103 8.3 ā‡„103 8.3 ā‡„103 3 2 16 4 3.3 ā‡„103 3.3 ā‡„103 3.3 ā‡„103 TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE. mj (RAM size in GB) 1 2 4 8 16 32 heapSizej (max Heap size in GB) 0.5 1 1 2 4 8 the minimum throughput the service provider must guarantee to process the requests from application i. The SLA parameters Di and ri are used to determine the number of vnodes to be instantiated, as discussed in the next section. Concerning Assumption 1, we limit the study to the set L = I c c a h c w j Workload ā€¢ 3 tenants ( R, W, RW ) ā€¢ Tmin = 10-70 Kops/sec ā€¢ Di = 3 ā€¢ Ri = 8GB ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 28. Throughput increase (contā€™d) -10 0 10 ScalingIndex(SI)andnum.ofvnodes App. 1 (R) App. 2 (W) App. 3 (RW) -10 0 10 10 20 30 40 50 60 70 -10 0 10 10 20 30 40 50 60 70 Tmin i Ɨ 103 ops/sec 10 20 30 40 50 60 70 VM type 1 VM type 2 VM type 3 vnodes LocalOpt BestFit Optimal policy ut increase: Box plot represent the scaling Index for the three applications and for the three policies. The line represent the number of VMs ch experiment higher power consumption. Also for the write of vnodes migrations. TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li . THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and conļ¬g. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ā‡„103 8.3 ā‡„103 13.3 ā‡„103 2 4 16 4 8.3 ā‡„103 8.3 ā‡„103 8.3 ā‡„103 3 2 16 4 3.3 ā‡„103 3.3 ā‡„103 3.3 ā‡„103 In case ri > heapSizej it holds Eq constraint ni Di. Considering that t can be deļ¬ned as ni = X j2J ,h2H xi,j,h and considering that in our industri 2 but return immediately to VM type 1. That at the price of an higher energy consumption. From the optimal policy behaviour we learned that is better to allocate always smaller virtual machines. That choice usually allow to satisfy both the dataset and throughput con- straints minimising the actual throughput provided, that for large dataset can be higher than Tmin i . The power consumption is plotted in Figure 7. The LocalOpt outperforms the BestFit, speciļ¬cally for low throughput. The penalty payed is between 15 and 25% if LocalOpt is used and between the 13 and 50% for the BestFit. The higher loss is observed for low values of the throughput. Figure 8 shows the Migration Index. Each box plot is computed over the data collected for the three tenants. We can observe that each application experiments between 0 and 3 vnode migrations depending on the infrastructure load state. The case Tmin i = 50.000 ops/sec. is an exception because it corresponds to the vertical scaling action taken for App. 1. The decrease in the number of vnodes used impacts also the value T min i (Ɨ 10 3 ops/sec.) Fig. 7. Throughput increase: the power consumed P(x) by the three adaptation policies. 20 30 40 50 60 70 T min i Ɨ 10 3 ops/sec 0 1 2 3 MigrationIndex(MI) Fig. 8. Throughput increase: Migration Index for the optimal policy. LocalOpt and BestFit have a MI equal to zero by deļ¬nition. C. SLA variation: small dataset size variations Dataset size could heavily impact the number of vnodes used in a VDC (see Eq. 1). In this experiment we assess if ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 29. Throughput increase (contā€™d)Auto-scaling Algorithms for Cassandra Virtual Data Centers 13 20 30 40 50 60 70 -10 -5 0 5 10 ScalingIndex(SI) Opt 20 30 40 50 60 70 -10 -5 0 5 10 LocalOpt 20 30 40 50 60 70 Throughput Ti min (Ɨ 103 tps) -10 -5 0 5 10 LocalOpt-H 20 30 40 50 60 70 -10 -5 0 5 10 BestFit 20 30 40 50 60 70 -10 -5 0 5 10 BestFit-H VM type 1 VM type 2 VM type 3 ghput increase: The box represent the scaling Index actions for the RW workload and for the ļ¬ve policies he adaptation policy switches between two ations (vertical scaling). The negative bar M type dismissed and the positive for the e allocated. Observations with only pos- respond to horizontal scaling adaptation example, for the Optimal policy there is m VM type 3 (yellow bar) to VM type 2 r the observation Tmin i = 30.000 ops/sec. between 0 and 3 vnode migrations depending on the infrastructure load state. Considering the low values for the migration index for the Opt allocation and the high saving in the en- ergy consumed compared with the other algorithms, it makes sense to perform periodic VDC consolidation us- ing the Opt policy, as recommended in Section 7. is for the VM type dismissed and the positive for the new VM type allocated. Observations with only pos- itive bars correspond to horizontal scaling adaptation actions. For example, for the Optimal policy there is a change from VM type 3 (yellow bar) to VM type 2 (green bar) for the observation Tmin i = 30.000 ops/sec. The number of new allocated VMs is smaller because each new VM oā†µers a higher throughput. The optimal adaptation policy always starts allocating VMs of Type 3 (cf. Tab. 1) and, if needed progressively moves to more powerful VM types. The Opt policy performs only one vertical scaling and when the VM type if changed from type 3 to type 2; after that it always does horizon- tal scaling actions (this is a particularly lucky case). The two heuristics LocalOpt and BestFit show a very unstable behaviour performing both vertical and hor- izontal scaling. Both ļ¬rst scale to VM type 1 from VM type 3 and then they scale back to VM type 2. When the variant of the above algorithm is used, that is LocalOpt-H and BestFit-H respectively, the VM type is ļ¬xed to type 1 and the only action taken is horizontal scaling. The power consumption is plotted in Figure 6. For throughput higher than 40ā‡„103 ops/sec, with the opti- mal scaling is possible to save about 50% of the energy consumed by the heuristic allocation. For low values of the throughput (10 20ā‡„103 ops/sec) the BestFit and BestFit-H show a very high energy consumption com- Considering the low values for the migration index for the Opt allocation and the high saving in the en- ergy consumed compared with the other algorithms, it makes sense to perform periodic VDC consolidation us- ing the Opt policy, as recommended in Section 7. 10 20 30 40 50 60 70 Throughput T i min (Ɨ 103 tps) 1000 1500 2000 2500 3000 3500 4000 PowerconsumedP(x)(Watt) Opt LocalOpt LocalOpt-H BestFit BestFit-H Fig. 6 Throughput increase: the power consumed P(x) by the ļ¬ve adaptation policies when increasing the throughput for Application 3 (RW workload). 9.2 Throughput surge In this set of experiments we analyse how fast the scal- ing is, with respect to the throughput variation rate, and what is the number of delayed requests. We assume 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 ACROSS Ā -Ā­ā€ Rome Ā Meeting Auto-scaling Algorithms for Cassandra Virtual Data Centers 11 of the auto-scaling algorithms Use case Opt LocalOpt BestFit LocalOpt-H BestFit-H Capacity planning X Data center consolidation X VDC consolidation X X run-time adaptation X X X X
  • 30. Surge in the throughput (RW)Emiliano Casalicchio et al. 0 2 4 6 8 10 12 14 16 18 20 22 Time (minutes) 0 10 20 30 40 50 60 70 80 Throughput(Ɨ10 3 ops/sec) ThrRW min Opt (Case A, actual thr) BestFit (Case A, actual thr) Opt (Case B, actual thr) BestFit (Case B, actual thr) Fig. 8 Auto-scaling actions in case of a throughput surge: Case A and Case B ). Than, at time t = 10, twelve vnodes are al- Considering the serialization of the horizontal actions (cf. Section 7) the seven Cassandra vn- e added in 14 minutes. The LocalOpt behaves Opt in terms of scaling decisions. The BestFit aling start allocationg 4vnodes of Type 3, than up to seven vnodes (at time t = 8) and ļ¬nally wo vertical scaling actions: the ļ¬rst from vnodes to Type 2, and the second from Type 2to Type s. number of delayed requests Qi and the percent- h respect the total number of received requests ) are reported in table 5. Qi and tot.req. are ed over the time interval the requested through- n W exceed the actual throughput. itively, with Cassandra vnodes capable to han- gher workload it should be possible to better the surge in the throughput. Hence, we have an- Case B where we conļ¬gure three new types of dra vnodes capable to handle the following RW put: type 4, 20 ā‡„ 103 ops/sec.; type 5, 15 ā‡„ 103 vnodes type capable to handle from low throughput to very high throughput allow to manage throughput surges. Table 5 The number of delayed requests Qi and the per- centage with respect the total number of received requests (tot.req.). Qi and tot.req. are computed over the time in- terval the requested throughput (Tmin RW ) exceed the actual throughput. Case A Qi (ā‡„103 ) Qi tot.req. (%) Opt 191.84 22.78 LocalOpt 191.84 22.78 BestFit 70.89 46.33 Case B Opt 7.66 4 LocalOpt 7.66 4 BestFit 70.58 30.29 Case Ā B T4, Ā 20 Ā Ć— 103ops/sec. Ā  T5, Ā 15 Ā Ć— 103ops/sec. T6, Ā 7 Ā Ć— 103 Ā ops/sec. Ā  Case Ā A T1, Ā 13.3x103 ops/sec T2, Ā 8.3x103 ops/sec T3, Ā 3.3x103 Ā ops/sec ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 31. Physical node failure Consistency Ā level Ā reliability Ā R Ā defined Ā as Ā the Ā probability Ā that Ā the Ā number Ā of Ā healthy Ā replicas Ā in Ā the Ā  Cassandra Ā VDC Ā is Ā enough Ā to Ā guarantee Ā a Ā specific Ā level Ā of Ā consistency Ā over Ā a Ā fixed Ā time Ā interval Ā  Consistency Ā level Ā ONE Ā and Ā QUORUM Ā (Q= Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā ) rithms are applied. In case the Cassandra VDC has a number of physi- cal nodes H equal to the number of vnodes n, and there is a one-to-one mapping between vnodes and physical nodes, the consistency level of ONE is guaranteed if one replica is up. Hence, the Consistence reliability is the probability that at least one vnode is up and a replica is on that node: RO = 1 D n ā‡„ (1 ā‡¢)n (24) where: ā‡¢ is the resiliency of a physical node, and D n is the probability that a replica is on a Cassandra vnode when the data replication strategy used is the SimpleStrategy (cf. the Datastax documentation for Cassandra). In the same way, we can deļ¬ne the reliability of the Cassandra VDC to guarantee a consistency level of QUORUM as the probability that at least Q vnodes are up and that Q replicas are on them: RQ = 1 D ā‡„ (1 ā‡¢)n Q+1 . (25) physical nodes is unknown puted the values of KO and the value for KO is equa nodes used, the values for allocation and on the node allocation, we could have where the max{K1 Q, K2 Q, . and the min{K1 Q, K2 Q, ...} r example, if 8 vnodes are d following way {1, 1, 2, 1, 3} n Q + 1 = 7, KO = 5, K Table 6 reports the val and D = 5, ā‡¢ = 0.9. Th following way. We consid for a randomly generated S RW, 15%W and 75% R; T in the interval [10.000, 18.0 ri is constant (8GB). The each tenant is n = 6 for th the case D = 5. We run 1 the best and worst case ov In the ļ¬rst set of expe 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 RO = 1 D n ā‡„ (1 ā‡¢)n (24) where: ā‡¢ is the resiliency of a physical node, and D n is the probability that a replica is on a Cassandra vnode when the data replication strategy used is the SimpleStrategy (cf. the Datastax documentation for Cassandra). In the same way, we can deļ¬ne the reliability of the Cassandra VDC to guarantee a consistency level of QUORUM as the probability that at least Q vnodes are up and that Q replicas are on them: RQ = 1 D n ā‡„ (1 ā‡¢)n Q+1 . (25) Table 6 shows the values of RO and RQ for D = 3 and 5 and for ā‡¢ = 0.9 and ā‡¢ = 0.8. In a managed Cassandra data center, a Cassandra VDC is rarely allocated using a one-to-one mapping of vnodes on physical nodes. The resource management policies adopted by the provider usually end-up with a many-to-one mapping, that is h physical nodes run n Cassandra vnodes: D ļ£æ h < n. In that case we can following way {1, 1, 2, 1, 3} n Q + 1 = 7, KO = 5, K Table 6 reports the valu and D = 5, ā‡¢ = 0.9. Th following way. We conside for a randomly generated S RW, 15%W and 75% R; T in the interval [10.000, 18.0 ri is constant (8GB). The each tenant is n = 6 for th the case D = 5. We run 10 the best and worst case ov In the ļ¬rst set of exper The one-to-one mapping ONE and QUORUM with 9s and ļ¬ve 9s respectively tion factor increase to 5 th 9s and eight 9s for consis respectively. Unfortunately, when a the reliability of the consi orders of magnitude. In th 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 DB DB DB VM VM VM PM PM PM DB DB DB VM VM VM PM PM of the Opt and BestFit auto-scaling al- LocalOpt behaves as the Opt. From the ent that with more powerful vnodes the gorithms are capable to satisfy the re- hput with a delay of only 2 minutes. The ating 3 vnodes of type 6, at time t = 4 type 6 is added and at time t = 6 the a vertical scaling allocating 4 vnodes of Cassandra oā†µers three main levels of consistency (both for Read and Write): ONE, QUORUM and ALL. Con- sistency level of ONE means that only one replica node is required to reply correctly, that is it contains the replica of the portion of the dataset needed to answer the query. Consistency level QUORUM means that Q =āŒ…D 2 ā‡§ + 1 replicas nodes are available to reply correctly Q replicas are on them: RQ = 1 D n ā‡„ (1 ā‡¢)n Q+1 . (25) Table 6 shows the values of RO and RQ for D = 3 and 5 and for ā‡¢ = 0.9 and ā‡¢ = 0.8. In a managed Cassandra data center, a Cassandra VDC is rarely allocated using a one-to-one mapping of vnodes on physical nodes. The resource management policies adopted by the provider usually end-up with a many-to-one mapping, that is h physical nodes run n Cassandra vnodes: D ļ£æ h < n. In that case we can generalise equations 24 and 25 to the following: RO = 1 D n ā‡„ (1 r)KO (26) the case D = 5. We the best and worst ca In the ļ¬rst set of The one-to-one map ONE and QUORUM 9s and ļ¬ve 9s respec tion factor increase t 9s and eight 9s for respectively. Unfortunately, wh the reliability of the orders of magnitude. that all the auto-scali ONE with a reliabilit tor increases to 5 the 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 for Cassandra Virtual Data Centers 15 the consistency level of ONE and QUORUM. The probability that a data replica is on = 5. We assume the reliability of a physical node is ā‡¢ = 0.9 n-to-one to-one Opt LocalOpt LocalOpt-H BestFit BestFit-H 99995 0.9995 0.9995 99995 0.995 ā€“ 0.9995 0.9995 9999995 0.999995 0.999995 999995 0.9995 ā€“ 0.999995 0.99995 9996 0.996 0.996 9984 0.98 ā€“ 0.996 0.996 999948 0.99984 0.99984 99987 0.996 ā€“ 0.99984 0.9992 ). Consistency level e available. Consistency level re- f ONE and of QUO- RQ = 1 D n ā‡„ (1 r)KQ . (27) where: KO is the number of failed physical nodes that causes a failure of n vnodes; and KQ is the number of KO is Ā the Ā number Ā of Ā failed Ā physical Ā nodes Ā that Ā causes Ā a Ā  failure Ā of Ā n Ā vnodes; KQ is Ā the Ā number Ā of Ā failed Ā physical Ā nodes Ā that Ā causes Ā a Ā  failure Ā of Ā  Ā (n Ā āˆ’ Ā Q Ā + Ā 1) Ā vnodes. Ā  ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 32. Physical node failure (contā€™d)ware Auto-scaling Algorithms for Cassandra Virtual Data Centers Consistency reliability R for the consistency level of ONE and QUORUM. The probability that a data repli s 0.5 for both D = 3 and D = 5. We assume the reliability of a physical node is ā‡¢ = 0.9 n-to-one ā‡¢ = 0.9 one-to-one Opt LocalOpt LocalOpt-H BestFit BestFit-H RO|D=3 0.9999995 0.9995 0.9995 RQ|D=3 0.999995 0.995 ā€“ 0.9995 0.9995 RO|D=5 0.99999999995 0.999995 0.999995 RQ|D=5 0.999999995 0.9995 ā€“ 0.999995 0.99995 ā‡¢ = 0.8 RO|D=3 0.99996 0.996 0.996 RQ|D=3 0.99984 0.98 ā€“ 0.996 0.996 RO|D=5 0.999999948 0.99984 0.99984 RQ|D=5 0.9999987 0.996 ā€“ 0.99984 0.9992 D is the replication factor). Consistency level ans that all the replicas are available. RQ = 1 D n ā‡„ (1 r)KQ .ACROSS Ā -Ā­ā€ Rome Ā Meeting
  • 33. Lesson learned ā€¢ When you have to deal with a specific technology there are many constraint to be considered ā€¢ The multi-layer adaptation is a must ā€¢ Not a single policies fit all the workload ā€¢ Not all the policies fit all the application life cycle aware Auto-scaling Algorithms for Cassandra Virtual Data Centers 3 Use of the auto-scaling algorithms Use case Opt LocalOpt BestFit LocalOpt-H BestFit-H Capacity planning X Data center consolidation X VDC consolidation X X run-time adaptation X X X X ACROSS Ā -Ā­ā€ Rome Ā Meeting