Adapative Provisioning of Stream Processing Systems in the Cloud

Adap?ve
Provisioning
of
Stream
Processing
Systems

in
the
Cloud

Javier
Cerviño#1,
Eva
Kalyvianaki*2,

Joaquín
Salvachúa#3,
Peter
Pietzuch*4

#
Universidad
Politécnica
de
Madrid,
*
Imperial
College
London

1jcervino@dit.upm.es,
2ekalyv@doc.ic.ac.uk

3jsalvachua@dit.upm.es,
4prp@doc.ic.ac.uk

SMDB
2012

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

1/23

Data
Stream
Processing
Systems
(DSPS)

•  Real-‐?me
processing
of
con?nuous
data

•  Financial
trading,
sensor
networks,
etc.

•  Data
from
sources
arrive
as
streams

–  Time-‐ordered
sequence
of
tuples

•  Characteris?cs

–  Tuples
arrival
rates
are
not
uniform

•  Performance
requirements

–  Low
latency

–  Guaranteed
throughput

•  Adap6ve
provisioning

–  Use
resources
on
demand

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

2/23

Cloud
Compu?ng

Cloud
oﬀers
elas?c
compu?ng
by
providing
resources
on
demand

–  Characteris?cs

•  Scalability

•  Geographical
Distribu?on

•  Virtualiza?on

•  Applica?on
Programming
Interface
(API)

–  Amazon
EC2

•  Public
cloud
provider

•  Infrastructure
as
a
Service

•  Images
and
Virtual
Machines

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

3/23

Related
work

•  Cloud
Stream
Processing

[Kleiminger
et
al,
SMDB’11]

•  Cloud
network
performance

–  Cloud
and
Internet
paths
support
streaming
data
into
cloud
DCs?

[Barker
et
al,
MMSys’07],
[Wang
et
al,
INFOCOM’10],
[Jackson
et
al,
CLOUDCOM’10]

•  Cloud
computa?on
performance

–  Best
eﬀort
VMs
support
low-‐latency,
low-‐jiier
and
high-‐throughput
stream

processing?

[Barker
et
al,
MMSys’07]

–  Computa?onal
power
of
Amazon
EC2
VMs
for
standard
stream
processes
tasks?

[Diirich
et
al,
VLDB’10],

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

4/23

Contribu?ons

•  Explore
the
suitability
of
cloud
infrastructures
for
stream
processing,
(case

study
on
Amazon
EC2)

–  Measure
network
and
processing
latencies,
jiier
and
throughput

•  An
adap?ve
algorithm
to
allocate
cloud
resources
on-‐demand

–  Resizes
the
number
of
VMs
in
a
DSPS
deployment

•  Algorithm
evalua?on

–  Deploying
the
algorithm
as
part
of
a
DSPS
on
Amazon
EC2

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

5/23

Outline

1.  Cloud
Performance

1.  Network
Measurements

2.  Processing
Measurements

3.  Discussion

2.  Adap?ve
Cloud
Stream
Processing

1.  Architecture

2.  Algorithm

3.  Experimental
Evalua?on

1.  Descrip?on

2.  Results

4.  Future
Work
and
Conclusions

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

6/23

Outline

1.  Cloud
Performance

1.  Network
Measurements

2.  Processing
Measurements

3.  Discussion

2.  Adap?ve
Cloud
Stream
Processing

1.  Architecture

2.  Algorithm

3.  Experimental
Evalua?on

1.  Descrip?on

2.  Results

4.  Future
Work
and
Conclusions

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

7/23

Cloud
Performance

Network
Measurements

•  Goal:
Explore
network
parameters
that
aﬀect
stream
processing
condi?ons:

–  Ji9er,
latency
and
bandwidth

•  Experimental
set-‐up

–  Stream
engines

•  Mock
engines
without
processing

•  9
Amazon
EC2
instances:
3
in
US,
3
in
EU
and
3
in
Asia.

•  Large
Amazon
EC2
instances:
7.5GB
and
4
ECU

–  Stream
sources

•  9
distributed
PlanetLab
nodes:
3
in
US,
3
in
EU
and
3
in
Asia.

–  Dataset

•  Random
data
at
three
diﬀerent
data
rates:
10kbps,
100kbps
and
1Mbps

Europe PlanetLab Cloud
USA Asia
node instance

SOURCE PROCESSING
ENGINE

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

8/23

Cloud
Performance

Network
Measurements

high rate medium rate low rate
4000
Jitter (ms)

2000

0

1 2 3 4 5 6 7 8 9
PlanetLab nodes

•  Average
jiier
is
less
than
2.5
μs

•  Some
outliers
have
a
value
of
almost
4
seconds

•  Low
ji9er
with
less
than
3%
of
high
outliers

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

9/23

Cloud
Performance

Network
Measurements

Round−Trip Time (ms)
Network−Level 300

200
ideal
america
100
asia
europe
0
0 50 100 150 200 250
Application−Level Round−Trip Time (ms)

•  Applica?on-‐level
delay
involves
processing
?me:
tsent-‐treceived

•  Network-‐level
delay
between
the
source
and
the
engine:
RTT

•  Cloud
DC
does
not
increase
applica6on-‐level
delay

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

10/23

Cloud
Performance

Processing
Measurements

•  Goal

–  Explore
performance
varia?on
with
?me-‐of-‐day
(processing
and
latency)

–  Check
if
cloud
VMs
can
scale
eﬃciently
with
varying
input
rate

•  Experimental
set-‐up

–  Dataset

•  Esper
benchmark
tool

•  Stream
of
shares
and
stock
values
for
a
given
symbol
at
a
ﬁxed
rate
(30000
tuples/sec)

–  Submi9er

•  10
Extra
large
Amazon
EC2
VMs:
15GB,
8
ECU

–  Nodes

•  10
Small
Amazon
EC2
VMs:
1.7
GB,
1
ECU

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

11/23

Cloud
Performance

Processing
Measurements

Day 1 Day 2
Latency 50
(ms)

0 4
x 10
Throughput
(tuples/s)

2

0
7 8 9 10111213141516171819 7 8 9 10111213141516171819
Time of day, 24−hour format Time of day, 24−hour format

•  Throughput
remains
rela?vely
stable
over
the
measurement
period

•  Latency
suﬀers
more
from
unpredictable
outliers

•  No
obvious
pa9ern
to
correlate
performance
with
?me-‐of-‐day

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

12/23

Cloud
Performance

Processing
Measurements

5 Small VM instances Large VM instances
x 10
2
1.8
Throughput − tuples/s

1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1 3 5 7 9 11 13 15 17 1 3 5 7 9 11 13 15 17
Input Data Rate − x10000 tuples/s Input Data Rate − x10000 tuples/s

•  Cloud
VMs
can
be
used
to
scale
eﬃciently
with
an
increasing
input
rate

•  The
number
of
VMs
depends
on
their
type,
as
expected

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

13/23

Outline

1.  Cloud
Performance

1.  Network
Measurements

2.  Processing
Measurements

3.  Discussion

2.  Adap?ve
Cloud
Stream
Processing

1.  Architecture

2.  Algorithm

3.  Experimental
Evalua?on

1.  Descrip?on

2.  Results

4.  Future
Work
and
Conclusions

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

14/23

Adap?ve
Cloud
Stream
Processing

•  Elas?c
stream
processing
system
to
scale
the
number
of
VMs
to
input
stream
rates

•  Goals

–  Low-‐latency
with
a
given
throughput

–  Keep
VMs
opera?ng
to
their
maximum
processing
capacity

•  Workload
is
par??oned
and
balanced
across
mul?ple
VMs

•  Many
VMs
available
to
scale
up
and
down
to
workload
demands

•  Collector
gathers
results
from
engines
and
process
addi?onal
queries

VM

engine

VM

source
1
engine

VM
collector

source
2
engine

VM

engine

Stream
source
Sub-‐query
1
Sub-‐query
2

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

15/23

Adap?ve
Cloud
Stream
Processing

Algorithm
I

VM

N virtual machines
Proc.

Esper
Rate

Input VM
Proc.

Tuple
Proc Extra
Rate submiier
Esper
Rate
Σ
Rate -‐
Rate

VM
Proc.

Esper
Rate

/

Average
Rate

•  Gathering
and
calcula6on

–  Gathers
processing
rates
from
VMs

–  Obtains

•  Total
extra
processing
rate
(Extra rate)

•  Average
processing
rate
per
VM
(Average rate)

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

16/23

Adap?ve
Cloud
Stream
Processing

Algorithm
II

Extra Average
Rate /
Rate

N
scale
up
Σ

Yes

Average
Rate Store

Extra
Rate >
0
?
N’
Return

No

scale
down
Input
Rate /

•  Decision
stage

–  Calculates
new
number
of
machines
(N’)

–  Scale
up

•  Stores
the
average
rate
as
maximum
average
rate

–  Scale
down

•  Uses
last
maximum
average
rate

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

17/23

Outline

1.  Cloud
Performance

1.  Network
Measurements

2.  Processing
Measurements

3.  Discussion

2.  Adap?ve
Cloud
Stream
Processing

1.  Architecture

2.  Algorithm

3.  Experimental
Evalua?on

1.  Descrip?on

2.  Results

4.  Future
Work
and
Conclusions

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

18/23

Experimental
Evalua?on

Descrip?on

•  Goals

–  Adaptability
of
the
algorithm
against
varying
input
rates

–  Implica?ons
on
stream
processing
performance
to
adapta?on

•  Experimental
set-‐up

–  Integrated
with
Esper
processing
system
engine

–  Framework
to
control
VMs
and
to
collect
performance
metrics

•  Throughput,
processing
latency
and
network
latency

•  Collec?on
of
shell
script

–  Deployed
on
Amazon
EC2

Amazon
EC2

Controller

VM

Esper

VM

Esper
tuple
Esper

submiier

VM
Esper

Esper
tuple
Esper

submiier

VM

engine

Stream
source
Sub-‐query
1
Sub-‐query
2

Random
values
of
Maximum
value
of
each
stock
Collec?on
and
merge
of
all
results

diﬀerent
stock
symbols
symbol
per
second
Same
query

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

19/23

Experimental
Evalua?on

Results

5
x 10
Small
Instances

Number of VMs
Input Rate Tuples dropped Number of nodes
1.5
Tuples/sec

4
1 3
•  Processing
latency
remains

2 low:
7
–
28
μs

0.5
1
0
100 200 300 400 500 600 700 •  Scales
up
and
down
the

Time (sec) number
of
VMs
as
required
by

the
input
rate

•  There
is
a
signiﬁcant
reac?on

2
x 10
5
2
delay
before
VMs
are
scaled

Large
Instances

up
and
down

Number of VMs
Input Rate Tuples dropped Number of nodes
Tuples/sec

1 1 •  VMs
are
pre-‐allocated

0 0
100 200 300 400 500 600 700
Time (sec)

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

20/23

Outline

1.  Cloud
Performance

1.  Network
Measurements

2.  Processing
Measurements

3.  Discussion

2.  Adap?ve
Cloud
Stream
Processing

1.  Architecture

2.  Algorithm

3.  Experimental
Evalua?on

1.  Descrip?on

2.  Results

4.  Future
Work
and
Conclusions

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

21/23

Future
Work

•  Inves?gate
ways
to
reduce
the
reac?on
delay
to
performance
viola?ons

•  Predict
the
future
behaviour
of
input
data
rates

•  Inves?gate
cost
models
for
alloca?on
of
small
and
large
VM
instances

•  Evaluate
our
system
in
other
cloud
environments

•  Extensive
evalua?on
over
longer
periods
of
?me
and
diﬀerent
VM
types

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

22/23

Conclusions

•  An
adap?ve
approach
to
provision
stream
processing
systems
in
the
cloud

•  Public
clouds
are
suitable
for
stream
processing

•  Network
latency
is
the
domina?ng
factor
in
public
clouds

•  Our
approach
can
adap?vely
scale
the
number
of
VMs
to
input
rates

•  Processing
latency
and
data
loss
remain
low

Javier
Cerviño

email:
jcervino@dit.upm.es

Thank
you!

Ques?ons?

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

23/23

Adap?ve
Cloud
Stream
Processing

Algorithm

e VM instances Algorithm 1 Adaptive provisioning of a cloud-based DSPS
Require: totalInRate, N , maxRatePerVM
Ensure: N 0 s.t. projRatePerVM ⇤ N 0 = totalInRate
1: expRatePerVM = btotalInRate/N c
2: totalExtraRateForVMs = 0; totalProcRate = 0
3: for all deployed VMs do
4: totalExtraRateForVMs += expRatePerVM -
getRate(VM )
7 9 11 13 15 17 5: totalProcRate += getRate(VM )
Rate − x10000 tuples/s
6: end for
7: avgRatePerVM = b(totalProcRate/N )c
sizes on Amazon EC2
) 8: if totalExtraRateForVMs > 0 then
9: N 0 = N +d(totalExtraRateForVMs/avgRatePerVM )e
10: maxRatePerVM = avgRatePerVM
11: else if totalExtraRateForVMs < 0 then
12: N 0 = dtotalInRate/maxRatePerVM e
13: end if
14: projRatePerVM = totalInRate/N 0
15: return N 0

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

24/23

Adap?ve
Cloud
Stream
Processing

Algorithm

getExpectedVMs(totalInRate, currentVMs) {

expectedRatePerVM = totalInRate/currentVMs

Input
rate

for each deployed VM {
calcula?ons
vmRate = getRate(VM)
totalExtraRate += (expRatePerVM-vmRate)
}
avgRatePerVM = totalProcRate/N

if (totalExtraRateForVMs > 0) {
Increasing
expectedVMs = currentVMs + totalExtraRate/avgRate
maxRatePerVM = avgRatePerVM
Input
rate

}

Decreasing
else if (totalExtraRateForVMs < 0) {
expectedVMs = totalInRate / maxRatePerVM
Input
rate
}

}

Javier
Cerviño,
Eva
Kalyvianaki,
Joaquín
Salvachúa,
Peter
Pietzuch

Adap?ve
Provisioning
of
Stream
Processing
Systems
in
the
Cloud

25/23

Adapative Provisioning of Stream Processing Systems in the Cloud

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Adapative Provisioning of Stream Processing Systems in the Cloud

Ähnlich wie Adapative Provisioning of Stream Processing Systems in the Cloud (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Adapative Provisioning of Stream Processing Systems in the Cloud