SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  
                             in	
  the	
  Cloud	
  



                                                                                                      	
  
                                                                                 Javier	
  Cerviño#1,	
  Eva	
  Kalyvianaki*2,	
  
                                                                                Joaquín	
  Salvachúa#3,	
  Peter	
  Pietzuch*4	
  
                                                                                                      	
  
                                                        #	
  Universidad	
  Politécnica	
  de	
  Madrid,	
  *	
  Imperial	
  College	
  London	
  

                                                                              1jcervino@dit.upm.es,	
  2ekalyv@doc.ic.ac.uk	
  

                                                                              3jsalvachua@dit.upm.es,	
  4prp@doc.ic.ac.uk	
  

                                                                                                                     	
  
                                                                                                                  SMDB	
  2012	
  

	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  1/23	
  	
  
Data	
  Stream	
  Processing	
  Systems	
  (DSPS)	
  




                     •         Real-­‐?me	
  processing	
  of	
  con?nuous	
  data	
  
                     •         Financial	
  trading,	
  sensor	
  networks,	
  etc.	
  
                     •         Data	
  from	
  sources	
  arrive	
  as	
  streams	
  
                                  –  Time-­‐ordered	
  sequence	
  of	
  tuples	
  
                     •         Characteris?cs	
  
                                  –  Tuples	
  arrival	
  rates	
  are	
  not	
  uniform	
  
                     •         Performance	
  requirements	
  
                                  –  Low	
  latency	
  
                                  –  Guaranteed	
  throughput	
  
                     •         Adap6ve	
  provisioning	
  
                                  –  Use	
  resources	
  on	
  demand	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  2/23	
  	
  
Cloud	
  Compu?ng	
  




                     Cloud	
  offers	
  elas?c	
  compu?ng	
  by	
  providing	
  resources	
  on	
  demand	
  
                                  –  Characteris?cs	
  
                                               •     Scalability	
  
                                               •     Geographical	
  Distribu?on	
  
                                               •     Virtualiza?on	
  
                                               •     Applica?on	
  Programming	
  Interface	
  (API)	
  
                                  –  Amazon	
  EC2	
  
                                               •  Public	
  cloud	
  provider	
  
                                               •  Infrastructure	
  as	
  a	
  Service	
  
                                               •  Images	
  and	
  Virtual	
  Machines	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  3/23	
  	
  
Related	
  work	
  



                     •         Cloud	
  Stream	
  Processing	
  
                                               [Kleiminger	
  et	
  al,	
  SMDB’11]	
  
                                               	
  
                     •         Cloud	
  network	
  performance	
  
                                  –  Cloud	
  and	
  Internet	
  paths	
  support	
  streaming	
  data	
  into	
  cloud	
  DCs?	
  
                                               [Barker	
  et	
  al,	
  MMSys’07],	
  [Wang	
  et	
  al,	
  INFOCOM’10],	
  [Jackson	
  et	
  al,	
  CLOUDCOM’10]	
  
                                               	
  
                     •         Cloud	
  computa?on	
  performance	
  
                                  –  Best	
  effort	
  VMs	
  support	
  low-­‐latency,	
  low-­‐jiier	
  and	
  high-­‐throughput	
  stream	
  
                                     processing?	
  
                                               [Barker	
  et	
  al,	
  MMSys’07]	
  
                                               	
  
                                  –  Computa?onal	
  power	
  of	
  Amazon	
  EC2	
  VMs	
  for	
  standard	
  stream	
  processes	
  tasks?	
  
                                               [Diirich	
  et	
  al,	
  VLDB’10],	
  	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  4/23	
  	
  
Contribu?ons	
  




                     •         Explore	
  the	
  suitability	
  of	
  cloud	
  infrastructures	
  for	
  stream	
  processing,	
  (case	
  
                               study	
  on	
  Amazon	
  EC2)	
  
                                –              Measure	
  network	
  and	
  processing	
  latencies,	
  jiier	
  and	
  throughput	
  


                     •         An	
  adap?ve	
  algorithm	
  to	
  allocate	
  cloud	
  resources	
  on-­‐demand	
  
                                –              Resizes	
  the	
  number	
  of	
  VMs	
  in	
  a	
  DSPS	
  deployment	
  


                     •         Algorithm	
  evalua?on	
  
                                –              Deploying	
  the	
  algorithm	
  as	
  part	
  of	
  a	
  DSPS	
  on	
  Amazon	
  EC2	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  5/23	
  	
  
Outline	
  



                     1.            Cloud	
  Performance	
  
                                1.             Network	
  Measurements	
  
                                2.             Processing	
  Measurements	
  
                                3.             Discussion	
  
                     2.            Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                1.             Architecture	
  
                                2.             Algorithm	
  
                     3.            Experimental	
  Evalua?on	
  
                                1.             Descrip?on	
  
                                2.             Results	
  
                     4.            Future	
  Work	
  and	
  Conclusions	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  6/23	
  	
  
Outline	
  



                     1.            Cloud	
  Performance	
  
                                1.             Network	
  Measurements	
  
                                2.             Processing	
  Measurements	
  
                                3.             Discussion	
  
                     2.            Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                1.             Architecture	
  
                                2.             Algorithm	
  
                     3.            Experimental	
  Evalua?on	
  
                                1.             Descrip?on	
  
                                2.             Results	
  
                     4.            Future	
  Work	
  and	
  Conclusions	
  	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  7/23	
  	
  
Cloud	
  Performance	
  
                                                                               Network	
  Measurements	
  
                     •  Goal:	
  Explore	
  network	
  parameters	
  that	
  affect	
  stream	
  processing	
  condi?ons:	
  	
  
                                  –  Ji9er,	
  latency	
  and	
  bandwidth	
  
                                  	
  
                     •         Experimental	
  set-­‐up	
  
                                  –  Stream	
  engines	
  
                                       •  Mock	
  engines	
  without	
  processing	
  
                                       •  9	
  Amazon	
  EC2	
  instances:	
  3	
  in	
  US,	
  3	
  in	
  EU	
  and	
  3	
  in	
  Asia.	
  
                                       •  Large	
  Amazon	
  EC2	
  instances:	
  7.5GB	
  and	
  4	
  ECU	
  
                                  –  Stream	
  sources	
  
                                       •  9	
  distributed	
  PlanetLab	
  nodes:	
  3	
  in	
  US,	
  3	
  in	
  EU	
  and	
  3	
  in	
  Asia.	
  
                                  –  Dataset	
  
                                       •  Random	
  data	
  at	
  three	
  different	
  data	
  rates:	
  10kbps,	
  100kbps	
  and	
  1Mbps	
  


                                                                                    Europe                                                                  PlanetLab                      Cloud
                                                    USA                                                                     Asia
                                                                                                                                                               node                       instance

                                                                                                                                                                  SOURCE                   PROCESSING
                                                                                                                                                                                             ENGINE




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  8/23	
  	
  
Cloud	
  Performance	
  
                                                                               Network	
  Measurements	
  



                                                                                                  high rate                       medium rate                              low rate
                                                             4000
                                               Jitter (ms)




                                                             2000

                                                                0

                                                                         1               2               3             4      5     6                                    7               8               9
                                                                                                                       PlanetLab nodes




                     •         Average	
  jiier	
  is	
  less	
  than	
  2.5	
  μs	
  

                     •         Some	
  outliers	
  have	
  a	
  value	
  of	
  almost	
  4	
  seconds	
  

                     •         Low	
  ji9er	
  with	
  less	
  than	
  3%	
  of	
  high	
  outliers	
  


	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  9/23	
  	
  
Cloud	
  Performance	
  
                                                                               Network	
  Measurements	
  


                                            Round−Trip Time (ms)
                                               Network−Level       300

                                                                   200
                                                                                                                                                                                              ideal
                                                                                                                                                                                              america
                                                                   100
                                                                                                                                                                                              asia
                                                                                                                                                                                              europe
                                                                     0
                                                                      0                    50         100        150       200                                                                                 250
                                                                                          Application−Level Round−Trip Time (ms)




                     •        Applica?on-­‐level	
  delay	
  involves	
  processing	
  ?me:	
  tsent-­‐treceived	
  	
  

                     •        Network-­‐level	
  delay	
  between	
  the	
  source	
  and	
  the	
  engine:	
  RTT	
  

                     •        Cloud	
  DC	
  does	
  not	
  increase	
  applica6on-­‐level	
  delay	
  



	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  10/23	
  	
  
Cloud	
  Performance	
  
                                                                           Processing	
  Measurements	
  



                     •  Goal	
  
                                  –  Explore	
  performance	
  varia?on	
  with	
  ?me-­‐of-­‐day	
  (processing	
  and	
  latency)	
  
                                  –  Check	
  if	
  cloud	
  VMs	
  can	
  scale	
  efficiently	
  with	
  varying	
  input	
  rate	
  

                     •        Experimental	
  set-­‐up	
  
                                  –  Dataset	
  
                                               •  Esper	
  benchmark	
  tool	
  
                                               •  Stream	
  of	
  shares	
  and	
  stock	
  values	
  for	
  a	
  given	
  symbol	
  at	
  a	
  fixed	
  rate	
  (30000	
  tuples/sec)	
  
                                  –  Submi9er	
  
                                               •  10	
  Extra	
  large	
  Amazon	
  EC2	
  VMs:	
  15GB,	
  8	
  ECU	
  
                                  –  Nodes	
  
                                               •  10	
  Small	
  Amazon	
  EC2	
  VMs:	
  1.7	
  GB,	
  1	
  ECU	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  11/23	
  	
  
Cloud	
  Performance	
  
                                                                           Processing	
  Measurements	
  
                                                                                             Day 1                                                                         Day 2
                                          Latency     50
                                           (ms)

                                                          0          4
                                                              x 10
                                            Throughput
                                             (tuples/s)




                                                          2

                                                          0
                                                                7 8 9 10111213141516171819                                                    7 8 9 10111213141516171819
                                                                  Time of day, 24−hour format                                                   Time of day, 24−hour format



                     •        Throughput	
  remains	
  rela?vely	
  stable	
  over	
  the	
  measurement	
  period	
  

                     •        Latency	
  suffers	
  more	
  from	
  unpredictable	
  outliers	
  

                     •        No	
  obvious	
  pa9ern	
  to	
  correlate	
  performance	
  with	
  ?me-­‐of-­‐day	
  



	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  12/23	
  	
  
Cloud	
  Performance	
  
                                                                                    Processing	
  Measurements	
  

                                                                                5   Small VM instances                                                    Large VM instances
                                                                            x 10
                                                                        2
                                                                      1.8
                                              Throughput − tuples/s

                                                                      1.6
                                                                      1.4
                                                                      1.2
                                                                        1
                                                                      0.8
                                                                      0.6
                                                                      0.4
                                                                      0.2
                                                                        0
                                                                             1 3 5 7 9 11 13 15 17                                         1 3 5 7 9 11 13 15 17
                                                                             Input Data Rate − x10000 tuples/s                             Input Data Rate − x10000 tuples/s




                     •        Cloud	
  VMs	
  can	
  be	
  used	
  to	
  scale	
  efficiently	
  with	
  an	
  increasing	
  input	
  rate	
  

                     •        The	
  number	
  of	
  VMs	
  depends	
  on	
  their	
  type,	
  as	
  expected	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  13/23	
  	
  
Outline	
  



                     1.            Cloud	
  Performance	
  
                                1.             Network	
  Measurements	
  
                                2.             Processing	
  Measurements	
  
                                3.             Discussion	
  
                     2.            Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                1.             Architecture	
  
                                2.             Algorithm	
  
                     3.            Experimental	
  Evalua?on	
  
                                1.             Descrip?on	
  
                                2.             Results	
  
                     4.            Future	
  Work	
  and	
  Conclusions	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  14/23	
  	
  
Adap?ve	
  Cloud	
  Stream	
  Processing	
  

                     •        Elas?c	
  stream	
  processing	
  system	
  to	
  scale	
  the	
  number	
  of	
  VMs	
  to	
  input	
  stream	
  rates	
  

                     •  Goals	
  
                                  –  Low-­‐latency	
  with	
  a	
  given	
  throughput	
  
                                  –  Keep	
  VMs	
  opera?ng	
  to	
  their	
  maximum	
  processing	
  capacity	
  

                     •        Workload	
  is	
  par??oned	
  and	
  balanced	
  across	
  mul?ple	
  VMs	
  
                     •        Many	
  VMs	
  available	
  to	
  scale	
  up	
  and	
  down	
  to	
  workload	
  demands	
  
                     •        Collector	
  gathers	
  results	
  from	
  engines	
  and	
  process	
  addi?onal	
  queries	
  
                                                                                                                            VM	
  
                                                                                                                         engine	
  

                                                                                                                            VM	
  
                                                             source	
  1	
                                               engine	
  

                                                                                                                            VM	
                                                   collector	
  
                                                             source	
  2	
                                               engine	
  


                                                                                                                            VM	
  
                                                                                                                         engine	
  

                                                    Stream	
  source	
                                             Sub-­‐query	
  1	
                                           Sub-­‐query	
  2	
  


	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  15/23	
  	
  
Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                                                                 Algorithm	
  I	
  

                                                                                                           VM	
  




                                                                                     N virtual machines
                                                                                                                      Proc.	
  
                                                                                                          Esper	
     Rate	
  

                    Input                                                                                  VM	
       Proc.	
  
                                             Tuple	
                                                                                                                                     Proc                                    Extra
                    Rate                   submiier	
                                                     Esper	
     Rate	
                                             Σ	
             Rate                         -­‐	
      Rate

                                                                                                           VM	
       Proc.	
  
                                                                                                          Esper	
     Rate	
  



                                                                                                                            /	
  

                                                                                                                                                                                                                                Average
                                                                                                                                                                                                                                Rate


                     •        Gathering	
  and	
  calcula6on	
  
                                  –  Gathers	
  processing	
  rates	
  from	
  VMs	
  	
  
                                  –  Obtains	
  
                                       •  Total	
  extra	
  processing	
  rate	
  (Extra rate)	
  
                                       •  Average	
  processing	
  rate	
  per	
  VM	
  (Average rate)	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  16/23	
  	
  
Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                                                                   Algorithm	
  II	
  
                                                                                                          Extra                                Average
                                                                                                          Rate                   /	
           Rate



                                                                                                               N
                                                                   scale	
  up	
                                                 Σ	
  

                                                           Yes	
  
                                                                                                       Average
                                                                                                       Rate                        Store	
  
                       Extra
                       Rate                      >	
  0	
  ?	
                                                                                                                                          N’
                                                                                                                                                                                                                           Return	
  

                                                           No	
  

                                                             scale	
  down	
                              Input
                                                                                                          Rate                   /	
  
                     •        Decision	
  stage	
  
                                  –  Calculates	
  new	
  number	
  of	
  machines	
  (N’)	
  
                                  –  Scale	
  up	
  
                                       •  Stores	
  the	
  average	
  rate	
  as	
  maximum	
  average	
  rate	
  
                                  –  Scale	
  down	
  
                                       •  Uses	
  last	
  maximum	
  average	
  rate	
  



	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  17/23	
  	
  
Outline	
  



                     1.            Cloud	
  Performance	
  
                                1.             Network	
  Measurements	
  
                                2.             Processing	
  Measurements	
  
                                3.             Discussion	
  
                     2.            Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                1.             Architecture	
  
                                2.             Algorithm	
  
                     3.            Experimental	
  Evalua?on	
  
                                1.             Descrip?on	
  
                                2.             Results	
  
                     4.            Future	
  Work	
  and	
  Conclusions	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  18/23	
  	
  
Experimental	
  Evalua?on	
  
                                                                                                          Descrip?on	
  
                     •        Goals	
  
                               –  Adaptability	
  of	
  the	
  algorithm	
  against	
  varying	
  input	
  rates	
  
                               –  Implica?ons	
  on	
  stream	
  processing	
  performance	
  to	
  adapta?on	
  
                     •        Experimental	
  set-­‐up	
  
                                  –  Integrated	
  with	
  Esper	
  processing	
  system	
  engine	
  
                                  –  Framework	
  to	
  control	
  VMs	
  and	
  to	
  collect	
  performance	
  metrics	
  
                                        •  Throughput,	
  processing	
  latency	
  and	
  network	
  latency	
  
                                        •  Collec?on	
  of	
  shell	
  script	
  
                                  –  Deployed	
  on	
  Amazon	
  EC2	
  
                                                                                                                        Amazon	
  EC2	
  
                                                          Controller	
  
                                                                                                                                VM	
  
                                                                                                                              Esper	
  

                                                                                                                                VM	
  
                                                          Esper	
  tuple	
                                                    Esper	
  
                                                           submiier	
  
                                                                                                                                VM	
                                                           Esper	
  
                                                          Esper	
  tuple	
                                                    Esper	
  
                                                           submiier	
  


                                                                                                                                VM	
  
                                                                                                                             engine	
  

                                                 Stream	
  source	
                                                  Sub-­‐query	
  1	
                                                 Sub-­‐query	
  2	
  
                                              Random	
  values	
  of	
                                  Maximum	
  value	
  of	
  each	
  stock	
                      Collec?on	
  and	
  merge	
  of	
  all	
  results	
  
                                           different	
  stock	
  symbols	
                                   symbol	
  per	
  second	
                                               Same	
  query	
  
	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  19/23	
  	
  
Experimental	
  Evalua?on	
  
                                                                                                                  Results	
  

                                                          5
                                                   x 10
    Small	
  Instances	
  




                                                                                                                                                      Number of VMs
                                                              Input Rate     Tuples dropped                 Number of nodes
                                          1.5
                             Tuples/sec




                                                                                                                                                4
                                               1                                                                                                3
                                                                                                                                                                      •  Processing	
  latency	
  remains	
  
                                                                                                                                                2                        low:	
  7	
  –	
  28	
  μs	
  	
  
                                          0.5
                                                                                                                                                1
                                               0
                                                              100      200     300     400               500              600              700                        •  Scales	
  up	
  and	
  down	
  the	
  
                                                                                Time (sec)                                                                               number	
  of	
  VMs	
  as	
  required	
  by	
  
                                                                                                                                                                         the	
  input	
  rate	
  

                                                                                                                                                                      •  There	
  is	
  a	
  significant	
  reac?on	
  
                                               2
                                                   x 10
                                                       5
                                                                                                                                               2
                                                                                                                                                                         delay	
  before	
  VMs	
  are	
  scaled	
  
      Large	
  Instances	
  




                                                                                                                                                                         up	
  and	
  down	
  

                                                                                                                                                   Number of VMs
                                                              Input Rate     Tuples dropped               Number of nodes
                                  Tuples/sec




                                               1                                                                                               1                      •  VMs	
  are	
  pre-­‐allocated	
  

                                               0                                                                                            0
                                                              100     200     300     400               500              600              700
                                                                               Time (sec)


	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  20/23	
  	
  
Outline	
  



                     1.            Cloud	
  Performance	
  
                                1.             Network	
  Measurements	
  
                                2.             Processing	
  Measurements	
  
                                3.             Discussion	
  
                     2.            Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                1.             Architecture	
  
                                2.             Algorithm	
  
                     3.            Experimental	
  Evalua?on	
  
                                1.             Descrip?on	
  
                                2.             Results	
  
                     4.            Future	
  Work	
  and	
  Conclusions	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  21/23	
  	
  
Future	
  Work	
  




                     •        Inves?gate	
  ways	
  to	
  reduce	
  the	
  reac?on	
  delay	
  to	
  performance	
  viola?ons	
  

                     •        Predict	
  the	
  future	
  behaviour	
  of	
  input	
  data	
  rates	
  

                     •        Inves?gate	
  cost	
  models	
  for	
  alloca?on	
  of	
  small	
  and	
  large	
  VM	
  instances	
  

                     •        Evaluate	
  our	
  system	
  in	
  other	
  cloud	
  environments	
  

                     •        Extensive	
  evalua?on	
  over	
  longer	
  periods	
  of	
  ?me	
  and	
  different	
  VM	
  types	
  	
  




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  22/23	
  	
  
Conclusions	
  


                     •        An	
  adap?ve	
  approach	
  to	
  provision	
  stream	
  processing	
  systems	
  in	
  the	
  cloud	
  

                     •        Public	
  clouds	
  are	
  suitable	
  for	
  stream	
  processing	
  

                     •        Network	
  latency	
  is	
  the	
  domina?ng	
  factor	
  in	
  public	
  clouds	
  

                     •        Our	
  approach	
  can	
  adap?vely	
  scale	
  the	
  number	
  of	
  VMs	
  to	
  input	
  rates	
  

                     •        Processing	
  latency	
  and	
  data	
  loss	
  remain	
  low	
  


                                                                                                        Javier	
  Cerviño	
  
                                                                                              email:	
  jcervino@dit.upm.es	
  
                                                                                                          Thank	
  you!	
  

                                                                                                           Ques?ons?	
  
	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  23/23	
  	
  
Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                                                                Algorithm	
  
e VM instances                                     Algorithm 1 Adaptive provisioning of a cloud-based DSPS
                                                   Require: totalInRate, N , maxRatePerVM
                                                   Ensure: N 0 s.t. projRatePerVM ⇤ N 0 = totalInRate
                                                    1: expRatePerVM = btotalInRate/N c
                                                    2: totalExtraRateForVMs = 0; totalProcRate = 0
                                                    3: for all deployed VMs do
                                                    4:    totalExtraRateForVMs += expRatePerVM -
                                                          getRate(VM )
 7 9 11 13 15 17                                    5:    totalProcRate += getRate(VM )
Rate − x10000 tuples/s
                                                    6: end for
                                                    7: avgRatePerVM = b(totalProcRate/N )c
sizes on Amazon EC2
)                                                   8: if totalExtraRateForVMs > 0 then
                                                    9:    N 0 = N +d(totalExtraRateForVMs/avgRatePerVM )e
                                                   10:    maxRatePerVM = avgRatePerVM
                                                   11: else if totalExtraRateForVMs < 0 then
                                                   12:    N 0 = dtotalInRate/maxRatePerVM e
                                                   13: end if
                                                   14: projRatePerVM = totalInRate/N 0
                                                   15: return N 0


 	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  24/23	
  	
  
Adap?ve	
  Cloud	
  Stream	
  Processing	
  
                                                                               Algorithm	
  

                                                                      getExpectedVMs(totalInRate, currentVMs) {

                                                                             expectedRatePerVM =                             totalInRate/currentVMs

                                            Input	
  rate	
  	
              for each deployed VM {
                                          calcula?ons	
                             vmRate = getRate(VM)
                                                                                    totalExtraRate += (expRatePerVM-vmRate)
                                                                             }
                                                                             avgRatePerVM = totalProcRate/N

                                                                             if (totalExtraRateForVMs > 0) {
                                             Increasing	
                       expectedVMs = currentVMs + totalExtraRate/avgRate
                                                                                maxRatePerVM = avgRatePerVM
                                             Input	
  rate	
  
                                                                             }


                                            Decreasing	
                     else if (totalExtraRateForVMs < 0) {
                                                                                expectedVMs = totalInRate / maxRatePerVM
                                             Input	
  rate	
                 }

                                                                      }




	
  Javier	
  Cerviño,	
  Eva	
  Kalyvianaki,	
  Joaquín	
  Salvachúa,	
  Peter	
  Pietzuch	
  	
  	
  	
  	
  	
  	
  	
  Adap?ve	
  Provisioning	
  of	
  Stream	
  Processing	
  Systems	
  in	
  the	
  Cloud	
  	
  	
  	
  	
  	
  	
  	
  	
  25/23	
  	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Open repository 2011_duracloud-final
Open repository 2011_duracloud-finalOpen repository 2011_duracloud-final
Open repository 2011_duracloud-final
Mark Diggory
 
Erlang Cache
Erlang CacheErlang Cache
Erlang Cache
ice j
 
Adaptive location oriented content delivery in
Adaptive location oriented content delivery inAdaptive location oriented content delivery in
Adaptive location oriented content delivery in
ambitlick
 
Architecting cloud with OpenStack
Architecting cloud with OpenStackArchitecting cloud with OpenStack
Architecting cloud with OpenStack
Choe Cheng-Dae
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflow
rjmurphyslideshare
 

Was ist angesagt? (20)

Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
 
Open repository 2011_duracloud-final
Open repository 2011_duracloud-finalOpen repository 2011_duracloud-final
Open repository 2011_duracloud-final
 
Erlang Cache
Erlang CacheErlang Cache
Erlang Cache
 
Adaptive location oriented content delivery in
Adaptive location oriented content delivery inAdaptive location oriented content delivery in
Adaptive location oriented content delivery in
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
 
Swiss National Supercomputing Center
Swiss National Supercomputing CenterSwiss National Supercomputing Center
Swiss National Supercomputing Center
 
Architecting cloud with OpenStack
Architecting cloud with OpenStackArchitecting cloud with OpenStack
Architecting cloud with OpenStack
 
Osac2012
Osac2012Osac2012
Osac2012
 
REAL-TIME ROUTING PROTOCOLS FOR WIRELESS SENSOR NETWORKS: A SURVEY
REAL-TIME ROUTING PROTOCOLS FOR WIRELESS SENSOR NETWORKS: A SURVEYREAL-TIME ROUTING PROTOCOLS FOR WIRELESS SENSOR NETWORKS: A SURVEY
REAL-TIME ROUTING PROTOCOLS FOR WIRELESS SENSOR NETWORKS: A SURVEY
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflow
 
A Technical Overview of DuraCloud
A Technical Overview of DuraCloudA Technical Overview of DuraCloud
A Technical Overview of DuraCloud
 
Innovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceInnovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle Coherence
 
2011 & 2012 ieee projects
2011 & 2012 ieee projects2011 & 2012 ieee projects
2011 & 2012 ieee projects
 
Cloud Computing : Security and Forensics
Cloud Computing : Security and ForensicsCloud Computing : Security and Forensics
Cloud Computing : Security and Forensics
 
Windows Azure Uzerinden Alinabilen Hizmetler
Windows Azure Uzerinden Alinabilen HizmetlerWindows Azure Uzerinden Alinabilen Hizmetler
Windows Azure Uzerinden Alinabilen Hizmetler
 
Cliser
CliserCliser
Cliser
 
#lspe: Dynamic Scaling
#lspe: Dynamic Scaling #lspe: Dynamic Scaling
#lspe: Dynamic Scaling
 
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
A California-Wide Cyberinfrastructure for Data-Intensive Research
A California-Wide Cyberinfrastructure for Data-Intensive ResearchA California-Wide Cyberinfrastructure for Data-Intensive Research
A California-Wide Cyberinfrastructure for Data-Intensive Research
 

Andere mochten auch (7)

doumi94
doumi94doumi94
doumi94
 
北京猿人的探討
北京猿人的探討北京猿人的探討
北京猿人的探討
 
De Klantrede 2015 - H3ROES
De Klantrede 2015 - H3ROESDe Klantrede 2015 - H3ROES
De Klantrede 2015 - H3ROES
 
Canada Games How to Tweet Presentation
Canada Games How to Tweet PresentationCanada Games How to Tweet Presentation
Canada Games How to Tweet Presentation
 
Presentación WebRTC y Lynckia
Presentación WebRTC y LynckiaPresentación WebRTC y Lynckia
Presentación WebRTC y Lynckia
 
WebRTC
WebRTCWebRTC
WebRTC
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Ähnlich wie Adapative Provisioning of Stream Processing Systems in the Cloud

Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategy
drmarcustillett
 
Introduction to Cloud Computing - CCGRID 2009
Introduction to Cloud Computing - CCGRID 2009Introduction to Cloud Computing - CCGRID 2009
Introduction to Cloud Computing - CCGRID 2009
James Broberg
 
Intelligent cloud computing
Intelligent cloud computingIntelligent cloud computing
Intelligent cloud computing
LINE+
 

Ähnlich wie Adapative Provisioning of Stream Processing Systems in the Cloud (20)

EMEA OpenStack Day Intro, July 13th 2011 in London
EMEA OpenStack Day Intro, July 13th 2011 in LondonEMEA OpenStack Day Intro, July 13th 2011 in London
EMEA OpenStack Day Intro, July 13th 2011 in London
 
Sensor Data Management
Sensor Data ManagementSensor Data Management
Sensor Data Management
 
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introEMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
 
Optimization of Resource Provisioning Cost in Cloud Computing
Optimization of Resource Provisioning Cost in Cloud Computing Optimization of Resource Provisioning Cost in Cloud Computing
Optimization of Resource Provisioning Cost in Cloud Computing
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategy
 
OpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overview
 
Cloud Economics in Training and Simulation
Cloud Economics in Training and SimulationCloud Economics in Training and Simulation
Cloud Economics in Training and Simulation
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web ...
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web ...SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web ...
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web ...
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
Cloud: CDN Killer?
Cloud: CDN Killer? Cloud: CDN Killer?
Cloud: CDN Killer?
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Introduction to Cloud Computing - CCGRID 2009
Introduction to Cloud Computing - CCGRID 2009Introduction to Cloud Computing - CCGRID 2009
Introduction to Cloud Computing - CCGRID 2009
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
 
Intelligent cloud computing
Intelligent cloud computingIntelligent cloud computing
Intelligent cloud computing
 
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
 
Psdot 15 performance analysis of cloud computing
Psdot 15 performance analysis of cloud computingPsdot 15 performance analysis of cloud computing
Psdot 15 performance analysis of cloud computing
 
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Adapative Provisioning of Stream Processing Systems in the Cloud

  • 1. Adap?ve  Provisioning  of  Stream  Processing  Systems   in  the  Cloud     Javier  Cerviño#1,  Eva  Kalyvianaki*2,   Joaquín  Salvachúa#3,  Peter  Pietzuch*4     #  Universidad  Politécnica  de  Madrid,  *  Imperial  College  London   1jcervino@dit.upm.es,  2ekalyv@doc.ic.ac.uk   3jsalvachua@dit.upm.es,  4prp@doc.ic.ac.uk     SMDB  2012    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  1/23    
  • 2. Data  Stream  Processing  Systems  (DSPS)   •  Real-­‐?me  processing  of  con?nuous  data   •  Financial  trading,  sensor  networks,  etc.   •  Data  from  sources  arrive  as  streams   –  Time-­‐ordered  sequence  of  tuples   •  Characteris?cs   –  Tuples  arrival  rates  are  not  uniform   •  Performance  requirements   –  Low  latency   –  Guaranteed  throughput   •  Adap6ve  provisioning   –  Use  resources  on  demand    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  2/23    
  • 3. Cloud  Compu?ng   Cloud  offers  elas?c  compu?ng  by  providing  resources  on  demand   –  Characteris?cs   •  Scalability   •  Geographical  Distribu?on   •  Virtualiza?on   •  Applica?on  Programming  Interface  (API)   –  Amazon  EC2   •  Public  cloud  provider   •  Infrastructure  as  a  Service   •  Images  and  Virtual  Machines    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  3/23    
  • 4. Related  work   •  Cloud  Stream  Processing   [Kleiminger  et  al,  SMDB’11]     •  Cloud  network  performance   –  Cloud  and  Internet  paths  support  streaming  data  into  cloud  DCs?   [Barker  et  al,  MMSys’07],  [Wang  et  al,  INFOCOM’10],  [Jackson  et  al,  CLOUDCOM’10]     •  Cloud  computa?on  performance   –  Best  effort  VMs  support  low-­‐latency,  low-­‐jiier  and  high-­‐throughput  stream   processing?   [Barker  et  al,  MMSys’07]     –  Computa?onal  power  of  Amazon  EC2  VMs  for  standard  stream  processes  tasks?   [Diirich  et  al,  VLDB’10],      Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  4/23    
  • 5. Contribu?ons   •  Explore  the  suitability  of  cloud  infrastructures  for  stream  processing,  (case   study  on  Amazon  EC2)   –  Measure  network  and  processing  latencies,  jiier  and  throughput   •  An  adap?ve  algorithm  to  allocate  cloud  resources  on-­‐demand   –  Resizes  the  number  of  VMs  in  a  DSPS  deployment   •  Algorithm  evalua?on   –  Deploying  the  algorithm  as  part  of  a  DSPS  on  Amazon  EC2    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  5/23    
  • 6. Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  6/23    
  • 7. Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions      Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  7/23    
  • 8. Cloud  Performance   Network  Measurements   •  Goal:  Explore  network  parameters  that  affect  stream  processing  condi?ons:     –  Ji9er,  latency  and  bandwidth     •  Experimental  set-­‐up   –  Stream  engines   •  Mock  engines  without  processing   •  9  Amazon  EC2  instances:  3  in  US,  3  in  EU  and  3  in  Asia.   •  Large  Amazon  EC2  instances:  7.5GB  and  4  ECU   –  Stream  sources   •  9  distributed  PlanetLab  nodes:  3  in  US,  3  in  EU  and  3  in  Asia.   –  Dataset   •  Random  data  at  three  different  data  rates:  10kbps,  100kbps  and  1Mbps   Europe PlanetLab Cloud USA Asia node instance SOURCE PROCESSING ENGINE  Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  8/23    
  • 9. Cloud  Performance   Network  Measurements   high rate medium rate low rate 4000 Jitter (ms) 2000 0 1 2 3 4 5 6 7 8 9 PlanetLab nodes •  Average  jiier  is  less  than  2.5  μs   •  Some  outliers  have  a  value  of  almost  4  seconds   •  Low  ji9er  with  less  than  3%  of  high  outliers    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  9/23    
  • 10. Cloud  Performance   Network  Measurements   Round−Trip Time (ms) Network−Level 300 200 ideal america 100 asia europe 0 0 50 100 150 200 250 Application−Level Round−Trip Time (ms) •  Applica?on-­‐level  delay  involves  processing  ?me:  tsent-­‐treceived     •  Network-­‐level  delay  between  the  source  and  the  engine:  RTT   •  Cloud  DC  does  not  increase  applica6on-­‐level  delay    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  10/23    
  • 11. Cloud  Performance   Processing  Measurements   •  Goal   –  Explore  performance  varia?on  with  ?me-­‐of-­‐day  (processing  and  latency)   –  Check  if  cloud  VMs  can  scale  efficiently  with  varying  input  rate   •  Experimental  set-­‐up   –  Dataset   •  Esper  benchmark  tool   •  Stream  of  shares  and  stock  values  for  a  given  symbol  at  a  fixed  rate  (30000  tuples/sec)   –  Submi9er   •  10  Extra  large  Amazon  EC2  VMs:  15GB,  8  ECU   –  Nodes   •  10  Small  Amazon  EC2  VMs:  1.7  GB,  1  ECU    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  11/23    
  • 12. Cloud  Performance   Processing  Measurements   Day 1 Day 2 Latency 50 (ms) 0 4 x 10 Throughput (tuples/s) 2 0 7 8 9 10111213141516171819 7 8 9 10111213141516171819 Time of day, 24−hour format Time of day, 24−hour format •  Throughput  remains  rela?vely  stable  over  the  measurement  period   •  Latency  suffers  more  from  unpredictable  outliers   •  No  obvious  pa9ern  to  correlate  performance  with  ?me-­‐of-­‐day    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  12/23    
  • 13. Cloud  Performance   Processing  Measurements   5 Small VM instances Large VM instances x 10 2 1.8 Throughput − tuples/s 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 3 5 7 9 11 13 15 17 1 3 5 7 9 11 13 15 17 Input Data Rate − x10000 tuples/s Input Data Rate − x10000 tuples/s •  Cloud  VMs  can  be  used  to  scale  efficiently  with  an  increasing  input  rate   •  The  number  of  VMs  depends  on  their  type,  as  expected    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  13/23    
  • 14. Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  14/23    
  • 15. Adap?ve  Cloud  Stream  Processing   •  Elas?c  stream  processing  system  to  scale  the  number  of  VMs  to  input  stream  rates   •  Goals   –  Low-­‐latency  with  a  given  throughput   –  Keep  VMs  opera?ng  to  their  maximum  processing  capacity   •  Workload  is  par??oned  and  balanced  across  mul?ple  VMs   •  Many  VMs  available  to  scale  up  and  down  to  workload  demands   •  Collector  gathers  results  from  engines  and  process  addi?onal  queries   VM   engine   VM   source  1   engine   VM   collector   source  2   engine   VM   engine   Stream  source   Sub-­‐query  1   Sub-­‐query  2    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  15/23    
  • 16. Adap?ve  Cloud  Stream  Processing   Algorithm  I   VM   N virtual machines Proc.   Esper   Rate   Input VM   Proc.   Tuple   Proc Extra Rate submiier   Esper   Rate   Σ   Rate -­‐   Rate VM   Proc.   Esper   Rate   /   Average Rate •  Gathering  and  calcula6on   –  Gathers  processing  rates  from  VMs     –  Obtains   •  Total  extra  processing  rate  (Extra rate)   •  Average  processing  rate  per  VM  (Average rate)    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  16/23    
  • 17. Adap?ve  Cloud  Stream  Processing   Algorithm  II   Extra Average Rate /   Rate N scale  up   Σ   Yes   Average Rate Store   Extra Rate >  0  ?   N’ Return   No   scale  down   Input Rate /   •  Decision  stage   –  Calculates  new  number  of  machines  (N’)   –  Scale  up   •  Stores  the  average  rate  as  maximum  average  rate   –  Scale  down   •  Uses  last  maximum  average  rate    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  17/23    
  • 18. Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  18/23    
  • 19. Experimental  Evalua?on   Descrip?on   •  Goals   –  Adaptability  of  the  algorithm  against  varying  input  rates   –  Implica?ons  on  stream  processing  performance  to  adapta?on   •  Experimental  set-­‐up   –  Integrated  with  Esper  processing  system  engine   –  Framework  to  control  VMs  and  to  collect  performance  metrics   •  Throughput,  processing  latency  and  network  latency   •  Collec?on  of  shell  script   –  Deployed  on  Amazon  EC2   Amazon  EC2   Controller   VM   Esper   VM   Esper  tuple   Esper   submiier   VM   Esper   Esper  tuple   Esper   submiier   VM   engine   Stream  source   Sub-­‐query  1   Sub-­‐query  2   Random  values  of   Maximum  value  of  each  stock   Collec?on  and  merge  of  all  results   different  stock  symbols   symbol  per  second   Same  query    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  19/23    
  • 20. Experimental  Evalua?on   Results   5 x 10 Small  Instances   Number of VMs Input Rate Tuples dropped Number of nodes 1.5 Tuples/sec 4 1 3 •  Processing  latency  remains   2 low:  7  –  28  μs     0.5 1 0 100 200 300 400 500 600 700 •  Scales  up  and  down  the   Time (sec) number  of  VMs  as  required  by   the  input  rate   •  There  is  a  significant  reac?on   2 x 10 5 2 delay  before  VMs  are  scaled   Large  Instances   up  and  down   Number of VMs Input Rate Tuples dropped Number of nodes Tuples/sec 1 1 •  VMs  are  pre-­‐allocated   0 0 100 200 300 400 500 600 700 Time (sec)  Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  20/23    
  • 21. Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  21/23    
  • 22. Future  Work   •  Inves?gate  ways  to  reduce  the  reac?on  delay  to  performance  viola?ons   •  Predict  the  future  behaviour  of  input  data  rates   •  Inves?gate  cost  models  for  alloca?on  of  small  and  large  VM  instances   •  Evaluate  our  system  in  other  cloud  environments   •  Extensive  evalua?on  over  longer  periods  of  ?me  and  different  VM  types      Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  22/23    
  • 23. Conclusions   •  An  adap?ve  approach  to  provision  stream  processing  systems  in  the  cloud   •  Public  clouds  are  suitable  for  stream  processing   •  Network  latency  is  the  domina?ng  factor  in  public  clouds   •  Our  approach  can  adap?vely  scale  the  number  of  VMs  to  input  rates   •  Processing  latency  and  data  loss  remain  low   Javier  Cerviño   email:  jcervino@dit.upm.es   Thank  you!   Ques?ons?    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  23/23    
  • 24. Adap?ve  Cloud  Stream  Processing   Algorithm   e VM instances Algorithm 1 Adaptive provisioning of a cloud-based DSPS Require: totalInRate, N , maxRatePerVM Ensure: N 0 s.t. projRatePerVM ⇤ N 0 = totalInRate 1: expRatePerVM = btotalInRate/N c 2: totalExtraRateForVMs = 0; totalProcRate = 0 3: for all deployed VMs do 4: totalExtraRateForVMs += expRatePerVM - getRate(VM ) 7 9 11 13 15 17 5: totalProcRate += getRate(VM ) Rate − x10000 tuples/s 6: end for 7: avgRatePerVM = b(totalProcRate/N )c sizes on Amazon EC2 ) 8: if totalExtraRateForVMs > 0 then 9: N 0 = N +d(totalExtraRateForVMs/avgRatePerVM )e 10: maxRatePerVM = avgRatePerVM 11: else if totalExtraRateForVMs < 0 then 12: N 0 = dtotalInRate/maxRatePerVM e 13: end if 14: projRatePerVM = totalInRate/N 0 15: return N 0  Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  24/23    
  • 25. Adap?ve  Cloud  Stream  Processing   Algorithm   getExpectedVMs(totalInRate, currentVMs) { expectedRatePerVM = totalInRate/currentVMs Input  rate     for each deployed VM { calcula?ons   vmRate = getRate(VM) totalExtraRate += (expRatePerVM-vmRate) } avgRatePerVM = totalProcRate/N if (totalExtraRateForVMs > 0) { Increasing   expectedVMs = currentVMs + totalExtraRate/avgRate maxRatePerVM = avgRatePerVM Input  rate   } Decreasing   else if (totalExtraRateForVMs < 0) { expectedVMs = totalInRate / maxRatePerVM Input  rate   } }  Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  25/23