SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Network-­‐aware	
  Data	
  
Management	
  for	
  Large-­‐scale	
  
Distributed	
  Applications	
  
Sept	
  28,	
  2015	
  
Mehmet	
  Balman	
  
h3p://balman.info	
  
	
  
Senior	
  Performance	
  Engineer	
  at	
  VMware	
  Inc.	
  	
  
Guest/Affiliate	
  at	
  Berkeley	
  Lab	
  
1	
  
About	
  me:	
  
Ø 2013:	
  Performance,	
  OCTO,	
  VMware,	
  Palo	
  Alto,	
  CA	
  
Ø 2009:	
  ComputaNonal	
  Research	
  Division	
  (CRD)	
  at	
  Lawrence	
  Berkeley	
  
NaNonal	
  Laboratory	
  (LBNL)	
  
Ø 2005:	
  Center	
  for	
  ComputaNon	
  &	
  Technology	
  (CCT),	
  Baton	
  Rouge,	
  LA	
  
v Computer	
  Science,	
  Louisiana	
  State	
  University	
  (2010,2008)	
  
v Bogazici	
  University,	
  Istanbul,	
  Turkey	
  (2006,2000)	
  
	
  
Data	
  Transfer	
  Scheduling	
  with	
  Advance	
  ReservaNon	
  and	
  Provisioning,	
  Ph.D.	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Failure-­‐Awareness	
  and	
  Dynamic	
  AdaptaNon	
  in	
  Data	
  Scheduling,	
  M.S.	
  
Parallel	
  Tetrahedral	
  Mesh	
  Refinement,	
  M.S.	
  
2	
  
Why	
  Network-­‐aware?	
  
Networking	
  is	
  one	
  of	
  the	
  major	
  components	
  in	
  many	
  of	
  the	
  
soluNons	
  today	
  
•  Distributed	
  data	
  and	
  compute	
  resources	
  
•  CollaboraNon:	
  data	
  to	
  be	
  shared	
  between	
  remote	
  sites	
  
•  Data	
  centers	
  are	
  complex	
  network	
  infrastructures 	
  	
  
ü What	
  further	
  steps	
  are	
  necessary	
  to	
  take	
  full	
  advantage	
  of	
  future	
  
networking	
  infrastructure?	
  
ü How	
  are	
  we	
  going	
  to	
  deal	
  with	
  performance	
  problems?	
  	
  
ü How	
  can	
  we	
  enhance	
  data	
  management	
  services	
  and	
  make	
  them	
  
network-­‐aware?	
  	
  
New	
  collabora>ons	
  between	
  data	
  management	
  and	
  
networking	
  communi>es.	
  
3	
  
Two	
  major	
  players:	
  
• AbstracNon	
  and	
  Programmability	
  
•  Rapid	
  Development,	
  Intelligent	
  services	
  
•  OrchestraNng	
  compute,	
  storage,	
  and	
  network	
  resources	
  
together	
  
•  IntegraNon	
  and	
  deployment	
  of	
  complex	
  sytems	
  
•  Performance	
  Gap:	
  
•  LimitaNon	
  in	
  current	
  system	
  so3ware	
  vs	
  foreseen	
  	
  speed:	
  
•  Hardware	
  is	
  fast,	
  Soaware	
  is	
  slow	
  	
  
•  Latency	
  vs	
  throughput	
  mismatch	
  will	
  lead	
  to	
  new	
  
innovaBons	
  
4	
  
Outline	
  	
  
•  VSAN	
  +	
  VVOL	
  Storage	
  Performance	
  	
  in	
  Virtualized	
  
Environments	
  
	
  
•  Data	
  Streaming	
  in	
  High-­‐bandwidth	
  Networks	
  
•  Climate100:	
  Advance	
  Network	
  IniNaNve	
  and	
  100Gbps	
  Demo	
  
•  MemzNet:	
  Memory-­‐Mapped	
  Network	
  Zero-­‐copy	
  Channels	
  	
  
•  Core	
  Affinity	
  and	
  End	
  System	
  Tuning	
  in	
  High-­‐Throughput	
  
Flows	
  
•  Network	
  Reserva>on	
  and	
  Online	
  Scheduling	
  (QoS)	
  
•  FlexRes:	
  A	
  Flexible	
  Network	
  ReservaNon	
  Algorithm	
  
•  SchedSim:	
  Online	
  Scheduling	
  with	
  Advance	
  Provisioning	
  	
  
	
  
5	
  
VSAN:	
  virtual	
  SAN	
  
6	
  
VSAN	
  image:	
  blog.vmware.com	
  
Distributed	
  Object	
  
Storage	
  
	
  
Hybrid	
  (SSD+HDD)	
  
VSAN	
  performance	
  work	
  in	
  a	
  nutshell	
  
7	
  
Observer	
  image:	
  blog.vmware.com	
  
•  Every	
  write	
  operaNon	
  needs	
  to	
  go	
  over	
  network	
  (and	
  
network	
  is	
  not	
  free)	
  
•  Each	
  layer	
  (cache,	
  disk,	
  object	
  management,	
  etc.)	
  needs	
  
resources	
  (CPU,	
  memory)	
  
•  Resource	
  limitaNons	
  vs	
  Latency	
  effect	
  
•  Needs	
  to	
  support	
  thousands	
  of	
  VMs	
   Placement	
  of	
  Objects:	
  
•  Which	
  Host?	
  
•  Which	
  Disk/SSD	
  in	
  the	
  
Host?	
  
What	
  if	
  there	
  are	
  
failures,	
  migraNons,	
  	
  
and	
  if	
  we	
  need	
  to	
  
rebalance	
  
8	
  
VVOL:	
  
virtual	
  volumes	
  
VVOL	
  image:	
  blog.vmware.com	
  
Offloading	
  	
  
control	
  operaNons	
  to	
  
the	
  storage	
  array	
  	
  
	
  
•  powerOn	
  
•  powerOff	
  
•  Delete	
  
•  clone	
  
VVOL	
  performance	
  work	
  
• Effect	
  of	
  the	
  latency	
  	
  in	
  control	
  path	
  	
  
• 	
  	
  	
   	
  linked	
  clone	
  vs	
  VVOL	
  clones	
  
	
  
	
  
	
  
	
  
	
  
9	
  
Vsphere	
  
Storage	
  	
  
Host	
  
VASA	
  VP	
  
Data	
  path	
  
Control	
  path	
  
	
  
•  Op>mize	
  service	
  latencies	
  
	
  
•  Batching	
  (disklib)	
  
•  Use	
  concurrent	
  opera>ons	
  
	
  
Internet	
  Modeling	
  
•  My	
  first	
  real	
  paper	
  	
  was	
  on	
  Internet	
  Topology	
  
•  CollecNng	
  data	
  from	
  Traceroute	
  Gateways	
  
•  Analyzing:	
  
•  Outdegree	
  
•  Indegree	
  
•  Diameter	
  
•  Reachable	
  Set	
  
10	
  
Outline	
  	
  
•  VSAN	
  +	
  VVOL	
  Storage	
  Performance	
  	
  in	
  Virtualized	
  
Environments	
  
•  Data	
  Streaming	
  in	
  High-­‐bandwidth	
  Networks	
  
•  Climate100:	
  Advance	
  Network	
  IniNaNve	
  and	
  100Gbps	
  Demo	
  
•  MemzNet:	
  Memory-­‐Mapped	
  Network	
  Zero-­‐copy	
  Channels	
  	
  
•  Core	
  Affinity	
  and	
  End	
  System	
  Tuning	
  in	
  High-­‐Throughput	
  
Flows	
  
•  Network	
  Reserva>on	
  and	
  Online	
  Scheduling	
  (QoS)	
  
•  FlexRes:	
  A	
  Flexible	
  Network	
  ReservaNon	
  Algorithm	
  
•  SchedSim:	
  Online	
  Scheduling	
  with	
  Advance	
  Provisioning	
  	
  
	
  
11	
  
100Gbps	
  networking	
  has	
  Jinally	
  arrived!	
  
Applica>ons’	
  Perspec>ve	
  
Increasing	
   the	
   bandwidth	
   is	
   not	
   sufficient	
   by	
   itself;	
   we	
   need	
  
careful	
   evaluaNon	
   of	
   high-­‐bandwidth	
   networks	
   from	
   the	
  
applicaNons’	
  perspecNve.	
  	
  
	
  
1Gbps	
  to	
  10Gbps	
  transiNon	
  	
  
(10	
  years	
  ago)	
  
ApplicaNon	
  did	
  not	
  run	
  10	
  Nmes	
  
faster	
  because	
  there	
  was	
  more	
  
bandwidth	
  available	
  
12	
  
ANI	
  
100Gbps	
  
Demo	
  
•  100Gbps	
  demo	
  by	
  ESnet	
  and	
  
Internet2	
  	
  
	
  
•  ApplicaNon	
  design	
  issues	
  and	
  host	
  
tuning	
  strategies	
  to	
  scale	
  to	
  100Gbps	
  
rates	
  
	
  
•  VisualizaNon	
  of	
  remotely	
  located	
  data	
  
(Cosmology)	
  
	
  
•  Data	
  movement	
  of	
  large	
  	
  datasets	
  with	
  
many	
  files	
  (Climate	
  analysis)	
  
	
  
13	
  
Earth	
  System	
  Grid	
  Federation	
  (ESGF)	
  
14	
  
•  Over	
  2,700	
  sites	
  
•  25,000	
  users	
  
	
  
•  IPCC	
  Fiah	
  Assessment	
  Report	
  (AR5)	
  2PB	
  	
  
•  IPCC	
  Forth	
  Assessment	
  Report	
  (AR4)	
  35TB	
  
•  Remote	
  	
  Data	
  Analysis	
  
•  Bulk	
  Data	
  Movement	
  
Application’s	
  
Perspective:	
  	
  
Climate	
  Data	
  Analysis	
  
15	
  
 
lots-­‐of-­‐small-­‐*iles	
  problem!	
  
*ile-­‐centric	
  tools?	
  	
  
FTP
RPC
request a file
request a file
send file
send file
request
data
send data
•  Keep	
  the	
  network	
  pipe	
  full	
  
•  We	
  want	
  out-­‐of-­‐order	
  and	
  asynchronous	
  send	
  receive	
  	
  
	
   16	
  
Many	
  Concurrent	
  Streams	
  
(a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single
NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps
pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a
different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).	
  
	
  	
  
17	
  
ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min
intervals], TCP buffer size is 50M	

Effects	
  of	
  many	
  concurrent	
  streams	
  
18	
  
Analysis	
  of	
  	
  Core	
  AfJinities	
  
	
  (NUMA	
  Effect)	
  
19	
  Nathan	
  Hanford	
  et	
  al.	
  	
  NDM’13	
  
Sandy	
  Bridge	
  Architecture	
  
Receive	
  process	
  
	
  
20	
  
Analysis	
  of	
  	
  Core	
  AfJinities	
  
	
  (NUMA	
  Effect)	
  
Nathan	
  Hanford	
  et	
  al.	
  
NDM’14	
  
 100Gbps	
  demo	
  environment	
  
RRT:	
  	
  Sea3le	
  –	
  NERSC	
  	
  16ms	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  NERSC	
  –	
  ANL	
  	
  	
  	
  	
  	
  	
  50ms	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  NERSC	
  –	
  ORNL	
  	
  	
  	
  64ms	
  
21	
  
Framework	
  for	
  the	
  Memory-­‐mapped	
  
Network	
  Channel	
  
+	
  SynchronizaNon	
  mechanism	
  for	
  RoCE	
  
-­‐	
  Keep	
  the	
  pipe	
  full	
  for	
  remote	
  analysis	
   22	
  
Moving	
  climate	
  *iles	
  ef*iciently	
  
23	
  
Advantages	
  
•  Decoupling	
  I/O	
  and	
  network	
  operaNons	
  
•  front-­‐end	
  (I/O	
  	
  processing)	
  
•  back-­‐end	
  (networking	
  layer)	
  
	
  
•  Not	
  limited	
  by	
  the	
  characterisNcs	
  of	
  the	
  file	
  sizes	
  
•  On	
  the	
  fly	
  tar	
  approach,	
  	
  bundling	
  and	
  sending	
  	
  many	
  files	
  
together	
  
•  Dynamic	
  data	
  channel	
  management	
  
	
   Can	
   increase/decrease	
   the	
   parallelism	
   level	
   both	
   	
   in	
   the	
   network	
  
communicaNon	
   and	
   I/O	
   read/write	
   operaNons,	
   without	
   closing	
   and	
  
reopening	
   the	
   data	
   channel	
   connecNon	
   (as	
   is	
   done	
   in	
   regular	
   FTP	
  
variants).	
  	
  
MemzNet	
  is	
   	
  is	
  not	
  file-­‐centric.	
  Bookkeeping	
  informaNon	
  is	
  embedded	
  
inside	
  each	
  block.	
  	
  
	
  
24	
  
MemzNet’s	
  Architecture	
  for	
  data	
  
streaming	
  
25	
  
100Gbps	
  Demo	
  
•  CMIP3	
  data	
  (35TB)	
  from	
  the	
  GPFS	
  filesystem	
  at	
  NERSC	
  
•  Block	
  size	
  4MB	
  
•  Each	
  block’s	
  data	
  secNon	
  was	
  aligned	
  according	
  to	
  the	
  
system	
  pagesize.	
  	
  
•  1GB	
  cache	
  both	
  at	
  the	
  client	
  and	
  the	
  server	
  	
  
•  At	
  NERSC,	
  8	
  front-­‐end	
  threads	
  on	
  each	
  host	
  for	
  reading	
  data	
  files	
  
in	
  parallel.	
  
•  	
  At	
  ANL/ORNL,	
  4	
  front-­‐end	
  threads	
  for	
  processing	
  received	
  data	
  
blocks.	
  
•  	
  4	
  parallel	
  TCP	
  streams	
  (four	
  back-­‐end	
  threads)	
  were	
  used	
  for	
  
each	
  host-­‐to-­‐host	
  connecNon.	
  	
  
26	
  
MemzNet’s	
  Performance	
  	
  
TCP	
  buffer	
  size	
  is	
  set	
  to	
  50MB	
  	
  
MemzNetGridFTP
100Gbps demo
ANI Testbed
27	
  
Challenge?	
  
•  High-­‐bandwidth	
  brings	
  new	
  challenges!	
  
•  We	
  need	
  substanNal	
  amount	
  of	
  processing	
  power	
  and	
  involvement	
  of	
  
mulNple	
  cores	
  to	
  fill	
  a	
  40Gbps	
  or	
  100Gbps	
  network	
  	
  
•  Fine-­‐tuning,	
  both	
  in	
  network	
  and	
  applicaNon	
  layers,	
  to	
  take	
  
advantage	
  of	
  the	
  higher	
  network	
  capacity.	
  	
  
•  Incremental	
  improvement	
  in	
  current	
  tools?	
  
•  We	
  cannot	
  expect	
  every	
  applicaNon	
  to	
  tune	
  and	
  improve	
  every	
  Nme	
  we	
  
change	
  the	
  link	
  technology	
  or	
  speed.	
  	
  
	
  
28	
  
MemzNet	
  
•  MemzNet:	
  Memory-­‐mapped	
  Network	
  Channel	
  	
  
•  High-­‐performance	
  data	
  movement	
  
	
  
MemzNet	
  is	
  an	
  iniNal	
  effort	
  to	
  put	
  a	
  new	
  layer	
  
between	
  the	
  applicaNon	
  and	
  the	
  transport	
  layer.	
  
•  Main	
  goal	
  is	
  to	
  define	
  a	
  network	
  channel	
  so	
  applicaNons	
  can	
  
directly	
  use	
  it	
  without	
  the	
  burden	
  of	
  managing/tuning	
  the	
  network	
  
communicaNon.	
  
	
  
29	
  
Tech	
  report:	
  LBNL-­‐6177E	
  
MemzNet	
  =	
  New	
  Execution	
  Model	
  
•  Luigi	
  Rizzo	
  ’s	
  netmap	
  	
  
•  proposes	
  a	
  new	
  API	
  to	
  send/receive	
  data	
  over	
  the	
  
network	
  
• RDMA	
  programming	
  model	
  
•  MemzNet	
  as	
  a	
  memory-­‐management	
  component	
  
• IX:	
  Data	
  Plane	
  OS	
  (Adam	
  Baley	
  et	
  al.	
  @standford	
  –	
  
similar	
  to	
  MemzNet’s	
  model)	
  
•  mTCP	
  (even	
  based	
  /	
  replaces	
  send/receive	
  in	
  user	
  level)	
  
•  Tanenbaum	
  et	
  al.	
  	
  Minimizing	
  context	
  switches:	
  
proposing	
  to	
  use	
  MONITOR/MWAIT	
  for	
  
synchronizaNon	
  
30	
  
Outline	
  	
  
•  VSAN	
  +	
  VVOL	
  Storage	
  Performance	
  	
  in	
  Virtualized	
  
Environments	
  
	
  
•  Data	
  Streaming	
  in	
  High-­‐bandwidth	
  Networks	
  
•  Climate100:	
  Advance	
  Network	
  IniNaNve	
  and	
  100Gbps	
  Demo	
  
•  MemzNet:	
  Memory-­‐Mapped	
  Network	
  Zero-­‐copy	
  Channels	
  	
  
•  Core	
  Affinity	
  and	
  End	
  System	
  Tuning	
  in	
  High-­‐Throughput	
  
Flows	
  
•  Network	
  Reserva>on	
  and	
  Online	
  Scheduling	
  (QoS)	
  
•  FlexRes:	
  A	
  Flexible	
  Network	
  ReservaNon	
  Algorithm	
  
•  SchedSim:	
  Online	
  Scheduling	
  with	
  Advance	
  Provisioning	
  	
  
	
  
31	
  
Problem	
  Domain:	
  Esnet’s	
  OSCARS	
  
32	
  
ASIA-PACIFIC
(ASGC/Kreonet2/
TWAREN)
ASIA-PACIFIC
(KAREN/KREONET2/
NUS-GP/ODN/
REANNZ/SINET/
TRANSPAC/TWAREN)
AUSTRALIA
(AARnet)
LATIN AMERICA
CLARA/CUDI
CANADA
(CANARIE)
RUSSIA
AND CHINA
(GLORIAD)
US R&E
(DREN/Internet2/NLR)
US R&E
(DREN/Internet2/
NASA)
US R&E
(NASA/NISN/
USDOI)
ASIA-PACIFIC
(BNP/HEPNET)
ASIA-PACIFIC
(ASCC/KAREN/
KREONET2/NUS-GP/
ODN/REANNZ/
SINET/TRANSPAC)
AUSTRALIA
(AARnet)
US R&E
(DREN/Internet2/
NISN/NLR)
US R&E
(Internet2/
NLR)
CERN
US R&E
(DREN/Internet2/
NISN)
CANADA
(CANARIE) LHCONE
CANADA
(CANARIE)
FRANCE
(OpenTransit)
RUSSIA
AND CHINA
(GLORIAD)
CERN
(USLHCNet)
ASIA-PACIFIC
(SINET)
EUROPE
(GÉANT/
NORDUNET)
EUROPE
(GÉANT)
LATIN AMERICA
(AMPATH/CLARA)
LATIN AMERICA
(CLARA/CUDI)
HOUSTON
ALBUQUERQUE
El PASO
SUNNYVALE
BOISE
SEATTLE
KANSAS CITY
NASHVILLE
WASHINGTON DC
NEW YORK
BOSTON
CHICAGO
DENVER
SACRAMENTO
ATLANTA
PNNL
SLAC
AMES PPPL
BNL
ORNL
JLAB
FNAL
ANL
LBNL
•  ConnecNng	
  experimental	
  faciliNes	
  and	
  supercompuNng	
  centers	
  
•  On-­‐Demand	
  Secure	
  Circuits	
  and	
  Advance	
  ReservaNon	
  System	
  	
  
•  Guaranteed	
  between	
  collaboraNng	
  insNtuNons	
  by	
  delivering	
  
network-­‐as-­‐a-­‐service	
  
	
  
•  Co-­‐allocaNon	
  of	
  storage	
  and	
  network	
  resources	
  	
  	
  	
  
(SRM:	
  Storage	
  Resource	
  Manager)	
  
	
  
OSCARS	
  provides	
  yes/no	
  
answers	
  to	
  a	
  reservaNon	
  
request	
  for	
  (bandwidth,	
  
start_Bme,	
  end_Bme)	
  
End-­‐to-­‐end	
  ReservaNon:	
  
	
  Storage+Network	
  	
  
Reservation	
  Request	
  
•  Between	
  edge	
  routers	
  
	
  
Need	
  to	
  ensure	
  availability	
  of	
  the	
  requested	
  bandwidth	
  from	
  source	
  to	
  
desBnaBon	
  for	
  the	
  requested	
  Bme	
  interval	
  
	
  
v  	
  R={	
  nsource,	
  ndesBnaBon,	
  Mbandwidth,	
  tstart,	
  tend}.	
  
v  source/desNnaNon	
  end-­‐points	
  
v  Requested	
  bandwidth	
  
v  start/end	
  Nmes	
  
	
  
Commi3ed	
  reservaNons	
  between	
  tstart	
  and	
  tend	
  are	
  examined	
  
	
  
	
  
The	
  shortest	
  path	
  from	
  source	
  to	
  desNnaNon	
  is	
  calculated	
  based	
  on	
  the	
  
engineering	
  metric	
  on	
  each	
  link,	
  and	
  a	
  bandwidth	
  guaranteed	
  path	
  is	
  set	
  
up	
  to	
  commit	
  and	
  eventually	
  complete	
  the	
  reservaNon	
  request	
  for	
  the	
  
given	
  Nme	
  period	
  
33	
  
Reservation	
  
34	
  
v  Components (Graph):
v node (router), port, link (connecting two ports)
v engineering metric (~latency)
v maximum bandwidth (capacity)
v  Reservation:
v source, destination, path, time
v (time t1, t3) A -> B -> D (900Mbps)
v (time t2, t3) A -> C -> D (400Mbps)
v (time t4, t5) A -> B -> D (800Mpbs)
A	
  
C	
  B	
  
D	
  
800Mbps	
  
900Mbps	
   500Mbps	
  
1000Mbps	
  
300Mbps	
  
ReservaNon	
  1	
  
ReservaNon	
  2	
  
ReservaNon	
  3	
  
t1	
  
t2	
   t3	
  
t4	
   t5	
  
Example	
  
(Nme	
  t1,	
  t2)	
  :	
  
	
  
A	
  to	
  D	
  (600Mbps)	
  NO	
  
	
  
A	
  to	
  D	
  (500Mbps)	
  YES	
  
	
  
	
  
	
  
	
  
A	
  
C	
  B	
  
D	
  
0	
  Mbps	
  /	
  900Mbps	
  (900Mbps)	
  
100	
  Mbps	
  /	
  900Mbps	
  (1000Mbps)	
  
800	
  Mbps	
  /	
  0Mbps	
  (800Mbps)	
  
500	
  Mbps	
  /	
  0Mbps	
  (500Mbps)	
  
300	
  Mbps	
  /	
  	
  0	
  Mbps	
  (300Mbps)	
  
AcNve	
  reservaNon	
  
reservaNon	
  1:	
  (Nme	
  t1,	
  t3)	
  	
  A	
  -­‐>	
  B	
  -­‐>	
  D	
  	
  (900Mbps)	
  
reservaNon	
  2:	
  (Nme	
  t1,	
  t3)	
  	
  A	
  -­‐>	
  C	
  -­‐>	
  D	
  	
  (400Mbps)	
  
reservaNon	
  3:	
  (Nme	
  t4,	
  t5)	
  	
  A	
  -­‐>	
  B	
  -­‐>	
  D	
  	
  (800Mpbs)	
  
available/	
  reserved	
  
(capacity)	
  
	
  
35	
  
Example	
  
A	
  
C	
  B	
  
D	
  
0	
  Mbps	
  /	
  900Mbps	
  (900Mbps)	
  
100	
  Mbps	
  /	
  900Mbps	
  (1000Mbps)	
  
400	
  Mbps	
  /	
  400Mbps	
  (800Mbps)	
  
100	
  Mbps	
  /	
  400Mbps	
  (500Mbps)	
  
300	
  Mbps	
  /	
  	
  0	
  Mbps	
  (300Mbps)	
  
(Nme	
  t1,	
  t3)	
  :	
  
	
  
A	
  to	
  D	
  (500Mbps)	
  NO	
  
	
  
	
  
A	
  to	
  C	
  (500Mbps)	
  No	
  
(not	
  max-­‐FLOW!)	
  
	
  
	
  
	
  
	
  
AcNve	
  reservaNon	
  
reservaNon	
  1:	
  (Nme	
  t1,	
  t3)	
  	
  A	
  -­‐>	
  B	
  -­‐>	
  D	
  	
  (900Mbps)	
  
reservaNon	
  2:	
  (Nme	
  t1,	
  t3)	
  	
  A	
  -­‐>	
  C	
  -­‐>	
  D	
  	
  (400Mbps)	
  
reservaNon	
  3:	
  (Nme	
  t4,	
  t5)	
  	
  A	
  -­‐>	
  B	
  -­‐>	
  D	
  	
  (800Mpbs)	
  
available/	
  reserved	
  
(capacity)	
  
	
  
36	
  
Alternative	
  Approach:	
  Flexible	
  Reservations	
  
•  IF	
  the	
  requested	
  bandwidth	
  can	
  not	
  be	
  guaranteed:	
  
•  Try-­‐and-­‐error	
  unNl	
  get	
  an	
  available	
  reservaNon	
  
•  Client	
  is	
  not	
  given	
  other	
  possible	
  opNons	
  
•  How	
  can	
  we	
  enhance	
  the	
  OSCARS	
  reservaNon	
  system?	
  
•  Be	
  Flexible:	
  
•  Submit	
  constraints	
  and	
  the	
  system	
  suggests	
  possible	
  reservaNon	
  opNons	
  
saNsfying	
  given	
  requirements	
  
37	
  
	
  Rs
'={	
  nsource	
  ,	
  ndesBnaBon,	
  MMAXbandwidth,	
  DdataSize,	
  tEarliestStart,	
  tLatestEnd}	
  
	
  
ReservaNon	
  engine	
  finds	
  out	
  the	
  reservaNon	
  	
  
	
   	
   	
   	
  R={	
  nsource,	
  ndesBnaBon,	
  Mbandwidth,	
  tstart,	
  tend}	
  	
  
for	
  the	
  earliest	
  compleNon	
  or	
  for	
  the	
  shortest	
  duraNon	
  	
  
where	
  Mbandwidth≤	
  MMAXbandwidth	
  and	
  tEarliestStart	
  ≤	
  tstart	
  <	
  tend≤	
  tLatestEnd	
  .	
  
Bandwidth	
  Allocation	
  (time-­‐dependent)	
  
	
  	
  	
  	
  
	
  	
  
Modified	
  Dijstra's	
  
algorithms	
  (max	
  available	
  
bandwidth):	
  
	
  
•  BoUleneck	
  constraint	
  	
  
(not	
  addiNve)	
  
•  QoS	
  constraint	
  is	
  addiNve	
  
in	
  shortest	
  path,	
  etc)	
  
38	
  The	
  maximum	
  bandwidth	
  available	
  for	
  allocaNon	
  from	
  a	
  source	
  node	
  to	
  a	
  desNnaNon	
  
node	
  
t1	
   t2	
   t3	
   t4	
   t5	
   t6	
  
Analogous Example
n  A vehicle travelling from city A to city B
n  There are multiple cities between A and B connected with separate
highways.
n  Each highway has a specific speed limit
–  (maximum bandwidth)
n  But we need to reduce our speed if there is high traffic load on the
road
n  We know the load on each highway for every time period
–  (active reservations)
n  The first question is which path the vehicle should follow in order to
reach city B from city A as early as possible (earliest completion)
•  Or, we can delay our journey and start later if the total travel time
would be reduced. Second question is to find the route along with the
starting time for shortest travel duration (shortest duration)
39	
  
Advance bandwidth reservation: we have to set the speed limit before starting and
cannot change during the journey
	
  
Time steps
n  Time steps between t1 and t13
Nme	
  
t4	
  t2	
   t3	
  t1	
   t5	
   t6	
   t7	
   t8	
   t9	
   t10	
   t11	
   t12	
   t13	
  
ReservaNon	
  1	
  
ReservaNon	
  2	
  
ReservaNon	
  3	
  
Res	
  1	
   Res	
  1,2	
  
Res	
  
2	
  
Res	
  3	
  
t4	
  t1	
   t6	
   t7	
   t9	
   t12	
   t13	
  
Nme	
  
Nme	
  steps	
  
Max (2r+1) time steps,
where r is the number of
reservations
ts1	
   ts2	
   ts3	
   ts4	
  
40	
  
Static Graphs
Res	
  1	
   Res	
  1,2	
   Res	
  2	
  
t4	
  t1	
  
t6	
   t7	
   t9	
  
A	
  
C	
  B	
  
D	
  
0	
  Mbps	
  
100	
  Mbps	
  
800	
  Mbps	
  
500	
  Mbps	
  
300	
  Mbps)	
  
A	
  
C	
  B	
  
D	
  
0	
  Mbps	
  
100	
  Mbps	
  
400	
  Mbps	
  
100	
  Mbps	
  
300	
  Mbps)	
  
A	
  
C	
  B	
  
D	
  
900	
  Mbps	
  
1000	
  Mbps	
  
400	
  Mbps	
  
100	
  Mbps	
  
300	
  Mbps)	
  
A	
  
C	
  B	
  
D	
  
900	
  Mbps	
  
1000	
  Mbps	
  
800	
  Mbps	
  
500	
  Mbps	
  
300	
  Mbps)	
  
t4	
   t6	
  
t7	
  
G(ts3)	
   G(ts4)	
  G(ts2)	
  G(ts1)	
  
41	
  
Time Windows
Res	
  1,2	
   Res	
  2	
  
t1	
  
t6	
   t9	
  
A	
  
C	
  B	
  
D	
  
0	
  Mbps	
  
100	
  Mbps	
  
400	
  Mbps	
  
100	
  Mbps	
  
300	
  Mbps	
  
A	
  
C	
  B	
  
D	
  
900	
  Mbps	
  
1000	
  Mbps	
  
400	
  Mbps	
  
100	
  Mbps	
  
300	
  Mbps	
  
t6	
  
Max (s × (s + 1))/2 time windows, where s is the
number of time steps
G(tw)=G(ts3)	
  x	
  G(ts4)	
  
tw=ts1+ts2	
  
Bo3leneck	
  constraint	
  
G(tw)=G(ts1)	
  x	
  G(ts2)	
  
tw=ts3+ts4	
  
42	
  
Time	
  Window	
  List	
  	
  
	
   	
   	
  (special	
  data	
  structures)	
  
now	
   infinite	
  
Time	
  windows	
  list	
  
new	
  reservaNon:	
  	
  reservaNon	
  1,	
  start	
  t1,	
  end	
  t10	
  
now	
   t1	
   t10	
   infinite	
  
Res	
  1	
  
new	
  reservaNon:	
  	
  reservaNon	
  2,	
  start	
  t12,	
  end	
  t20	
  
now	
   t1	
   t10	
   t12	
  
Res	
  1	
  
t20	
   infinite	
  
Res	
  2	
  
43	
  
Careful	
  soaware	
  design	
  makes	
  implementaNon	
  fast	
  and	
  efficient	
  
Performance
max-bandwidth path ~ O(n^2 )
n is the number of nodes in the topology graph
In the worst-case, we may require to search all time
windows, (s × (s + 1))/2, where s is the number of
time steps.
If there are r committed reservations in the search
period, there can be a maximum of 2r + 1 different
time steps in the worst-case.
Overall, the worst-case complexity is bounded
by O(r^2 n^2 )
Note: r is relatively very small compared to the
number of nodes n 44	
  
Example
Reservation 1: (time t1, t6) A -> B -> D
(900Mbps)
Reservation 2: (time t4, t7) A -> C -> D
(400Mbps)
Reservation 3: (time t9, t12) A -> B -> D
(700Mpbs)
A	
  
C	
  B	
  
D	
  
800Mbps	
  
900Mbps	
   500Mbps	
  
1000Mbps	
  
300Mbps	
  
t4	
  t2	
   t3	
  t1	
   t5	
   t6	
   t7	
   t8	
   t9	
   t10	
   t11	
   t12	
   t13	
  
ReservaNon	
  1	
  
ReservaNon	
  2	
  
ReservaNon	
  3	
  
from A to D (earliest completion)
max bandwidth = 200Mbps, volume = 200Mbps x 4 time slots
earliest start = t1, latest finish t13
45	
  
Search Order - Time Windows
Res	
  1	
   Res	
  1,2	
  
Res	
  
2	
  
Res	
  3	
  
t4	
  t1	
   t6	
   t7	
   t9	
   t12	
   t13	
  
Nme	
  
windows	
  
Res	
  1	
  
Res	
  1,	
  2	
  
Res	
  1,	
  2	
  
2	
  
Res	
  1,2	
  	
  
Res	
  1,	
  2	
  
Res	
  2	
  
Res	
  1,	
  2	
  
Res	
  1,	
  2	
  
t1-­‐-­‐t6	
  
t4—t6	
  
t1-­‐-­‐t4	
  
t6—t7	
  
t4—t7	
  
t1—t7	
  
t7—t9	
  
t6—t9	
  
t4—t9	
  
t1—t9	
  
Max	
  bandwidth	
  from	
  A	
  to	
  D	
  
1.  900Mbps	
  	
  (3)	
  
2.  100Mbps	
  	
  (2)	
  
3.  100Mbps	
  	
  (5)	
  
4.  900Mbps	
  	
  (1)	
  
5.  100Mbps	
  	
  (3)	
  
6.  100Mbps	
  	
  (6)	
  
7.  900Mpbs	
  	
  (2)	
  
8.  900Mbps	
  	
  (3)	
  
9.  100Mbps	
  	
  (5)	
  
10.  100Mbps	
  	
  (8)	
  
ReservaNon:	
  (	
  A	
  to	
  D	
  )	
  (100Mbps)	
  start=t1	
  	
  end=t9	
   46	
  
Search Order - Time Windows
Shortest	
  dura>on?	
  	
  
Res	
  1	
   Res	
  1,2	
  
Res	
  
2	
  
Res	
  3	
  
t4	
  t1	
   t6	
   t7	
   t9	
   t12	
   t13	
  
Nme	
  
windows	
  
Res	
  3	
  
Res	
  3	
  t9—t13	
  
t12—t12	
  
t9—t12	
  
Max	
  bandwidth	
  from	
  A	
  to	
  D	
  
1.  200Mbps	
  	
  (3)	
  
2.  900Mbps	
  	
  (1)	
  
3.  200Mbps	
  	
  (4)	
  
	
   	
  ReservaNon:	
  (A	
  to	
  D	
  )	
  (200Mbps)	
  start=t9	
  end=t13	
  
	
   	
  	
  
Ø from	
  A	
  to	
  D,	
  max	
  bandwidth	
  =	
  200Mbps	
  
	
  	
  	
  	
  volume	
  =	
  175Mbps	
  x	
  4	
  Nme	
  slots	
  	
  
	
  	
  	
  	
  earliest	
  start	
  =	
  t1,	
  latest	
  finish	
  t13	
  
	
  
	
   	
  earliest	
  compleNon:	
  	
  (	
  A	
  to	
  D	
  )	
  (100Mbps)	
  start=t1	
  	
  end=t8	
  
	
   	
  shortest	
  duraNon:	
  	
  	
  	
  	
  (	
  A	
  to	
  D	
  )	
  (200Mbps)	
  start=t9	
  	
  end=t12.5	
  
	
  
47	
  
Source	
  >	
  Network	
  >	
  Destination	
  
	
  
A
CB
D
800Mbps	
  
900Mbps	
   500Mbps	
  
1000Mbps	
  
300Mbps	
  
n2	
  
n1	
  
Now	
  we	
  have	
  	
  
mulNple	
  requests	
  
48	
  
With	
  start/end	
  times	
  
•  	
  Each	
  transfer	
  request	
  has	
  start	
  and	
  end	
  Nmes	
  
•  n	
  transfer	
  requests	
  are	
  given	
  (each	
  request	
  has	
  a	
  specific	
  amount	
  of	
  
profit)	
  
•  ObjecNve	
  is	
  to	
  maximize	
  the	
  profit	
  
•  If	
  profit	
  is	
  same	
  for	
  each	
  request,	
  then	
  objecNve	
  is	
  to	
  
maximize	
  the	
  number	
  of	
  jobs	
  in	
  a	
  give	
  Nme	
  period	
  
	
  
•  Unspli3able	
  Flow	
  Problem:	
  
•  An	
  undirected	
  graph,	
  	
  
•  route	
  demand	
  from	
  source(s)	
  to	
  desNnaNons(s)	
  and	
  maximize/minimize	
  
the	
  total	
  profit/cost	
  
	
  
49	
  
	
  The	
  online	
  scheduling	
  method	
  here	
  is	
  inspired	
  from	
  Gale-­‐Shapley	
  algorithm	
  (also	
  
known	
  as	
  stable	
  marriage	
  problem)	
  
Methodology	
  
•  Displace	
  other	
  jobs	
  to	
  open	
  space	
  for	
  the	
  new	
  request	
  
•  	
  we	
  can	
  shia	
  max	
  n	
  jobs?	
  
•  Never	
  accept	
  a	
  job	
  if	
  it	
  causes	
  other	
  commi3ed	
  jobs	
  to	
  break	
  their	
  
criteria	
  
•  Planning	
  ahead	
  (gives	
  opportunity	
  for	
  co-­‐allocaNon)	
  
•  Gives	
  a	
  polynomial	
  approximaNon	
  algorithm	
  
•  The	
  preference	
  converts	
  the	
  UFP	
  problem	
  into	
  Dijkstra	
  path	
  
search	
  
•  UNlizes	
  Nme	
  windows/Nme	
  steps	
  for	
  ranking	
  (be3er	
  than	
  earliest	
  
deadline	
  first)	
  
•  Earliest	
  compleNon	
  +	
  shortest	
  duraNon	
  
•  Minimize	
  concurrency	
  	
  
•  Even	
  random	
  ranking	
  would	
  work	
  (relaxaNon	
  in	
  an	
  NP-­‐hard	
  problem	
  
50	
  
 	
  	
  	
  
51	
  
Recall	
  Time	
  Windows	
  
Res	
  1	
   Res	
  1,2	
  
Res	
  
2	
  
Res	
  3	
  
t4	
  t1	
   t6	
   t7	
   t9	
   t12	
   t13	
  
Nme	
  
windows	
  
Res	
  1	
  
Res	
  1,	
  2	
  
Res	
  1,	
  2	
  
2	
  
Res	
  1,2	
  	
  
Res	
  1,	
  2	
  
Res	
  2	
  
Res	
  1,	
  2	
  
Res	
  1,	
  2	
  
t1-­‐-­‐t6	
  
t4—t6	
  
t1-­‐-­‐t4	
  
t6—t7	
  
t4—t7	
  
t1—t7	
  
t7—t9	
  
t6—t9	
  
t4—t9	
  
t1—t9	
  
Max	
  bandwidth	
  from	
  A	
  to	
  D	
  
1.  900Mbps	
  	
  (3)	
  
2.  100Mbps	
  	
  (2)	
  
3.  100Mbps	
  	
  (5)	
  
4.  900Mbps	
  	
  (1)	
  
5.  100Mbps	
  	
  (3)	
  
6.  100Mbps	
  	
  (6)	
  
7.  900Mpbs	
  	
  (2)	
  
8.  900Mbps	
  	
  	
  (3)	
  
9.  100Mbps	
  	
  (5)	
  
10.  100Mbps	
  	
  (8)	
  
ReservaNon:	
  (	
  A	
  to	
  D	
  )	
  (100Mbps)	
  start=t1	
  	
  end=t9	
   52	
  
Test	
  
	
  
53	
  
In	
  real	
  life,	
  number	
  of	
  
nodes	
  and	
  number	
  of	
  
reservaNon	
  in	
  a	
  given	
  
search	
  interval	
  are	
  
limited	
   See	
  AINA’13	
  paper	
  for	
  results	
  
	
  +	
  comparison	
  with	
  different	
  preference	
  metrics	
  
Autonomic	
  Provisioning	
  System	
  
•  Generate	
  constraints	
  automaNcally	
  (without	
  user	
  input)	
  
•  Volume	
  (elephant	
  flow?)	
  
•  True	
  deadline	
  if	
  applicable	
  
•  End-­‐host	
  resource	
  availability	
  
•  Burst	
  rate	
  (fixed	
  bandwidth,	
  variable	
  bandwidth)	
  
•  Update	
  constraints	
  according	
  to	
  feedback	
  and	
  monitoring	
  
•  Minimize	
  operaNonal	
  cost	
  
•  AlternaNve	
  to	
  manual	
  traffic	
  engineering	
  
	
  
What	
  is	
  the	
  incenNve	
  to	
  make	
  correct	
  reservaNons?	
  
	
  
	
  
54	
  
Data	
  Center	
  1	
  
Data	
  Center	
  2	
  
Data	
  node	
  B	
  
	
  (web	
  access)	
  
Experimental	
  
	
  facility	
  A	
  
*	
  (1)	
  Experimental	
  facility	
  A	
  generates	
  30T	
  of	
  data	
  every	
  day,	
  and	
  it	
  needs	
  to	
  be	
  stored	
  in	
  
data	
  center	
  2,	
  before	
  the	
  next	
  run,	
  since	
  local	
  disk	
  space	
  is	
  limited	
  
*	
  (2)	
  There	
  is	
  a	
  reservaNon	
  made	
  between	
  data	
  center	
  1	
  and	
  2.	
  It	
  is	
  used	
  to	
  replicate	
  
data	
  files,	
  1P	
  total	
  size,	
  when	
  new	
  data	
  is	
  available	
  in	
  data	
  center	
  2	
  
*	
  (3)	
  New	
  results	
  are	
  published	
  at	
  data	
  node	
  B,	
  we	
  expect	
  high	
  traffic	
  to	
  download	
  
new	
  simulaNon	
  files	
  for	
  the	
  next	
  couple	
  of	
  months	
  
Wide-­‐area	
  
SDN	
  
55	
  
Example	
  
•  Experimental	
  facility	
  periodically	
  transfers	
  data	
  (i.e.	
  every	
  night)	
  
•  Data	
  replicaNon	
  happens	
  occasionally,	
  and	
  it	
  will	
  take	
  a	
  week	
  to	
  
move	
  1P	
  of	
  data.	
  If	
  could	
  get	
  delayed	
  couple	
  of	
  hours	
  with	
  no	
  harm	
  
•  Wide-­‐area	
  download	
  traffic	
  will	
  increase	
  gradually,	
  most	
  of	
  the	
  
traffic	
  will	
  be	
  during	
  the	
  day.	
  	
  
•  We	
  can	
  dynamically	
  increase	
  preference	
  for	
  download	
  traffic	
  in	
  the	
  
mornings,	
  give	
  high	
  priority	
  for	
  transferring	
  data	
  from	
  the	
  facility	
  at	
  night,	
  
and	
  use	
  rest	
  of	
  the	
  bandwidth	
  for	
  data	
  replicaNon	
  (and	
  allocate	
  some	
  
bandwidth	
  to	
  confirm	
  that	
  it	
  would	
  finish	
  within	
  a	
  week	
  as	
  usual)	
  
56	
  
Virtual	
  Circuit	
  
ReservaNon	
  Engine	
  
Autonomic	
  provisioning	
  
system	
  
monitoring	
  
Reserva>on	
  Engine	
  
–  Select	
  opNmal	
  path/Nme/bandwidth	
  
–  maximize	
  the	
  number	
  of	
  admi3ed	
  requests	
  
–  	
  increase	
  overall	
  system	
  uNlizaNon	
  and	
  network	
  efficiency	
  
–  Dynamically	
  update	
  the	
  selected	
  rouNng	
  path	
  for	
  network	
  efficiency	
  
–  Modify	
  exisNng	
  reservaNons	
  dynamically	
  to	
  open	
  space/Nme	
  for	
  new	
  
requests	
  
57	
  
THANK	
  YOU	
  
	
  
Any	
  QuesNon/Comment?	
  	
  	
  	
  	
  
Mehmet	
  Balman	
  	
  	
  	
  	
  mehmet@balman.info	
  
	
  
h3p://balman.info	
  
	
  
58	
  
PetaShare	
  +	
  Stork	
  Data	
  Scheduler	
  
59	
  
AggregaNon	
  in	
  Data	
  Path:	
  
	
  
Advance	
  Buffer	
  Cache	
  in	
  Petafs	
  and	
  Petashell	
  clients	
  by	
  aggregaNng	
  
I/O	
  requests	
  to	
  minimize	
  the	
  number	
  of	
  network	
  messages	
  
Adaptive	
  Tuning	
  +	
  Advanced	
  Buffer	
  
60	
  
AdapNve	
  Tuning	
  for	
  
Bulk	
  Transfer	
  	
  	
  
Buffer	
  Cache	
  for	
  
Remote	
  I/O	
  

Weitere ähnliche Inhalte

Was ist angesagt?

MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...balmanme
 
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction SystemPACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction SystemJPINFOTECH JAYAPRAKASH
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0Heiko Loewe
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
Gfarm presentation and thesis topic introduction
Gfarm presentation and thesis topic introductionGfarm presentation and thesis topic introduction
Gfarm presentation and thesis topic introductionChawanat Nakasan
 
Plank
PlankPlank
PlankFNian
 
Achieving congestion diversity in multi hop wireless mesh networks
Achieving congestion diversity in multi hop wireless mesh networksAchieving congestion diversity in multi hop wireless mesh networks
Achieving congestion diversity in multi hop wireless mesh networksieeeprojectschennai
 
Improving Passive Packet Capture : Beyond Device Polling
Improving Passive Packet Capture : Beyond Device PollingImproving Passive Packet Capture : Beyond Device Polling
Improving Passive Packet Capture : Beyond Device PollingHargyo T. Nugroho
 
Data center architectures and their relevance in cloud processes
Data center architectures and their relevance in cloud processesData center architectures and their relevance in cloud processes
Data center architectures and their relevance in cloud processesClaudio Fiandrino
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersRan Ziv
 
MRemu: An Emulation-based Framework for Datacenter Network Experimentation us...
MRemu: An Emulation-based Framework for Datacenter Network Experimentation us...MRemu: An Emulation-based Framework for Datacenter Network Experimentation us...
MRemu: An Emulation-based Framework for Datacenter Network Experimentation us...Marcelo Veiga Neves
 
Reference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network DesignReference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network DesignDataWorks Summit
 
Cyber Analytics Applications for Data-Intensive Computing
Cyber Analytics Applications for Data-Intensive ComputingCyber Analytics Applications for Data-Intensive Computing
Cyber Analytics Applications for Data-Intensive ComputingMike Fisk
 
Provisioning Janet
Provisioning JanetProvisioning Janet
Provisioning JanetJisc
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB ProjectRakuten Group, Inc.
 
Benchmarking Personal Cloud Storage
Benchmarking Personal Cloud StorageBenchmarking Personal Cloud Storage
Benchmarking Personal Cloud StorageSpyros Eleftheriadis
 

Was ist angesagt? (20)

MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
 
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction SystemPACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Gfarm presentation and thesis topic introduction
Gfarm presentation and thesis topic introductionGfarm presentation and thesis topic introduction
Gfarm presentation and thesis topic introduction
 
UDT.pptx
UDT.pptxUDT.pptx
UDT.pptx
 
Plank
PlankPlank
Plank
 
Achieving congestion diversity in multi hop wireless mesh networks
Achieving congestion diversity in multi hop wireless mesh networksAchieving congestion diversity in multi hop wireless mesh networks
Achieving congestion diversity in multi hop wireless mesh networks
 
Improving Passive Packet Capture : Beyond Device Polling
Improving Passive Packet Capture : Beyond Device PollingImproving Passive Packet Capture : Beyond Device Polling
Improving Passive Packet Capture : Beyond Device Polling
 
Data center architectures and their relevance in cloud processes
Data center architectures and their relevance in cloud processesData center architectures and their relevance in cloud processes
Data center architectures and their relevance in cloud processes
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
MRemu: An Emulation-based Framework for Datacenter Network Experimentation us...
MRemu: An Emulation-based Framework for Datacenter Network Experimentation us...MRemu: An Emulation-based Framework for Datacenter Network Experimentation us...
MRemu: An Emulation-based Framework for Datacenter Network Experimentation us...
 
SmartFlowwhitepaper
SmartFlowwhitepaperSmartFlowwhitepaper
SmartFlowwhitepaper
 
Reference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network DesignReference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network Design
 
Cyber Analytics Applications for Data-Intensive Computing
Cyber Analytics Applications for Data-Intensive ComputingCyber Analytics Applications for Data-Intensive Computing
Cyber Analytics Applications for Data-Intensive Computing
 
Provisioning Janet
Provisioning JanetProvisioning Janet
Provisioning Janet
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
 
Benchmarking Personal Cloud Storage
Benchmarking Personal Cloud StorageBenchmarking Personal Cloud Storage
Benchmarking Personal Cloud Storage
 

Ähnlich wie Network-aware Data Management for High Throughput Flows Akamai, Cambridge, MA 2015

Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networksbalmanme
 
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkThe Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkOpen Networking Summits
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Cloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxCloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxRahulBhole12
 
Future services on Janet
Future services on JanetFuture services on Janet
Future services on JanetJisc
 
Impact of Grid Computing on Network Operators and HW Vendors
Impact of Grid Computing on Network Operators and HW VendorsImpact of Grid Computing on Network Operators and HW Vendors
Impact of Grid Computing on Network Operators and HW VendorsTal Lavian Ph.D.
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
 
Traffic Optimization in Multi-Layered WANs using SDN
Traffic Optimization in Multi-Layered WANs using SDN Traffic Optimization in Multi-Layered WANs using SDN
Traffic Optimization in Multi-Layered WANs using SDN Infinera
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCDNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCGrid Protection Alliance
 
Simulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islandsSimulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islandsAPNIC
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioHostedbyConfluent
 
Transport SDN Overview and Standards Update: Industry Perspectives
Transport SDN Overview and Standards Update: Industry PerspectivesTransport SDN Overview and Standards Update: Industry Perspectives
Transport SDN Overview and Standards Update: Industry PerspectivesInfinera
 
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...Tal Lavian Ph.D.
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HAtcp cloud
 

Ähnlich wie Network-aware Data Management for High Throughput Flows Akamai, Cambridge, MA 2015 (20)

Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networks
 
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkThe Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Cloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxCloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptx
 
Future services on Janet
Future services on JanetFuture services on Janet
Future services on Janet
 
Impact of Grid Computing on Network Operators and HW Vendors
Impact of Grid Computing on Network Operators and HW VendorsImpact of Grid Computing on Network Operators and HW Vendors
Impact of Grid Computing on Network Operators and HW Vendors
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Traffic Optimization in Multi-Layered WANs using SDN
Traffic Optimization in Multi-Layered WANs using SDN Traffic Optimization in Multi-Layered WANs using SDN
Traffic Optimization in Multi-Layered WANs using SDN
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCDNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
 
Network cost services
Network cost servicesNetwork cost services
Network cost services
 
Simulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islandsSimulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islands
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
 
Transport SDN Overview and Standards Update: Industry Perspectives
Transport SDN Overview and Standards Update: Industry PerspectivesTransport SDN Overview and Standards Update: Industry Perspectives
Transport SDN Overview and Standards Update: Industry Perspectives
 
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 

Mehr von balmanme

Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1balmanme
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09balmanme
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...balmanme
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...balmanme
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balmanbalmanme
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010balmanme
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterbalmanme
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerbalmanme
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarbalmanme
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarbalmanme
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentationbalmanme
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12balmanme
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation balmanme
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011balmanme
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11balmanme
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100gbalmanme
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2balmanme
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAbalmanme
 

Mehr von balmanme (20)

Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-poster
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summer
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balman
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentation
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100g
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
 

Kürzlich hochgeladen

Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilitiesalihassaah1994
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024Jan Löffler
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSlesteraporado16
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpressssuser166378
 
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdfShreedeep Rayamajhi
 
Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteMavein
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfmchristianalwyn
 
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...APNIC
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Shubham Pant
 
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsRoxana Stingu
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxnaveenithkrishnan
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSedrianrheine
 

Kürzlich hochgeladen (12)

Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilities
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpress
 
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
 
Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a Website
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
 
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024
 
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptx
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
 

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, MA 2015

  • 1. Network-­‐aware  Data   Management  for  Large-­‐scale   Distributed  Applications   Sept  28,  2015   Mehmet  Balman   h3p://balman.info     Senior  Performance  Engineer  at  VMware  Inc.     Guest/Affiliate  at  Berkeley  Lab   1  
  • 2. About  me:   Ø 2013:  Performance,  OCTO,  VMware,  Palo  Alto,  CA   Ø 2009:  ComputaNonal  Research  Division  (CRD)  at  Lawrence  Berkeley   NaNonal  Laboratory  (LBNL)   Ø 2005:  Center  for  ComputaNon  &  Technology  (CCT),  Baton  Rouge,  LA   v Computer  Science,  Louisiana  State  University  (2010,2008)   v Bogazici  University,  Istanbul,  Turkey  (2006,2000)     Data  Transfer  Scheduling  with  Advance  ReservaNon  and  Provisioning,  Ph.D.                     Failure-­‐Awareness  and  Dynamic  AdaptaNon  in  Data  Scheduling,  M.S.   Parallel  Tetrahedral  Mesh  Refinement,  M.S.   2  
  • 3. Why  Network-­‐aware?   Networking  is  one  of  the  major  components  in  many  of  the   soluNons  today   •  Distributed  data  and  compute  resources   •  CollaboraNon:  data  to  be  shared  between  remote  sites   •  Data  centers  are  complex  network  infrastructures     ü What  further  steps  are  necessary  to  take  full  advantage  of  future   networking  infrastructure?   ü How  are  we  going  to  deal  with  performance  problems?     ü How  can  we  enhance  data  management  services  and  make  them   network-­‐aware?     New  collabora>ons  between  data  management  and   networking  communi>es.   3  
  • 4. Two  major  players:   • AbstracNon  and  Programmability   •  Rapid  Development,  Intelligent  services   •  OrchestraNng  compute,  storage,  and  network  resources   together   •  IntegraNon  and  deployment  of  complex  sytems   •  Performance  Gap:   •  LimitaNon  in  current  system  so3ware  vs  foreseen    speed:   •  Hardware  is  fast,  Soaware  is  slow     •  Latency  vs  throughput  mismatch  will  lead  to  new   innovaBons   4  
  • 5. Outline     •  VSAN  +  VVOL  Storage  Performance    in  Virtualized   Environments     •  Data  Streaming  in  High-­‐bandwidth  Networks   •  Climate100:  Advance  Network  IniNaNve  and  100Gbps  Demo   •  MemzNet:  Memory-­‐Mapped  Network  Zero-­‐copy  Channels     •  Core  Affinity  and  End  System  Tuning  in  High-­‐Throughput   Flows   •  Network  Reserva>on  and  Online  Scheduling  (QoS)   •  FlexRes:  A  Flexible  Network  ReservaNon  Algorithm   •  SchedSim:  Online  Scheduling  with  Advance  Provisioning       5  
  • 6. VSAN:  virtual  SAN   6   VSAN  image:  blog.vmware.com   Distributed  Object   Storage     Hybrid  (SSD+HDD)  
  • 7. VSAN  performance  work  in  a  nutshell   7   Observer  image:  blog.vmware.com   •  Every  write  operaNon  needs  to  go  over  network  (and   network  is  not  free)   •  Each  layer  (cache,  disk,  object  management,  etc.)  needs   resources  (CPU,  memory)   •  Resource  limitaNons  vs  Latency  effect   •  Needs  to  support  thousands  of  VMs   Placement  of  Objects:   •  Which  Host?   •  Which  Disk/SSD  in  the   Host?   What  if  there  are   failures,  migraNons,     and  if  we  need  to   rebalance  
  • 8. 8   VVOL:   virtual  volumes   VVOL  image:  blog.vmware.com   Offloading     control  operaNons  to   the  storage  array       •  powerOn   •  powerOff   •  Delete   •  clone  
  • 9. VVOL  performance  work   • Effect  of  the  latency    in  control  path     •         linked  clone  vs  VVOL  clones             9   Vsphere   Storage     Host   VASA  VP   Data  path   Control  path     •  Op>mize  service  latencies     •  Batching  (disklib)   •  Use  concurrent  opera>ons    
  • 10. Internet  Modeling   •  My  first  real  paper    was  on  Internet  Topology   •  CollecNng  data  from  Traceroute  Gateways   •  Analyzing:   •  Outdegree   •  Indegree   •  Diameter   •  Reachable  Set   10  
  • 11. Outline     •  VSAN  +  VVOL  Storage  Performance    in  Virtualized   Environments   •  Data  Streaming  in  High-­‐bandwidth  Networks   •  Climate100:  Advance  Network  IniNaNve  and  100Gbps  Demo   •  MemzNet:  Memory-­‐Mapped  Network  Zero-­‐copy  Channels     •  Core  Affinity  and  End  System  Tuning  in  High-­‐Throughput   Flows   •  Network  Reserva>on  and  Online  Scheduling  (QoS)   •  FlexRes:  A  Flexible  Network  ReservaNon  Algorithm   •  SchedSim:  Online  Scheduling  with  Advance  Provisioning       11  
  • 12. 100Gbps  networking  has  Jinally  arrived!   Applica>ons’  Perspec>ve   Increasing   the   bandwidth   is   not   sufficient   by   itself;   we   need   careful   evaluaNon   of   high-­‐bandwidth   networks   from   the   applicaNons’  perspecNve.       1Gbps  to  10Gbps  transiNon     (10  years  ago)   ApplicaNon  did  not  run  10  Nmes   faster  because  there  was  more   bandwidth  available   12  
  • 13. ANI   100Gbps   Demo   •  100Gbps  demo  by  ESnet  and   Internet2       •  ApplicaNon  design  issues  and  host   tuning  strategies  to  scale  to  100Gbps   rates     •  VisualizaNon  of  remotely  located  data   (Cosmology)     •  Data  movement  of  large    datasets  with   many  files  (Climate  analysis)     13  
  • 14. Earth  System  Grid  Federation  (ESGF)   14   •  Over  2,700  sites   •  25,000  users     •  IPCC  Fiah  Assessment  Report  (AR5)  2PB     •  IPCC  Forth  Assessment  Report  (AR4)  35TB   •  Remote    Data  Analysis   •  Bulk  Data  Movement  
  • 15. Application’s   Perspective:     Climate  Data  Analysis   15  
  • 16.   lots-­‐of-­‐small-­‐*iles  problem!   *ile-­‐centric  tools?     FTP RPC request a file request a file send file send file request data send data •  Keep  the  network  pipe  full   •  We  want  out-­‐of-­‐order  and  asynchronous  send  receive       16  
  • 17. Many  Concurrent  Streams   (a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).       17  
  • 18. ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min intervals], TCP buffer size is 50M Effects  of  many  concurrent  streams   18  
  • 19. Analysis  of    Core  AfJinities    (NUMA  Effect)   19  Nathan  Hanford  et  al.    NDM’13   Sandy  Bridge  Architecture   Receive  process    
  • 20. 20   Analysis  of    Core  AfJinities    (NUMA  Effect)   Nathan  Hanford  et  al.   NDM’14  
  • 21.  100Gbps  demo  environment   RRT:    Sea3le  –  NERSC    16ms                      NERSC  –  ANL              50ms                      NERSC  –  ORNL        64ms   21  
  • 22. Framework  for  the  Memory-­‐mapped   Network  Channel   +  SynchronizaNon  mechanism  for  RoCE   -­‐  Keep  the  pipe  full  for  remote  analysis   22  
  • 23. Moving  climate  *iles  ef*iciently   23  
  • 24. Advantages   •  Decoupling  I/O  and  network  operaNons   •  front-­‐end  (I/O    processing)   •  back-­‐end  (networking  layer)     •  Not  limited  by  the  characterisNcs  of  the  file  sizes   •  On  the  fly  tar  approach,    bundling  and  sending    many  files   together   •  Dynamic  data  channel  management     Can   increase/decrease   the   parallelism   level   both     in   the   network   communicaNon   and   I/O   read/write   operaNons,   without   closing   and   reopening   the   data   channel   connecNon   (as   is   done   in   regular   FTP   variants).     MemzNet  is    is  not  file-­‐centric.  Bookkeeping  informaNon  is  embedded   inside  each  block.       24  
  • 25. MemzNet’s  Architecture  for  data   streaming   25  
  • 26. 100Gbps  Demo   •  CMIP3  data  (35TB)  from  the  GPFS  filesystem  at  NERSC   •  Block  size  4MB   •  Each  block’s  data  secNon  was  aligned  according  to  the   system  pagesize.     •  1GB  cache  both  at  the  client  and  the  server     •  At  NERSC,  8  front-­‐end  threads  on  each  host  for  reading  data  files   in  parallel.   •   At  ANL/ORNL,  4  front-­‐end  threads  for  processing  received  data   blocks.   •   4  parallel  TCP  streams  (four  back-­‐end  threads)  were  used  for   each  host-­‐to-­‐host  connecNon.     26  
  • 27. MemzNet’s  Performance     TCP  buffer  size  is  set  to  50MB     MemzNetGridFTP 100Gbps demo ANI Testbed 27  
  • 28. Challenge?   •  High-­‐bandwidth  brings  new  challenges!   •  We  need  substanNal  amount  of  processing  power  and  involvement  of   mulNple  cores  to  fill  a  40Gbps  or  100Gbps  network     •  Fine-­‐tuning,  both  in  network  and  applicaNon  layers,  to  take   advantage  of  the  higher  network  capacity.     •  Incremental  improvement  in  current  tools?   •  We  cannot  expect  every  applicaNon  to  tune  and  improve  every  Nme  we   change  the  link  technology  or  speed.       28  
  • 29. MemzNet   •  MemzNet:  Memory-­‐mapped  Network  Channel     •  High-­‐performance  data  movement     MemzNet  is  an  iniNal  effort  to  put  a  new  layer   between  the  applicaNon  and  the  transport  layer.   •  Main  goal  is  to  define  a  network  channel  so  applicaNons  can   directly  use  it  without  the  burden  of  managing/tuning  the  network   communicaNon.     29   Tech  report:  LBNL-­‐6177E  
  • 30. MemzNet  =  New  Execution  Model   •  Luigi  Rizzo  ’s  netmap     •  proposes  a  new  API  to  send/receive  data  over  the   network   • RDMA  programming  model   •  MemzNet  as  a  memory-­‐management  component   • IX:  Data  Plane  OS  (Adam  Baley  et  al.  @standford  –   similar  to  MemzNet’s  model)   •  mTCP  (even  based  /  replaces  send/receive  in  user  level)   •  Tanenbaum  et  al.    Minimizing  context  switches:   proposing  to  use  MONITOR/MWAIT  for   synchronizaNon   30  
  • 31. Outline     •  VSAN  +  VVOL  Storage  Performance    in  Virtualized   Environments     •  Data  Streaming  in  High-­‐bandwidth  Networks   •  Climate100:  Advance  Network  IniNaNve  and  100Gbps  Demo   •  MemzNet:  Memory-­‐Mapped  Network  Zero-­‐copy  Channels     •  Core  Affinity  and  End  System  Tuning  in  High-­‐Throughput   Flows   •  Network  Reserva>on  and  Online  Scheduling  (QoS)   •  FlexRes:  A  Flexible  Network  ReservaNon  Algorithm   •  SchedSim:  Online  Scheduling  with  Advance  Provisioning       31  
  • 32. Problem  Domain:  Esnet’s  OSCARS   32   ASIA-PACIFIC (ASGC/Kreonet2/ TWAREN) ASIA-PACIFIC (KAREN/KREONET2/ NUS-GP/ODN/ REANNZ/SINET/ TRANSPAC/TWAREN) AUSTRALIA (AARnet) LATIN AMERICA CLARA/CUDI CANADA (CANARIE) RUSSIA AND CHINA (GLORIAD) US R&E (DREN/Internet2/NLR) US R&E (DREN/Internet2/ NASA) US R&E (NASA/NISN/ USDOI) ASIA-PACIFIC (BNP/HEPNET) ASIA-PACIFIC (ASCC/KAREN/ KREONET2/NUS-GP/ ODN/REANNZ/ SINET/TRANSPAC) AUSTRALIA (AARnet) US R&E (DREN/Internet2/ NISN/NLR) US R&E (Internet2/ NLR) CERN US R&E (DREN/Internet2/ NISN) CANADA (CANARIE) LHCONE CANADA (CANARIE) FRANCE (OpenTransit) RUSSIA AND CHINA (GLORIAD) CERN (USLHCNet) ASIA-PACIFIC (SINET) EUROPE (GÉANT/ NORDUNET) EUROPE (GÉANT) LATIN AMERICA (AMPATH/CLARA) LATIN AMERICA (CLARA/CUDI) HOUSTON ALBUQUERQUE El PASO SUNNYVALE BOISE SEATTLE KANSAS CITY NASHVILLE WASHINGTON DC NEW YORK BOSTON CHICAGO DENVER SACRAMENTO ATLANTA PNNL SLAC AMES PPPL BNL ORNL JLAB FNAL ANL LBNL •  ConnecNng  experimental  faciliNes  and  supercompuNng  centers   •  On-­‐Demand  Secure  Circuits  and  Advance  ReservaNon  System     •  Guaranteed  between  collaboraNng  insNtuNons  by  delivering   network-­‐as-­‐a-­‐service     •  Co-­‐allocaNon  of  storage  and  network  resources         (SRM:  Storage  Resource  Manager)     OSCARS  provides  yes/no   answers  to  a  reservaNon   request  for  (bandwidth,   start_Bme,  end_Bme)   End-­‐to-­‐end  ReservaNon:    Storage+Network    
  • 33. Reservation  Request   •  Between  edge  routers     Need  to  ensure  availability  of  the  requested  bandwidth  from  source  to   desBnaBon  for  the  requested  Bme  interval     v   R={  nsource,  ndesBnaBon,  Mbandwidth,  tstart,  tend}.   v  source/desNnaNon  end-­‐points   v  Requested  bandwidth   v  start/end  Nmes     Commi3ed  reservaNons  between  tstart  and  tend  are  examined       The  shortest  path  from  source  to  desNnaNon  is  calculated  based  on  the   engineering  metric  on  each  link,  and  a  bandwidth  guaranteed  path  is  set   up  to  commit  and  eventually  complete  the  reservaNon  request  for  the   given  Nme  period   33  
  • 34. Reservation   34   v  Components (Graph): v node (router), port, link (connecting two ports) v engineering metric (~latency) v maximum bandwidth (capacity) v  Reservation: v source, destination, path, time v (time t1, t3) A -> B -> D (900Mbps) v (time t2, t3) A -> C -> D (400Mbps) v (time t4, t5) A -> B -> D (800Mpbs) A   C  B   D   800Mbps   900Mbps   500Mbps   1000Mbps   300Mbps   ReservaNon  1   ReservaNon  2   ReservaNon  3   t1   t2   t3   t4   t5  
  • 35. Example   (Nme  t1,  t2)  :     A  to  D  (600Mbps)  NO     A  to  D  (500Mbps)  YES           A   C  B   D   0  Mbps  /  900Mbps  (900Mbps)   100  Mbps  /  900Mbps  (1000Mbps)   800  Mbps  /  0Mbps  (800Mbps)   500  Mbps  /  0Mbps  (500Mbps)   300  Mbps  /    0  Mbps  (300Mbps)   AcNve  reservaNon   reservaNon  1:  (Nme  t1,  t3)    A  -­‐>  B  -­‐>  D    (900Mbps)   reservaNon  2:  (Nme  t1,  t3)    A  -­‐>  C  -­‐>  D    (400Mbps)   reservaNon  3:  (Nme  t4,  t5)    A  -­‐>  B  -­‐>  D    (800Mpbs)   available/  reserved   (capacity)     35  
  • 36. Example   A   C  B   D   0  Mbps  /  900Mbps  (900Mbps)   100  Mbps  /  900Mbps  (1000Mbps)   400  Mbps  /  400Mbps  (800Mbps)   100  Mbps  /  400Mbps  (500Mbps)   300  Mbps  /    0  Mbps  (300Mbps)   (Nme  t1,  t3)  :     A  to  D  (500Mbps)  NO       A  to  C  (500Mbps)  No   (not  max-­‐FLOW!)           AcNve  reservaNon   reservaNon  1:  (Nme  t1,  t3)    A  -­‐>  B  -­‐>  D    (900Mbps)   reservaNon  2:  (Nme  t1,  t3)    A  -­‐>  C  -­‐>  D    (400Mbps)   reservaNon  3:  (Nme  t4,  t5)    A  -­‐>  B  -­‐>  D    (800Mpbs)   available/  reserved   (capacity)     36  
  • 37. Alternative  Approach:  Flexible  Reservations   •  IF  the  requested  bandwidth  can  not  be  guaranteed:   •  Try-­‐and-­‐error  unNl  get  an  available  reservaNon   •  Client  is  not  given  other  possible  opNons   •  How  can  we  enhance  the  OSCARS  reservaNon  system?   •  Be  Flexible:   •  Submit  constraints  and  the  system  suggests  possible  reservaNon  opNons   saNsfying  given  requirements   37    Rs '={  nsource  ,  ndesBnaBon,  MMAXbandwidth,  DdataSize,  tEarliestStart,  tLatestEnd}     ReservaNon  engine  finds  out  the  reservaNon            R={  nsource,  ndesBnaBon,  Mbandwidth,  tstart,  tend}     for  the  earliest  compleNon  or  for  the  shortest  duraNon     where  Mbandwidth≤  MMAXbandwidth  and  tEarliestStart  ≤  tstart  <  tend≤  tLatestEnd  .  
  • 38. Bandwidth  Allocation  (time-­‐dependent)               Modified  Dijstra's   algorithms  (max  available   bandwidth):     •  BoUleneck  constraint     (not  addiNve)   •  QoS  constraint  is  addiNve   in  shortest  path,  etc)   38  The  maximum  bandwidth  available  for  allocaNon  from  a  source  node  to  a  desNnaNon   node   t1   t2   t3   t4   t5   t6  
  • 39. Analogous Example n  A vehicle travelling from city A to city B n  There are multiple cities between A and B connected with separate highways. n  Each highway has a specific speed limit –  (maximum bandwidth) n  But we need to reduce our speed if there is high traffic load on the road n  We know the load on each highway for every time period –  (active reservations) n  The first question is which path the vehicle should follow in order to reach city B from city A as early as possible (earliest completion) •  Or, we can delay our journey and start later if the total travel time would be reduced. Second question is to find the route along with the starting time for shortest travel duration (shortest duration) 39   Advance bandwidth reservation: we have to set the speed limit before starting and cannot change during the journey  
  • 40. Time steps n  Time steps between t1 and t13 Nme   t4  t2   t3  t1   t5   t6   t7   t8   t9   t10   t11   t12   t13   ReservaNon  1   ReservaNon  2   ReservaNon  3   Res  1   Res  1,2   Res   2   Res  3   t4  t1   t6   t7   t9   t12   t13   Nme   Nme  steps   Max (2r+1) time steps, where r is the number of reservations ts1   ts2   ts3   ts4   40  
  • 41. Static Graphs Res  1   Res  1,2   Res  2   t4  t1   t6   t7   t9   A   C  B   D   0  Mbps   100  Mbps   800  Mbps   500  Mbps   300  Mbps)   A   C  B   D   0  Mbps   100  Mbps   400  Mbps   100  Mbps   300  Mbps)   A   C  B   D   900  Mbps   1000  Mbps   400  Mbps   100  Mbps   300  Mbps)   A   C  B   D   900  Mbps   1000  Mbps   800  Mbps   500  Mbps   300  Mbps)   t4   t6   t7   G(ts3)   G(ts4)  G(ts2)  G(ts1)   41  
  • 42. Time Windows Res  1,2   Res  2   t1   t6   t9   A   C  B   D   0  Mbps   100  Mbps   400  Mbps   100  Mbps   300  Mbps   A   C  B   D   900  Mbps   1000  Mbps   400  Mbps   100  Mbps   300  Mbps   t6   Max (s × (s + 1))/2 time windows, where s is the number of time steps G(tw)=G(ts3)  x  G(ts4)   tw=ts1+ts2   Bo3leneck  constraint   G(tw)=G(ts1)  x  G(ts2)   tw=ts3+ts4   42  
  • 43. Time  Window  List          (special  data  structures)   now   infinite   Time  windows  list   new  reservaNon:    reservaNon  1,  start  t1,  end  t10   now   t1   t10   infinite   Res  1   new  reservaNon:    reservaNon  2,  start  t12,  end  t20   now   t1   t10   t12   Res  1   t20   infinite   Res  2   43   Careful  soaware  design  makes  implementaNon  fast  and  efficient  
  • 44. Performance max-bandwidth path ~ O(n^2 ) n is the number of nodes in the topology graph In the worst-case, we may require to search all time windows, (s × (s + 1))/2, where s is the number of time steps. If there are r committed reservations in the search period, there can be a maximum of 2r + 1 different time steps in the worst-case. Overall, the worst-case complexity is bounded by O(r^2 n^2 ) Note: r is relatively very small compared to the number of nodes n 44  
  • 45. Example Reservation 1: (time t1, t6) A -> B -> D (900Mbps) Reservation 2: (time t4, t7) A -> C -> D (400Mbps) Reservation 3: (time t9, t12) A -> B -> D (700Mpbs) A   C  B   D   800Mbps   900Mbps   500Mbps   1000Mbps   300Mbps   t4  t2   t3  t1   t5   t6   t7   t8   t9   t10   t11   t12   t13   ReservaNon  1   ReservaNon  2   ReservaNon  3   from A to D (earliest completion) max bandwidth = 200Mbps, volume = 200Mbps x 4 time slots earliest start = t1, latest finish t13 45  
  • 46. Search Order - Time Windows Res  1   Res  1,2   Res   2   Res  3   t4  t1   t6   t7   t9   t12   t13   Nme   windows   Res  1   Res  1,  2   Res  1,  2   2   Res  1,2     Res  1,  2   Res  2   Res  1,  2   Res  1,  2   t1-­‐-­‐t6   t4—t6   t1-­‐-­‐t4   t6—t7   t4—t7   t1—t7   t7—t9   t6—t9   t4—t9   t1—t9   Max  bandwidth  from  A  to  D   1.  900Mbps    (3)   2.  100Mbps    (2)   3.  100Mbps    (5)   4.  900Mbps    (1)   5.  100Mbps    (3)   6.  100Mbps    (6)   7.  900Mpbs    (2)   8.  900Mbps    (3)   9.  100Mbps    (5)   10.  100Mbps    (8)   ReservaNon:  (  A  to  D  )  (100Mbps)  start=t1    end=t9   46  
  • 47. Search Order - Time Windows Shortest  dura>on?     Res  1   Res  1,2   Res   2   Res  3   t4  t1   t6   t7   t9   t12   t13   Nme   windows   Res  3   Res  3  t9—t13   t12—t12   t9—t12   Max  bandwidth  from  A  to  D   1.  200Mbps    (3)   2.  900Mbps    (1)   3.  200Mbps    (4)      ReservaNon:  (A  to  D  )  (200Mbps)  start=t9  end=t13         Ø from  A  to  D,  max  bandwidth  =  200Mbps          volume  =  175Mbps  x  4  Nme  slots            earliest  start  =  t1,  latest  finish  t13        earliest  compleNon:    (  A  to  D  )  (100Mbps)  start=t1    end=t8      shortest  duraNon:          (  A  to  D  )  (200Mbps)  start=t9    end=t12.5     47  
  • 48. Source  >  Network  >  Destination     A CB D 800Mbps   900Mbps   500Mbps   1000Mbps   300Mbps   n2   n1   Now  we  have     mulNple  requests   48  
  • 49. With  start/end  times   •   Each  transfer  request  has  start  and  end  Nmes   •  n  transfer  requests  are  given  (each  request  has  a  specific  amount  of   profit)   •  ObjecNve  is  to  maximize  the  profit   •  If  profit  is  same  for  each  request,  then  objecNve  is  to   maximize  the  number  of  jobs  in  a  give  Nme  period     •  Unspli3able  Flow  Problem:   •  An  undirected  graph,     •  route  demand  from  source(s)  to  desNnaNons(s)  and  maximize/minimize   the  total  profit/cost     49    The  online  scheduling  method  here  is  inspired  from  Gale-­‐Shapley  algorithm  (also   known  as  stable  marriage  problem)  
  • 50. Methodology   •  Displace  other  jobs  to  open  space  for  the  new  request   •   we  can  shia  max  n  jobs?   •  Never  accept  a  job  if  it  causes  other  commi3ed  jobs  to  break  their   criteria   •  Planning  ahead  (gives  opportunity  for  co-­‐allocaNon)   •  Gives  a  polynomial  approximaNon  algorithm   •  The  preference  converts  the  UFP  problem  into  Dijkstra  path   search   •  UNlizes  Nme  windows/Nme  steps  for  ranking  (be3er  than  earliest   deadline  first)   •  Earliest  compleNon  +  shortest  duraNon   •  Minimize  concurrency     •  Even  random  ranking  would  work  (relaxaNon  in  an  NP-­‐hard  problem   50  
  • 51.         51  
  • 52. Recall  Time  Windows   Res  1   Res  1,2   Res   2   Res  3   t4  t1   t6   t7   t9   t12   t13   Nme   windows   Res  1   Res  1,  2   Res  1,  2   2   Res  1,2     Res  1,  2   Res  2   Res  1,  2   Res  1,  2   t1-­‐-­‐t6   t4—t6   t1-­‐-­‐t4   t6—t7   t4—t7   t1—t7   t7—t9   t6—t9   t4—t9   t1—t9   Max  bandwidth  from  A  to  D   1.  900Mbps    (3)   2.  100Mbps    (2)   3.  100Mbps    (5)   4.  900Mbps    (1)   5.  100Mbps    (3)   6.  100Mbps    (6)   7.  900Mpbs    (2)   8.  900Mbps      (3)   9.  100Mbps    (5)   10.  100Mbps    (8)   ReservaNon:  (  A  to  D  )  (100Mbps)  start=t1    end=t9   52  
  • 53. Test     53   In  real  life,  number  of   nodes  and  number  of   reservaNon  in  a  given   search  interval  are   limited   See  AINA’13  paper  for  results    +  comparison  with  different  preference  metrics  
  • 54. Autonomic  Provisioning  System   •  Generate  constraints  automaNcally  (without  user  input)   •  Volume  (elephant  flow?)   •  True  deadline  if  applicable   •  End-­‐host  resource  availability   •  Burst  rate  (fixed  bandwidth,  variable  bandwidth)   •  Update  constraints  according  to  feedback  and  monitoring   •  Minimize  operaNonal  cost   •  AlternaNve  to  manual  traffic  engineering     What  is  the  incenNve  to  make  correct  reservaNons?       54  
  • 55. Data  Center  1   Data  Center  2   Data  node  B    (web  access)   Experimental    facility  A   *  (1)  Experimental  facility  A  generates  30T  of  data  every  day,  and  it  needs  to  be  stored  in   data  center  2,  before  the  next  run,  since  local  disk  space  is  limited   *  (2)  There  is  a  reservaNon  made  between  data  center  1  and  2.  It  is  used  to  replicate   data  files,  1P  total  size,  when  new  data  is  available  in  data  center  2   *  (3)  New  results  are  published  at  data  node  B,  we  expect  high  traffic  to  download   new  simulaNon  files  for  the  next  couple  of  months   Wide-­‐area   SDN   55  
  • 56. Example   •  Experimental  facility  periodically  transfers  data  (i.e.  every  night)   •  Data  replicaNon  happens  occasionally,  and  it  will  take  a  week  to   move  1P  of  data.  If  could  get  delayed  couple  of  hours  with  no  harm   •  Wide-­‐area  download  traffic  will  increase  gradually,  most  of  the   traffic  will  be  during  the  day.     •  We  can  dynamically  increase  preference  for  download  traffic  in  the   mornings,  give  high  priority  for  transferring  data  from  the  facility  at  night,   and  use  rest  of  the  bandwidth  for  data  replicaNon  (and  allocate  some   bandwidth  to  confirm  that  it  would  finish  within  a  week  as  usual)   56  
  • 57. Virtual  Circuit   ReservaNon  Engine   Autonomic  provisioning   system   monitoring   Reserva>on  Engine   –  Select  opNmal  path/Nme/bandwidth   –  maximize  the  number  of  admi3ed  requests   –   increase  overall  system  uNlizaNon  and  network  efficiency   –  Dynamically  update  the  selected  rouNng  path  for  network  efficiency   –  Modify  exisNng  reservaNons  dynamically  to  open  space/Nme  for  new   requests   57  
  • 58. THANK  YOU     Any  QuesNon/Comment?           Mehmet  Balman          mehmet@balman.info     h3p://balman.info     58  
  • 59. PetaShare  +  Stork  Data  Scheduler   59   AggregaNon  in  Data  Path:     Advance  Buffer  Cache  in  Petafs  and  Petashell  clients  by  aggregaNng   I/O  requests  to  minimize  the  number  of  network  messages  
  • 60. Adaptive  Tuning  +  Advanced  Buffer   60   AdapNve  Tuning  for   Bulk  Transfer       Buffer  Cache  for   Remote  I/O