SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
SpeQuloS: A QoS Service for BoT Applications Using
Best Effort Distributed Computing Infrastructures
Simon Delamare 1
Gilles Fedak 2
Derrick Kondo 3
Oleg Lodygensky 4
1
LIP/CNRS, Univ. Lyon, France
2
LIP/INRIA, Univ. Lyon, France
3
LIG/INRIA, Univ. Grenoble, France
4
LAL/CNRS, Univ. Paris XI, France
High-Performance Parallel and Distributed Computing, 2012
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 1 / 18
Introduction
BE-DCI = “Best-Effort” Distributed Computing Infrastructure
→ Large computing power at low cost, Avoid wasting resources
→ No availability guarantee
Desktop Grids
→ BOINC projects: Peta FLOPS for free
Grids used in Best-Effort mode
→ ≈ 40% of utilization in Grid5000@Lyon
Cloud “Spot” Instances
→ c1.large instance price: 0.12$/h (spot) vs. 0.32$/h (regular)
Relevant for BoT execution ...
Bag of Tasks: Set of independent tasks to compute
→ but Low QoS level
Especially compared to regular infrastructures
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 2 / 18
Performance Problem Addressed
BoT completion rate increases at the end of execution
→ Tail Effect
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60 80 100
BoTcompletionratio
Time
Continuation is performed
at 90% of completion
Ideal Time Actual Completion Time
Tail Duration
Slowdown = (Tail Duration + Ideal Time) / Ideal Time
BoT completion
Tail part of the BoT
Measured by Slowdown:
S =
IdealCompletionTime
RealCompletionTime
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 3 / 18
Slowdown by Tail Effect
Slowdown reported on BoT execution
0
0.2
0.4
0.6
0.8
1
0.1 1 10 100
Fractionofexecutionwheretailslowdown<S
Tail Slowdown S (Completion time observed divided by ideal completion time)
BOINC
XWHEP
Best 50% ⇒ S < 1.3
25% to 33% ⇒ S > 2
Worst 5% ⇒ S> 4 to 10
Avg. % of BoT in tail Avg. % of time in tail
BE-DCI Trace BOINC XWHEP BOINC XWHEP
Desktop Grids 4.65 5.11 51.8 45.2
Best Effort Grids 3.74 6.40 27.4 16.5
Spot Instances 2.94 5.19 22.7 21.6
→ Caused by no more than the last 7% of
BoT
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 4 / 18
How to improve the situation ?
Better scheduling
QoS in Grid scheduling ([12], [20], [38])
→ Require heavy modification of middleware
→ No satisfactory solution for unreliable infrastructure ([7])
Addressing the tail effect
→ e.g. in MapReduce ([3], [39]), but require precise information from compute
nodes, hard in large DCIs.
Building Hybrid DCIs
Grid & Desktop Grid ([35],[36])
→ Mostly to offload Grid usage
Using Cloud computing ([10],[28],[37])
→ To address peak demands
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 5 / 18
SpeQuloS Service
→ Improving BE-DCIs users perceived QoS
Speeding up BoT execution
Bring information on expected BoT execution time
By dynamic provision of Cloud resources
→ Monitoring BoT execution
→ Execute the tail on Cloud
Features:
1 Our context: Existing BE-DCIs and Clouds, not administrator: Black Boxes
2 Interface with users: QoS requests, State of completion, Prediction on
remaining time
3 Careful utilization of Cloud resources w/ Billing & Accounting of usage
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 6 / 18
Framework
SpeQuloS modules:
Information: Collect QoS-related
information from DGs
Oracle: Strategies to appropriately
use Cloud resources / QoS
prediction for users
Scheduler: Start/Stop Cloud
resources, usage accounting
Credit System: Bill Cloud usage to
user, using “credits” to buy Cloud
resource cpu.h
Implementation
Independant modules using Python & MySQL
Supported Clouds: EC2, OpenNebula, etc.
Supported DG middleware: BOINC & XtremWeb-HEP
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 7 / 18
Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
How many Cloud resources to start (for a given amount of Credits) ?
Greedy: As much as possible, for 1 hour of cloud usage (G)
Conservative: To ensure that there will be enough credits to run Cloud up to
an estimated completion time (C)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
How many Cloud resources to start (for a given amount of Credits) ?
Greedy: As much as possible, for 1 hour of cloud usage (G)
Conservative: To ensure that there will be enough credits to run Cloud up to
an estimated completion time (C)
How to use Cloud resources ?
Flat: Cloud worker not differentiated from BE-DCI workers (F)
Reschedule : Scheduler reshedule tasks executed on BE-DCI to Cloud (R)
Cloud Duplication : Uncompleted tasks are duplicated to a dedicated Cloud
infrastructure (D)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
Experimentation Setup (1)
Simulations using real BE-DCI infrastructures availability traces, various BoT
workloads, BOINC and XWEP middleware
BE-DCIs availability traces :
Desktop Grids: seti, nd (SETI@Home & NotreDame traces from FTA)
Best Effort Grids: g5klyo, g5kgre (Available ressources in Grid5000 Lyon &
Grenoble clusters in December 2010)
Cloud Spot instances: spot10, spot100 (Maximum number of instances for a
renting cost of 10 or 100 $ per hour, fluctuates according to market price)
trace length mean deviation min max av. quartiles (s) unav. quartiles (s) avg. power power
(days) (nops/s) std. dev.
seti 120 24391 6793 15868 31092 61,531,5407 174,501,3078 1000 250
nd 413.87 180 4.129 77 501 952,3840,26562 640,960,1920 1000 250
g5klyo 31 90.573 105.4 6 226 21,51,63 191,236,480 3000 0
g5kgre 31 474.69 178.7 184 591 5,182,11268 23,547,6891 3000 0
spot10 90 82.186 3.814 29 87 4415,5432,17109 4162,5034,9976 3000 300
spot100 90 823.95 4.945 196 877 1063,5566,22490 383,1906,10274 3000 300
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 9 / 18
Experimentation Setup (2)
BoT workloads:
Size nops / task Arrival time
SMALL 1000 3600000 0
BIG 10000 60000 0
RANDOM norm(µ = 1000, σ2
= 200) norm(µ = 60000, σ2
= 10000) weib(λ = 91.98, k = 0.57)
Simulations methodology:
Reproducible executions wo & w/ SpeQuloS
SpeQuloS Credits provisioned w/ 10% of BoT workload (in Cloud resource
cpu.hour equivalent)
→ 25000 BoT execution traces
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 10 / 18
Strategies Comparison
Tail Removal Efficiency
→ Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-F
9A-G-F
V-G-F
9C-C-F
9A-C-F
V-C-F
Flat deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P)
9C-G-R
9A-G-R
V-G-R
9C-C-R
9A-C-R
V-C-R
Reschedule deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-D
9A-G-D
V-G-D
9C-C-D
9A-C-D
V-C-D
Cloud duplication
deployment strategy
Best strategies are able to
Suppress tail for 50% of execution
Half the tail for 80% of execution
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
Strategies Comparison
Tail Removal Efficiency
→ Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-F
9A-G-F
V-G-F
9C-C-F
9A-C-F
V-C-F
Flat deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P)
9C-G-R
9A-G-R
V-G-R
9C-C-R
9A-C-R
V-C-R
Reschedule deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-D
9A-G-D
V-G-D
9C-C-D
9A-C-D
V-C-D
Cloud duplication
deployment strategy
Best strategies are able to
Suppress tail for 50% of execution
Half the tail for 80% of execution
Flat (F) < Reschedule (R) & Cloud Duplication (D)
Tail Detection (V) triggers Cloud too late
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
Cloud Resources Consumption
Percentage of credits spent vs
credits provisioned (=10% of BoT
workload).
10% to 25% of what has been
provisioned are actually used by
Cloud resources
0
10
20
30
40
50
9C-G
-F
9C-G
-R
9C-G
-D9C-C-F
9C-C-R
9C-C-D9A
-G
-F
9A
-G
-R
9A
-G
-D9A
-C-F
9A
-C-R
9A
-C-DV
-G
-F
V
-G
-R
V
-G
-DV
-C-F
V
-C-R
V
-C-D
Percentageofcreditsused
Combination of SpeQuloS strategies
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
Cloud Resources Consumption
Percentage of credits spent vs
credits provisioned (=10% of BoT
workload).
10% to 25% of what has been
provisioned are actually used by
Cloud resources
0
10
20
30
40
50
9C-G
-F
9C-G
-R
9C-G
-D9C-C-F
9C-C-R
9C-C-D9A
-G
-F
9A
-G
-R
9A
-G
-D9A
-C-F
9A
-C-R
9A
-C-DV
-G
-F
V
-G
-R
V
-G
-DV
-C-F
V
-C-R
V
-C-D
Percentageofcreditsused
Combination of SpeQuloS strategies
→ ≈2.5% of BoT workload is executed on Cloud
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
Completion Time
Combination of strategies used: 9C-C-R
0
20000
40000
60000
80000
100000
120000
140000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & SMALL BoT
0
5000
10000
15000
20000
25000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & BIG BoT
0
10000
20000
30000
40000
50000
60000
70000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & RANDOM BoT
0
5000
10000
15000
20000
25000
30000
35000
40000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & SMALL BoT
0
1000
2000
3000
4000
5000
6000
7000
8000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10
SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & BIG BoT
1000
2000
3000
4000
5000
6000
7000
8000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10
SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & RANDOM BoT
→ Up to 9x speedup
→ Depend on middleware used, BE-DCI volatility
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 13 / 18
Completion Time Prediction
→ User can ask prediction at any moment of BoT execution
Predicted completion time:
tp = α ×
t(r)
r
Current completion ratio: r
Time elapsed since submission: t(r)
α: adjustment factor, depend on execution environment:
DG server & middlware
Application & BoT size
→ Adjusted after BoT execution to minimize difference w/ completion time
observed
Statistical uncertainty (±x%): Success rate of prediction vs previous execution
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 14 / 18
Prediction Results
Completion Time Predication:
Made at 50% of BoT execution
Uncertainty: ± 20%
α adjusted after 30 execution w/ same BD-DCI, middleware, BoT workload
BoT category & Middleware
SMALL BIG RANDOM
BE-DCI BOINC XWHEP BOINC XWHEP BOINC XWHEP Mixed
seti 100 100 100 82.8 100 87.0 94.1
nd 100 100 100 100 100 96.0 99.4
g5klyo 88.0 89.3 96.0 87.5 75 75 85.6
g5kgre 96.3 88.5 100 92.9 83.3 34.8 83.3
spot10 100 100 100 100 100 100 100
spot100 100 100 100 100 76 3.6 78.3
Mixed 97.6 96.1 99.2 93.5 89.6 65.3 90.2
→ Successful prediction in 9 cases out of 10
→ Lower results with heterogeneous BoT
→ Needs a learning phase, with same BoT (at least same app.), executed on
same BE-DCI.
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 15 / 18
SpeQuloS Deployment in European Desktop Grid Initiative
EDGI project: Bringing European Desktop Grids computing resources to scientific
communities.
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 16 / 18
Conclusion
BE-DCIs: “Low-cost” solution but poor QoS (tail effect)
SpeQuloS: Use Cloud resources to improve QoS delivered to BE-DCI users
Efficiently removes the tail problem
→ Speed up BoT execution
→ Only require few % of workload to be executed on Cloud
Enable completion time prediction for users
→ A step towards BE-DCIs usability in computing landscape ?
Future work:
Better strategies to anticipate problems (tail effect)
Analysis from users feedback in SpeQuloS deployments
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 17 / 18
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 18 / 18

Weitere ähnliche Inhalte

Was ist angesagt?

Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Ilham Amezzane
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...Edge AI and Vision Alliance
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDatainside-BigData.com
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...OW2
 
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceRiding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceLarry Smarr
 

Was ist angesagt? (6)

Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
 
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceRiding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
 

Andere mochten auch

Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data CenterGilles Fedak
 
Active Data PDSW'13
Active Data PDSW'13Active Data PDSW'13
Active Data PDSW'13Gilles Fedak
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Gilles Fedak
 
Mapreduce Runtime Environments: Design, Performance, Optimizations
Mapreduce Runtime Environments: Design, Performance, OptimizationsMapreduce Runtime Environments: Design, Performance, Optimizations
Mapreduce Runtime Environments: Design, Performance, OptimizationsGilles Fedak
 
The iEx.ec Distributed Cloud: Latest Developments and Perspectives
The iEx.ec Distributed Cloud: Latest Developments and PerspectivesThe iEx.ec Distributed Cloud: Latest Developments and Perspectives
The iEx.ec Distributed Cloud: Latest Developments and PerspectivesGilles Fedak
 
iExec: Blockchain-based Fully Distributed Cloud Computing
iExec: Blockchain-based Fully Distributed Cloud ComputingiExec: Blockchain-based Fully Distributed Cloud Computing
iExec: Blockchain-based Fully Distributed Cloud ComputingGilles Fedak
 
How Blockchain and Smart Buildings can Reshape the Internet
How Blockchain and Smart Buildings can Reshape the InternetHow Blockchain and Smart Buildings can Reshape the Internet
How Blockchain and Smart Buildings can Reshape the InternetGilles Fedak
 

Andere mochten auch (7)

Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
 
Active Data PDSW'13
Active Data PDSW'13Active Data PDSW'13
Active Data PDSW'13
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
 
Mapreduce Runtime Environments: Design, Performance, Optimizations
Mapreduce Runtime Environments: Design, Performance, OptimizationsMapreduce Runtime Environments: Design, Performance, Optimizations
Mapreduce Runtime Environments: Design, Performance, Optimizations
 
The iEx.ec Distributed Cloud: Latest Developments and Perspectives
The iEx.ec Distributed Cloud: Latest Developments and PerspectivesThe iEx.ec Distributed Cloud: Latest Developments and Perspectives
The iEx.ec Distributed Cloud: Latest Developments and Perspectives
 
iExec: Blockchain-based Fully Distributed Cloud Computing
iExec: Blockchain-based Fully Distributed Cloud ComputingiExec: Blockchain-based Fully Distributed Cloud Computing
iExec: Blockchain-based Fully Distributed Cloud Computing
 
How Blockchain and Smart Buildings can Reshape the Internet
How Blockchain and Smart Buildings can Reshape the InternetHow Blockchain and Smart Buildings can Reshape the Internet
How Blockchain and Smart Buildings can Reshape the Internet
 

Ähnlich wie SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures

Research portfolio
Research portfolio Research portfolio
Research portfolio Mehdi Bennis
 
DSD-INT 2019 Parallelization project for the USGS - Verkaik
DSD-INT 2019 Parallelization project for the USGS - VerkaikDSD-INT 2019 Parallelization project for the USGS - Verkaik
DSD-INT 2019 Parallelization project for the USGS - VerkaikDeltares
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...Deltares
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...Deltares
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesInderjeet Singh
 
UberCloud - From Project to Product
UberCloud - From Project to ProductUberCloud - From Project to Product
UberCloud - From Project to ProductThe UberCloud
 
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...Wolfgang Gentzsch
 
Bridging the gap to facilitate selection and image analysis activities for la...
Bridging the gap to facilitate selection and image analysis activities for la...Bridging the gap to facilitate selection and image analysis activities for la...
Bridging the gap to facilitate selection and image analysis activities for la...Phidias
 
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...rodrickmero
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board Helix Nebula The Science Cloud
 
Enabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceEnabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceDai Yang
 
Predictive churn h20_dsx
Predictive churn h20_dsxPredictive churn h20_dsx
Predictive churn h20_dsxNdjido Ardo BAR
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC
 
Advancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandMellanox Technologies
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...jaliyae
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution ISSGC Summer School
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformLarry Smarr
 

Ähnlich wie SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures (20)

Research portfolio
Research portfolio Research portfolio
Research portfolio
 
DSD-INT 2019 Parallelization project for the USGS - Verkaik
DSD-INT 2019 Parallelization project for the USGS - VerkaikDSD-INT 2019 Parallelization project for the USGS - Verkaik
DSD-INT 2019 Parallelization project for the USGS - Verkaik
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
UberCloud - From Project to Product
UberCloud - From Project to ProductUberCloud - From Project to Product
UberCloud - From Project to Product
 
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
 
Bridging the gap to facilitate selection and image analysis activities for la...
Bridging the gap to facilitate selection and image analysis activities for la...Bridging the gap to facilitate selection and image analysis activities for la...
Bridging the gap to facilitate selection and image analysis activities for la...
 
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
 
Edge-Fog Cloud
Edge-Fog CloudEdge-Fog Cloud
Edge-Fog Cloud
 
B4 greengrid
B4 greengridB4 greengrid
B4 greengrid
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board
 
Enabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceEnabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault Tolerance
 
Predictive churn h20_dsx
Predictive churn h20_dsxPredictive churn h20_dsx
Predictive churn h20_dsx
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
 
Advancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 

Kürzlich hochgeladen

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Kürzlich hochgeladen (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures

  • 1. SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures Simon Delamare 1 Gilles Fedak 2 Derrick Kondo 3 Oleg Lodygensky 4 1 LIP/CNRS, Univ. Lyon, France 2 LIP/INRIA, Univ. Lyon, France 3 LIG/INRIA, Univ. Grenoble, France 4 LAL/CNRS, Univ. Paris XI, France High-Performance Parallel and Distributed Computing, 2012 S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 1 / 18
  • 2. Introduction BE-DCI = “Best-Effort” Distributed Computing Infrastructure → Large computing power at low cost, Avoid wasting resources → No availability guarantee Desktop Grids → BOINC projects: Peta FLOPS for free Grids used in Best-Effort mode → ≈ 40% of utilization in Grid5000@Lyon Cloud “Spot” Instances → c1.large instance price: 0.12$/h (spot) vs. 0.32$/h (regular) Relevant for BoT execution ... Bag of Tasks: Set of independent tasks to compute → but Low QoS level Especially compared to regular infrastructures S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 2 / 18
  • 3. Performance Problem Addressed BoT completion rate increases at the end of execution → Tail Effect 0 0.2 0.4 0.6 0.8 1 1.2 0 20 40 60 80 100 BoTcompletionratio Time Continuation is performed at 90% of completion Ideal Time Actual Completion Time Tail Duration Slowdown = (Tail Duration + Ideal Time) / Ideal Time BoT completion Tail part of the BoT Measured by Slowdown: S = IdealCompletionTime RealCompletionTime S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 3 / 18
  • 4. Slowdown by Tail Effect Slowdown reported on BoT execution 0 0.2 0.4 0.6 0.8 1 0.1 1 10 100 Fractionofexecutionwheretailslowdown<S Tail Slowdown S (Completion time observed divided by ideal completion time) BOINC XWHEP Best 50% ⇒ S < 1.3 25% to 33% ⇒ S > 2 Worst 5% ⇒ S> 4 to 10 Avg. % of BoT in tail Avg. % of time in tail BE-DCI Trace BOINC XWHEP BOINC XWHEP Desktop Grids 4.65 5.11 51.8 45.2 Best Effort Grids 3.74 6.40 27.4 16.5 Spot Instances 2.94 5.19 22.7 21.6 → Caused by no more than the last 7% of BoT S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 4 / 18
  • 5. How to improve the situation ? Better scheduling QoS in Grid scheduling ([12], [20], [38]) → Require heavy modification of middleware → No satisfactory solution for unreliable infrastructure ([7]) Addressing the tail effect → e.g. in MapReduce ([3], [39]), but require precise information from compute nodes, hard in large DCIs. Building Hybrid DCIs Grid & Desktop Grid ([35],[36]) → Mostly to offload Grid usage Using Cloud computing ([10],[28],[37]) → To address peak demands S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 5 / 18
  • 6. SpeQuloS Service → Improving BE-DCIs users perceived QoS Speeding up BoT execution Bring information on expected BoT execution time By dynamic provision of Cloud resources → Monitoring BoT execution → Execute the tail on Cloud Features: 1 Our context: Existing BE-DCIs and Clouds, not administrator: Black Boxes 2 Interface with users: QoS requests, State of completion, Prediction on remaining time 3 Careful utilization of Cloud resources w/ Billing & Accounting of usage S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 6 / 18
  • 7. Framework SpeQuloS modules: Information: Collect QoS-related information from DGs Oracle: Strategies to appropriately use Cloud resources / QoS prediction for users Scheduler: Start/Stop Cloud resources, usage accounting Credit System: Bill Cloud usage to user, using “credits” to buy Cloud resource cpu.h Implementation Independant modules using Python & MySQL Supported Clouds: EC2, OpenNebula, etc. Supported DG middleware: BOINC & XtremWeb-HEP S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 7 / 18
  • 8. Cloud Provisioning Strategies When to start Cloud resources ? At 90% of BoT completion (9C) At 90% of BoT assignment (9A) When Tail appear, by monitoring execution time variance (V) S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
  • 9. Cloud Provisioning Strategies When to start Cloud resources ? At 90% of BoT completion (9C) At 90% of BoT assignment (9A) When Tail appear, by monitoring execution time variance (V) How many Cloud resources to start (for a given amount of Credits) ? Greedy: As much as possible, for 1 hour of cloud usage (G) Conservative: To ensure that there will be enough credits to run Cloud up to an estimated completion time (C) S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
  • 10. Cloud Provisioning Strategies When to start Cloud resources ? At 90% of BoT completion (9C) At 90% of BoT assignment (9A) When Tail appear, by monitoring execution time variance (V) How many Cloud resources to start (for a given amount of Credits) ? Greedy: As much as possible, for 1 hour of cloud usage (G) Conservative: To ensure that there will be enough credits to run Cloud up to an estimated completion time (C) How to use Cloud resources ? Flat: Cloud worker not differentiated from BE-DCI workers (F) Reschedule : Scheduler reshedule tasks executed on BE-DCI to Cloud (R) Cloud Duplication : Uncompleted tasks are duplicated to a dedicated Cloud infrastructure (D) S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
  • 11. Experimentation Setup (1) Simulations using real BE-DCI infrastructures availability traces, various BoT workloads, BOINC and XWEP middleware BE-DCIs availability traces : Desktop Grids: seti, nd (SETI@Home & NotreDame traces from FTA) Best Effort Grids: g5klyo, g5kgre (Available ressources in Grid5000 Lyon & Grenoble clusters in December 2010) Cloud Spot instances: spot10, spot100 (Maximum number of instances for a renting cost of 10 or 100 $ per hour, fluctuates according to market price) trace length mean deviation min max av. quartiles (s) unav. quartiles (s) avg. power power (days) (nops/s) std. dev. seti 120 24391 6793 15868 31092 61,531,5407 174,501,3078 1000 250 nd 413.87 180 4.129 77 501 952,3840,26562 640,960,1920 1000 250 g5klyo 31 90.573 105.4 6 226 21,51,63 191,236,480 3000 0 g5kgre 31 474.69 178.7 184 591 5,182,11268 23,547,6891 3000 0 spot10 90 82.186 3.814 29 87 4415,5432,17109 4162,5034,9976 3000 300 spot100 90 823.95 4.945 196 877 1063,5566,22490 383,1906,10274 3000 300 S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 9 / 18
  • 12. Experimentation Setup (2) BoT workloads: Size nops / task Arrival time SMALL 1000 3600000 0 BIG 10000 60000 0 RANDOM norm(µ = 1000, σ2 = 200) norm(µ = 60000, σ2 = 10000) weib(λ = 91.98, k = 0.57) Simulations methodology: Reproducible executions wo & w/ SpeQuloS SpeQuloS Credits provisioned w/ 10% of BoT workload (in Cloud resource cpu.hour equivalent) → 25000 BoT execution traces S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 10 / 18
  • 13. Strategies Comparison Tail Removal Efficiency → Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-F 9A-G-F V-G-F 9C-C-F 9A-C-F V-C-F Flat deployment strategy 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-R 9A-G-R V-G-R 9C-C-R 9A-C-R V-C-R Reschedule deployment strategy 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-D 9A-G-D V-G-D 9C-C-D 9A-C-D V-C-D Cloud duplication deployment strategy Best strategies are able to Suppress tail for 50% of execution Half the tail for 80% of execution S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
  • 14. Strategies Comparison Tail Removal Efficiency → Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-F 9A-G-F V-G-F 9C-C-F 9A-C-F V-C-F Flat deployment strategy 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-R 9A-G-R V-G-R 9C-C-R 9A-C-R V-C-R Reschedule deployment strategy 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-D 9A-G-D V-G-D 9C-C-D 9A-C-D V-C-D Cloud duplication deployment strategy Best strategies are able to Suppress tail for 50% of execution Half the tail for 80% of execution Flat (F) < Reschedule (R) & Cloud Duplication (D) Tail Detection (V) triggers Cloud too late S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
  • 15. Cloud Resources Consumption Percentage of credits spent vs credits provisioned (=10% of BoT workload). 10% to 25% of what has been provisioned are actually used by Cloud resources 0 10 20 30 40 50 9C-G -F 9C-G -R 9C-G -D9C-C-F 9C-C-R 9C-C-D9A -G -F 9A -G -R 9A -G -D9A -C-F 9A -C-R 9A -C-DV -G -F V -G -R V -G -DV -C-F V -C-R V -C-D Percentageofcreditsused Combination of SpeQuloS strategies S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
  • 16. Cloud Resources Consumption Percentage of credits spent vs credits provisioned (=10% of BoT workload). 10% to 25% of what has been provisioned are actually used by Cloud resources 0 10 20 30 40 50 9C-G -F 9C-G -R 9C-G -D9C-C-F 9C-C-R 9C-C-D9A -G -F 9A -G -R 9A -G -D9A -C-F 9A -C-R 9A -C-DV -G -F V -G -R V -G -DV -C-F V -C-R V -C-D Percentageofcreditsused Combination of SpeQuloS strategies → ≈2.5% of BoT workload is executed on Cloud S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
  • 17. Completion Time Combination of strategies used: 9C-C-R 0 20000 40000 60000 80000 100000 120000 140000 SETI N D G 5K LY OG 5K G RESPO T10SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS BOINC & SMALL BoT 0 5000 10000 15000 20000 25000 SETI N D G 5K LY OG 5K G RESPO T10SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS BOINC & BIG BoT 0 10000 20000 30000 40000 50000 60000 70000 SETI N D G 5K LY OG 5K G RESPO T10SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS BOINC & RANDOM BoT 0 5000 10000 15000 20000 25000 30000 35000 40000 SETI N D G 5K LY OG 5K G RESPO T10SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS XWHEP & SMALL BoT 0 1000 2000 3000 4000 5000 6000 7000 8000 SETI N D G 5K LY OG 5K G RESPO T10 SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS XWHEP & BIG BoT 1000 2000 3000 4000 5000 6000 7000 8000 SETI N D G 5K LY OG 5K G RESPO T10 SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS XWHEP & RANDOM BoT → Up to 9x speedup → Depend on middleware used, BE-DCI volatility S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 13 / 18
  • 18. Completion Time Prediction → User can ask prediction at any moment of BoT execution Predicted completion time: tp = α × t(r) r Current completion ratio: r Time elapsed since submission: t(r) α: adjustment factor, depend on execution environment: DG server & middlware Application & BoT size → Adjusted after BoT execution to minimize difference w/ completion time observed Statistical uncertainty (±x%): Success rate of prediction vs previous execution S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 14 / 18
  • 19. Prediction Results Completion Time Predication: Made at 50% of BoT execution Uncertainty: ± 20% α adjusted after 30 execution w/ same BD-DCI, middleware, BoT workload BoT category & Middleware SMALL BIG RANDOM BE-DCI BOINC XWHEP BOINC XWHEP BOINC XWHEP Mixed seti 100 100 100 82.8 100 87.0 94.1 nd 100 100 100 100 100 96.0 99.4 g5klyo 88.0 89.3 96.0 87.5 75 75 85.6 g5kgre 96.3 88.5 100 92.9 83.3 34.8 83.3 spot10 100 100 100 100 100 100 100 spot100 100 100 100 100 76 3.6 78.3 Mixed 97.6 96.1 99.2 93.5 89.6 65.3 90.2 → Successful prediction in 9 cases out of 10 → Lower results with heterogeneous BoT → Needs a learning phase, with same BoT (at least same app.), executed on same BE-DCI. S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 15 / 18
  • 20. SpeQuloS Deployment in European Desktop Grid Initiative EDGI project: Bringing European Desktop Grids computing resources to scientific communities. S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 16 / 18
  • 21. Conclusion BE-DCIs: “Low-cost” solution but poor QoS (tail effect) SpeQuloS: Use Cloud resources to improve QoS delivered to BE-DCI users Efficiently removes the tail problem → Speed up BoT execution → Only require few % of workload to be executed on Cloud Enable completion time prediction for users → A step towards BE-DCIs usability in computing landscape ? Future work: Better strategies to anticipate problems (tail effect) Analysis from users feedback in SpeQuloS deployments S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 17 / 18
  • 22. S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 18 / 18