2. Co-‐‑‒located Events
• ACM Symposium on SDN Research 2016 (SOSR), March 13-‐‑‒17
• 2016 Open Networking Summit (ONS), March 14-‐‑‒17
• The 12th ACM/IEEE Symposium on Architectures for Networking
and Communications Systems (ANCSʼ’16), March 17-‐‑‒19
• The 13th USENIX Symposium on Networked Systems Design and
Implementation (NSDIʼ’16)
• The USENIX Workshop on Cool Topics in Sustainable Data
Centers (CoolDCʼ’16), March 19
2
3. Session: Resource Sharing
• “Ernest: Efficient Performance Prediction for Large-‐‑‒Scale Advanced
Analytics,” Shivaram Venkataraman, Zongheng Yang, Michael Franklin,
Benjamin Recht, and Ion Stoica, University of California, Berkeley
• “Cliffhanger: Scaling Performance Cliffs in Web Memory Caches,”
Asaf Cidon and Assaf Eisenman, Stanford University; Mohammad
Alizadeh, MIT CSAIL; Sachin Katti, Stanford University
• “FairRide: Near-‐‑‒Optimal, Fair Cache Sharing,” Qifan Pu and Haoyuan
Li, University of California, Berkeley; Matei Zaharia, Massachusetts
Institute of Technology; Ali Ghodsi and Ion Stoica, University of California,
Berkeley
• “HUG: Multi-‐‑‒Resource Fairness for Correlated and Elastic Demands,”
Mosharaf Chowdhury, University of Michigan; Zhenhua Liu, Stony Brook
University; Ali Ghodsi and Ion Stoica, University of California, Berkeley,
and Databricks Inc.
3
4. Ernest: Efficient Performance Prediction for
Large-‐‑‒Scale Advanced Analytics
• Who?:SparkやMesos等で知られるUCB AMPLabの⼤大学院⽣生。⼤大規模
データ分析に対するシステムやアルゴリズムが専⾨門で、SoCC12、
EuroSys13、OSDI14、SIGMOD16等で発表あり。
• What?:クラウド環境における機械学習、ゲノム解析などのデータ分析
ワークロードを効率率率的に性能予測するフレームワークの提案
4
DO CHOICES MATTER ?
0
5
10
15
20
25
30
Time(s)
1 r3.8xlarge
2 r3.4xlarge
4 r3.2xlarge
8 r3.xlarge
16 r3.large
Matrix Multiply: 400K by 1K
0
5
10
15
20
25
30
35
Time(s)
QR Factorization 1M by 1K
Network Bound
Mem Bandwidth Bound
DO CHOICES MATTER ? MATRIX MULTIPLY
10
15
20
25
30
Time(s)
1 r3.8xlarge
2 r3.4xlarge
4 r3.2xlarge
8 r3.xlarge
Matrix size: 400K by 1K
Cores = 16
Memory = 244 GB
Cost = $2.66/hr
Cosine
Transform
Normalization
Linear Solver
~100 iterations
Iterative
(each iteration many jobs)
Long Running à Expensive
Numerically Intensive
7
Keystone-ML TIMIT PIPELINE
Raw
Data
Properties
0
10
20
30
0
100
200
300
400
500
600
Time(s)
Cores
Actual
Ideal
r3.4xlarge instances, QR Factorization:1M by 1K
13
Do choices MATTER ?
Computation + Communication à Non-linear Scaling
5. Ernest: Efficient Performance Prediction for
Large-‐‑‒Scale Advanced Analytics
5
• How?:⼩小規模なTraining jobの実⾏行行結果から性能を予測。実験計画法
を使ってTraining job数を削減。
OPTIMAL Design of EXPERIMENTS
1%
2%
4%
8%
1
2
4
8
Input
Machines
Use off-the-shelf solver
(CVX)
USING ERNEST
Training
Jobs
Job
Binary
Machines,
Input Size
Linear
Model
Experiment
Design
Use few iterations for
training
0
200
400
600
800
1000
1
30
900
Time
Machines
ERNEST
BASIC Model
time = x1 + x2 ∗
input
machines
+ x3 ∗ log(machines)+ x4 ∗ (machines)
Serial
Execution
Computation (linear)
Tree DAG
All-to-One DAG
Collect Training Data
Fit Linear Regression
6. Ernest: Efficient Performance Prediction for
Large-‐‑‒Scale Advanced Analytics
• Results:
6
TRAINING TIME: Keystone-ml
TIMIT Pipeline on r3.xlarge instances, 100 iterations
29
7 data points
Up to 16 machines
Up to 10% data
EXPERIMENT DESIGN
0
1000
2000
3000
4000
5000
6000
42 machines
Time (s)
Training Time
Running Time
0%
20%
40%
60%
80%
100%
Regression
Classification
KMeans
PCA
TIMIT
Prediction Error (%)
Experiment Design
Cost-based
Is Experiment Design useful ?
30
7. Cliffhanger: Scaling Performance Cliffs
in Web Memory Caches
• Who?:Stanford CS出⾝身で、現在はクラウドセキュリティ会社Sookasa
のCEO(共同創業者)。クラウドストレージが専⾨門、SIGCOMM12、
USENIX ATC13, 15で発表あり。
• What?:Performance cliffに対する、Memcachedの動的キャッシュ割
当て機構(Slab allocator)の改良良
70 2000 4000 6000 8000 10000 12000 14000 16000 18000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Items in LRU Queue
Hitrate
Concave Hull
Application 19, Slab 0
Performance Cliff,
Talus[HPCA15]
+1 cache hit-‐‑‒rate
↓
+35% speedup
The cache hit-‐‑‒rate of
Facebookʼ’s Memcached pool
is 98.2%[SIGMETRICS12]
Hit-‐‑‒rate Curve
11. FairRide: Near-‐‑‒Optimal, Fair Cache
Sharing
• How?
– Max-‐‑‒minポリシにProbabilistic blockingを導⼊入することでチートに対する
dis-‐‑‒incentiveを与える。
– Alluxio (Tachyon)[SoCC14]ベースに実装。
11
LEGEND
A
C
5
5
A
B
C
5
5
10
B
A
B
C
5
5
10
true access
free-ride
cheat
blocked
Figure 3: Example with 2 users, 3 files and total cache
size of 2. Numbers represent access frequencies. (a). Al-
to get 1 hit/sec access rate for a unit file. To
mize over the utility, which is defined as the to
rate, a user’s optimal strategy is not to cache th
that one has highest access frequencies, but the
with lowest cost/(hit/sec). Compare a file of 10
shared by 2 users and another file of 100MB, share
users. Even though a user access the former 10 tim
and the latter only 8 times/sec, it is overall eco
to cache the second file (comparing 5MB/(hit/se
2.5MB/(hit/sec)).
(a) Max-‐‑‒min
fairness
(b) second user
makes cheating
(c) blocking free-‐‑‒
riding access
Probabilistic blocking
• FairRide blocks a user with p(nj) = 1/(nj+1) probability
– nj is number of other users caching file j
– e.g., p(1)=50%, p(4)=20%
• The best you can do in a general case
– Less blocking does not prevent cheating
25
12. FairRide: Near-‐‑‒Optimal, Fair Cache
Sharing
12
0
15
30
45
60
0 150 300 450 600 750 900 1050
missratio(%)
Time (s)
user 1
user 2
Cheating under FairRide
user 2 cheats
user 1 cheats
32
FairRide dis-incentives users from cheating.
400
300
200
100
0
Avg.response(ms)
Facebook experiments
FairRide outperforms max-min fairness by 29%
34
0
15
30
45
60
1-10 11-50 51-100 101-500 501-
RedcutioninMedian
JobTime(%)
Bin (#Tasks)
max-min
FairRide
13. HUG: Multi-‐‑‒Resource Fairness for
Correlated and Elastic Demands
• Who?:ミシガン⼤大の助教。UCB AMPLab出⾝身。ネットワークが専⾨門
(coflow-‐‑‒based networking, multi-‐‑‒resource allocation in dataceters,
compute and storage for big data, network virtualization)でSIGCOMM
で毎年年のように発表。DRF[NSDI11]、FairCloud[SIGCOMM12]の発展。
• What?:ネットワーク帯域の割当て最適化問題
13
…
M1 M2 M3 MN
Congestion-Less Core
L1 L2 L3 LNLN+1 LN+2 LN+3 L2N
How to share the links
between multiple
tenants to provide
1. optimal performance
guarantees and
2. maximize utilization?
Tenant-A’s VMs
Tenant-B’s VMs
14. HUG: Multi-‐‑‒Resource Fairness for
Correlated and Elastic Demands
• Highest Utilization with the Optimal Isolation Guarantee
14
Isolation Guarantee
Utilization
Work-
Conserving
Low
Low Optimal
PS-P
DRF
Per-Flow Fairness
HUG
HUG in Cooperative Setting
1. Optimal Isolation
Guarantee
2. Work Conservation
Isolation Guarantee
Utilization
Work-
Conserving
Low
Low Optimal
PS-P
DRF
Per-Flow Fairness
HUG
1. Optimal Isolation
Guarantee
2. HighestUtilization
3. Strategyproof
HUG in Non-Cooperative Setting
Intuitively, we want to maximize the minimum
progress over all tenants, i.e., maximize mink Mk,
where mink Mk corresponds to the isolation guaran-
tee of an allocation algorithm. We make three observa-
tions. First, when there is a single link in the system,
this model trivially reduces to max-min fairness. Sec-
ond, getting more aggregate bandwidth is not always bet-
ter. For tenant-A in the example, ⟨50Mbps, 100Mbps⟩ is
better than ⟨90Mbps, 90Mbps⟩ or ⟨25Mbps, 200Mbps⟩,
even though the latter ones have more bandwidth in to-
tal. Third, simply applying max-min fairness to individ-
ual links is not enough. In our example, max-min fairness
allocates equal resources to both tenants on both links,
resulting in allocations ⟨1
2 , 1
2 ⟩ on both links (Figure 1b).
Corresponding progress (MA = MB = 1
2 ) result in a
suboptimal isolation guarantee (min{MA, MB} = 1
2 ).
Dominant Resource Fairness (DRF) [33] extends max-
min fairness to multiple resources and prevents such sub-
Cloud Network Sharing
Dynamic Sharing
Flow-Level
(Per-Flow Fairness)
No isolation guarantee
VM-Level
(Seawall, GateKeeper)
No isolation guarantee
Tenant-/Network-Level
Non-Cooperative
Environments
Require
strategy-proofness
Highest Utilization for
Optimal IsolationGuarantee
(HUG)
Cooperative
Environments
Do not require
strategy-proofness
Reservation
(SecondNet, Oktopus, Pulsar, Silo)
Uses admission control
Low
Utilization
(DRF)
Optimal isolation guarantee
Work-Conserving
Optimal Isolation Guarantee
(HUG)
Suboptimal
IsolationGuarantee
(PS-P, EyeQ, NetShare)
Work-conserving
15. HUG: Multi-‐‑‒Resource Fairness for
Correlated and Elastic Demands
• 100台のEC2インスタンスで実験。
• 3つのテナント
– テナントA、C:pairwise one-‐‑‒to-‐‑‒one communication
– テナントB:all-‐‑‒to-‐‑‒all communication
15
0
50
100
0 60 120 180 240 300 360 420 480 540
TotalAlloc(Gbps)
Time (Seconds)
Tenant A
Tenant B
Tenant C
(a) Per-flow Fairness (TCP)
0
50
100
0 60 120 180 240 300 360 420 480 540
TotalAlloc(Gbps)
Time (Seconds)
Tenant A
Tenant B
Tenant C
(b) HUG
Figure 10: [EC2] Bandwidth consumptions of three tenants arriving over time in a 100-machine EC2 cluster. Each tenant has 100
VMs, but each uses a different communication pattern (§5.1.1). We observe that (a) using TCP, tenant-B dominates the network by
creating more flows; (b) HUG isolates tenants A and C from tenant B.