Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
LCA14: LCA14-109: Path to Energy Efficient Scheduler
1. 1
Path to Energy Efficient
Scheduler
Linaro Connect Asia 2014, Macau
Morten Rasmussen
2. 2
Motivation
Energy cost driven task placement (load-balancing)
Focus on the actual goal of the energy-aware scheduling activities:
Saving energy while achieving (near) optimum performance.
Energy benefit of scheduling decision clear when made.
Assuming energy cost estimates are fairly accurate.
Introduce a simple energy model to estimate costs and guide
scheduling decisions.
Requested by maintainers at the KS workshop.
Gives the right amount of packing and spreading.
May simplify balancing decision logic.
Strong focus on saving energy in load balancing algorithms.
big.LITTLE support comes naturally and almost for free.
This just one part of the energy efficiency work.
Several related sessions this week.
3. 3
Energy Load Balancing
The idea (a bit simplified):
Let the resulting energy consumption guide all balancing decisions:
if (energy_diff(task, src_cpu, dst_cpu) > 0) {
move_task(task, src_cpu, dst_cpu);
} else {
/* Try some other task */
}
Ideally, we should get the optimum balance if we try all combinations
of tasks and cpus.
In reality it is not that simple. We can't try all combinations, but we
can get fairly close for most scenarios.
If the energy model is accurate enough we get packing and spreading
implicitly and only when it saves energy
Should work for any system. SMP and big.LITTLE (with a few
extensions).
4. 4
Power and Energy
Goal: Save energy, not power.
Power
Time
Energy
ecpu=P⋅t , t=
inst
cc
ecpu=P(cc)
inst
cc
ecpu=P(cc)(
insttask
cc
+
instidle
cc
)
ecpu=etask+eidle
Compute capacity (~ freq * uarch)
= Energy/inst: This is what we try to minimize.
ecpu=Pbusy (cc)
insttask
cc
+Pidle
instidle
cc
If we have cpuidle support we get:
We have to add an additional leakage energy term to reflect that it is better not wake cpus
unnecessarily.
~ utilization
Tracked load
Time
Time in runnable state
~ utilization*
Work
5. 5
Simple Energy Model
cpu_energy = power(cc) * util/cc
+ idle_power * (1-(util/cc))
+ leakage_energy
cluster_energy =
c_active_power * c_util
+ c_idle_power * (1-c_util)
util = Scale invariant cpu utilization (Tracked load).
cc = Current compute capacity (depends on freq and uarch).
power(cc) = Busy power (fully loaded) at current capacity from table.
idle_power = Idle power consumption (~WFI).
leakage_energy = Constant representing the cost of waking the cpu.
c_util = Cluster utilization. Depends on max(util/cc) ratio of its cpus.
c_active_power = Cluster active power.
c_idle_power = Cluster idle power.
6. 6
Compute Capacity and Power
Processor specific table expressing power and compute
capacity at each P-state.
The sched domain hierarchy is in a good position to hold this type of
information.
Example (entirely made up):
Capacity Power
0.2 0.4
0.4 0.9
0.6 1.5
0.8 2.2
1.0 3.2
Capacity Power
0.4 1.6
0.8 4.4
1.2 9.0
1.6 15.0
2.0 23.0
Little Big
Equal compute capacity
idle 0.1
leakage 0.1
idle 0.3
leakage 0.5
Little Big
active 2.4 6.0
idle 0.0 0.0
cluster
7. 7
energy_diff()
def energy_diff(tload, scpu, dcpu):
# Estimate the next compute capacity (P-state)
s_new_cc = find_cpu_cap(scpu, cpu_util(scpu))
# energy model cost for task on source cpu
s_task_energy = tload/s_new_cc * cpu_cc_power(scpu, s_new_cc)
if nr_running(scpu) == 1:
s_task_energy += cpu_leakage_energy[cpu_type[scpu]]
# Estimate destination cpu cc after adding the task
d_new_cc = find_cpu_cc(dcpu, cpu_util(dcpu)+tload)
# energy model cost for task on destination cpu
d_task_energy = tload/d_new_cc * cpu_cc_power(dcpu, d_new_cc)
if nr_running(dcpu) == 0:
d_task_energy += cpu_leakage_energy[cpu_type[dcpu]]
return s_task_energy - d_task_energy
Balancing two cpus:
Balancing sched domains is slightly more complicated as it
involves cluster power as well.
8. 8
Example
cpu rq util cap cc_power leak power
0 {0.2} 0.2 0.2 0.4 0.1 0.5
1 {0.1} 0.1 0.2 0.4 0.1 0.35
2 {} 0.0 0.2 0.4 0.1 0.1
cluster - 1.0 - 2.4 - 2.4
Total 3.35
energy_diff()
= 0.075*
* energy_diff() ignores cluster power and other tasks to keep computations cheap and simple.
Better accuracy can be added if necessary.
0.55
saved
cpu rq util cap cc_power leak power
0 {0.2, 0.1} 0.3 0.4 0.9 0.1 0.8
1 {} 0.0 0.4 0.9 0.1 0.1
2 {} 0.0 0.4 0.9 0.1 0.1
cluster - 0.75 - 2.4 - 1.8
Total 2.8
After EA load balance:
9. 9
Is the energy model too simple?
It is essential that the energy model is fast and is easy to use for load-
balancing.
The scheduler is a critical path and already complex enough.
Python model tests
Disclaimer: These numbers have not been validated in any way.
Test configuration: 3+3 big.LITTLE, 1000 random balance scenarios.
Rand/Opt: Random balance energy (starting point) worse than best possible balance
energy (brute-force).
EA/Opt: Energy model based balance energy worse than best possible balance energy.
EA == Opt: Scenarios where EA found best possible balance.
Tasks Rand/Opt EA/Opt EA == Opt
2 7.86% 0.09% 72.60%
3 7.79% 0.15% 64.80%
4 9.39% 0.45% 62.00%
5 10.02% 1.15% 51.10%
6 11.44% 2.23% 38.30%
10. 10
What is next?
Early prototype to validate the idea. Initial focus getting
energy_diff() working on simple SMP system.
Post on LKML very soon.
Open Issues
Exposing power/capacity tables to kernel. Essential to make the right
decisions.
Plumbing: Where do the tables come from? DT?
Next steps:
Scale invariance: Requirement for the energy model to work.
Fix cpu_power/compute capacity use in scheduler.
Tooling and benchmarks (covered in another session)
Idle integration (covered in another session)