Boost PC performance: How more available memory can improve productivity
Graph Partition with Natural Cuts
1. PUNCH
Partitioning Using Natural-Cut Heuristics
Daniel Delling (Microsoft Research)
Andrew V. Goldberg (Microsoft Research)
Ilya Razenshteyn (Moscow University)
Renato F. Werneck (Microsoft Research)
May 19, 2010
2. Motivation
Goal: process a continental-sized road network in parallel
(Europe: 18M nodes and 43M arcs).
The first natural step: divide it into “small” parts with few
arcs between them.
Partition problems are NP-hard, but routinely solved using
different heuristics.
4. Applications: routing on road networks
Idea: Precompute distances between boundary nodes of each cell.
Overlay Graph:
Nodes — boundary nodes
Edges between boundary nodes,
Search Graph:
t
Source and target cell,
s
Overlay graph,
Use bidirectional Dijkstra
Number of cut edges affects the performance heavily.
More applications: arc-flags and reach.
5. Existing solvers
METIS [KK’99], SCOTCH [PR’96], KAPPA [HSS’10],
KASPAR [OS’10], KAFFPA [SS’10]. General purpose, some
are fast, some produce very good solution.
There are many more . . .
Our goal: partitioner tailored to road networks,
emphasize quality, still fast enough in practice.
6. Formal definition
Input: undirected graph G = (V , E ).
Result: partition (V = V1 ∪ V2 ∪ . . . ∪ Vk , with Vi ∩ Vj = ∅).
Goal: minimize number of edges between Vi .
Two common variants:
given U, require |Vi | ≤ U for every i,
given k and , require |Vi | ≤ (1 + ) n/k .
We focus on the first one.
Rebalancing is possible.
7. Intuition
Road networks: dense regions (grids, cities) interleaved with
natural cuts (mountains, parks, rivers, deserts, sparse areas,
freeways).
8. Summary of the algorithm
Filtering:
Contracts dense regions,
Reduces graph size,
Preserves natural cuts structure.
Assembly phase:
Works with much smaller graph,
Finds actual partition.
9. Outline of the talk
Introduction
Natural cuts
Assembly phase
Experiments
Conclusion
10. Outline of the talk
Introduction
Natural cuts
Assembly phase
Experiments
Conclusion
11. Natural cuts
Sparse sets that
separate dense areas.
Minimum s–t cuts are
trivial (average degree
< 3).
Sparsest cuts would be
OK, but they are
intractable.
Our notion of natural
cut is both tractable and
useful.
12. Natural cuts
Pick centers in a
randomized manner.
v Compute minimum cut
between the core and the
ring.
Repeat until every node
is inside of at least two
cores.
13. Natural cuts
Pick centers in a
randomized manner.
v Compute minimum cut
between the core and the
U/10 nodes ring.
Repeat until every node
is inside of at least two
cores.
14. Natural cuts
Pick centers in a
randomized manner.
v Compute minimum cut
between the core and the
U/10 nodes ring.
Repeat until every node
is inside of at least two
cores.
U nodes
15. Natural cuts
Pick centers in a
randomized manner.
v Compute minimum cut
between the core and the
U/10 nodes ring.
Repeat until every node
is inside of at least two
cores.
U nodes
16. Natural cuts
Take a union of all natural cuts found and contract everything
between them.
The resulting graph is much smaller than the original one.
U = 106 — 18M nodes to 10K nodes
U = 103 — 18M nodes to 1.3M nodes
17. Natural cuts
Take a union of all natural cuts found and contract everything
between them.
The resulting graph is much smaller than the original one.
U = 106 — 18M nodes to 10K nodes
U = 103 — 18M nodes to 1.3M nodes
18. Natural cuts
Take a union of all natural cuts found and contract everything
between them.
The resulting graph is much smaller than the original one.
U = 106 — 18M nodes to 10K nodes
U = 103 — 18M nodes to 1.3M nodes
19. Tiny cuts
The most obvious natural cuts — 1-cuts and 2-cuts.
We handle them explicitly before processing natural cuts.
Greatly decreases graph size (by half) and overall running
time,
20. Outline of the talk
Introduction
Natural cuts
Assembly phase
Experiments
Conclusion
21. Assembly phase
Three ingredients:
Greedy algorithm,
Local search,
Multistart and combination heuristics (optional).
22. Greedy algorithm
We combine well-connected small fragments in a randomized
fashion.
Repeat until maximal.
Finds initial partition.
23. Greedy algorithm
We combine well-connected small fragments in a randomized
fashion.
Repeat until maximal.
Finds initial partition.
24. Greedy algorithm
We combine well-connected small fragments in a randomized
fashion.
Repeat until maximal.
Finds initial partition.
25. Greedy algorithm
We combine well-connected small fragments in a randomized
fashion.
Repeat until maximal.
Finds initial partition.
26. Greedy algorithm
We combine well-connected small fragments in a randomized
fashion.
Repeat until maximal.
Finds initial partition.
27. Greedy algorithm
We combine well-connected small fragments in a randomized
fashion.
Repeat until maximal.
Finds initial partition.
28. The local search
Pick two neighboring cells, disassemble them, apply greedy
algorithm to the subproblem.
Repeat several times for every pair of neighboring cells.
29. The local search
Pick two neighboring cells, disassemble them, apply greedy
algorithm to the subproblem.
Repeat several times for every pair of neighboring cells.
30. The local search
Pick two neighboring cells, disassemble them, apply greedy
algorithm to the subproblem.
Repeat several times for every pair of neighboring cells.
31. Multistart and combination heuristics
Since the local search is typically much faster than the natural cuts
detection, we can use the following two heuristics:
Multistart: since the local search is randomized, we can
repeat it several times.
Combination: keep track of several solutions, and combine
them from time to time.
32. Outline of the talk
Introduction
Natural cuts
Assembly phase
Experiments
Conclusion
33. Experimental evaluation
C++/OpenMP
Tested on Western Europe map (18M nodes, 43M arcs).
Machine: Intel Xeon X5680 (two six-core 3.33GHz CPUs)
with DDR3-1333MHz RAM.
34. A typical use-case
Europe, U = 64K .
Tiny cuts contraction: 25 seconds (18M nodes to 9M nodes).
Natural cuts identification: 50 seconds (12 cores, 9M nodes to
100K nodes).
Greedy + local search: only 5 seconds (12 cores).
35. Running times on Europe
Tiny cuts
200
Natural cuts
Greedy + Local search
180
q
160
140
120
q
Time (s)
100
80
q
60
q
40
q
q
q q q q q q q
20
q
q
q
q
q
q q q
0
210 212 214 216 218 220 222
maximum cell size
36. Influence of ϕ
The local search tries every edge ϕ times.
15000
Dependence on phi
q
14000
13000
cut size
12000
q
q
q
q
q
q
q
q
11000
q q
10000
0.1 1 10 100 1000 10000
time (s)
38. Balanced partitions
Recall that there are two variants of requirements on |Vi |:
given U, require |Vi | ≤ U for every i,
given k and , require |Vi | ≤ (1 + ) n/k .
PUNCH solves the first, but most existing solvers find
-balanced partitions.
Rebalancing:
Run PUNCH with U = (1 + ) n/k ,
If there are too many regions, redistribute them.