This document summarizes Jason Riedy's Ph.D. dissertation on making static pivoting scalable and dependable. It outlines contributions to improving iterative refinement to provide small forward errors dependably, even for difficult systems. It also improves static pivoting heuristics and develops a distributed memory algorithm for static pivoting. The work defines what it means for a solver to be dependable and introduces error measures and a difficulty metric. It presents results showing the method provides dependable errors for a higher percentage of test systems compared to a previous method.
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Making Static Pivoting Scalable and Dependable
1. Making Static Pivoting Scalable and Dependable
Ph.D. Dissertation Talk
E. Jason Riedy
jason@acm.org
EECS Department
University of California, Berkeley
Committee: Dr. James Demmel (chair), Dr. Katherine Yelick, Dr. Sanjay Govindjee
17 December, 2010
2. Outline
1 Introduction
2 Solving Ax = b dependably
3 Extending dependability to static pivoting
4 Distributed matching for static pivoting
5 Summary
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 2 / 59
3. Motivation: Ever Larger Ax = b
Systems Ax = b are growing larger, more difficult
Omega3P: n = 7.5 million with τ = 300 million entries
Quantum Mechanics: precondition with blocks of dimension
200-350 thousand
Large barrier-based optimization problems: Many solves, similar
structure, increasing condition number
Huge systems are generated, solved, and analyzed automatically.
Large, highly unsymmetric systems need scalable parallel solvers.
Low-level routines: No expert in the loop!
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 3 / 59
4. Motivation: Solving Ax = b better
Many people work to solve Ax = b faster.
Today we start with how to solve it better.
Better enables faster.
Use extra floating-point precision within iterative refinement to
obtain a dependable solution, adding O(n2 ) work after an O(n3 )
factorization.
Accelerate sparse factorization through static pivoting,
decoupling symbolic, numeric phases.
Refine the perturbed solution without needing extra triangular
solves for condition estimation.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 4 / 59
5. Contributions
Iterative refinement
Extend iterative refinement to provide small forward errors
dependably (to be defined)
Set and use a methodology to demonstrate dependability
Show that condition estimation (expensive for sparse systems) is
not necessary for obtaining a dependable solution
Static pivoting
Improve static pivoting heuristics
Demonstrate that an approximate maximum weight bipartite
matching is faster and just as accurate
Develop a memory-scalable distributed memory auction
algorithm for static pivoting
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 5 / 59
6. Defining “dependable”
A dependable solver for Ax = b returns a result x with small error
often enough that you expect success with a small error, and clearly
signals results that likely contain large errors.
True error Difficulty Alg. reports w/likeliness
O(mach. precision) not bad success Very likely
failure Somewhat rare
larger not bad success (not yet seen)
failure Practically certain
O(mach. precision) difficult success Whenever feasible
failure Practically certain
larger difficult success (not yet seen)
failure Very likely
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 6 / 59
7. Introducing the errors and targets
y1
A b
−1
(A, b)
(A, b)
A −1b
x
LU: Small backward error LU: Error in y ∝ difficulty
2−25
20
2−30 Percent
Percent 0.5%
1% 1.0%
2−10
Error
1.5%
Error
2%
−35 2.0%
2 3%
4% 2.5%
3.0%
3.5%
−40
2−20
2
2−45 2−30
25 210 215 220 225 20 25 210 215 220 225 230
Difficulty Difficulty
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 7 / 59
8. Introducing the errors and targets
y1
A b
−1
yk
(A, b)
(A, b) yk
A −1b
x
Refined: Accepted with small errors in y , or flagged with unknown error.
Successful Flagged
20
2−10
2−20
% of systems
0.2%
2−30 0.4%
Error
0.6%
0.8%
1.0%
2−40
1.2%
1.4%
2−50
2−60
210 220 230 240 210 220 230 240
Difficulty
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 7 / 59
9. Iterative refinement
Newton’s method applied to Ax = b.
Repeat until done:
1 Compute the residual ri = b − Ayi using extra precision εr .
2 Solve Ady i = ri for the correction using working precision εw .
3 Increment yi+1 = yi + dy i , maintaining y to extra precision εx .
Precisions:
Working precision εw The precision used for storing (and factoring)
A: IEEE754 single (εw = 2−24 ), double (εw = 2−53 ), etc.
Residual precision εr At least double working precision, εr ≤ ε2 w
Solution precision εx At least double working precision, εx ≤ ε2 w
Latter two may be implemented in software.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 8 / 59
10. Definitions
Errors:
Backward (relative) error
Forward (relative) error
Difficulty:
Condition numbers: sensitivity to perturbations
Element growth: error from factorization
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 9 / 59
11. Error measures: Backward error
How close is the nearest system satisfying Ay1 = b?
y1
A b
−1
(A, b)
(A, b)
A −1b
x
Three ways, given r1 = b − Ay1 :
r1 ∞ |r1 |
Normwise A y1 ∞ + b
∞ ∞
Componentwise |A| |y1 |+|b| ∞
r1 ∞
Columnwise Note: Elementwise division, 0/0 = 0,
(max |A|) |y1 |+ b ∞
and max produces a row vector
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59
12. Error measures: Forward error
How close is y1 to x?
y1
A b
−1
(A, b)
(A, b)
A −1b
Two ways and two measuring sticks: x
y1 −x ∞ y1 −x ∞
Normwise x ∞ y1 ∞
y1 −x y1 −x
Componentwise x ∞ y1 ∞
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59
13. Error sensitivity: Conditioning
How sensitive is y1 to perturbations in A and b?
y1
A b
−1
(A, b)
(A, b)
A −1b
x
forward error ≤ condition number × backward error
Each combination has a condition number. We choose two for use in
our difficulty measure.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59
14. Difficulty: condition number × element growth
Condition number:
Backward error κ(A−1 ) = κ(A) = A−1 ∞ A ∞
Normwise forw. err.
κ(A, x, b) = A−1 ∞ ( A ∞ x ∞ + b ∞ )
Componentwise forw. err.
ccond(A, x, b) = |A−1 | (|A| |x| + |b|) ∞
Element growth, est. δAi in (A + δAi )y = b:
|δAi | ≤ 3nd |L| |U| ≤ p(nd )g 1r max |A|
We use a col.-scaling-indep. expression allowing |L| > 1,
(max1≤k≤j maxi |L|(i,k))·(maxi |U|(i,j))
gc = maxj maxi |A|(i,j)
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 11 / 59
15. Dense test systems
30 × 30 single, double, complex, and double complex:
250k, 4 right-hand sides, 1M test systems
Size chosen to sample ill-conditioned region well
Generated as in Demmel, et al., plus b → x
κ∞ (A) = A−1 ∞ A ∞
Single Double
15%
10%
5%
Percent of population
0%
Complex Double Complex
15%
10%
5%
0%
20 210 220 230 240 250 260 270 20 210 220 230 240 250 260 270
Difficulty
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59
16. Dense test systems
30 × 30 single, double, complex, and double complex:
250k, 4 right-hand sides, 1M test systems
Size chosen to sample ill-conditioned region well
Generated as in Demmel, et al., plus b → x
κ(A, x, b) = A−1 ∞ ( A ∞ x ∞ + b ∞)
Single Double
14%
12%
10%
8%
6%
4%
Percent of population
2%
0%
Complex Double Complex
14%
12%
10%
8%
6%
4%
2%
0%
20 210 220 230 240 250 260 270 20 210 220 230 240 250 260 270
Difficulty
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59
17. Dense test systems
30 × 30 single, double, complex, and double complex:
250k, 4 right-hand sides, 1M test systems
Size chosen to sample ill-conditioned region well
Generated as in Demmel, et al., plus b → x
ccond(A, x, b) = |A−1 | (|A| |x| + |b|) ∞
Single Double
12%
10%
8%
6%
4%
Percent of population
2%
0%
Complex Double Complex
12%
10%
8%
6%
4%
2%
0%
20 220 240 260 280 20 220 240 260 280
Difficulty
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59
19. How?
cberr cferr
20
2−10
2−20 % of systems
0.00%
2−30
Error
0.01%
2−40 0.10%
2−50 1.00%
2−60
25 210 215 220 225 230 235 240 25 210 215 220 225 230 235 240
Difficulty
Carry the intermediate soln. yi to twice the working precision.
Refine the backward error down to nearly ε2 .w
By “forward error ≤ conditioning × backward error”, the
forward error for well-enough conditioned problems is nearly εw .
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 14 / 59
20. How?
cberr cferr
20
2−10
2−20 % of systems
0.00%
2−30
Error
0.01%
2−40 0.10%
2−50 1.00%
2−60
25 210 215 220 225 230 235 240 25 210 215 220 225 230 235 240
Difficulty
Carry the intermediate soln. yi to twice the working precision.
Refine the backward error down to nearly ε2 .w
By “forward error ≤ conditioning × backward error”, the
forward error for well-enough conditioned problems is nearly εw .
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 14 / 59
21. Results: Comparison with xGESVXX
Precision Accepted Rejected
well ill well ill
Single 79% 15% 1% 5%
Single complex 76% 19% 1% 4%
Double 87% 9% 1% 5%
Double complex 85% 11% 1% 3%
Accepted, ill-conditioned systems are those gained by our routine
that xGESVXX rejects.
Rejected, well-conditioned systems are those lost by our routine
but accepted by xGESVXX.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 15 / 59
26. Static pivoting
If a pivot |A(j, j)| < T , perturb up to T by adding
sign(A(j, j)) · (T − |A(j, j)|).
Forcibly increases backward error, decreases element growth
In sparse systems, few updates should occur to an entry.
Large diagonal entries should remain large...
Thresholding heuristics
SuperLU γ · A 1
column-relative γ · max |A(:, j)|
diagonal-relative γ · |A(j, j)|
√
γ = 2−26 ≈ εw , 2−38 , or 2−43 = 210 εw
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 20 / 59
27. Sparse test systems
Matrices are from the UF Collection, chosen from existing
comparisons between SuperLU, MUMPS, and UMFPACK.
Wide range of conditioning and numerical scaling
Compute “True” solutions using a doubled-double-extended
factorization and quad-double-extended refinement with a
modified TAUCS.
Refinement uses LAPACK-style numerical scaling throughout,
but the test systems are generated in the matrix’s given scaling.
Also tested on singular systems; no solutions accepted.
At some point, plan on feeding the “true” solutions into the UF
Collection...
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 21 / 59
28. Sparse normwise conditioning
8%
Percent of population
6%
4%
2%
0%
210 220 230 240 250
Difficulty
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 22 / 59
29. Sparse componentwise conditioning
8%
6%
Percent of population
4%
2%
0%
220 230 240 250 260
Difficulty
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 23 / 59
37. Sparse Matrix to Bipartite Graph to Pivots
Col 1Col 2Col 3Col 4 Col 1Col 2Col 3Col 4
Row 1 Row 1 Col 1 Row 2
Row 2 Row 2 Col 2 Row 3
Row 3 Row 3 Col 3 Row 1
Row 4 Row 4 Col 4 Row 4
Bipartite model
Each row and column is a vertex.
Each explicit entry is an edge.
Want to chose “largest” entries for pivots.
Maximum weight complete bipartite matching:
linear assignment problem
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 31 / 59
38. Mathematical Form
“Just” a linear optimization problem:
B n × n matrix of benefits in ∪ {−∞}, often c + log2 |A|
X n × n permutation matrix: the matching
pr , πc dual variables, will be price and profit
1r , 1c unit entry vectors corresponding to rows, cols
Lin. assignment prob. Dual problem
maximize Tr B T X minimize 1T pr + 1T πc
r c
X∈ n×n pr ,πc
subject to X 1c = 1r , subject to pr 1T + 1r πc ≥ B.
c
T
X T 1r = 1c , and
X ≥ 0.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 32 / 59
39. Mathematical Form
“Just” a linear optimization problem:
B n × n matrix of benefits in ∪ {−∞}, often c + log2 |A|
X n × n permutation matrix: the matching
pr , πc dual variables, will be price and profit
1r , 1c unit entry vectors corresponding to rows, cols
Lin. assignment prob. Dual problem
Implicit form:
T
maximize Tr B X
X∈ n×n minimize 1T pr
r
pr
subject to X 1c = 1r ,
+ max(B(i, j)
X T 1r = 1c , and i∈R
j∈C
X ≥ 0. − pr (j)).
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 32 / 59
40. Do We Need a Special Method?
The LAP: Standard form:
maximize Tr B T X min cT x
X∈ n×n x
subject to X 1c = 1r , subject to Ax = 1r +c , and
T x ≥ 0.
X 1r = 1c , and
X ≥ 0.
A: 2n × τ vertex-edge matrix
Network optimization kills simplex methods.
(“Smoothed analysis” does not apply.)
Interior point algs need to round the solution.
(And need to solve Ax = b for a much larger A, although
theoretically great in NC.)
Combinatorial methods should be faster.
(But unpredictable!)
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 33 / 59
41. Properties from Optimization
Complementary slackness
X c
T
(pr 1T + 1r πc − B) = 0.
If (i, j) is in the matching (X (i, j) = 0), then
pr (i) + πc (j) = B(i, j).
Used to chose matching edges and modify dual variables in
combinatorial algorithms.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 34 / 59
42. Properties from Optimization
Relaxed problem
Introduce a parameter µ, two interpretations:
from a barrier function related to X ≥ 0, or
from the auction algorithm (later).
Then
Tr B T X∗ ≤ 1T pr + 1T πc ≤ Tr B T X∗ + (n − 1)µ,
r c
or the computed dual value (and hence computed primal matching) is
within (n − 1)µ of the optimal primal.
Very useful for finding approximately optimal matchings.
Feasibility bound
Starting from zero prices:
pr (i) ≤ (n − 1)(µ + finite range of B)
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 35 / 59
43. Algorithms for Solving the LAP
Goal: A parallel algorithm that justifies buying big machines.
Acceptable: A distributed algorithm; matrix is on many nodes.
Choices:
Simplex or continuous / interior-point
Plain simplex blows up, network simplex difficult to parallelize.
Rounding for interior point often falls back on matching.
(Optimal IP algorithm: Goldberg, Plotkin, Shmoys, Tardos.
Needs factorization.)
Augmenting-path based (Mc64: Duff and Koster)
Based on depth- or breadth-first search.
Both are P-complete, inherently sequential (Greenlaw, Reif).
Auctions (Bertsekas, et al.)
Only length-1 or -2 alternating paths; global sync for duals.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 36 / 59
44. Auction Algorithms
Discussion will be column-major.
General structure:
1 Each unmatched column finds the “best” row, places a bid.
The dual variable pr holds the prices.
The profit πc is implicit. (No significant FP errors!)
Each entry’s value: benefit B(i, j)− price p(i).
A bid maximally increases the price of the most valuable row.
2 Bids are reconciled.
Highest proposed price wins, forms a match.
Loser needs to re-bid.
Some versions need tie-breaking; here least column.
3 Repeat.
Eventually everyone will be matched, or
some price will be too high.
Seq. implementation in ∼40–50 lines, can compete with Mc64
Some corner cases to handle. . .
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 37 / 59
45. The Bid-Finding Loop
For each unmatched column:
Price
Row Index
Row Entry
value = entry − price
Save largest and second−largest
Bid price incr: diff. in values
Differences from sparse matrix-vector products
Not all columns, rows used every iteration. (sparse matrix,
sparse vector)
Hence output price updates are scattered.
More local work per entry
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 38 / 59
46. The Bid-Finding Loop
For each unmatched column:
Price
Row Index
Row Entry
value = entry − price
Save largest and second−largest
Bid price incr: diff. in values
Little points
Increase bid price by µ to avoid loops
Needs care in floating-point for small µ.
Single adjacent row → ∞ price
Affects feasibility test, computing dual
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 38 / 59
47. Termination
Once a row is matched, it stays matched.
A new bid may swap it to another column.
The matching (primal) increases monotonically.
Prices only increase.
The dual does not change when a row is newly matched.
But the dual may decrease when a row is taken.
The dual decreases monotonically.
Subtle part: If the dual doesn’t decrease. . .
It’s ok. Can show the new edge begins an augmenting path that
increases the matching or an alternating path that decreases the
dual.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 39 / 59
48. Successive Approximation (µ-scaling)
Simple auctions aren’t really competitive with Mc64.
Start with a rough approximation (large µ) and refine.
Called -scaling in the literature, but µ-scaling is better.
Preserve the prices pr at each step, but clear the matching.
Note: Do not clear matches associated with ∞ prices!
Equivalent to finding diagonal scaling Dr ADc and matching
again on the new B.
Problem: Performance strongly depends on initial scaling.
Also depends strongly on hidden parameters.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 40 / 59
50. Sequential performance: Highly variable
Row
Group Name By col (s) By row (s) Col
Bai af23560 0.025 0.028 1.13
FEMLAB poisson3Db 0.014 0.016 1.11
FIDAP ex11 0.060 0.060 1.00
GHS indef cont-300 0.007 0.006 0.84
GHS indef ncvxqp5 0.338 0.318 0.94
Hamm scircuit 0.048 0.047 0.99
Hollinger g7jac200 0.355 0.339 0.95
Mallya lhr14 0.044 0.065 1.47
Schenk IBMSDS 3D 51448 3D 0.031 0.282 9.22
Schenk IBMSDS matrix 9 0.074 0.613 8.29
Schenk ISEI barrier2-4 0.291 0.193 0.66
Vavasis av41092 5.462 4.083 0.75
Zhao Zhao2 1.041 0.609 0.58
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 42 / 59
51. Sequential performance: Highly variable
Int
Group Name Float (s) Int (s) Float
Bai af23560 0.025 0.040 1.61
FEMLAB poisson3Db 0.015 0.016 1.08
FIDAP ex11 0.060 0.029 0.49
GHS indef cont-300 0.007 0.006 0.91
GHS indef ncvxqp5 0.338 0.425 1.26
Hamm scircuit 0.048 0.016 0.34
Hollinger g7jac200 0.355 1.004 2.83
Mallya lhr14 0.044 0.050 1.12
Schenk IBMSDS 3D 51448 3D 0.031 0.020 0.66
Schenk IBMSDS matrix 9 0.074 0.066 0.89
Schenk ISEI barrier2-4 0.291 0.261 0.91
Vavasis av41092 5.462 5.401 0.99
Zhao Zhao2 1.041 2.269 2.18
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 43 / 59
52. Approximately maximum matchings
Terminal µ value
Name 0 5.96e-08 2.44e-04 5.00e-01
af23560 Primal 1342850 1342850 1342850 1342670
Time(s) 0.14 0.05 0.03 0
ratio 0.37 0.21 0.02
poisson3Db Primal 2483070 2483070 2483070 2483070
Time(s) 0.02 0.02 0.02 0.02
ratio 1.01 1.04 1.07
g7jac200 Primal 3533980 3533980 3533980 3533340
Time(s) 2.98 1.07 0.28 0.18
ratio 0.36 0.09 0.06
av41092 Primal 3156210 3156210 3156210 3155920
Time(s) 24.51 8.09 2.48 0.11
ratio 0.33 0.10 0.00
Zhao2 Primal 333891 333891 333891 333487
Time(s) 7.69 2.37 3.65 0.02
ratio 0.31 0.47 0.00
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 44 / 59
53. Setting / Lowering Parallel Expectations
Performance scalability?
Originally proposed (early 1990s) when
cpu speed ≈ memory speed ≈ network speed ≈ slow.
Now:
cpu speed memory latency > network latency.
The number of communication phases dominates matching
algorithms (auction and others).
Communication patterns are very irregular.
Latency and software overhead is not improving. . .
Scaled back goal
It suffices to not slow down much on distributed data.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 45 / 59
59. Iteration order still matters
av41092 shyy161
G
102
G
1
10
Time (s)
Direction
G
G G
G G Row−major
G G G
Col−major
100
G
G
G
−1
10
G
G
G
G G
5 10 15 20 5 10 15 20
Number of Processors
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 50 / 59
60. Many different speed-up profiles
af23560 bmwcra_1
101 G
100
G
G
G
10−1
G
G G
G G G
G G G
G G
10−2
10−3
10−4
Time (s)
garon2 stomach
101
G G
G
100
G
10−1 G G G
G G
G G
−2 G
10
G
G G G
10−3
10−4
5 10 15 20 5 10 15 20
Number of Processors
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 51 / 59
61. So what happens in some cases?
Matrix av41092 has one large strongly connected component.
(The square blocks in a Dulmage-Mendelsohn decomposition.)
The SCC spans all the processors.
Every edge in an SCC is a part of some complete matching.
Horrible performance from:
starting along a non-max-weight matching,
making it almost complete,
then an edge-by-edge search for nearby matchings,
requiring a communication phase almost per edge.
Conjecture: This type of performance land-mine will affect any
0-1 combinatorial algorithm.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 52 / 59
62. Improvements?
Approximate matchings: Speeds up the sequential case,
eliminating any “speed-up.”
Rearranging deck chairs: few-to-few communication
Build a directory of which nodes share rows: collapsed BB T .
Send only to/from those neighbors.
Minor improvement over MPI Allgatherv for a huge effort.
Latency not a major factor...
Improving communication may not be worth it. . .
The real problem is the number of comm. phases.
If diagonal is the matching, everything is overhead.
Or if there’s a large SCC. . .
Another alternative: Multiple algorithms at once.
Run Bora U¸ar’s alg. on one set of nodes, auction on another,
c
transposed auction on another, . . .
Requires some painful software engineering.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 53 / 59
63. Latency not a dominating factor
103
Speed−up relative to reducing to the root node
102
101
100
10−1
1x3 3x1 1x8 2x4
Number of nodes x number of procs. per node
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 54 / 59
64. So, Could This Ever Be Parallel?
For a given matrix-processor layout, constructing a matrix
requiring O(n) communication is pretty easy for combinatorial
algorithms.
Force almost every local action to be undone at every step.
Non-fractional combinatorial algorithms are too restricted.
Using less-restricted optimization methods is promising, but far
slower sequentially.
Existing algs (Goldberg, et al.) are PRAM with n3 processors.
General purpose methods: Cutting planes, successive SDPs
Someone clever might find a parallel rounding algorithm.
Solving the fractional LAP quickly would become a matter of
finding a magic preconditioner. . .
Maybe not a good thing for a direct method?
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 55 / 59
65. Review of contributions
Iterative refinement
Successfully deliver dependable solutions with a little extra
precision.
Removed need for condition estimation.
Built methodology for evaluating Ax = b solution methods’
accuracy and dependability.
Static pivoting
Tuned static pivoting heuristics to provide dependability.
Demonstrated that an approximate maximum weight bipartite
matching is faster and just as dependable.
Developed a memory-scalable (although not
performance-scalable) distributed memory auction algorithm for
static pivoting.
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 56 / 59
66. Future directions
Iterative refinement
Least-squares refinement demonstrated (Demmel, Hida, Li, &
Riedy), but needs... refinement.
Perhaps refinement could render an iterative method
dependable. Could improve accuracy of Ady i = ri with extra
iterations as i increases.
Could help build trust in new methods (e.g. CALU).
Distributed matching
Interesting software problem: Run multiple algorithms on
portions of a parallel allotment. How do you signal the others to
terminate?
Interesting algorithm problem: Is there an efficient rounding
method for fractional / interior point algorithms?
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 57 / 59
68. Bounds
Backward error
Di−1 ri ∞ ≤ (c − ρ)−1 (3(nd + 1)εr + εx )
¯
Here nd is an expression of size, c is the upper bound on per-iteration
decrease, and ρ is a safety factor for the region around 1/εw .
¯
Forward error
Di−1 ei ∞ 2(4 + ρ(nd + 1))εw · (c − ρ)−1
¯ ¯
Assuming εr ≤ ε2 , εx ≤ ε2 . Using only one precision, εr = εx = εw ,
w w
(c − ρ) Di−1 ei
¯ ∞ 2(5 + 2(nd + 1) ccond(A, yi ))εd .
Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 59 / 59