SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Dependable direct solutions for linear systems
       using a little extra precision

                            E. Jason Riedy
                       jason.riedy@cc.gatech.edu

                       CSE Division, College of Computing
                         Georgia Institute of Technology


                             21 August, 2009




Jason Riedy (GATech)             Dependable solver          21 Aug, 2009   1 / 28
Ax = b
Workhorse in scientific computing
    Two primary linear algebra problems: Ax = b, Av = λv
    Many applications reduce problems into those, often Ax = b.
            PDEs: Discretize to one of the above.
            Optimization: Solve one at each step.
            ...
    Commercial supercomputers are built for Ax = b: Linpack

      Many people work to solve Ax = b faster.
         Today’s focus is solving it better.

 (I’m oversimplifying in many ways. And better can lead to faster.)

   Jason Riedy (GATech)          Dependable solver      21 Aug, 2009   2 / 28
Outline


1   What do I mean by “better”?

2   Refining to more accurate solutions with extra precision

3   Other applications of better: faster, more scalable


Most of this work was done at/with UC Berkeley in conjunction with
Yozo Hida, Dr. James Demmel, Dr. Xiaoye Li (LBL), and a long
sequence of then-undergrads (M. Vishvanath, D. Lu, D. Halligan, ...).


    Jason Riedy (GATech)       Dependable solver          21 Aug, 2009   3 / 28
Errors in Ax = b
                                                                                                          x (0)
                                                                 ˜ −1 ˜
                                                                 A b
                                    ˜ ˜
                                   (A, b)

                                   (A, b)
                                                                 A −1b                                    x
                           2^−10                                                             2^−5
          Backward Error




                                                                            Forward Error
                           2^−20                                                            2^−10


                           2^−30                                                            2^−15


                           2^−40                                                            2^−20


                           2^−50                                                            2^−25


                                   2^10     2^20       2^30                                         2^5    2^10   2^15   2^20   2^25    2^30

                                          Difficulty                                                              Difficulty



        backward error (berr)                                             forward error (ferr)
  Jason Riedy (GATech)                                        Dependable solver                                                        21 Aug, 2009   4 / 28
Goals for errors in Ax = b
                                                                                                           x (0)
                                                                  ˜ −1 ˜
                                                                  A b
                                     ˜ ˜
                                    (A, b)                                                    x (k)

                                    (A, b)                                                                         x (k)
                                                                  A −1b                                    x
                            2^−10                                                             2^−5
           Backward Error




                                                                             Forward Error
                            2^−20                                                            2^−10


                            2^−30                                                            2^−15


                            2^−40                                                            2^−20


                            2^−50                                                            2^−25


                                    2^10     2^20       2^30                                         2^5    2^10   2^15   2^20   2^25    2^30

                                           Difficulty                                                              Difficulty



         backward error (berr)                                             forward error (ferr)
   Jason Riedy (GATech)                                        Dependable solver                                                        21 Aug, 2009   5 / 28
Possible methods
    Interval arithmetic
           Tells you when there’s a problem, not how to solve it.
           Finding the optimal enclosure is NP-hard!
    Exact / rational arithmetic
           Storage (& computation) grows exponentially with dimension.
    Telescoping precisions (increase precision throughout)
           Increases the cost of the O(n3 ) portion.
    Iterative refinement
           O(n2 ) extra work after O(n3 ) factorization.
           Only a little extra precision necessary!
           Downside: Dependable, but not validated.

Dependable solver
 Reduce the error to the precision’s limit as often as reasonable, or
            clearly indicate when the result is unsure.
  Jason Riedy (GATech)           Dependable solver           21 Aug, 2009   6 / 28
What I’m not going to explain deeply
   Precise definition of difficulty:
           A condition number relevant to the error in consideration, or,
           roughly, the error measure’s sensitivity to perturbation near the
           solution.
   Numerical scaling / equilibration:
           Assume all numbers in the input are roughly in the same scale.
           Rarely true for computer-produced problems.
           Common cases easy to handle; obscures the important points.
           Note: Poor scaling produces simple ill-conditioning.
   Details of when each error measure is appropriate.
           Backward: normwise, columnwise, componentwise, ...
           Forward: normwise, componentwise, ...

           All three are inter-linked and address norms.
  Jason Riedy (GATech)           Dependable solver             21 Aug, 2009   7 / 28
Outline



1   What do I mean by “better”?


2   Refining to more accurate solutions with extra precision


3   Other applications of better: faster, more scalable




    Jason Riedy (GATech)       Dependable solver          21 Aug, 2009   8 / 28
Iterative refinement

Newton’s method for Ax = b
Assume A is n × n, non-singular, factored PA = LU, etc.
  1 Solve Ax (0) = b
  2 Repeat for i = 0, ...:
        1   Compute residual r (i) = b − Ax (i) .
        2   (Check backward error criteria)
        3   Solve Adx (i) = r (i) .
        4   (Check forward error criteria)
        5   Update x (i+1) = x (i) + dx (i) .

    Overall algorithm is well-known (Forsythe & Moler, 1967...).
    In exact arithmetic, would converge in one step.

   Jason Riedy (GATech)            Dependable solver      21 Aug, 2009   9 / 28
Iterative refinement

Newton’s method for Ax = b
Assume A is n × n, non-singular, factored PA = LU, etc.
  1 Solve Ax (0) = b
  2 Repeat for i = 0, ...:
        1   Compute residual r (i) = b − Ax (i) . (Using double precision.)
        2   (Check backward error criteria)
        3   Solve Adx (i) = r (i) . (Using working/single precision.)
        4   (Check forward error criteria)
        5   Update x (i+1) = x (i) + dx (i) . (New: x with double precision.)

    No extra precision: Reduce backward error in one step [Skeel].
    A bit of double precision: Reduce errors much, much further.

   Jason Riedy (GATech)          Dependable solver             21 Aug, 2009   9 / 28
Why should this work?
A brief, informal excursion into the analysis...


                             r (i) = b − Ax (i) +δr (i)
                             (A +δA(i) )dx (i) = r (i)
                             x (i+1) = x (i) + dx (i) +δx (i+1)

Very roughly (not correct, approximating behavior, see Lawn165):

Backward Error (Residual)
                      r (i+1) ≈ εw A A−1 r (i) + A δx (i) + δr (i)

Forward Error
                    e (i+1) ≈ εw A−1 A e (i) + δx (i) + A−1 δr (i)
   Jason Riedy (GATech)               Dependable solver              21 Aug, 2009   10 / 28
Test cases

    One million random, single-precision, 30 × 30 systems Ax = b
            250k: A generated to cover factorization difficulty
            Four (x, b), two with random x and two with random b
            Solve for true x using greater than quad precision.
            Working precision: εw = 2−24 ≈ 6 × 10−8
            Extra / double precision: εx = 2−53 ≈ 10−16
    Using single precision and small because
            generating and running one million tests takes time, and also
            it’s easier to hit difficult cases!
                                                                √
    Results apply to double, complex & double complex (with 2 2
    factor). Also on tests (fewer) running beyond 1k × 1k.
    Note: Same plots apply to sparse matrices in various collections,
    but far fewer than 1M test cases.

   Jason Riedy (GATech)          Dependable solver            21 Aug, 2009   11 / 28
Backward error results (before)
                                                                             2^10     2^20         2^30

                                         All working                                All double


                  2^−10
 Backward Error




                  2^−20


                  2^−30


                  2^−40


                  2^−50


                                 2^10      2^20        2^30

                                                                Difficulty



                   Omitting double-prec residuals; same limiting error as all
                   working.
                   All-double backward error is for the double-prec x.
                  Jason Riedy (GATech)                  Dependable solver                        21 Aug, 2009   12 / 28
Backward error results (after)
                                                                             2^10     2^20         2^30

                                         All working                                All double


                  2^−10
 Backward Error




                  2^−20


                  2^−30


                  2^−40


                  2^−50


                                 2^10      2^20        2^30

                                                                Difficulty



                   Omitting double-prec residuals; same limiting error as all
                   working.
                   All-double backward error is for the double-prec x.
                  Jason Riedy (GATech)                  Dependable solver                        21 Aug, 2009   13 / 28
Forward error results (before)
                                                                        2^5   2^10   2^15    2^20       2^25       2^30

                                         All working                                 All double


                  2^−5
 Forward Error




                 2^−10


                 2^−15


                 2^−20


                 2^−25


                           2^5    2^10   2^15    2^20   2^25     2^30

                                                                Difficulty



                  Omitting double-prec residuals; same limiting error as all
                  working.
                  All-double forward error is for the single-prec x.
                 Jason Riedy (GATech)                   Dependable solver                           21 Aug, 2009          14 / 28
Forward error results (after)
                                                                        2^5   2^10   2^15    2^20       2^25       2^30

                                         All working                                 All double


                  2^−5
 Forward Error




                 2^−10


                 2^−15


                 2^−20


                 2^−25


                           2^5    2^10   2^15    2^20   2^25     2^30

                                                                Difficulty



                  Omitting double-prec residuals; same limiting error as all
                  working.
                  All-double forward error is for the single-prec x.
                 Jason Riedy (GATech)                   Dependable solver                           21 Aug, 2009          15 / 28
Iteration costs: backward error
                      Convergence to εw                                                       Convergence to 10εw
                           All working                                                               All working
                           Residual double                                                           Residual double
                           All double                                                                All double


                1.0                                                                     1.0

                0.8                                                                     0.8
Empirical CDF




                                                                        Empirical CDF
                0.6                                                                     0.6

                0.4                                                                     0.4

                0.2                                                                     0.2

                0.0                                                                     0.0

                      0    5    10      15   20   25   30                                      1.0     1.5     2.0       2.5        3.0

                               Convergence step                                                         Convergence step



Practical: Stop when backward error is tiny or makes little progress.

                 Jason Riedy (GATech)                       Dependable solver                                        21 Aug, 2009         16 / 28
Iteration costs: backward error
                      Convergence to ε2
                                      w                                                        Convergence to 10ε2
                                                                                                                 w
                             All working                                                              All working
                             Residual double                                                          Residual double
                             All double                                                               All double


                1.0                                                                      1.0

                0.8                                                                      0.8
Empirical CDF




                                                                         Empirical CDF
                0.6                                                                      0.6

                0.4                                                                      0.4

                0.2                                                                      0.2

                0.0                                                                      0.0

                         5      10      15   20    25   30                                        5      10    15       20   25    30

                                Convergence step                                                         Convergence step



Practical: Stop when backward error is tiny or makes little progress.

                 Jason Riedy (GATech)                        Dependable solver                                      21 Aug, 2009        17 / 28
Iteration costs: forward error
                                                                                                                        √
                      Convergence to εw                                                    Convergence to                N · εw
                              All working                                                            All working
                              Residual double                                                        Residual double
                              All double                                                             All double

                1.0                                                                        1.0

                0.8                                                                        0.8
Empirical CDF




                                                                           Empirical CDF
                0.6                                                                        0.6

                0.4                                                                        0.4

                0.2                                                                        0.2

                0.0                                                                        0.0

                          5      10     15      20   25   30                                     0   5    10   15      20   25     30
                                 Convergence step                                                        Convergence step



Practical: Stop when dx is tiny or makes little progress.

                 Jason Riedy (GATech)                          Dependable solver                                    21 Aug, 2009        18 / 28
Performance costs
           Overhead each phase by precision and type

                                                      10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5

                             complex                                complex
                              single                                 double

                                                                                                  6

                                                                                                  4

                                                                                                  2
                                                                                                      Itanium 2
Overhead




                                real                                  real
                                                                                                         Relatively balanced
                               single                                double
                                                                                                         cpu / mem arch.
           6
                                                                                                         Double faster than
           4
                                                                                                         single
           2


               10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5

                                               Dimension
                            Factorization                     + Condition numbers
                            + Refinement loop                 + Ref. + Cond.

Overhead is time for phase (incl. fact.) / time for factorization



               Jason Riedy (GATech)                                         Dependable solver                     21 Aug, 2009   19 / 28
Performance costs
           Overhead each phase by precision and type

                                                      10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5

                             complex                                complex
                              single                                 double
                                                                                                  8

                                                                                                  6   Xeon 3ghz
                                                                                                  4
                                                                                                         Horribly unbalanced
                                                                                                  2
                                                                                                         cpu / mem arch.
Overhead




                                real                                  real
                               single                                double
           8
                                                                                                         (Not parallel)
           6                                                                                             Vector instructions
           4

           2
                                                                                                         No vectorization in
                                                                                                         extra precision ops.
               10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5

                                               Dimension
                            Factorization                     + Condition numbers
                            + Refinement loop                 + Ref. + Cond.

Overhead is time for phase (incl. fact.) / time for factorization



               Jason Riedy (GATech)                                         Dependable solver                     21 Aug, 2009   20 / 28
Outline



1   What do I mean by “better”?


2   Refining to more accurate solutions with extra precision


3   Other applications of better: faster, more scalable




    Jason Riedy (GATech)       Dependable solver          21 Aug, 2009   21 / 28
Obvious applications of better
Available in Lapack
    Routines SGESVXX, DGESVXX, CGESVXX, ZGESVXX
    Experimental interface, subject to changes

High-level environments
    Do you want to think about all error conditions all the time?
    Should be in Octave & MatlabTM :
                             x = Ab;
    The same technique applies to overdetermined least-squares
    [Lawn188; Demmel, Hida, Li, Riedy]. R or S+ (statistics):
                      model <- lm(response~var)
                                              A αI     x    b
            Refine the augmented system                    =   . [Bj¨rck]
                                                                   o
                                              0 AT   r /α   0
   Jason Riedy (GATech)        Dependable solver            21 Aug, 2009   22 / 28
Not so obvious application: Speed!

When single precision is much faster than double...
    Assume: Targeting backward error, often well-conditioned
    Factor A in single precision, use for Adxi = r .
    Refine to dp backward error, or fall back to using dp overall.
    Earlier Cell (extra slow double): 12 Gflop/s ⇒ 150 Gflop/s!
    [Lawn175; Langou2 , Luszczek, Kurzak, Buttari, Dongarra]
    (Independent path to the same destination.)

When single precision fits more into memory...
    Sparse, sparse out-of-core
            Generally limited by indexing performance [Hogg & Scott]
            Could use packed data structures from Cell [Williams, et al.]

   Jason Riedy (GATech)          Dependable solver            21 Aug, 2009   23 / 28
Not so obvious application: Scalability!
When pivoting is a major bottleneck...
    Sparse, unsymmetric LU factorization:
            Completely separate structural analysis from numerical work.
            Introduce backward errors to avoid entry growth.
            Fix with refinement.
            (SuperLU [Demmel, Li, (+ me)], earlier sym.indef. work)

When pivoting blocks practical theory...
    Communication-optimal algorithms for O(n3 ) linear algebra
            Trade some computation for optimal memory transfers / comm.
            [Lawn218; Ballard, Demmel, Holtz, Schwartz]
            Codes exist, are fast, etc.
    But LU cannot use partial pivoting!
            Use a new strategy [Demmel, Grigori, Xiang], refine...
   Jason Riedy (GATech)          Dependable solver           21 Aug, 2009   24 / 28
Summary


   We can construct an inexpensive, dependable solver for Ax = b.
           Compute an accurate answer whenever feasible.
           Reliably detect failures / unsure, even for the forward error.
   We can compute better results for Ax = b.
           Trade some computation, a little bandwidth for accuracy.
           Important bit is keeping all the limiting terms (residual,
           solution) to extra precision
   Better results can help solve Ax = b more quickly.
           Start with a sloppy solver and fix it.




  Jason Riedy (GATech)           Dependable solver             21 Aug, 2009   25 / 28
Questions / Backup




  Jason Riedy (GATech)   Dependable solver   21 Aug, 2009   26 / 28
Doubled-precision
    Represent a ◦ b exactly as a pair (h, t).
    Old algorithms [Knuth, Dekker, Linnainmaa, Kahan; 60s & 70s]
    Work on any faithful arithmetic [Priest]

 Addition                                Multiplication
       h =a+b                                     h =a·b
       z =h−a                                     (ah, at) = split(a)
       t = (a − (h − z)) + (b − z)                (bh, bt) = split(b)
                                                  t = ah · at − h
                                                  t = ((t + (ah ∗ bt)) +
                                                  (at ∗ bh)) + (at ∗ bt)

See qd package from [Bailey, Hida, Li]; recent pubs from [Rump,
Ogita, Oishi].
   Jason Riedy (GATech)       Dependable solver                 21 Aug, 2009   27 / 28
Iteration costs: backward error to double
                      Convergence to εx                                                        Convergence to 10εx
                             All working                                                              All working
                             Residual double                                                          Residual double
                             All double                                                               All double


                1.0                                                                      1.0

                0.8                                                                      0.8
Empirical CDF




                                                                         Empirical CDF
                0.6                                                                      0.6

                0.4                                                                      0.4

                0.2                                                                      0.2

                0.0                                                                      0.0

                         5      10      15   20    25   30                                        5      10   15    20      25     30

                                Convergence step                                                         Convergence step



Practical: Stop when backward error is tiny or makes little progress.

                 Jason Riedy (GATech)                        Dependable solver                                      21 Aug, 2009        28 / 28

Weitere ähnliche Inhalte

Was ist angesagt?

Mid term examination -2011 class vii
Mid term examination -2011 class viiMid term examination -2011 class vii
Mid term examination -2011 class vii
Asad Shafat
 
Datamining 8th hclustering
Datamining 8th hclusteringDatamining 8th hclustering
Datamining 8th hclustering
sesejun
 
Datamining 8th Hclustering
Datamining 8th HclusteringDatamining 8th Hclustering
Datamining 8th Hclustering
sesejun
 
Final examination 2011 class vi
Final examination 2011 class viFinal examination 2011 class vi
Final examination 2011 class vi
Asad Shafat
 
2011 10 lyp_mathematics_sa1_15 (1)
2011 10 lyp_mathematics_sa1_15 (1)2011 10 lyp_mathematics_sa1_15 (1)
2011 10 lyp_mathematics_sa1_15 (1)
Tarun Gehlot
 
CXC -maths-2009-p1
CXC -maths-2009-p1CXC -maths-2009-p1
CXC -maths-2009-p1
leroy walker
 
Multiple choice one
Multiple choice oneMultiple choice one
Multiple choice one
leroy walker
 
Mid term paper Maths class viii 2011
Mid term paper Maths class viii 2011Mid term paper Maths class viii 2011
Mid term paper Maths class viii 2011
Asad Shafat
 
Mid term examination -2011 class viii
Mid term examination -2011 class viiiMid term examination -2011 class viii
Mid term examination -2011 class viii
Asad Shafat
 

Was ist angesagt? (20)

January 2011
January 2011January 2011
January 2011
 
Math
MathMath
Math
 
CG OpenGL vectors geometric & transformations-course 5
CG OpenGL vectors geometric & transformations-course 5CG OpenGL vectors geometric & transformations-course 5
CG OpenGL vectors geometric & transformations-course 5
 
Mid term examination -2011 class vii
Mid term examination -2011 class viiMid term examination -2011 class vii
Mid term examination -2011 class vii
 
cxc.Mathsexam1
cxc.Mathsexam1cxc.Mathsexam1
cxc.Mathsexam1
 
Datamining 8th hclustering
Datamining 8th hclusteringDatamining 8th hclustering
Datamining 8th hclustering
 
Datamining 8th Hclustering
Datamining 8th HclusteringDatamining 8th Hclustering
Datamining 8th Hclustering
 
Tonnage Uncertainty Assessment of Vein Type Deposits
Tonnage Uncertainty Assessment of Vein Type DepositsTonnage Uncertainty Assessment of Vein Type Deposits
Tonnage Uncertainty Assessment of Vein Type Deposits
 
Final examination 2011 class vi
Final examination 2011 class viFinal examination 2011 class vi
Final examination 2011 class vi
 
2011 10 lyp_mathematics_sa1_15 (1)
2011 10 lyp_mathematics_sa1_15 (1)2011 10 lyp_mathematics_sa1_15 (1)
2011 10 lyp_mathematics_sa1_15 (1)
 
CXC -maths-2009-p1
CXC -maths-2009-p1CXC -maths-2009-p1
CXC -maths-2009-p1
 
Multiple choice one
Multiple choice oneMultiple choice one
Multiple choice one
 
Mid term paper Maths class viii 2011
Mid term paper Maths class viii 2011Mid term paper Maths class viii 2011
Mid term paper Maths class viii 2011
 
Mid term paper of Maths class VI 2011 Fazaia Inter college
Mid term  paper of Maths class  VI 2011 Fazaia Inter collegeMid term  paper of Maths class  VI 2011 Fazaia Inter college
Mid term paper of Maths class VI 2011 Fazaia Inter college
 
Final examination 2011 class viii
Final examination 2011 class viiiFinal examination 2011 class viii
Final examination 2011 class viii
 
Mid term examination -2011 class viii
Mid term examination -2011 class viiiMid term examination -2011 class viii
Mid term examination -2011 class viii
 
MCQ's for class 7th
MCQ's for class 7thMCQ's for class 7th
MCQ's for class 7th
 
June 2005
June 2005June 2005
June 2005
 
Mid term examination -2011 class vi
Mid term examination -2011 class viMid term examination -2011 class vi
Mid term examination -2011 class vi
 
Keynote at 18th International Conference on Cooperative Information Systems (...
Keynote at 18th International Conference on Cooperative Information Systems (...Keynote at 18th International Conference on Cooperative Information Systems (...
Keynote at 18th International Conference on Cooperative Information Systems (...
 

Andere mochten auch

STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
 
1С:Аналитика - Документооборот
1С:Аналитика - Документооборот1С:Аналитика - Документооборот
1С:Аналитика - Документооборот
bitkiev
 
5 Array
5 Array5 Array
5 Array
Cuong
 

Andere mochten auch (20)

Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
 
Micro-benchmarking the Tera MTA
Micro-benchmarking the Tera MTAMicro-benchmarking the Tera MTA
Micro-benchmarking the Tera MTA
 
Sparse Data Structures for Weighted Bipartite Matching
Sparse Data Structures for Weighted Bipartite Matching Sparse Data Structures for Weighted Bipartite Matching
Sparse Data Structures for Weighted Bipartite Matching
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
Graph Exploitation Seminar, 2011
Graph Exploitation Seminar, 2011Graph Exploitation Seminar, 2011
Graph Exploitation Seminar, 2011
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACKFuture of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
 
Auctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) MatchingsAuctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) Matchings
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
NYU Center for Publishing
NYU Center for PublishingNYU Center for Publishing
NYU Center for Publishing
 
Market
MarketMarket
Market
 
Influenza
InfluenzaInfluenza
Influenza
 
1С:Аналитика - Документооборот
1С:Аналитика - Документооборот1С:Аналитика - Документооборот
1С:Аналитика - Документооборот
 
5 Array
5 Array5 Array
5 Array
 
Making Visual Narratives
Making Visual NarrativesMaking Visual Narratives
Making Visual Narratives
 

Ähnlich wie Dependable direct solutions for linear systems using a little extra precision

Math 223 Disclaimer It is not a good idea.docx
    Math 223   Disclaimer It is not a good idea.docx    Math 223   Disclaimer It is not a good idea.docx
Math 223 Disclaimer It is not a good idea.docx
joyjonna282
 
UPSEE - Mathematics -2000 Unsolved Paper
UPSEE - Mathematics -2000 Unsolved PaperUPSEE - Mathematics -2000 Unsolved Paper
UPSEE - Mathematics -2000 Unsolved Paper
Vasista Vinuthan
 
Lecture6
Lecture6Lecture6
Lecture6
voracle
 

Ähnlich wie Dependable direct solutions for linear systems using a little extra precision (19)

Lesson 26: Optimization II: Data Fitting
Lesson 26: Optimization II: Data FittingLesson 26: Optimization II: Data Fitting
Lesson 26: Optimization II: Data Fitting
 
Shuwang Li Moving Interface Modeling and Computation
Shuwang Li Moving Interface Modeling and ComputationShuwang Li Moving Interface Modeling and Computation
Shuwang Li Moving Interface Modeling and Computation
 
Lesson 22: Quadratic Forms
Lesson 22: Quadratic FormsLesson 22: Quadratic Forms
Lesson 22: Quadratic Forms
 
Math 223 Disclaimer It is not a good idea.docx
    Math 223   Disclaimer It is not a good idea.docx    Math 223   Disclaimer It is not a good idea.docx
Math 223 Disclaimer It is not a good idea.docx
 
UPSEE - Mathematics -2000 Unsolved Paper
UPSEE - Mathematics -2000 Unsolved PaperUPSEE - Mathematics -2000 Unsolved Paper
UPSEE - Mathematics -2000 Unsolved Paper
 
HATARI: Raising Risk Awareness
HATARI: Raising Risk AwarenessHATARI: Raising Risk Awareness
HATARI: Raising Risk Awareness
 
LISTA DE EXERCÍCIOS - OPERAÇÕES COM NÚMEROS REAIS
LISTA DE EXERCÍCIOS - OPERAÇÕES COM NÚMEROS REAISLISTA DE EXERCÍCIOS - OPERAÇÕES COM NÚMEROS REAIS
LISTA DE EXERCÍCIOS - OPERAÇÕES COM NÚMEROS REAIS
 
Lecture6
Lecture6Lecture6
Lecture6
 
Lesson 11: Limits and Continuity
Lesson 11: Limits and ContinuityLesson 11: Limits and Continuity
Lesson 11: Limits and Continuity
 
Notes 5-2
Notes 5-2Notes 5-2
Notes 5-2
 
June 2009
June 2009June 2009
June 2009
 
Presentation
PresentationPresentation
Presentation
 
Polish nootation
Polish nootationPolish nootation
Polish nootation
 
Ch09
Ch09Ch09
Ch09
 
Ch09
Ch09Ch09
Ch09
 
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 Download
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 DownloadAerospace Engineering (AE) - Gate Previous Question Paper 2011 Download
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 Download
 
Lesson 21: Curve Sketching (Section 10 version)
Lesson 21: Curve Sketching (Section 10 version)Lesson 21: Curve Sketching (Section 10 version)
Lesson 21: Curve Sketching (Section 10 version)
 
AMU - Mathematics - 2002
AMU - Mathematics  - 2002AMU - Mathematics  - 2002
AMU - Mathematics - 2002
 
Hat04 0205
Hat04 0205Hat04 0205
Hat04 0205
 

Mehr von Jason Riedy

Mehr von Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Dependable direct solutions for linear systems using a little extra precision

  • 1. Dependable direct solutions for linear systems using a little extra precision E. Jason Riedy jason.riedy@cc.gatech.edu CSE Division, College of Computing Georgia Institute of Technology 21 August, 2009 Jason Riedy (GATech) Dependable solver 21 Aug, 2009 1 / 28
  • 2. Ax = b Workhorse in scientific computing Two primary linear algebra problems: Ax = b, Av = λv Many applications reduce problems into those, often Ax = b. PDEs: Discretize to one of the above. Optimization: Solve one at each step. ... Commercial supercomputers are built for Ax = b: Linpack Many people work to solve Ax = b faster. Today’s focus is solving it better. (I’m oversimplifying in many ways. And better can lead to faster.) Jason Riedy (GATech) Dependable solver 21 Aug, 2009 2 / 28
  • 3. Outline 1 What do I mean by “better”? 2 Refining to more accurate solutions with extra precision 3 Other applications of better: faster, more scalable Most of this work was done at/with UC Berkeley in conjunction with Yozo Hida, Dr. James Demmel, Dr. Xiaoye Li (LBL), and a long sequence of then-undergrads (M. Vishvanath, D. Lu, D. Halligan, ...). Jason Riedy (GATech) Dependable solver 21 Aug, 2009 3 / 28
  • 4. Errors in Ax = b x (0) ˜ −1 ˜ A b ˜ ˜ (A, b) (A, b) A −1b x 2^−10 2^−5 Backward Error Forward Error 2^−20 2^−10 2^−30 2^−15 2^−40 2^−20 2^−50 2^−25 2^10 2^20 2^30 2^5 2^10 2^15 2^20 2^25 2^30 Difficulty Difficulty backward error (berr) forward error (ferr) Jason Riedy (GATech) Dependable solver 21 Aug, 2009 4 / 28
  • 5. Goals for errors in Ax = b x (0) ˜ −1 ˜ A b ˜ ˜ (A, b) x (k) (A, b) x (k) A −1b x 2^−10 2^−5 Backward Error Forward Error 2^−20 2^−10 2^−30 2^−15 2^−40 2^−20 2^−50 2^−25 2^10 2^20 2^30 2^5 2^10 2^15 2^20 2^25 2^30 Difficulty Difficulty backward error (berr) forward error (ferr) Jason Riedy (GATech) Dependable solver 21 Aug, 2009 5 / 28
  • 6. Possible methods Interval arithmetic Tells you when there’s a problem, not how to solve it. Finding the optimal enclosure is NP-hard! Exact / rational arithmetic Storage (& computation) grows exponentially with dimension. Telescoping precisions (increase precision throughout) Increases the cost of the O(n3 ) portion. Iterative refinement O(n2 ) extra work after O(n3 ) factorization. Only a little extra precision necessary! Downside: Dependable, but not validated. Dependable solver Reduce the error to the precision’s limit as often as reasonable, or clearly indicate when the result is unsure. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 6 / 28
  • 7. What I’m not going to explain deeply Precise definition of difficulty: A condition number relevant to the error in consideration, or, roughly, the error measure’s sensitivity to perturbation near the solution. Numerical scaling / equilibration: Assume all numbers in the input are roughly in the same scale. Rarely true for computer-produced problems. Common cases easy to handle; obscures the important points. Note: Poor scaling produces simple ill-conditioning. Details of when each error measure is appropriate. Backward: normwise, columnwise, componentwise, ... Forward: normwise, componentwise, ... All three are inter-linked and address norms. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 7 / 28
  • 8. Outline 1 What do I mean by “better”? 2 Refining to more accurate solutions with extra precision 3 Other applications of better: faster, more scalable Jason Riedy (GATech) Dependable solver 21 Aug, 2009 8 / 28
  • 9. Iterative refinement Newton’s method for Ax = b Assume A is n × n, non-singular, factored PA = LU, etc. 1 Solve Ax (0) = b 2 Repeat for i = 0, ...: 1 Compute residual r (i) = b − Ax (i) . 2 (Check backward error criteria) 3 Solve Adx (i) = r (i) . 4 (Check forward error criteria) 5 Update x (i+1) = x (i) + dx (i) . Overall algorithm is well-known (Forsythe & Moler, 1967...). In exact arithmetic, would converge in one step. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 9 / 28
  • 10. Iterative refinement Newton’s method for Ax = b Assume A is n × n, non-singular, factored PA = LU, etc. 1 Solve Ax (0) = b 2 Repeat for i = 0, ...: 1 Compute residual r (i) = b − Ax (i) . (Using double precision.) 2 (Check backward error criteria) 3 Solve Adx (i) = r (i) . (Using working/single precision.) 4 (Check forward error criteria) 5 Update x (i+1) = x (i) + dx (i) . (New: x with double precision.) No extra precision: Reduce backward error in one step [Skeel]. A bit of double precision: Reduce errors much, much further. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 9 / 28
  • 11. Why should this work? A brief, informal excursion into the analysis... r (i) = b − Ax (i) +δr (i) (A +δA(i) )dx (i) = r (i) x (i+1) = x (i) + dx (i) +δx (i+1) Very roughly (not correct, approximating behavior, see Lawn165): Backward Error (Residual) r (i+1) ≈ εw A A−1 r (i) + A δx (i) + δr (i) Forward Error e (i+1) ≈ εw A−1 A e (i) + δx (i) + A−1 δr (i) Jason Riedy (GATech) Dependable solver 21 Aug, 2009 10 / 28
  • 12. Test cases One million random, single-precision, 30 × 30 systems Ax = b 250k: A generated to cover factorization difficulty Four (x, b), two with random x and two with random b Solve for true x using greater than quad precision. Working precision: εw = 2−24 ≈ 6 × 10−8 Extra / double precision: εx = 2−53 ≈ 10−16 Using single precision and small because generating and running one million tests takes time, and also it’s easier to hit difficult cases! √ Results apply to double, complex & double complex (with 2 2 factor). Also on tests (fewer) running beyond 1k × 1k. Note: Same plots apply to sparse matrices in various collections, but far fewer than 1M test cases. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 11 / 28
  • 13. Backward error results (before) 2^10 2^20 2^30 All working All double 2^−10 Backward Error 2^−20 2^−30 2^−40 2^−50 2^10 2^20 2^30 Difficulty Omitting double-prec residuals; same limiting error as all working. All-double backward error is for the double-prec x. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 12 / 28
  • 14. Backward error results (after) 2^10 2^20 2^30 All working All double 2^−10 Backward Error 2^−20 2^−30 2^−40 2^−50 2^10 2^20 2^30 Difficulty Omitting double-prec residuals; same limiting error as all working. All-double backward error is for the double-prec x. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 13 / 28
  • 15. Forward error results (before) 2^5 2^10 2^15 2^20 2^25 2^30 All working All double 2^−5 Forward Error 2^−10 2^−15 2^−20 2^−25 2^5 2^10 2^15 2^20 2^25 2^30 Difficulty Omitting double-prec residuals; same limiting error as all working. All-double forward error is for the single-prec x. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 14 / 28
  • 16. Forward error results (after) 2^5 2^10 2^15 2^20 2^25 2^30 All working All double 2^−5 Forward Error 2^−10 2^−15 2^−20 2^−25 2^5 2^10 2^15 2^20 2^25 2^30 Difficulty Omitting double-prec residuals; same limiting error as all working. All-double forward error is for the single-prec x. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 15 / 28
  • 17. Iteration costs: backward error Convergence to εw Convergence to 10εw All working All working Residual double Residual double All double All double 1.0 1.0 0.8 0.8 Empirical CDF Empirical CDF 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0 5 10 15 20 25 30 1.0 1.5 2.0 2.5 3.0 Convergence step Convergence step Practical: Stop when backward error is tiny or makes little progress. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 16 / 28
  • 18. Iteration costs: backward error Convergence to ε2 w Convergence to 10ε2 w All working All working Residual double Residual double All double All double 1.0 1.0 0.8 0.8 Empirical CDF Empirical CDF 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 5 10 15 20 25 30 5 10 15 20 25 30 Convergence step Convergence step Practical: Stop when backward error is tiny or makes little progress. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 17 / 28
  • 19. Iteration costs: forward error √ Convergence to εw Convergence to N · εw All working All working Residual double Residual double All double All double 1.0 1.0 0.8 0.8 Empirical CDF Empirical CDF 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 5 10 15 20 25 30 0 5 10 15 20 25 30 Convergence step Convergence step Practical: Stop when dx is tiny or makes little progress. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 18 / 28
  • 20. Performance costs Overhead each phase by precision and type 10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5 complex complex single double 6 4 2 Itanium 2 Overhead real real Relatively balanced single double cpu / mem arch. 6 Double faster than 4 single 2 10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5 Dimension Factorization + Condition numbers + Refinement loop + Ref. + Cond. Overhead is time for phase (incl. fact.) / time for factorization Jason Riedy (GATech) Dependable solver 21 Aug, 2009 19 / 28
  • 21. Performance costs Overhead each phase by precision and type 10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5 complex complex single double 8 6 Xeon 3ghz 4 Horribly unbalanced 2 cpu / mem arch. Overhead real real single double 8 (Not parallel) 6 Vector instructions 4 2 No vectorization in extra precision ops. 10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5 Dimension Factorization + Condition numbers + Refinement loop + Ref. + Cond. Overhead is time for phase (incl. fact.) / time for factorization Jason Riedy (GATech) Dependable solver 21 Aug, 2009 20 / 28
  • 22. Outline 1 What do I mean by “better”? 2 Refining to more accurate solutions with extra precision 3 Other applications of better: faster, more scalable Jason Riedy (GATech) Dependable solver 21 Aug, 2009 21 / 28
  • 23. Obvious applications of better Available in Lapack Routines SGESVXX, DGESVXX, CGESVXX, ZGESVXX Experimental interface, subject to changes High-level environments Do you want to think about all error conditions all the time? Should be in Octave & MatlabTM : x = Ab; The same technique applies to overdetermined least-squares [Lawn188; Demmel, Hida, Li, Riedy]. R or S+ (statistics): model <- lm(response~var) A αI x b Refine the augmented system = . [Bj¨rck] o 0 AT r /α 0 Jason Riedy (GATech) Dependable solver 21 Aug, 2009 22 / 28
  • 24. Not so obvious application: Speed! When single precision is much faster than double... Assume: Targeting backward error, often well-conditioned Factor A in single precision, use for Adxi = r . Refine to dp backward error, or fall back to using dp overall. Earlier Cell (extra slow double): 12 Gflop/s ⇒ 150 Gflop/s! [Lawn175; Langou2 , Luszczek, Kurzak, Buttari, Dongarra] (Independent path to the same destination.) When single precision fits more into memory... Sparse, sparse out-of-core Generally limited by indexing performance [Hogg & Scott] Could use packed data structures from Cell [Williams, et al.] Jason Riedy (GATech) Dependable solver 21 Aug, 2009 23 / 28
  • 25. Not so obvious application: Scalability! When pivoting is a major bottleneck... Sparse, unsymmetric LU factorization: Completely separate structural analysis from numerical work. Introduce backward errors to avoid entry growth. Fix with refinement. (SuperLU [Demmel, Li, (+ me)], earlier sym.indef. work) When pivoting blocks practical theory... Communication-optimal algorithms for O(n3 ) linear algebra Trade some computation for optimal memory transfers / comm. [Lawn218; Ballard, Demmel, Holtz, Schwartz] Codes exist, are fast, etc. But LU cannot use partial pivoting! Use a new strategy [Demmel, Grigori, Xiang], refine... Jason Riedy (GATech) Dependable solver 21 Aug, 2009 24 / 28
  • 26. Summary We can construct an inexpensive, dependable solver for Ax = b. Compute an accurate answer whenever feasible. Reliably detect failures / unsure, even for the forward error. We can compute better results for Ax = b. Trade some computation, a little bandwidth for accuracy. Important bit is keeping all the limiting terms (residual, solution) to extra precision Better results can help solve Ax = b more quickly. Start with a sloppy solver and fix it. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 25 / 28
  • 27. Questions / Backup Jason Riedy (GATech) Dependable solver 21 Aug, 2009 26 / 28
  • 28. Doubled-precision Represent a ◦ b exactly as a pair (h, t). Old algorithms [Knuth, Dekker, Linnainmaa, Kahan; 60s & 70s] Work on any faithful arithmetic [Priest] Addition Multiplication h =a+b h =a·b z =h−a (ah, at) = split(a) t = (a − (h − z)) + (b − z) (bh, bt) = split(b) t = ah · at − h t = ((t + (ah ∗ bt)) + (at ∗ bh)) + (at ∗ bt) See qd package from [Bailey, Hida, Li]; recent pubs from [Rump, Ogita, Oishi]. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 27 / 28
  • 29. Iteration costs: backward error to double Convergence to εx Convergence to 10εx All working All working Residual double Residual double All double All double 1.0 1.0 0.8 0.8 Empirical CDF Empirical CDF 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 5 10 15 20 25 30 5 10 15 20 25 30 Convergence step Convergence step Practical: Stop when backward error is tiny or makes little progress. Jason Riedy (GATech) Dependable solver 21 Aug, 2009 28 / 28