Next: QR Factorization Up: Examples of Block Algorithms Previous: Examples of Block Algorithms   Contents   Index

## Factorizations for Solving Linear Equations

The well-known LU and Cholesky factorizations are the simplest block algorithms to derive. No extra floating-point operations nor extra working storage are required.

Table 3.7 illustrates the speed of the LAPACK routine for LU factorization of a real matrix, DGETRF in double precision. This corresponds to 64-bit floating-point arithmetic. A block size of 1 means that the unblocked algorithm is used, since it is faster than -- or at least as fast as -- a blocked algorithm. These numbers may be compared to those for DGEMM in Table 3.12, which should be upper bounds.

 No. of Block Values of n processors size 100 1000 Dec Alpha Miata 1 28 172 370 Compaq AlphaServer DS-20 1 28 353 440 IBM Power 3 1 32 278 551 IBM PowerPC 1 52 77 148 Intel Pentium II 1 40 132 250 Intel Pentium III 1 40 143 297 SGI Origin 2000 1 64 228 452 SGI Origin 2000 4 64 190 699 Sun Ultra 2 1 64 121 240 Sun Enterprise 450 1 64 163 334

Table 3.8 gives similar results for Cholesky factorization.

 No. of Block Values of n processors size 100 1000 Dec Alpha Miata 1 28 197 399 Compaq AlphaServer DS-20 1 28 306 464 IBM Power 3 1 32 299 586 IBM PowerPC 1 52 79 125 Intel Pentium II 1 40 118 253 Intel Pentium III 1 40 142 306 SGI Origin 2000 1 64 222 520 SGI Origin 2000 4 64 137 1056 Sun Ultra 2 1 64 131 276 Sun Enterprise 450 1 64 178 391

LAPACK, like LINPACK, provides a factorization for symmetric indefinite matrices, so that A is factorized as P U D UT PT, where P is a permutation matrix, and D is block diagonal with blocks of order 1 or 2. A block form of this algorithm has been derived, and is implemented in the LAPACK routine SSYTRF/DSYTRF. It has to duplicate a little of the computation in order to ``look ahead'' to determine the necessary row and column interchanges, but the extra work can be more than compensated for by the greater speed of updating the matrix by blocks as is illustrated in Table 3.9, provided that n is large enough.

 Block Values of n size 100 1000 1 186 215 32 130 412

LAPACK, like LINPACK, provides LU and Cholesky factorizations of band matrices. The LINPACK algorithms can easily be restructured to use Level 2 BLAS, though that has little effect on performance for matrices of very narrow bandwidth. It is also possible to use Level 3 BLAS, at the price of doing some extra work with zero elements outside the band [48]. This becomes worthwhile for matrices of large order and semi-bandwidth greater than 100 or so.

Next: QR Factorization Up: Examples of Block Algorithms Previous: Examples of Block Algorithms   Contents   Index
Susan Blackford
1999-10-01