Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
A survey of indexing techniques for sparse matrices
1. A Survey of Indexing Techniques for Sparse Matrices
UDO W. POOCH, AND AL NIEDER
Texas A & M Umversily,* College Statwn, Texas
A sparse matrix is defined to be a matrix containing a high proportion of elements that
are zeros. Sparse matrices of large order are of great interest and application in science
and industry; for example, electrical networks, structural engineering, power
distribution, reactor diffusion, and solutions to differential equations
While conclusions within this paper are primarily drawn considering orders of
greater than 1000, much ~s applicable to sparse matrices of smaller orders in the
hundreds.
Because of increasing use of large order sparse matrices and the tendency to
attempt to solve larger order problems, great attention must be focused on core
storage and execution time Every effort should be made to optimize both computer
memory allocation and executmn times, as these are the limiting factors that most
often dictate the practicahty of solving a given problem
Indexing algorithms are the subject of this paper, as they are generMly recognized
as the most ~mportant factor in fast and efficient processing of large order sparse
matrices.
Indexing schemes of main interest are the bit map, address map, row-column, and
the threaded list Major variations of the indexing techniques above mentioned are
noted, as well as the particular indexing scheme inherent in diagonal or band matrices.
The concluding section of the paper compares the types of methods, discusses their
suitabihty for different types of processing, and makes suggestions eoneernlng the
adaptability and flexibility of the maj or exmting methods of indexing algorithms for
application to user problems
Key Words and Phrases: Matrix, sparse matrix, matrix manipulation, indexing.
CR Categomes: 5 14, 5 19
I. INTRODUCTION
Computations involving sparse matrices
have been of widespread use since the 1950s,
becoming increasingly popular with the advent of faster cycle times and larger computer memories. One cycle time is the time
required for the central processing unit to
send and to receive a data signal from main
memory. Systems applications for sparse
matrices include electrical networks and
power distribution, structural engineering,
reactor diffusion, and solutions to differentim equations.
A sparse matrix is a matrix having few
nonzero elements. Matrix density is defined
as the number of nonzero elements of the
* D e p a r t m e n t of Industrial Engineering.
matrix divided by the total number of elements in the full matrix. Most available references utilizing sparse matrices for calculations [1-8] consider matrices of order 50, or
more [9, 10], with densities ranging from 15 %
to 25 % and decreasing steadily as the order
increases. This paper will accept these
boundary conditions as a strict definition of
a sparse matrix. Brayton, Gustavson, and
Willoughby [8] say that a typical large (implied to be in the hundreds) order sparse
matrix has 2 to 10 nonzero entries per row.
Hays [5] says that an average of 20 nonzero
elements per row is not an unreasonably
small number in quite large (implied to be
around 100 and greater) order. Livesley [1]
indicates that an average of 3 or 4 elements
Computing Surveys, Vol. 5, No. 2, June 1973
2. 110
•
U. W. Pooch and A. Nieder
CONTENTS
I Introduction
II Bit Map Scheme
III Address Map Scheme
IV Row-Column Scheme
V Threaded List Scheme
¥I Diagonal or Band Indexing Scheme
VII Conclusion
Appendix A
Algorithm 1 Bit Map Scheme
Algorithm 2 Address Map Scheme
Algorithm 3. Address Map Scheme
Bibliography
109-112
112-114
114-116
116-119
119-122
122-123
123-127
127-132
132-133
C o p y r i g h t (~ 1973, A s s o c ~ a t m n for C o m p u t i n g
M a c h i n e r y , Inc. G e n e r a l p e r m i s s i o n to r e p u b h s h ,
b u t n o t for profit, all or p a r t of t h i s m a t e r i a l is
g r a n t e d , p r o v i d e d t h a t A C M ' s c o p y r i g h t n o t i c e is
g i v e n a n d t h a t r e f e r e n c e is m a d e to thJs p u b l i c a tion, to i t s d a t e of ~.ssue, a n d to t h e f a c t t h a t rep r m L i n g p r i v i l e g e s were g r a n t e d b y p e r m i s s i o n of
t h e A s s o c m t m n for C o m p u t i n g M a c h i n e r y .
Computing Sulvevs, Vol 5, No 2, June 1973
per row in a large (implied to be around
1000) order structural problem is a good
estimate.
If the order I of the matrix is reasonably
small, i.e., about order 50 or less, it would
make little difference if the full matrix were
kept in core. However, if the sparse matrix
is of larger order than about 50, it becomes
efficient in terms of execution time and core
allocation to store only the nonzero entries
of the matrix.
The efficiency of retaining only the nonzero elements becomes obvious in the exampie of a 500 X 500 matrix with 10 % density.
With one word of storage allocated for each
element, the matrix requires 250,000 words,
which is very often more than is physically
available. Storing only the nonzero elements
requires 25,000 words. If the full matrix were
multiplied by a similar full matrix a minimum of 500 X 500 X 500 = 125 X 106
arithmetic operations are required, compared
to a minimum of (500 X 10 %)3 = 125 X 103
arithmetic operations when only the nonzero
elements are retained. If both 500 X 500
matrices were to be retained in core as full
matrices, core allocation and execution time
would be prohibitive on many computers,
and the problem would be abandoned as infeasible for computer solution.
By storing the nonzero elements in some
reasonable manner, and using logical operations to decide when arithmetic operations
are necessary, Brayton, et al. [8] relate that
both the storage requirements and the required amount of arithmetic can often, in
practice, be decreased by a factor of I over
the full matrix.
Sparse matrices are classified generally by
the arrangement of the nonzero elements.
When the matrix is in random form, nonzero
elements appear in no specific pattern. A
matrix is said to be a band matrix, or in band
form, if its elements a~.~ = 0 for [ i - j I > m
(where m is a small integer, and usually
m ~ I) and where the nonzero elements form
a band along the mam diagonal. The band
width is the number of nonzero elements
that appear in one row of a band matrix
(i.e., 2m ~- 1). A block-diagonal form occurs
when submatrices of nonzero elements appear along the matrix diagonal. In block
3. Indexing Techniques for Sparse Matrices
form, the matrix has submatrices of nonzero
elements that occur in no specific pattern
throughout the full matrix. The block dimension is the order of a submatrix in a block or
block-diagonal matrix.
In electrical network and power distribution problems, the matrix is generally in
random, band, or block-diagonal form, with
the elements representing circuit voltages,
currents, impedances, power sources, or users
[9-10]; in structural engineering applications,
the sparse matrix is generally of band or
block form, with the band width or block
dimension representing the number of joints
per floor [3, 11]; in reactor diffusion problems
and differential equations, the band form of
matrix is most common, with the band width
being the number of points used in a pointdifference formula [12-14].
This paper, while not concerned with the
actual mathematical manipulations of sparse
matrices, is primarily concerned with the indexing algorithms employed in such calculations. If the sparse matrix is stored in a haphazard manner, elements can only be retrieved by a search of all the data, which
takes much time. If the sparse matrix is
stored in some very convenient form, execution time will be much less. Conservation of
execution time is of major importance in
selecting an indexing algorithm.
Another major consideration in selecting
a particular indexing method is the amount
of fast core the method requires in addition
to that used for the storage of the nonzero
data elements. For most applications, a small
difference in core allocation between two
methods is not a critical factor. In this case,
the critical consideration is the execution
time difference between the two methods.
Since execution times vary greatly with the
methods of indexing, an exact comparison of
execution times must reflect the type of
mathematical manipulation that is to be
performed on the sparse matrix.
One last major aspect of indexing algorithm selection concerns the adaptability
and flexibility of programming the selected
scheme. This depends in great part on the
type of machine, business or scientific; machine configuration; operating system capabilities; number of bits per word; access
•
111
times for peripheral devices; average instruction times; availability of the required instructions; the maximum row or column size
to be used; the expected matrix density; and
the availability and size of buffers.
As with most applications, the use of a
high-level programming language may provide relative ease of implementation for a
selected indexing scheme, but such use is frequently accompanied by penalties in execution time and storage requirements. However, on the positive side, use of high-level
languages may well result in a minimum of
elapsed time for problem solution with a
given programming staff, as well as overall
minimum cost, considering both personnel
and computer usage. Problems involving
large order sparse matrices focus their attention on core storage utilization and execution
time minimization, and therefore all but
eliminate the employment of high-level languages for indexing schemes.
In subsequent sections of this paper, current indexing schemes will be examined in an
attempt to isolate a "fast" indexing algorithm, with "fast" being defined as producing an optimization of execution time and
core storage for sparse matrices of large order. Particular advantages and disadvantages of each major type of indexing discussed will be brought to the attention of
the reader. Parts II through VI discuss aspects of particular indexing schemes, while
Part VII compares the requirements and advantages of the various schemes. Part VII, in
conclusion, also makes recommendations
concerning the adaptability and flexibility of
the major existing indexing algorithms for
application to user problems.
The authors have attempted, as much as
possible, to make their discussions machine
independent. However, the authors made use
of an IBM System 360/65 Model I in their
research and certain basic aspects of this machine, such as the 32-bit word, are alluded to
in the succeeding pages. The interested
reader should have httle difficulty in adapting the concepts presented to machines of
differing architecture.
Computing Surveys, Vol 5, N o
2, June 1973
4. 112
U. W. Pooch and A. Nieder
•
II. BIT MAP SCHEME
I 0100 1010 I 1001 0101 I . . . . . . .
I n a bit m a p scheme, a Boolean form of the
matrix M is the basic indexing reference.
Whenever a nonzero entry occurs in the
sparse matrix, a 1 bit is placed in the bit map,
with null entries remaining as zeros in the
bit map. The position of each successive nonzero entry is found by counting over to the
next 1 bit in the map.
More rapid access to any element of a row
is achieved b y providing an additional row
index vector, where each element of t h a t
vector is the address of the first nonzero elem e n t of each row [16]. An additional column
index vector m a y also be applied for a more
rapid column access, but this will also necessitate storing each nonzero entry twice. I t
should be noted, however, t h a t any machine
based on word, rather t h a n bit, addressing
techniques will give much slower access in
one dimension of the matrix t h a n in the
other.
As an example, the following matrix M,
and its associated bit m a p and reduced Zvector is given.
M=
BM=
05
00
10
[3,2,5,4,7,1,8]
Z--
01
00
10
Figure 1 demonstrates a sample bit m a p supplemented with the row index vector V; the
Z elements are the nonzero elements of the
matrix.
The bit m a p in Figure 1 is a matrix conception of the bit map. To conserve core, instead of using one word for each row of the
bit map, all four rows (16 bits) are cornv
v(2)
•
2
• z(2)
V(3)
•
4
, Z(4)
z(5)
V(4)
)
z(3)
) Z(6)
z(7)
R WIndex ValueIndicates
O
f l r s t nonzero
element for row
FI~
1.
,
Z-vector
value
Sample bit m a p .
Computing Surveys, Vol 5, No 2, June 1973
Bit Map
byte 1
byte 2
]
byte 3
FIG. 2. Bit map of Figure 1 in core.
pacted into one word as shown in Figure 2
with byte (8 bits) boundaries marked.
F r o m Figure 2, it is simple to see t h a t the
bit map, being the Boolean form of the matrix, will, in fast core, require at least W =
I . J / B words, where I and J are the dimensions of the matrix and B is the n u m b e r of
bits per word; W is rounded up to the nearest
integer. The bit m a p uses at m i n i m u m
Emt Map ---- (100/B) %
of the storage requirements of the full matrix
for indexing. The additional row index vector
adds W= I . A / B more words, where A is
the n u m b e r of bits required for an address.
Supplemented with the row index vector,
E•lt Map -~- R O W I n d e x
= IO0/B (1 ~- ( A / J ) ) %
of the full matrix is required for the indexing.
Now, if the sparse matrix has less t h a n
65,536 nonzero elements, then A can be 16
bits in excess 32,768 notation. I n a 32-bitword machine for example, 16 bits m a y be
conveniently accessed if the instruction set
has a complement of half-word instructions.
Attention should be given to the number of
bits required for an address to range through
the m a x i m u m core size. If this number of
bits is not conveniently manipulated, it will
be necessary to use more than the m i n i m u m
a m o u n t of core to gain an execution advantage. Execution times for full word instructions are often less t h a n execution times for
half-word instructions. Therefore, when
choosing a convenient number of bits for A,
the n u m b e r of bits used for an address, it is
i m p o r t a n t to realize the tradeoff between
core conservation and access time.
Using B = 32 bits (word length), and
A = 16 bits (half-word length), for a 500 ×
500 matrix the bit m a p and row index vector
require 8313 words, or 3.325 % of the 250,000
words for the full matrix; if the matIix is
only 5 % dense, another 12,500 words are
required for the nonzero elements; the total
is 20,813 words, or 8.325 % of the full matrix.
5. Indexing Techniques for Sparse Matrices
In order to reference the M,~ element, it is
necessary to physically count across to the
j t h element in the zth "row" of the bit map.
The correct bit will lie in the S~ = ((i - 1) *
J + j + (B - 1))/B word of the bit map.
To isolate the required bit, it will be necessary to either shift the word the necessary
number of bits or mask all the other bits by
a logical operation. If a shift is used, then
repeated shifts perform a row operation when
the bit map is stored by rows. Algorithm 1
(see Appendix) isolates the correct beginning
word of a row in the bit map; a segment of
the code shifts through one entire row, in
preparation for a mathematical manipulation of the row.
Algorithm 1 with slight alteration will accommodate matrices up to order 100,000.
The restriction occurs in statement 06, where
the multiplication must not result in loss of
significant bits due to exceeding word size.
In practice, the algorithm is limited either by
the index vector being half-words, as indexing is provided for only 65,536 nonzero elements; or by 4095 rows or columns, the
maximum number used in the indexing in
statement 02.
When the bit map is stored by rows, as in
the algorithm above, then to perform a column operation it is necessary to count to the
correct j bit for all I rows. This means executing virtually the entire algorithm I times.
If more than a few column operations are to
be performed, then execution time will become an important factor. The execution
time is dependent on the density of the
sparse matrix, the order of the sparse matrix,
and the number of column operations to be
performed. The time factor is exemplified by
the following:
EXAMPLE 1: A 500 X 500 m a t r i x
exists, and it is necessary to perform 10
column operations when the matrix is
5 % dense. The average column
execution time will be that of the 250th
column. Assuming the entire algorithm
is executed for each row, the execution
time will be approximately:
500 rows X 10 column operations
[(time to locate beginning of each
row)
•
113
+ .05 density X 500/2 columns X
(to process 1 bits)
+ (1 - .05 density) X 500/2 columns
X (time to process 0 bits)
+ 500/2 columns X (time to locate
bit in bit map)
+ 500/2 words X (time to locate
word in bit map)]
which is about 10 seconds on the I B M
360/65, with additional microseconds incorporated for the mathematical operation
not listed in the coding. Had the same procedure been carried out on the transpose of
the bit map, that is, the bit map is now
column-oriented instead of row-oriented,
then the execution time would have been cut
by a factor of about 500, a considerable time
savings. Not taken into consideration is any
further computer processing, such as updating an index register after each 4095
characters or bytes, if necessary.
If the bit map of the sparse matrix can be
transposed and the data rearranged in less
time than the difference between the column
and row execution times, then the transpose
operation will conserve execution time. In
the above example, the difference between
column and row execution times is about 9.7
seconds.
For certain types of operations the bit map
is ideal. Being in Boolean form, which means
elements are either 1 or 0, true or false, or
plus or minus, the bit map is the most compact form for logical operations, such as
AND, OR, or E X C L U S I V E OR. Thus, if
matrices MA and M B exist, and it is necessary to determine which elements are nonzero in both matrices, it is necessary only to
A N D each word of bit map MA with the corresponding word of bit map MB. If the result
is zero, both are not present; if the result is
nonzero, the indicated elements appear in
both matrices. An E X C L U S I V E OR determines which elements are present in either,
but not both, of the matrices; an OR determines which elements appear in either or
both of the matrices. Logical operations performed on the bit map require about 1/~2 of
the execution time for the same logical operation on the full scale matrix, because the
bit map on a 32 bit-word machine condenses
32 pieces of data into 1 word. Additionally,
Computing Surveys, Vol. 5, No 2, June 1973
6. 114
•
U. W. Pooch and A. N~eder
and often most importantly, the bit map
conserves core storage.
To determine how many elements will be
present in the sum of two rows, and their
order, an OR is performed on the two rows
of the bit map. Using similar techniques, the
feasibility of rearranging the matrix in a
form more convenient for the user, such as
diagonal form, where nonzero elements appear all along the diagonal, is determined.
Kettler and Well [15] discuss some of the
aspects of such a rearrangement algorithm.
M a n y references are found to endorse or
suggest the use of a bit map scheme for
sparse matrices [7, 15-20], but it is particularly difficult to ascertain the exact algorithms utilized, as most authors do not
include these in their papers.
While a bit map scheme appears convenient and fast, it is restricted by the amount
of fast core available for the bit map. In the
case where the sparse matrix is less dense
than the percentage of the full matrix that
the bit map scheme occupies, core storage
will be conserved by switching to an alternative method of indexing.
Givens [21] has suggested that the bit map
scheme would be more attractive to users if
some special instructions were designed and
implemented, to further decrease execution
times. One such instruction Givens references
is C L E A R TO ZERO, which would clear a
large block of core, e.g., the bit map, from a
first to a last address. Another instruction
would be LOAD N E X T NONZERO, which
would fetch the address of the next nonzero
entry of the bit map, given the previous nonzero element, thereby eliminating the necessity of counting through all the zero bits.
These special instructions would be implemented as microprogrammed subroutines
[21]. To define a microprogram, it is necessary to understand that the execution of each
assembly language instruction involves a
specific sequence of transfers of information
from one register in the processor to another;
some of the transfers take place directly, and
some through an adder or other logical circuit. Each of these steps defines a microinstruction and the complete set of steps necessary to execute the assembly language instruction constitutes a microprogram [22].
Computing Surveys, Vol 5, No 2, June 1973
IlL ADDRESS MAP SCHEME
The address map is similar in form to the
bit map, the main difference being that the
address map stores an address or address displacement for each matrix element. If the
matrix element is zero then a zero address is
stored. The bit map requires only one bit for
each matrix element.
Since an address or address displacement
requires more than one bit for each matrix
element, the address map scheme will require
N times more core storage than the bit map
scheme, where N is the number of bits used
for an address or address displacement. If
address displacements instead of full-length
addresses are used, then the address map
must be augmented by a row index vector,
as with the bit map.
Assuming there are less than 256 nonzero
entries per row, for example, an address displacement would require only 8 bits (a
common character size). If a particular computer allows character operations that are
faster than the access time to an individual
bit map entry, the improved column access
time of the address map can warrant the
increased core expenditure. On a system with
6 bit characters, up to 64 nonzero row entries
can be accommodated.
The overall percentage storage requirement of the full matrix required for the address map with the row index vector will be
EAdd.... Map = 100/B (C + A / J ) %
where B is the number of bits per word; C is
the number of bits used for an address displacement; A is the number of bits used for
an element of the row index vector; and J is
the number of columns of the matrix. Using
C = 8 bits; A = 32 bits; B = 32 bits; and
J = 1000 columns, the address map and row
index vector require 25.1% of the full matrix, that is 251,000 words compared to 1
million for the full matrix. In addition, if the
matrix is 5% dense, an additional 50,000
words are required for the storage of the
nonzero elements.
In order to isolate the M,~ element, it is
necessary to access the S, = C / B (i -- 1).
J -t- j character (or byte). In terms of words,
S, = {C[(i -- 1). J + (j - 1)] + B } / B . )
7. Indexing Techniques for Sparse Matrices
where i and 3 are respectively the row and
column of interest. If the S~ character (byte)
is zero, it is a null entry; otherwise, the content of the S~ character (byte) is added to the
row index element to give the address of the
nonzero element.
The address map scheme is subject to
many of the same limitations of the bit map
scheme, and requires a larger amount of core
storage for indexing. A sample coding, Algorithm 2, which has the same characteristics
as the example used in the bit map method
(Algorithm 1) illustrates that fewer arithmetic operations than the bit map method
are required when the computer is equipped
with character addressing capabilities. If the
computer used does not allow convenient
arithmetic manipulation of individual characters, then the coding enclosed in brackets
in Algorithm 2 must be added to overcome
this difficulty. The bracketed coding requires
much of the algorithm time, so if a computer
has built-in arithmetic character manipulation, then the algorithm becomes increasingly faster.
With an example similar to Example 1, we
find that the execution time, with the bracketed coding included, is drastically different
from the bit map time. This is primarily because of the easy access to any character. To
access by column instead of by row, only the
first row location of the correct column need
be found. To find the correct location of the
character in row 2, it is sufficient to add just
the column dimension. This process is continued until the end of the matrix is encountered.
For a column manipulation, then, we easily obtain Algorithm 3, similar to Algorithm 2.
EXAMPLE 2. As in Example 1, a 500 X
500 matrix exists with 5 % density, and
it is necessary to perform 10 column
operations. It is therefore necessary to
execute Algorithm 3, 10 times, so the
execution time will be approximately
10 column operations X
[(initialization time to lobate
beginning of each row)
500 rows X (time to locate bit in
bit map)
•
115
+ (1 -- .05 density) X (time to
process 0 bits)
+ .05 density X (time to process 1
bits)]
which is about 30 msec on the I B M 360/65,
and has incorporated 2 additional ~sec that
were included for the mathematical operation
not listed in the coding. As with Algorithm
1, the limitations are due to the use of halfwords for the index vector, and to the use of
an index register. Note that there is a considerable time savings, but at the expense of
computer memory. Again, not taken into
consideration is any further computer processing, other than the above coding, such as
updating index registers, which may be necessary and require more time.
Unhke the bit map scheme, where the entire row of the bit map up to the desired element must be scanned for nonzero entries
before data manipulation can occur, the address map method requires only a reference
to the desired element. Because the storage
location of a data element is found independently of all except the desired address
displacement, the address map method
blends well with the concept of parallel
processing. Parallel processing involves the
s~multaneous execution of a sequence of operations by dependent central processing
units. Thus, using the address map method,
4 separate central processing units could simultaneously execute the required arithmetic on 4 different elements of the matrix;
at best, using the bit map method, different
steps in the execution of 1 matrix element
would be shared by the 4 central processing
units. Employing the address map method,
the processing units could work independently, except for the final results; while the
bit map method would require transfers of
information from one processing unit to the
other processing units to execute the shared
steps, which introduces an additional time
lag.
While no references have been found to
explicitly endorse or suggest this method,
and comparatively large core requirements
exist, the address map scheme m a y prove
useful with some future computer t h a t features both very fast core of a few million
characters and a multitude of parallel proc-
Computing Surveys, Vol. 5, No. 2, June 1973
8. 116
•
U. W . Pooch and A . Nieder
0
2
0
0
the row designation and another specified
number of bits for the column designation
(Figure 4).
If computations are to be performed in a
row manner, it is highly practical and efficient to order the nonzero entries first by
rows and then by columns. Ordering the entries by rows makes it unnecessary to maintain the row index for every nonzero element;
only the row need be identified for the first
nonzero element of each row, as it is known
t h a t all the following entries up to the next
row indicator belong to the same row. In
order to create the row marker, a check bit,
such as a minus sign bit, can be set in the
first column index word of each row (Figure
5), or as is usually done, an additional and
separate row index vector can be created
(Figure 6). The row index element generally
contains the address or index number of the
first column index for the row. The same syst e m m a y be applied to ordering the entries
I: O 4 °
oo 1
o
7
9
FIG. 3
v(]>
v(2)
v(3)
v(4)
v(5)
v(6)
v(7)
5
l
2
÷
2
Z(1)
2
l
÷
6
Z(2)
2
3
+
4
Z(3)
3
l
÷
3
Z(4)
4
l
÷
4
2
÷
7
9
z(s)
z(6)
4
4
÷
5
Z(7)
Row
FIG. 4
nators
0
Sample matrix.
Col umn
Indexing with row and column deslg-
V(2)
essing units. Hoffman and McCormick [22]
state t h a t at present the value of parallel
processing on a large scale is debatable as
far as manipulating sparse matrices, as there
are virtually no available computers with
more t h a n just a few parallel central processing units, and the field is quite unexplored.
IV. R O W - C O L U M N
2
V(1)
SCHEME
Row-column indexing schemes refer to methods relying on paired vectors of some type;
generally one vector contains the nonzero
elements, which are most often ordered by
rows or columns, and the other vector maintains the indexing information. Row-column
indexing schemes are sometimes referred to
as block index, row, or column packing
schemes, depending on the author's description of how the indexing algorithm works
[7, 15, 17, 20, 23-24].
I n the simplest, but not the most core- and
time-efficient form, each nonzero element of
the matrix has a corresponding index word
t h a t contains a specified number of bits for
Computing Surveys, Vol 5, No 2, June 1973
V(3)
V(4)
V(5)
V(6)
V(7)
÷ ~
z(l)
Z(2)
-
1
÷
+
3
+
-
1
Z(4)
-
1
z(5)
~(6)
:
÷
+
2
+
4
Row Column
indicator
(Sign b i t )
z(3)
Z(7)
FIG. 5. Indexing with row m d m a t o r and column designation
VR(1) ~
!
VR(2)
VR(3)
VR(4)
First
column
index
for
each
row
(halfword)
V(1)
V(2)
1 1
V(3)
3 1 ÷
V(4)
1 : ÷
V(5)
" '
V(6) 2 ~
÷
V(7)
4 i ÷
Column
(halfword)
2
6
4
3
7
9
5
z(1)
z(2)
z(3)
z(4)
z(5)
z(6)
z(7)
FIG 6 Indexing with row vector and column
index vector.
9. Indexing Techniques for Sparse Matrices
by columns if column operations are to be
performed.
Figures 3 through 6 depict sample vectors
for the row-column schemes described above.
The index vectors are V and VR; the nonzero entries are contained in vector Z. The
data matrix used in Figures 4 through 6 is
displayed in Figure 3. The nonzero entries
of the data matrix are stored by rows, in
order of increasing column number. All index
vectors are full words unless otherwise noted.
From the above figures it is evident, there
exists a wide possibility of variation in the
row-column scheme of indexing. Further
variations and adaptations can occur as a
result of optimizing peculiar computer characteristics, or as a result of making calculations on special forms of sparse matrices,
such as block matrices.
However, caution is advised, for such
optimizations may result in a useless program whenever system changes occur, and
should therefore only be used when they are
critical economies of the calculations.
In the instance of computer peculiarities,
Smith [17] states that a particular type of
second generation IBM computer did not
utilize the bits of the second word in extended-precision floating-point calculations
that were normally used as the exponent
bits in single precision floating-point calculations. A sparse matrix row-column indexing algorithm was developed that employed these otherwise wasted 8 to 9 bits as
the row or column indices, and could accommodate matrices up to order 255 and 511
respectively.
For the case of a special sparse matrix, the
row-column indexing scheme for a blockdiagonal matrix could become a blocked
indexing scheme. The blocked indexing
scheme would be identical to the row-column
method, except that the large sparse matrix
is partitmned into several smaller submatrices (blocks). Then each submatrix is
identified with a separate row-column
scheme of some sort.
A blocked indexing scheme may also be
used to refer to combining several column
indices into one block (word). For example,
one 64-bit word would contain 4 column
indices, each index of 16 bits. When a row
•
117
operation is performed, then, 4 nonzero
elements can be readied for processing at
the expense of a loading time for only one
block [17].
I t should be noted t h a t for many computers and algorithms more time is required
to load a referenced word for arithmetic
processing than is required to perform the
necessary arithmetic to isolate the required
bits of the referenced word. Likewise, more
time is required to load extended-precision
words than ordinary ,words. Also, since
most computers are geared to utilize arithmetic data primarily by words, more time
is required to load a half-word for arithmetic processing than is required to load a
full word.
Another major variation, known as delta
or displacement indexing, is also popular,
and is somewhat similar to the address map
form of indexing. For one particular example
of a delta indexing scheme, one 64-bit extended-precision word contains one 16-bit
index and six 8-bit displacements to the index. Therefore, the column indices of 7
elements can be referred to by loading and
processing one extended-precision word,
which can result in both a considerable time
and core savings. For a delta of 8 bits, it is
possible for 2 nonzero entries of the same
row to be a maximum of 255 columns apart.
If elements can appear farther apart than
255 columns, then a greater number of bits
must be allocated for each delta or the
method must be abandoned. To determine
the column number of the first element
paired with the 64-bit index word, the first
16 bits of the index word are used. In order
to determine subsequent column numbers
for any other element paired with the 64-bit
index word, the appropriate delta is added
to the first 16 bits and the sum of deltas in
between.
Smith [17] also states that delta indexing
is more efficient for large order (implied
order about 250) sparse matrices than a
blocked index form. Figures 7 and 8 depict
the blocked and delta indexed word mentioned above, and are equivalent.
EXAMPLE 3. From Figure 7, column
index 3 = 1078. From Figure 8, column
index 3 = 1027 + 20 -t- 31 = 1078.
Computing Surveys, Vol, 5, No 2, June 1973
10. 118
*
U. W. Pooch and A. Nieder
1027
Column
index 1
1047
1078
1095
Column
Column
Column
index 2
index 3
index 4
(16 bits each index)
FIG. 7 Blocked index word.
For the row-column indexing method,
using a column index for each nonzero
entry and a row index vector, there is a
required minimum for indexing W =
I / B ( J . T . D + V) words; where I is the
number of rows; J is the number of
columns; T is the number of bits used
for a column index element; D is the
density of the matrix; V is the number
of bits used for a row index element; and
B is the number of bits per word. In
reality, however, for matrices up to
order 65,535 (in excess 32,768 notation),
half-words may be most conveniently
and efficiently used for all the row and
column indices. Half-word indices are
used to increase core savings at a
generally tolerable increase in execution
time; few it any matrices of order 30,000
or greater have been of notable use.
Using half-word indices, then, the abovementioned indexing scheme requires a
minimum core storage of
ERow-co~umn = ( 1 / 2 J + D ) %
of the full matrix for indexing.
To access an M , element, it is necessary
to refer to the ith row index, which points
to the first nonzero element of the ith row.
The column indices between the ~th and
i + 1st row indices are searched for j. If the
column indices searched do not contain j,
the M , element is zero; otherwise the data
element paired with the j column index is
fetched and processed.
For row operations, as long as the matrix
remains ordered, execution time is very fast.
For more than a few column operations,
however, on a matrix of order greater than
about 200, it is almost always more convenient and efficient to transpose the entire
matrix and reorder all the data elements
before performing the desired arithmetic.
Again, the same situation exists as with the
bit map; if the data and indexing scheme
can be transposed in less time than the difference between the column and row execution times, then the transpose operation will
conserve execution time.
Unlike the bit map and address map
schemes, which have constant core requirements for indexing, the row-column method
has a core requirement for indexing directly
proportional to the matrix density. Since
each nonzero element has a paired column
index, only the number of elements in the
row index vector is constant. For example,
adding two 50 X 50 sparse matrices, M A and
MB, does not in general produce the result
that the total number of resulting nonzero
elements is the sum of the nonzero elements
for each matrix before the matrix addition:
if M A has 250 data elements and M B has
450, the sum of matrices MA and M B will
not, in most cases, have 700 elements, i n
the sum of matrices M A and MB, the only
surety is t h a t there will still be 50 row index
elements. A variable amount of core for indexing creates core allocation difficulties
t h a t m a y not be readily acceptable to the
user.
In comparison to the bit map method, the
row-column indexing method is noted for its
fast execution time, when data elements are
properly ordered, and its ease of programming, even for matrices of very large
order (in the thousands). A wide variety of
references endorse (or imply an endorsement
of) a row-column techmque for indexing [15,
17, 25-30], or a block-diagonal method [3134], especially for particular applications, as
noted in the Introduction, or for special
matrices, such as symmetric matrices. I t
should be noted that a symmetric matrix
1027
20
31
17
Column
delta
delta
delta
m
delta
index 1
(16 bits)
(8 bits each index)
FIG 8. Delta index word.
Computing
S u r v e y s , Vol
5, N o
2, J u n e 1973
__
__
delta
delta
11. Indexing Techniques for Sparse Matrices
decreases by almost 50 % the core requirements in the row-column technique, both for
the data elements and for the indexing
elements.
Two of the more general sets of algorithms
encountered for processing random, and
some special, sparse matrices and employing
the row-column indexing technique are
MATLAN [29], an I B M product, and Algorithm 408 [30], a more recent private effort.
As these algorithms are readily available
and are of general interest, a particular
coding example is not given for the rowcolumn indexing technique. Both these
algorithms were intended for use on sparse
matrices of order less than about 32,700, and
are more efficient for orders less than (about)
1,000.
MATLAN is a programming system, operating under the control of Operating
System/360, and has a very wide applicability. MATLAN includes many supplementary features, such as different versions
for an all-core problem and for a segmented
problem, three overlay structures for core
storage, and options on precision. A segmented problem exists when portions of the
problem under consideration are stored in
core and on tapes or disks, an all-core problem exists when the storage requirement is
such that the entire problem is stored in fast
memory. Because of the variable precision
option and the all-core or segmented feature,
it is difficult to assess execution times. Array
dimensions are limited to 32,756, which indicates half-words are used for indexing
purposes.
Algorithm 408 uses a variation of the indexing algorithm depicted in Figure 6.
Instead of having the row index vector contain the address or index number of the first
column index for the row, the row index
vector contains the number of stored elements in the row. In addition, the row index
vector is appended to the column index
vector by using the same array name, M.
While the scope of Algorithm 408 is not as
broad as ~¢IATLAN, Algorithm 408 has the
distinct advantage of being readily alterable: a section of the reference is devoted to
possible alterations, such as combining three
or more indices to a word of the M array.
•
119
Because of the great variation in coding,
at present it is not considered economically
worthwhile to compare actual core storage
and execution times to determine which of
the many different existing algorithms employing the row-column method is the most
efficient or optimal.
A good basis for examing some of the rowcolumn indexing scheme characteristics rests
on using half-word indices, with a row index
vector, for calculations. At worst, the
method (as typified by Algorithm 408) will
utilize less core than the full matrix up to a
density of slightly over 66%. Conservation
of core allocation and execution time increases as the density decreases.
It has been noted that the bit map method
employs approximately 4 % of the full matrix
for indexing. Therefore, it can easily be seen
that when the matrix density falls below
about 4%, the row-column method will
conserve more core than the bit map scheme.
In addition, the advantage of the faster indexing into the data by the row-column
method in this case almost excludes the use
of the bit map, except for special cases, such
as a Boolean problem.
V. THREADED LIST SCHEME
A threaded, or linked list, scheme contains
one element of an array in core for each nonzero element of the sparse matrix. Each
array element in a linked list method has at
least three components: one component
contains the row and column indices; another contains the matrix element (data);
and the third contains the address of, or a
pointer to, the next array element.
If the third component of an array element
were not present, the linked list scheme
would have, at an absolute minimum, the
same core requirement for indexing as the
row-column method. The third component
adds W = A*D/B more words for indexing
which gives a minimum total of W -- I / B
((J.T A- A)D A- V) words for indexing
a threaded list scheme: where I is the number
of rows; J is the number of columns; D is
the density of the matrix; T is the number of
bits used for a column index; V is the number
Computing Surveys, Vol. 5, No. 2, June 1973
12. 120
•
U. W. Pooch and A. Nieder
of bits used for a row index; A is the number
of bits required for an address to range
through the entire amount of core used to
contain the complete threaded list; and B is
the number of bits per word. For any practical application, however, both the row and
column indices must be retained, which
gives an overall minimum core allocation
for indexing of W = I . J . D ( T + V + A ) / B
words.
As in the previously discussed methods of
indexing, half-words (16 bits) are used in
practice for both the row and column indices,
which give capabilities of a matrix of order
65,535 (in excess 32,768 notation). In addition, because of the great difficulty and
great time involved in manipulating addresses of less than full word size (refer to
Bit Map Scheme), full words (32 bits) are
conveniently used for addresses. These considerations now require for the overall minimum core storage for indexing, W =
2 . I . J . D words. As a percentage (E) of the
full matrix, this is
E L m k e d LI~t =
2*D %
necessary for indexing.
In order to reference an M , element, the
entire threaded list must be searched if the
nonzero elements are stored in a random
manner. Elements can be stored, except for
updates, and accessed more efficiently by
rows and colums, which can reduce access
time to particular elements or rows of elements. Elements need not be stored contiguously for reasonably efficient processing.
In one particular application of a threaded
list scheme, data elements were initially
stored by rows and columns, and a table of
pointers was kept. Each pointer addressed
the beginning element of a group of 8 elements. Any particular item, or row of items,
could be found by a binary search on the
list of pointers. Example 4 typifies the search
for a particular matrix element in this application of linked list indexing.
EXAMPLE 4. Matrix elements are
stored by rows and columns. The
element to be found is in the middle row
of the matrix, so the pointer in the
middle of the pointer list is selected.
The contents of the pointer word
Computing Surveys, Vol 5, No 2, June 1973
addresses an element of the linked list.
The element is then examined, to
compare the row and column components with the required row and
column numbers. Three separate cases
can now occur:
(1) If the row and column numbers
match, the correct element has been
found.
(2) The rest of the elements in the
group of 8 are searched, and if the
row matches, but not the column, it
is known that the correct group can
probably be found by a search on
the next few pointers about the
pointer last used. if the pointer
indexed an element whose column
number was greater than required,
then the next lower pointer is used.
(3) The rest of the elements in the
group of 8 are searched, and if the
row doesn't match, then a binary
search on the pointers is continued.
In a binary search, if the pointer
indexed an element whose row
number was greater than required,
the next pointer to be selected is
the one halfway between the last
pointer (upper bound pointer in
this case) and the lower bound
pointer (the first pointer in this
case).
When the procedure is iterated, (2) above,
and the appropriate groups are searched, but
the correct row and column cannot be found,
then it is known that the required matrix
element is the null element.
It should be noted that unless the data
elements are in reasonable order, the binary
search on the pointers is almost useless. The
particular value of a linked list is that there
is no longer the requirement that data elements be stored contiguously: updates, insertions, and deletions of matrix elements
are performed by altering the address component of the appropriate hnked elements.
However, a linked list expansion or contraction results in some pointer groups
having a greater number of link elements,
and some other pointer groups having fewer
link elements. The alterable number of link
13. Indexing Techniques for Sparse Matrices
elements in each pointer group necessitates
a periodic updating of the pointer table. A
pointer table update is vital to the efficiency of the binary search, and may require
a great amount of execution time. The
amount of execution time required for a
pointer table update depends directly on
the number of link elements to be grouped,
as each link element must be inspected m
order to find each successive link element.
For peak efficiency of the binary search,
every group should have the same number
of linked list elements.
Using the additional pointer table to
combat the otherwise slow execution time of
the linked list scheme, one pointer exists
for each 8 nonzero matrix elements. Employing a full word for each pointer, which
is an address, we now have a minimum indexing core requirement of W = 21/~*I*J*D
words, for
ELmked
List
--~
2 . 1 2 5 , D %
of the full matrix. This is a much greater
core requirement than the row-column
methods of the previous section require for
any matrix of order greater than three.
Figure 9 depicts a few elements of a linked
list, and the correlation between elements.
A pointer table is not included.
Not previously mentioned is the practical
necessity of maintaining a table of available
addresses, so that core allocation remain
conservative during the insertion and deleAddress
Address
I051
next
RW
O
Column
element
. . . . . . . . . . . . . . . . . . .
Data
element
*
Address
1162
.
F'2
I 3 1 9841
J
i
i . . . . . . . . . . .
Address
.
.
.
.
.
.
.
.
.
.
i.
.
"1
. . . . .
I
I
1273
.
I
i
i
I
H 41 1,4
FIG 9
f .6'2 J
Linkedhst elements.
f
•
121
tion of matrix elements. When matrix
elements are deleted, the address of the
deleted link element must be appended to
the table of available addresses. Not only
must the table be maintained in fast core
but the threaded list scheme additionally
requires a buffer area to be used for the inserted and/or deleted link elements. If such
a buffer area is not used or kept, then core
will not be conserved and the prime ~dvantage of the threaded list will have been
discarded.
Few references endorse, or suggest endorsement of, the linked list scheme as a
practical method for indexing sparse matrices [15, 34-37]. Only a few sources [15,
38-40] found in the literature survey actually utilized the threaded list scheme;
while the actual algorithms were seldom
described in great detail, the scheme basically followed the designs of Example 4.
Overall, the threaded list technique of indexing into sparse matrices requires a significant amount of execution time for processing
indices, in addition to the core requirements
of a buffer and two separate tables. Inherent
in the method, then, are considerable execution times for processing and considerable
core expenditure, in comparison with the bit
map and row-column schemes for identical
matrices. Offsetting these disadvantages,
however, the linked list scheme has the
distinct advantage of not requiring a significant amount of execution time to update the
linked list by insertion or deletion of single
matrix elements or series of matrix elements.
All other previously discussed indexing
techniques require a shifting of data when
an update is performed, which will take a
great amount of execution time when
numerous matrix elements have to be shifted
to make the appropriate word available for
the update. The linked list scheme is slow for
random processing of matrix elements; however, in many applications items are accessed sequentially by row or column. In
these applications, proper chains of pointers
speed up processing greatly. As with previous methods, a definite symmetry of the
sparse matrix reduces proportionately the
core requirements for indexing.
Computing Surveys, Vol. 5, No. 2, June 1973
14. 122
•
U. W . P o o c h a n d A . N~eder
Vl. DIAGONAL OR BAND INDEXING SCHEME
/
-199
Band and diagonal matrices are special
types of matrices t h a t occur frequently in
electrical engineering,
structural
engineering, nuclear engineering and physics,
solutions to differential equations, and a
host of other fields, as mentioned in the
I n t r o d u c t i o n . Band and diagonal matrices,
while of frequent occurrence, should not be
mistaken as a general case of sparse m a t rices.
When band or diagonal matrices occur, a
special effort on the part of the user should
be made to a d a p t his processing a n d / o r indexing algorithms to the case at hand. This
adaptation should be made because of the
inherent simplicity of processing, manipulating, and solving band matrices, and also
because of the opportunity to minimize core
allocation and execution time.
In most cases, band or diagonal matrices
are processed either wholly by rows or columns, and httle or no processing of single
elements occurs. For a band matrix, a comm o n manipulation involves decreasing the
band width. I n such a manipulation, it is
normal procedure for one entire row (column) to operate on the row (column) immediately above or below it (or to either
side). With such a simple processing sequence, it is evident t h a t only a few rows
(columns) need be maintained in fast core
for immediate use.
If d a t a transmission rates are comparable
to the rate with which rows (columns) are
manipulated, then rows (columns) not in
immediate use can be stored on slower access
devices, such as tapes or disks. Storing data
on tapes or disks frees the more expensive
fast core. I n most machine configurations
there is a much larger amount of m e m o r y
available in the slower devices. When slow
devices can be used efficiently for processing
band matrices, the capability of manipulating large order sparse matrices is limited
by the m a x i m u m allowable execution time
and the desired accuracy limits of the results,
and not by the order of the matrix involved.
To further conserve execution time, but at
the expense of fast memory, the entire band
matrix can be stored in fast core. Preserving
Computing Surveys, Vol 5, No 2, June 1973
lO0 5
99
-199.
lOl
98 5
-199
98
lOl 5
0
-199, I02
97 5 -199
97
102 5
-199
I03
96,5 -199
0
96
103 5
-199
104
95 5 -199
/
FIG 10. Band matrix.
the entire matrix in fast core eliminates the
transmission times between fast core and
auxiliary devices, as well as the time required to restore elements in fast core,
which is done prior to data manipulation
and processing. Another prime a d v a n t a g e
directly involved with data transmission is
the use of overlapping channels in burst or
select mode. However, when the matrix is
fully maintained in fast core, channels will
then be available to other users on multi-user
computers.
If the band matrix has full bands, t h a t is,
no row has any zero elements within the
band, then the total number of elements to
be stored is the band width multiplied by
the number of rows in the matrix. Figure 10
depicts a band matrix with full bands (a
band width of 3 here):
EXAMPLE 5. Figure 10 is the resulting
9 X 9 matrix obtained by using a central
difference approximation (3 points) to
solve the boundary-value differential
equation 2 + 3t 2 = y + y' + y" using
10 intervals between the points y ( t =
0) = 0. a n d y ( t = 1) = 1.
A 5-point interpolation would yield a
band width of 5; 50 intervals would
result in a 49 X 49 matrix. N o t e that
the augment column, a constant
associated with each row of the matrix,
is not considered here as an integral part
of the sparse matrix. Accuracy of results
depends on the number of intervals,
n u m b e r of points in the interpolation
formula, and computer round off.
I n one particular application of processing
a band matrix by rows (columns), it is convenient and efficient to store elements in full
vectors, one vector for each super- or sub-
15. Indexing Techniques for Sparse Matrices
diagonal of the band matrix. Since the
diagonal has the greatest number of elements,
the vector for the diagonal will be the largest
vector. To avoid double indexing, which
takes greater execution time, an additional
table of addresses is created. Each element
of the address table contains the address of
the first element of the respective vector.
The indexing scheme in the algorithm used
to arithmetically manipulate the band
matrix is then altered to suit the storage
scheme.
If, for some reason, it is more convenient
to store elements in a row or column form,
e.g., because of a very difficult or time-consuming arithmetic manipulation, most of
the advantage of employing a band scheme
is lost, and other methods of indexing should
be considered.
Band matrices, as noted above, are unusual from an indexing standpoint because
of the very slight core requirements for
indexing. For the application described
above, only W = I , V / B words are required
for indexing; where I is the number of rows;
V is the number of bits used for a row index
element; and B is the number of bits per
word. As a percentage (E) of the full matrix,
this indexing requirement is
Ezand = 1 0 0 / J %
where J is the number of columns in the
matrix when full words are used for the
table of addresses. If hMf-words are adequate,
it decreases this requirement further by onehalf.
It should be brought to the attention of
the user that in the instance where bands do
contain zero elements, a decision should be
made whether to employ a band scheme,
which may not be very efficient in use of core
if a large number of null entries exists, or
some other particular scheme, such as a
block-diagonM scheme, which may not conserve execution time.
Many papers [4, 10, 34, 40-43] are concerned with band matrices, primarily, as
said, because of the prevalence of band
matrices in many specific fields of interest.
Also, many algorithms are readily available
for processing band matrices; FOaTRAN M
•
123
[44] being one of the more recent programming packages.
VII. CONCLUSION
In the previous sections four major types of
indexing methods were discussed, three of
which are in general use: the bit map scheme,
the row-column scheme, and the threaded
list scheme. Each major type, of course, has
many variations (the address map method is
not in general use at present, so no variations
occur). The important special case of the
band matrix is discussed as a separate entity,
because it is not a general case of a sparse
matrix, even though it has wide application.
As stated in the Introduction, one of the
major considerations in selecting a particular
indexing method is the amount of fast core
the method requires, in addition to the data
elements. The indexing in the bit map
method requires a fast core allocation of
approximately 4 % of the full matrix; in the
address map method indexing requires about
25 % of the full matrix. The row-column and
threaded list schemes have no definite core
requirements for indexing, and fast memory
for indexing is directly proportional to the
sparse matrix density. The percentage of the
full matrix required for indexing a rowcolumn scheme is about one times the matrix
density, and about twice the density is required for a threaded list scheme.
Previous discussion indicated that an
exact comparison of execution times must
reflect the type of mathematical manipulation being performed on the sparse matrix.
For example, the bit map method is of particular use when the matrix is used to produce an "optimal" ordering, so the matrix
inverse will not have a greatly increased
density. In contrast, the row-column method
is faster than other methods when manipulations involve one row (column) acting on
other rows (columns).
The second important aspect of indexing
scheme selection is the conservation of
execution time. If arithmetic operations are
to be performed on the data, primary consideration should first be given to a rowcolumn method; if Boolean arithmetic or
Computmg Surveys, Vol. 5, No 2, June 1973
16. 124
•
U. W. Pooch and A. Nieder
reordering algorithms are to be performed,
the bit map scheme should be considered
first; and if a great number of data elements
are to be reordered, created, or annihilated,
a threaded list scheme deserves first consideration.
The bit map scheme has a definite core
allocation for indexing, offers a reasonable
row access time, is quite fast in execution
time when row operations are performed, is
core efficient when the matrix density is
greater than 4 %, and allows very fast manipulation of logical (Boolean) operations.
Logical operations can be conveniently used
to determine when arithmetic operations are
to be executed.
As to its disadvantages: the bit map
scheme has extremely poor column access
time when elements are ordered by rows,
which in most cases requires transposing
the bit map and reordering the data elements: it makes poor use of parallel processing, requires considerable time to reorder
data elements, and is not core efficient when
matrix density fails below 4 %.
The address map proves advantageous
when character addressing is available,
makes very efficient use of parallel processors, provides ready access to any element,
does not require an extensive amount of execution time (in comparison to the bit map
scheme) to reorder data elements, and exhibits a reasonable row and column execution time.
The primary disadvantages of the address
map method are: a large fast core requirement for indexing; and the relatively large
execution time, in comparison with the
threaded list scheme, to reorder matrix
elements.
Both bit and address maps require significant execution times to transpose the mat r i x - t h e map must be transposed, and all
the data elements must be reordered. Execution time to transpose the matrix is
directly proportional to the order of the
matrix and the matrix density.
Primary advantages of the row-column
schemes are: a very fast row access time in
comparison with the bit and address maps;
a relatively fast column access time in comparison to all other methods; conservation of
Computing Surveys, Vol 5, N o
2, June 1973
core with matrices of less than 4% density
when compared to the bit map method; an
increase in efficiency as the order of the
matrix increases, as more complex variations
become more efficient; and faster reordering
than the bit map or address map methods.
The main disadvantages of the row-column scheme are that column access time
and the time required to reorder elements
greatly increase as the matrix order a n d / o r
matrix density increases.
The threaded list technique is the sole
technique that allows a simple and fast executing method of reordering, adding, or
annihilating data elements.
The threaded list scheme exhibits a
variety of disadvantages, the primary ones
being a large core requirement for indexing
in comparison with the row-column method,
a slow access time for rows when elements
are stored by rows, and an even slower access
time for columns compared with the rowcolumn method. The inclusion of orthogonal
links, as discussed by K n u t h [35], removes
some of the column access difficulties, but
only at the price of additional storage.
For the special case of band matrices, a
scheme similar to the one described in Part
VI should be used unless either half or more
of the elements within the b, nd width are
null, or the nature of the mathematical
operations to be performed dictates otherwise (as described in Part VI). If the band
matrix scheme cannot he utilized, the user
must decide which characteristics of the
other types of indexing are considered vital
to the solution, and select a method on this
basis.
A final major aspect of indexing the user
must consider concerns the adaptability and
flexibility of programming the selected
scheme, which depends upon the factors
enumerated in the Introduction. The following suggestions and comments concerning
programming flexibility and adaptability
are offered.
None of the major types of indexing
schemes requires double indexing. Double
indexing involves using one register (adder)
to index across the row, and another register
to index down the column. Double indices
have at least three drawbacks: they require
17. Indexing Techniquesfor Sparse Matrices
more time than single indices; the computer
may have a built-in limit on the number of
characters or words that can be indexed by
one or both of the registers before a new
index (base) register must be designated;and
registers are at a premium, because of the
extremely fast register to register operation
time, and should be used for more vital arithmetic. In the last analysis, the increased
time involved in double indexing is the
critical factor.
In general, the larger the order of the
matrix, the lower the matrix density. Because of this the row-column method is
preferred for matrices with orders of 1000 or
more, especially when arithmetic manipulations or operations are to be performed.
As the order of the matrix increases, it
becomes more efficient to employ more complex variations of the major types. For instance, the delta indexing scheme (as described in Part VI) conserves a considerable
amount of fast core compared with the
simpler row-column schemes, without a
great increase in execution time, when the
order approaches 1000.
If the matrix requires more fast core than
is available, the user must decide either to
segment the matrix between fast and slow
core, or to reduce the complexity of the
problem. If the problem can be simplified,
or the matrix condensed or partitioned
(blocked), then it is not necessary to segment the matrix between fast and slow core.
Simplifying the matrix involves the real
consideration of whether or not it is economically feasible to reorder rows and/or
columns to produce a new matrix that can
be more efficiently processed. Many schemes
have been developed [7, 16, 18, 27] to attempt such an optimal ordering of matrix
elements. Condensing the matrix involves
the elimination of data elements that produce insignificant or negligible change in the
results. Such condensing can often be done
with reasonable competence by somebody
skilled in the nature of the problem to be
solved. If the matrix is of block-diagonal
form, each block can be processed as a
separate entity to produce a composite result.
The availability of a virtual memory
•
125
processor might lead the user to the erroneous conclusion that the benefits of a
proper indexing algorithm are negated. This
is not so; at some time during the processing
of a sparse matrix the matrix must reside in
physical memory. It then follows that the
fewer the number of pages occupied by the
sparse matrix, the fewer the page faults
generated, and therefore the less time involved in moving the matrix to and from
peripheral paging devices. In other words,
the same benefits accruing from indexing in
an ordinary processor apply in a virtual
memory processor.When such updating of data files is anticipated, the user should designate buffer
storage. When new matrix elements are
introduced, they should be stored in the
buffer area. When a considerable humber of
corrections to the data elements exist (about
5%), then the matrix is reordered. The
threaded list scheme requires no separate
buffer area, as a buffer is inherent in the indexing scheme.
The segments of coding that contain the
actual indexing algorithm should be programmed in a low-level language, such as
assembly language, to conserve execution
time. High-level languages, such as FOgWRhN
utilize a compiler, which may not produce
the most efficient coding. For instance, if a
division by 32,768 is necessary, the high-level
language may simply create a division by
32,768 in assembly language. If the highlevel compiler, however, recognized that a
division by 32,768 is identical to shifting an
accumulator right 16 bits, the assembly
language version would be a shift right
logical or shift right double logical. The first
version would require significantly more
execution time than the more efficient assembly language program version. A considerable savings is realized when the computation is performed perhaps as many as
several million times in a program.
The user should avoid making the indexing algorithm in a subroutine form,
especially in a high-level language, because
of the added linkage time during program
execution.
While a "fast" algorithm for indexing into
arbitrarily sparse matrices would allow very
Computing Surveys, Vol. 5, No. 2, June 1973
18. 126
•
U. W . Pooch and A . N~eder
efficient core storage allocation and execution
times for matrix manipulations, it is also
evident that no such single algorithm exists,
at least at present. The advent of array
processors and pipeline computers may
eliminate the desire to handle sparse matrices in any special manner whatsoever.
However, it also appears that no matter how
large, or how fast and sophisticated, computing machines become, users will continue to strive for core storage conservation
and faster execution times. It remains to be
seen if sufficiently sophisticated indexing
algorithms will be developed to accomplish
those goals in array or pipeline machines;
or whether such machines will come into
Computing Surveys, Vol. 5, No 2, June 1973
general use and provide an environment
conducive to developing sparse matrix
indexing schemes.
For the present, the choice of an indexing
algorithm depends upon many considerations, with each major type of indexing
discussed here having particular advantages
and disadvantages. Careful selection of an
algorithm can satisfactorily achieve the
goals of conservation of core memory and
execution time. In addition, whenever there
exists some pattern to the nonzero entries,
the possibility of reorganizing the calculations as a means to handle some sparse
matrices should be carefully considered.
19. Indexing Techniques for Sparse Matrices
•
127
APPENDIX
ALGORITHM
1 BIT MAP SCHEME
Statement
Meaning
is t h e row n u m b e r t h a t will b e m a m p u l a t e d
v is t h e row i n d e x v e c t o r
b = n u m b e r of b i t s / w o r d
(* -- 1)
J is t h e n u m b e r of c o l u m n s in t h e m a t r i x
(z -- 1) * J
Save (z- l)*J
(((z1 ) * J ) 4- b - D
S, = (((~ - l) * J ) 4- b -- 1 ) / b w o r d c o n t a i n s t h e
first b i t of r e q u i r e d row
E n d of row c o u n t e r ( J )
S t a r t i n g w o r d of t h e r o w
D e t e r m i n e c o r r e c t n u m b e r of d i s p l a c e m e n t b i t s ;
M A S K = m a s k for m a x i m u m d i s p l a c e m e n t
bits
S h i f t to e l i m i n a t e i n c o r r e c t b i t s ( f r o m p r e v i o u s
01
02
03
04
05
06
07
08
09
R O W ~R I N D E X ~- v(~)
BITS ~ b
R O W ¢-- R O W - 1
C O L S ~- J
ROW e- ROW * COLS
S A V E (--- R O W
R O W ~- R O W 4- B I T S R O W ~- R O W / B I T S
10
ll
12
R O W E N D (-- C O L S
S T A R T ~- R O W
R O W E N D ~- R O W E N D
MASK
13
S T A R T *- S T A R T * 2 * * S A V E
14
15
16 C O U N T
R O W E N D *- R O W E N D
GO TO ROWSCAN
R O W E N D *-- B I T S
17
18
19 R O W S C A N
R O W ~- R O W 4- 1
W O R D ~-- b i t w o r d f r o m m a p
W O R D B 1 T *-- b i t f r o m b i t - w o r d
20
C O L N U M (-- C O L N U M 4- 1
Increment column number
21
WORDB1T
22
IF YES, GO TO MATH
Following statements are branch controls
Is t h e b i t n o n - z e r o ?
Yes, an element exists.
23 E N D R O W
24
COLNUM = COLS
~
IF YES, GO TO END1
Is t h e c o l u m n c o u n t e r e q u a l to t h e r o w c o u n t e r ?
Y e s , e n d of row
25
COLNUM
26
27
28 M A T H
I F Y E S , GO T O C O U N T
GO TO ROWSCAN
R I N D E X e - - R I N D E X 4- 1
H a v e we s h i f t e d c o m p l e t e l y t h r o u g h b i t m a p
word?
Yes, fetch another word.
N o , s c a n n e x t b i t in w o r d
R I N D E X = a d d r e s s of n o n z e r o e l e m e n t
1
AND
rOW)
-
SAVE
C o r r e c t for e h m i n a t e d b i t s
B r a n c h to code to s c a n row in b i t m a p for 1 b i t s
F o l l o w m g code s c a n s o n e e n t i r e r o w of a b i t m a p .
A f t e r first w o r d of row is s c a n n e d , t h e b i t
counter (ROWEND) = b
Increment bit map word address by one
W o r d of b i t m a p
P i c k u p h{gh o r d e r b i t f r o m b i t w o r d
(WORD)
= 1
= ROWEND
COLNUM
element
=
column
number
of
non-zero
P e r f o r m r e q u i r e d o p e r a t i o n on e l e m e n t
29
30 E N D 1
GO TO ENDROW
STOP
Computing Surveys, ¥oi 5, No 2, June 1973
Return
E n d of o p e r a t i o n o n t h e row.
20. •
128
U. W . Pooch and A . Nieder
ALGORITHM
Statement
01
02
03
04
05
06
07 S T A R T
08
09
10
ll
I
12
13
14
15
16 M A T H
2: A D D R E S S
MAP SCHEME
R O W *-- i
R I N D E X ~-- v(~)
R O W *-" R O W -- 1
C O L S ¢-- 3
R O W (-- R O W * C O L S
R O W (-- R O W - 1
R O W ~-" ROW + 1
C O L N U M (--- C O L N U M + 1
COLNUM > COLS
IF YES, GO TO ENDROW
B Y T E ~- b y t e f r o m
address map
BYTE ~ 0
IF YES, GO TO START
C H E C K ~-- 0
CHECK *- BYTE
CHECK ~ CHECK + RINDEX
Meaning
i = row
v = row index vector
(i - 1)
3 = $ columns
j*(~ -- 1)
(3*(2 -- 1)) -- 1
Increment across row
Increment column $
E n d of row
Yes, done
Pick up partial word
I s b y t e zero?
Reenter scan process
Zero w o r k a r e a
Byte to work area
Points to non-zero element
Required operations
performed here
17
18 E N D R O W
19
GO TO START
STOP
END
ALGORITHM
Statement
01
O2
03
O4
O5
O6
07 S T A R T
O8
O9
10
11
12
13
14
15
16
17 M A T H
Reenter scan process
Finish
3: A D D R E S S
MAP SCHEME
B E G I N *-- A d d r e s s of a d d r e s s m a p
B E G I N ~-- B E G I N + J
B E G I N *-- B E G I N -- 1
C O L S (-- 3
ROWS ~
B E G I N ~-- B E G I N - C O L S
B E G I N ~-- B E G I N + C O L S
R I N D E X *- v(I)
R O W C T R ~-- R O W C T R + 1
ROWCTR > COLS
IF YES, GO TO ENDROW
BYTE *- byte from address map
BYTE = 0
~
IF YES, GO TO START
C H E C K ~- 0
C H E C K ~- B Y T E
C H E C K ~-- C H E C K + R I N D E X
Meaning
Pointer
J = column g
3 = g columns
i = g rows
Increment address
Row index vector
I n c r e m e n t row c o u n t e r
P a s s e d e n d of m a t r i x ?
Yes, passed end
Pick up partml word
Is b y t e zero?
Reenter scan process
Zero w o r k a r e a
Byte to work area
P o i n t s to n o n - z e r o e l e m e n t
Required operations
performed here
18
19 E N D R O W
20
GO TO START
STOP
END
Reenter scan process
Finish
Computing Surveys, Vol 5, No 2, June 1973
21. Indexing Techniques for Sparse Matrices
I
•
129
ROW + i I
,1,,
)--
IR,,DEX + v(i)
_
i_ __~ .......
~, ~IT÷bit frombitmap
I
COLS÷ J
(
D
[Row ~- RO.*CO'S i
$
,, .
I"R W
O÷
¢
NO
¢
(ROW + BITS - I ) / B I T S l !
[i.o.~,o: ~oc,~
.~
[START ÷ R W
OI
• ~
,
I MASK& SHIFT R W N I
O ED
I
....
RowEND~
NO
oc.o. ; O"E"9
@.o
@
R W N- SAVEI
O ED
FIG A1. Flowchart--algorithm 1 bit map scheme.
Computing Surveys, VoL 5, No. 2, JuBe 1973
22. 130
•
U. W. Pooch and A. Nieder
C E K÷ 0
HC
~ NDEX÷ v(i)
ICHECK÷BYTE
F~o~~o~-~I
~-EX~
ICHECK÷ C E K+ RIND - HC
CL÷j
OS
@
O _,]
~,ROW÷R W+ l
YS
E
( IS COLNUM,COLS~
~
~NO
~_~TE ÷ bYte from address map ]
~IS BYTE= 0?~
@"°
FIG A2
YES~
F l o w c h a r t - - a l g o r i t h m 2: a d d r e s s m a p s c h e m e
Computing Surveys, Vol 5, No 2, June 1973
23. Indexing Techniques for Sparse Matrices
BEGIN
I,
÷ address map address I
•
131
CHECK 0
÷
T
FCHECK÷ BYTE
BEGIN ÷ BEGIN + J - l
C E K÷ C E K+
HC
HC
~ BEG,N ÷
RINDEX
$
BEGIN + CO'S l
[ RINDEX÷ v(I> I
( ~ RO.C~> c o ~
~
'~._j /
NO
I-BYTE+ byte from address map~
~ ~,~ o~; Y~ < ~
+
Fza A3
F l o w c h a r t - - a l g o r i t h m 3 address m a p scheme.
Computing Surveys, Vol, 5, No. 2, June 1973
24. 132
•
U. W. Pooch and A. Nieder
BIBLIOGRAPHY
1. BRAYTON, R., GUSTAVSON, F., AND WILLOUGHBY~ R. "Some results on sparse matrices." RC2332, IBM Watson Research Center,
(February 1969), 37-46.
2. LARSEN, L. "A modified inversion procedure
for product form of the inverse-linear programruing codes " Comm. ACM 5, 7 (July 1962) 382383
3. LIVESLEY,R. "An analysis of large structural
system." Comp. J. 3, (1960)34-39.
4. McCoRMICK,C.W. "Application of partially
handed matrix methods to structural analysis." Sparse Matrix Proceedings, R. Willoughby (Ed.) IBM Watson Research Center,
RAl1707 (March 1969) 155-158
5 ORCHARD-HAYs, W. Advanced L~near Programming Techniques McGraw-Hill, New
York, 1968, 73-82.
6. TEWARSON, R. "On the product form of inverse of sparse matrices." S I A M Rewew 8,
(1966) 336-342.
7 TEWARSON,R. "Row column permutation of
sparse matrices." Comp. J 10, (1967/68)
300-305
8. BRAYTON, R., GUSTAVSON, F , AND WILLOUGHBY, R. "Some results on sparse matrices." (Introduction), RC2332, IBM Watson
Research Center, (February 1969) 1-3.
9. BASHKOW, T "Network analysis." Mathematical Methods for Digztal Computers A.
Ralston and A. S. Wilf, Eds., Vol. I, John
Wiley and Sons, New York, 1967280-290
1O. TINNEY,W F. "Comments on using sparsltv
techniques for power system problems."
Sparse Matrix Proceedings R Willoughby, Ed.,
IBM Watson Research Center, RAl1707
(March, 1969) 25-34.
11. PALACOL,E . L . "The finite element method
of structural analysis " Sparse Matmx Proceedzngs R. Willoughby Ed., IBM Watson Research Center, RAl1707 (March, 1969) 101-5.
12. RALSTON, A. "Numerical integration methods for the solution of ordinary differential
equations." Mathematzcal Metaods for Dzgztal
Computers A. Ralston and A. S. Wilf Eds, Vol.
I, John Wiley and Sons, New York, 1967, 95109.
13. ROMANELLI, M "Runge-Kutta methods for
the solution of ordinary differentml equations " Mathematzcal Methods for Dzgztal Computers A. Ralston and A S Wilf, Eds , Vol. I,
John Wiley and Sons, New York 1967, 110-20.
14. WAC~SPRESS,E "The numerical solution of
boundary value problems " Mathematzcal
Methods for Dzgztal Computers A Ralston and
A. S. Wflf, E d s , Vol. I, John Wiley and Sons,
New York, 1967, 121-7.
15. WEIL, R,, JR, AND KETTLER, P. " A n algorithm to provide structure for decomposition."
Sparse Matrzx Procee&ngs R. Willoughby, Ed.,
IBM Wa~sca Research Center, RAl1707
(March, 1969) 11-24
16. GUSTAVSON,F., LINIGEB,W., WILLOUGHBY,R.
"Symbohc generation of an optimal crout algorithm for sparse systems of linear equa-
Computing Surveys, Vol. 5, No 2, June 1973
17.
18.
19
20.
21.
22
23.
24.
25.
26.
27
28.
29.
30.
31.
32.
33.
34.
tlons." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center,
RAl1707 (March, 1969) 1-10.
SMI~I, D . M . "Data logistics for matrix inversion." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center,
RAl1707 (March, 1969) 127-32.
SPILLERS,W. R., AND t~ICKERSON,N. "Optimal elimination for sparse symmetric systems
as a graph problem." Quar Appl. Math. 26
(1968) 425-32
STEWARD,D. V. "On an
to technique for the analysis of thepproacha of large
structure
systems of equations." S I A M Rev 4 (1962)
321-42.
TEWARSON,R . P . "The Gausslan elimination
and sparse systems," Sparse Matrzx Proceed~ngs R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 35-42.
GIVENS, W., McCoRMICK, HOFFMAN, et al.
"Panel discussion on new and needed word
and open questions." (Chairman P. Wolfe),
Sparse Matmx Proceedings R. Willoughby, Ed.,
IBM Watson Research Center, RAl1707
(March, 1969) 159-80.
WILKES, M. V. "The growth of interest in
microprogramming: a literature survey,"
Com p. Surveys, 1,3 (September, 1969) 139-45.
ORC~ARD-HAYs,W. " M P s y s t e m s technology
for large sparse matrices." Sparse Matrix Proceedzngs R. Willoughby, Ed , IBM Watson Research Center, RAl1707 (March, 1969) 59-64.
CHANG, A. "Apphcatlon of sparse matrix
methods in electric power system analysis."
Sparse Matrix Proceedings R. Willoughby, Ed.,
IBM Watson Research Center, BAll707
(March, 1969) 113-122.
BRAYTON, n . , GUSTAVSON, F., WILLOUGHBY,
R "Some results on sparse matrices." IBM
Watson Research Center, RC2332 (February
1969) 21-22.
CHhRTRES, B A., ANn GLUDEN, J C. " C o m putable error bounds for direct solution of
hnear equations." J ACM 14, 1 (Jan 1967)
63-71
FORSY~HE, G. E. "Crout with pivoting."
Comm. ACM 3 (1960) 507-8.
JENNINGS,A. "A compact storage scheme for
the solution of symmetric linear simultaneous
equations." Comput. J. 9 (1966/67) 281-5
System 360 Matrix Language (MATLAN) Application Description, IBM H20-0479 Program
Description Manual, IBM H20-0564
McNAMEE, J M. "Algorithm 408, a sparse
matrix package." (Part I), Comm ACM 4, 4
(April 1971) 265-273.
DULMAGE, A L., AND MENDELSOHN, N. S.
"On the inversion of sparse matrices." Math.
Comp. 16 (1962) 494-496.
MAYOH,B.H. "A graph technique for inverting certain matrices." Math. Comp. 19 (1965)
644-646.
RoT~, J. P. "An application of algebraic
topology: Kron's method of tearing " Quar.
Appl. Math. 17 (1959) 1-24
SWIFT, G "A comment on matrix inversaon
by partition." S I A M Rev. 2 (1960) 132-33.
25. Indexing Techniques for Sparse Matrices
35. KNUTH, D. ]~. The Art of Computer Programm~ng, Vol. I, Addison--Wesley, Reading,
Mass. 1968 299-304, 554-556.
36. BERZTISS, A . T . Data Structures: Theory and
Practice. Academic Press, New York, 1971,
276-279.
37. LARCOMBE, M. "A hst processing approach
to the solution of large sparse sets of matrix
equations and the factorization of the overall
matrix." in Large Sparse Sets of L~near Equatwns, Reid, J. K., Ed., Academm Press,
London, 1971.
38. WEIL, R. L., ANDKI~TTLER,P . C . "Rearranging matmces to block-angular form for decompotation (and other) algorithms." Management Science 18, 1 (Sept. 1971) 98-108.
39. GUSTAVSON, F. G. "Some basic techniques
for solving sparse systems of linear equations "
in Sparse Matmces and Their Applications,
Rose, D J , and Willoughby, R. A., Eds.,
Plenum Press, New York, 1972 41-52.
40. FIKE, C . T . PL/I for Scientific Programmers,
41.
42.
43.
44.
45.
46.
•
133
Prentice-Hall, Englewood Cliffs, N. J., 1970
108, 180.
WILLOUGHBY, R. A. "A survey of sparse
matrix technology." IBM Watson Research
Center, RC3872 May 1972.
CuTmt.t., E. "Several strategies for reducing
the band-width of matrices." in Sparse Matraces and their Applications, Rose, D . J., and
Willoughby, R. A., Eds., Plenum Press, New
York, 1972, 34-38.
TEWARSON,R . P . "Computations withsparse
matrices." SIAM Rev., 12, 4 (Oct. 1970) 527543.
PETTY, J. S. "FORTRAN M: programming
package for band matrices and vectors." Aerospace Research Labs., Wright-Patterson AFB,
Ohio, ARL-69-0064 (April, 1969).
SHLL~RS, W . R . "On Diakoptics: Tearing an
arbitrary system." Quar. Appl. Math. 23
(1965) 188-90.
IBM System/360 Model 65 Functional Characteristics, IBM A22-6884-3, File No. $360-01.
Computing Surveys, VoI. 5, No. 2, June 1973