This is a sample of a manual I developed while at Wideband for a software math and science digital signal processing library for the Analog Devices ADSP-21K. It contains the detailed descriptions of the routines and shows the programmers had a complete and useful solution.
1. ADSP-21K Optimized DSP Library User’s Manual
CHAPTER 5 Function Descriptions For
The ADSP-21K Optimized
DSP Library
Each function described in the following pages includes the following topics in order to
better understand its use:
• Name
• Description of the function's operation
• The algorithm as applicable
• Synopsis of function prototype
• Domain valid for arguments
• Accuracy of the returned value(s)
• Execution time in machine cycles
• Notes applicable to this function
Wideband Computers, Inc. 5-55
2. ADSP-21K Optimized DSP Library User’s Manual
acort ( a, c, m, n )
NAME Auto-correlation (Time Domain)
DESCRIPTION Computes the time domain auto-correlation of the real elements stored in input vector
a[ ]. Values m and n define the number of auto-correlation values to compute. The
resulting auto-correlation values are stored in output vector c[ ].
n–i–1
ALGORITHM Ci= ∑ Ai + j • Aj i = { 0, 1, 2, …m – 1 }
j= 0
SYNOPSIS void acort ( a, c, m, n )
float *a ; /* Pointer to input vector a[ ] */
float *c ; /* Pointer to output vector c[ ] */
int m ; /* Lag count m */
int n ; /* Number of elements in vector a[ ] */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 31 + 9*M + (M+1) (2*N-M)
NOTES The file tacort.c included in the distribution tape provides an example of this func-
tion’s use.
Note that the lag count m must be less than or equal to the number of floating-point
elements (i.e. m ≤ n ).
5-56 Wideband Computers, Inc.
3. ADSP-21K Optimized DSP Library User’s Manual
acos_wci ( x )
NAME Arc Cosine
DESCRIPTION This function computes the arc cosine of a floating-point number, x. The computed
value returned from this function is in the range [0 to π ] radians. A domain error is
returned if x is not in the range [-1 to +1].
ALGORITHM return = cos –1( x )
SYNOPSIS float acos_wci ( float x )
DOMAIN -1.0 < x < +1.0
ACCURACY 7.75 decimal digits
EXECUTION TIME If A <= 0.5 then 55 cycles, Else if A >0.5 then 75 cycles
NOTES The file tacos.c included in the distribution tape provides an example of this function's
use.
acosh_wci ( x )
NAME Inverse Hyperbolic Cosine
DESCRIPTION This function computes the inverse hyperbolic cosine of a floating-point number, x.
ALGORITHM return = cosh – 1( x )
SYNOPSIS float acosh_wci ( float x )
DOMAIN 1.0 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 72 cycles
NOTES The file tacosh.c included in the distribution tape provides an example of this func-
tion's use.
Wideband Computers, Inc. 5-57
4. ADSP-21K Optimized DSP Library User’s Manual
alawc ( a, i, c, k, n )
NAME a-Law Compression
DESCRIPTION This routine performs an a-law compression on the elements in input vector a and out-
puts the compressed results to output vector c.
C mk = alaw compression of A mi
ALGORITHM
m = { 0, 1, 2, …n – 1 }
SYNOPSIS void alawc ( a, i, c, k, n )
int *a ; /* Pointer to input vector a */
int i ; /* Element stride for vector i */
int *c ; /* Pointer to output vector c */
int k ; /* Element stride for vector c */
int n ; /* Number of floating-point elements */
DOMAIN 0 to 255
ACCURACY 7.75 decimal digits
EXECUTION TIME 49 + 12 * ( N-1 )
NOTES The file talawc.c included in the distribution tape provides an example of this func-
tion’s use.
The alawc() routine takes a linear 13-bit signed speech sample and compresses it
according to CCITT (now ITU) recommendation G.711. The 8-bit compressed sample
is output to vector c.
This function is found on the serial port hardware for the ADSP-2106x DSP proces-
sors.
5-58 Wideband Computers, Inc.
5. ADSP-21K Optimized DSP Library User’s Manual
alawe ( a, i, c, k, n )
NAME a-Law Expansion
DESCRIPTION This routine performs an a-law expansion on the elements in input vector a and out-
puts the expanded results to output vector c.
C mk = alaw expansion of A mi
ALGORITHM
m = { 0, 1, 2, …n – 1 }
SYNOPSIS void alawe ( a, i, c, k, n )
int *a ; /* Pointer to input vector a */
int i ; /* Element stride for vector i */
int *c ; /* Pointer to output vector c */
int k ; /* Element stride for vector c */
int n ; /* Number of floating-point elements */
DOMAIN 0 to 255
ACCURACY 7.75 decimal digits
EXECUTION TIME 46 + 17 * ( N-1 )
NOTES The file talawe.c included in the distribution tape provides an example of this func-
tion’s use.
The alawe() routine takes an 8-bit compressed speech sample and expands it accord-
ing to CCITT (now ITU) recommendation G.711. The 13-bit signed sample is output
to vector c.
This function is found on the serial port hardware for the ADSP-2106x DSP proces-
sors.
Wideband Computers, Inc. 5-59
6. ADSP-21K Optimized DSP Library User’s Manual
alpha ( df, a, &al, &n )
NAME Kaiser-Bessel Window Shape Parameter
DESCRIPTION Computes a Kaiser-Bessel window shape parameter for later use by the kaiser( ) win-
dow mutiply library function. The computation is based on the input attenutation
specified in input scalar a and the transition width specified in real input scalar df.
From this, a count of floating-point elements (output scalar n) and an output window
shape parameter (output scalar al) is computed.
If A ≤ 21 then al = 0
ALGORITHM
Else If
0.4
A < 50 then al = 0.5842 • ( A – 21 ) + 0.07886 • ( A – 21 )
Else If
al = 0.1102 • ( A – 8.7 )
Number of Elements n is computed as follows:
( A – 7.95 )
If A > 21 then d = ------------------------ else d = 0.922
-
14.36
n = 1 + ceiling ( d ⁄ df )
n = n + 1 – remainder ( n ⁄ 2 )
SYNOPSIS void alpha ( df, a, &al, &n )
float dm *df ; /* Input transition width in fs units */
float dm a ; /* Input ripple attenutation in dB */
float dm &al ; /* Output alpha window shape parameter */
int &n ; /* Output floating-point element count */
-3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
5-60 Wideband Computers, Inc.
7. ADSP-21K Optimized DSP Library User’s Manual
alpha ( df, a, &al, &n )
EXECUTION TIME If a >= 50 then 143 Cycles
If 21 < a < 50 then 221 Cycles
If A <= 21 then 124 Cycles
NOTES The file talpha.c included in the distribution tape provides an example of this func-
tion’s use.
– A ⁄ 20
df = ∆f ⁄ f s, A = ripple attentuation in dB, δ = 10
asin_wci ( x )
NAME Arc Sine
DESCRIPTION This function computes the arc sine of a floating-point number, x. The computed
value returned from this function is in the range [-π/2 to π/2] radians. A domain error
is returned if x is not in the range [-1 to +1].
ALGORITHM return = sin – 1( x )
Wideband Computers, Inc. 5-61
8. ADSP-21K Optimized DSP Library User’s Manual
asin_wci ( x )
SYNOPSIS float asin_wci ( float x )
DOMAIN - 1.0 < x < +1.0
ACCURACY 7.75 decimal digits
EXECUTION TIME If A <= 0.5 then 55 cycles, Else if A >0.5 then 73 cycles
NOTES The file tasin.c included in the distribution tape provides an example of this function's
use.
asinh_wci ( x )
NAME Inverse Hyperbolic Sine
DESCRIPTION This function computes the inverse hyperbolic sine of a floating-point number, x.
ALGORITHM return = sinh –1( x )
SYNOPSIS float asinh_wci ( float x )
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 57 cycles
NOTES The file tasinh.c included in the distribution tape provides an example of this func-
tion's use.
5-62 Wideband Computers, Inc.
9. ADSP-21K Optimized DSP Library User’s Manual
aspec ( a, c, n )
NAME Accumulating Auto-spectrum
DESCRIPTION Computes the auto-spectrum of complex input vector a by multiplying vector a by its
complex conjugate and adding the resulting real number to the current value of vector
c. Vector c must be initialized prior to invoking a series of accumulating auto-spec-
trum calls.
2 2
ALGORITHM C m ⇐ C m + Re Am + Im Am
m = { 0, 1, 2, …n – 1 }
SYNOPSIS void aspec ( a, c, n )
complex *a ; /* Pointer to input vector a */
float *c ; /* Pointer to output vector c */
int n ; /* Element count for vector c */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 28 + 6*N cycles
NOTES The file taspec.c included in the distribution tape provides an example of this func-
tion’s use.
The stride of vectors a and c must always be 1.
If you wish to clear the auto-spectrum results before they are added to output vector c
use the vclr( ) function. If the results are not cleared using vclr( ), autospectrum results
are added to output vector c, thus computing an accumulating autospectrum.
Note that input vector a is of type complex, and data arguments supplied to this routine
will be treated as interleaved real and imaginary data.
Wideband Computers, Inc. 5-63
10. ADSP-21K Optimized DSP Library User’s Manual
atan_wci ( x )
NAME Arc Tangent
DESCRIPTION This function computes the arc tangent of a floating-point number x. The computed
value returned from this function is in the range [-π/2 to +π/2] radians.
ALGORITHM return = tan –1( x )
SYNOPSIS float atan_wci ( float x )
DOMAIN - 4.2E+37 < x < +4.2E+37
ACCURACY 7.75 decimal digits
EXECUTION TIME 59 cycles
NOTES The file tatan.c included in the distribution tape provides an example of this function's
use.
5-64 Wideband Computers, Inc.
11. ADSP-21K Optimized DSP Library User’s Manual
atan2_wci ( y, x )
NAME Arc Tangent 2 Arguments
DESCRIPTION This function computes the arc tangent of a floating-point number x. The computed
value returned from this function is in the range [-π to +π] radians.
–1 y
return = tan --
ALGORITHM
-
x
SYNOPSIS float atan2_wci ( y, x )
float dm y ; /* Input value y */
float dm x ; /* Input value x */
DOMAIN - 4.2E+37 < y/x < +4.2E+37, except x = 0.0
ACCURACY 7.75 decimal digits
EXECUTION TIME 76 cycles
NOTES The file tatan2.c included in the distribution tape provides an example of this func-
tion's use.
Wideband Computers, Inc. 5-65
12. ADSP-21K Optimized DSP Library User’s Manual
atanh_wci ( x )
NAME Inverse Hyperbolic Tangent
DESCRIPTION This function computes the inverse hyperbolic tangent of a floating-point number, x.
ALGORITHM return = tanh – 1( x )
SYNOPSIS float atanh_wci ( float x )
DOMAIN -1.0 to +1.0
ACCURACY 7.75 decimal digits
EXECUTION TIME 59 cycles
NOTES The file tatanh.c included in the distribution tape provides an example of this func-
tion's use.
5-66 Wideband Computers, Inc.
13. ADSP-21K Optimized DSP Library User’s Manual
bartlett ( a, i, c, k, n )
NAME Bartlett Window
DESCRIPTION This function generates a Bartlett window multiply on the elements of input vector a
and places the results in output vector c.
ALGORITHM
1
m – -- n
2
-
C mk = Ami • 1 – ----------------
-- n
1
-
2
m = { 0, 1, 2, …n – 1 }
SYNOPSIS void bartlett ( a, i, c, k, n )
float *a ; /* Pointer to input vector a */
int i ; /* Address stride in words for input vector a */
float *c ; /* Pointer to output vector c */
int k ; /* Address stride in words for output vector c */
int n ; /* Element count */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
EXECUTION TIME 44 + 17 * ( N-1 ) cycles
NOTES The file tbartlett.c included in the distribution diskette provides an example of this
function’s use.
The Bartlett window is also known as a triangular window.
Wideband Computers, Inc. 5-67
14. ADSP-21K Optimized DSP Library User’s Manual
biquad ( x, d, c, y, n )
NAME Bi-Quad IIR Filter
DESCRIPTION Using a bi-quad implementation, this function computes an IIR ( Infinite Impulse
Response ) filter using coefficients stored in input vector c, delay node points stored in
input buffer d, and applied to the elements of input vector x. The results are stored in
output vector y.
–1 –2
B0 + B1 z + B2 z
ALGORITHM H ( z ) = -----------------------------------------------
-
–1 –2
1 – A1 z – A2 z
where
Dm = A2 • Dm – 2 + A1 • Dm – 1 + xm
Y m = B2 • Dm – 2 + B1 • Dm – 1 + Dm
m = { 0, 1 , 2 , …, n – 1 }
SYNOPSIS void biquad ( x, d, c, y, n )
float *x ; /* Pointer to input buffer vector x of length n */
float *d ; /* Pointer to input delay node buff vector d of length 2 */
float *c ; /* Pointer to input coeff buffer vector c of length 5 */
float *y ; /* Pointer to output buffer vector y of length n */
int n ; /* Number of input/output samples to compute */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
5-68 Wideband Computers, Inc.
15. ADSP-21K Optimized DSP Library User’s Manual
biquad ( x, d, c, y, n )
EXECUTION TIME 65 + 13*N
NOTES This is a single bi-quad form of an infinite impulse response filter (IIR), defined by the
first equation shown above. It is implemented using a delay node buffer d shown in the
second and third equation shown above. The coefficients a[ ] and b[ ] are passed in a
single array c[ ] given by the following:
c [ 0 ] = A2 c [ 1 ] = B2 c [ 2 ] = A1 c [ 3 ] = B1 c [ 4 ] = B0
Prior to executing the filter loop, the two “oldest” delay node values are loaded from
buffer d[ ]. When the filter loop has completed (n samples have been processed) the
two “newest” delay node values are written to d[ ]. In this way the filter delay node
states are retained between calls, allowing filtering on blocks of contiguous samples.
The user is responsible for allocating the delay node array and for initializing its ele-
ments to zero prior to the first call to biquad( ).
Defining
d0 = D m d1 = D m – 1 d2 = D m – 2
Then
d0 = c0 • d2 + c2 • d1 + xm
ym = c1 • d2 + c 3 • d1 + c 4 • d0
d2 = d1
d1 = d0
m = { 0 , 1 , 2, … , n – 1 }
The coefficient buffer length is defined symbolically in the file dsppac.h as
DSP_BIQUAD_NCOEFF. The delay node buffer length is defined symbolically in
the file dsppac.h as DSP_BIQUAD_NDELAY.
The number of input samples n must be greater than or equal to 5.
The file tbiquad.c included in the distribution tape provides an example of this func-
tion’s use.
Wideband Computers, Inc. 5-69
16. ADSP-21K Optimized DSP Library User’s Manual
blkman ( a, i, c, k, w, h, n )
NAME Blackman Window Multiply
DESCRIPTION Multiplies the input vector a[ ] by a Blackman window and stores the result to vector
c[ ].
ALGORITHM 2πmi 4πmi
C mk = A mi • 0.42 – 0.50 • cos ------------ + 0.08 • cos ------------
- -
N N
m = { 0, 1, 2, …, n – 1 }
SYNOPSIS void blkman ( a, i, c, k, w, h, n )
float dm *a ; /* Pointer to input vector a */
int i ; /* Element stride for vector a */
float dm *c ; /* Pointer to output vector c */
int k ; /* Element stride for vector c */
float pm *w ; /* Pointer to cosine weights array */
int h ; /* Element stride for weights array */
int n ; /* Element count for vector c */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
5-70 Wideband Computers, Inc.
17. ADSP-21K Optimized DSP Library User’s Manual
blkman ( a, i, c, k, w, h, n )
EXECUTION TIME 41 + 4*(N-1) cycles
NOTES The file tblkman.c included in the distribution tape provides an example of this func-
tion’s use.
For real-time applications, the Blackman window can be computed once, and a simple
multiply used to window data as shown in the variable W ml . The Blackman Win-
dow is computed using the winwts( ) function found in the DSP Pac library. The win-
wts( ) function computes the weights array using the sin and cosine functions. This
array is pointed to by variable w listed in the synopsis section above.
The blkman( ) function is a vector function. You may therefore use the stride argu-
ments i, k and h to decimate both the input and output for data congruence. For exam-
ple, suppose you use winwts( ) to compute the FFT weights for a 16K FFT. This would
result in an fftwts array whose length would be 16,384 points. If you were to later
decide to compute an FFT of length 1,024 and run a Blackman Window on the results,
you would not need to rerun the winwts( ) function to generate new weights. Simply
use the old weights and stride by 16 (16,384/1024 = 16) on stride element h to obtain
the correct Blackman window FFT weights . In this manner you need only compute
winwts( ) once and later us them for varying length FFTs and windowing functions.
The cosine arguments are held in input vector w[ ] and can be computed from the win-
wts( ) function. Note that larger vector sizes of w[ ] can be used by changing the stride
for w[ ]. For example, if w[ ] were computed for a window of size 2,048, but a Black-
man Window of 1,024 was needed, use a stride of 2,048/1,024 = 2.
Note that the Blackman window has a passband ripple of 0.0017 dB, a maximum stop-
band attenuation of 74 dB, and a 57 dB main lobe relative to side lobe.
Wideband Computers, Inc. 5-71
18. ADSP-21K Optimized DSP Library User’s Manual
blkmanh ( a, i, c, k, w, h, n )
NAME Blackman-Harris Window Multiply
DESCRIPTION Multiplies the input vector a[ ] by a Blackman-Harris window and stores the result to
output vector c[ ].
2πmi 4πmi 6πmi
ALGORITHM C mk = A mi • 0.35875 – 0.48829 • cos ------------ + 0.14128 • cos ------------ – 0.01168 • cos ------------
- - -
N N N
m = { 0, 1, 2, …, n – 1 }
SYNOPSIS void blkmanh ( a, i, c, k, w, h, n )
float dm *a ; /* Pointer to input vector a */
int i ; /* Element stride for vector a */
float dm *c ; /* Pointer to output vector c */
int k ; /* Element stride for vector c */
float pm *w ; /* Pointer to cosine weights array */
int h ; /* Element stride for weights array */
int n ; /* Element count for vector c */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
5-72 Wideband Computers, Inc.
19. ADSP-21K Optimized DSP Library User’s Manual
blkmanh ( a, i, c, k, w, h, n )
EXECUTION TIME 54 + 6*(N-1) cycles
NOTES The file tblkmanh.c included in the distribution tape provides an example of this
function’s use.
For real time applications, the Blackman-Harris window can be computed once, and a
simple multiply used to window data, as shown in the variable W ml . The Blackman-
Harris Window is computed using the winwts( ) function found in the DSP Pac library.
The winwts( ) function computes the weights array using the sin and cosine functions.
This array is pointed to by variable w listed in the synopisis section above.
The blkmanh function is a vector function. You may therefore use the stride argu-
ments i, k and h to decimate both the input and output for data congruence. For exam-
ple, suppose you use winwts( ) to compute the FFT weights for a 16K point FFT. This
would result in an fftwts array whose length would be 16,384 points. If you were to
later decide to compute an FFT of length 1,024 and run a Blackman-Harris Window on
the results, you would not need to rerun the winwts( ) function to generate new
weights. Simply use the old weights and stride by 16 (16,384/1024 = 16) on stride ele-
ment h to obtain the correct window FFT weights . In this manner you need only com-
pute winwts( ) once and later us them for varying length FFTs and windowing
functions.
The cosine arguments are held in input vector w[ ] and can be computed from the win-
wts( ) function. Note that larger vector sizes of w[ ] can be used by changing the stride
for w[ ]. For example, if w[ ] were computed for a window of size 2,048, but a Black-
man Window of 1,024 was needed, use a stride of 2,048/1,024 = 2.
Note that the Blackman-Harris window has a passband ripple of 0.0017 dB, a maxi-
mum stopband attenuation of 74 dB, and a 57 dB main lobe relative to side lobe.
Wideband Computers, Inc. 5-73
20. ADSP-21K Optimized DSP Library User’s Manual
cacort ( a, c, m, n )
NAME Complex Auto-Correlation (Time Domain)
DESCRIPTION Computes the time domain auto-correlation of the complex elements stored in input
vector a[ ]. Values m and n define the number of auto-correlation values to compute.
The resulting auto-correlation values are stored in output complex vector c[ ].
n–i–1
ALGORITHM Ci= ∑ Ai + j • Aj i = { 0, 1, 2, …m – 1 }
j=0
SYNOPSIS void cacort ( a, c, m, n )
complex dm *a ; /* Pointer to input vector a[ ] */
complex dm *c ; /* Pointer to output vector c[ ] */
int m ; /* Lag count m */
int n ; /* Number of elements in vector a[ ] */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 39 + ( 9 + 5 * n ) * n
NOTES The file tacort.c included in the distribution tape provides an example of this func-
tion’s use.
Note that the lag count m must be less than or equal to the number of floating-point
elements (i.e. m ≤ n ).
The strides of vectors a[ ] and c [ ] must be 1.
5-74 Wideband Computers, Inc.
21. ADSP-21K Optimized DSP Library User’s Manual
ccdotpr ( a, i, b, j, c, k, n )
NAME Complex Dot Product Multiply by Conjugate
DESCRIPTION This function computes the complex dot product of complex input vector a by the
complex conjugate of input vector b and stores the results in complex output vector c.
This can be alternatively expressed as C=AB*.
ALGORITHM n–1
Re { C } = ∑ Re{ Ami } • Re { Bmj } + Im { Ami } • Im{ Bmj }
m= 0
n–1
Im { C } = ∑ –Re{ Ami } • Im { Bmj } + Im { Ami } • Re { Bmj }
m= 0
m = { 0, 1, 2…n – 1 }
SYNOPSIS void ccdotpr ( a, i, b, j, c, k, n )
complex *a ; /* Pointer to complex input vector a */
int i ; /* Address stride in words for input vector a */
complex *b ; /* Pointer to complex input vector b */
int j ; /* Address stride in words for input vector b */
complex *c ; /* Pointer to complex output vector c */
int k ; /* Address stride in words for output vector c */
int n ; /* Element count */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
EXECUTION TIME 64 + 4*(N-1) cycles
NOTES The file tccdotpr.c included in the distribution diskette provides an example of this
function’s use.
Wideband Computers, Inc. 5-75
22. ADSP-21K Optimized DSP Library User’s Manual
ccmmul ( a, b, x, y, b, z, c )
NAME Complex Matrix Multiply By Congugate of Complex Matrix
DESCRIPTION This function computes the multiplication of the conjugate of complex input matrix
a [ ] [ ] times the elements of complex input matrix b[ ] [ ]. The dimensions of com-
plex input matrix a[ ] [ ] are x and y, while the dimensions of complex input matrix
b[ ] [ ] are defined by input scalars y and z. The results are stored in complex output
matrix c[ ] [ ], which is of dimensions x and z.
ALGORITHM y
Re ( C ij ) = ∑ [ ( Re )Aik • ( Re )Bkj + ( Im )Aik • ( Im )Bkj ]
k=1
y
Im(C
ij ) = ∑ [ ( Re )C ik • ( Im )B kj – ( Re )B kj • ( Im )A ik ]
k=1
for i = { 0, 1, …x }
for j = { 0, 1, …z }
SYNOPSIS void ccmmul( a, x, y, b, z, c )
complex dm *a ; /* Pointer to complex input matrix a[ ][ ] */
int x ; /* Number of rows in complex matrix a[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ] And */
/* Number of rows in complex matrix b[ ][ ] */
complex dm *b ; /* Pointer to complex input matrix b[ ][ ] */
int z ; /* Number of columns in matrix b[ ][ ] */
complex dm *c ; /* Pointer to complex output matrix c[ ][ ] */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
5-76 Wideband Computers, Inc.
23. ADSP-21K Optimized DSP Library User’s Manual
ccmmul ( a, b, x, y, b, z, c )
EXECUTION TIME 62 + ( 6 + ( 12 + 7 * Y ) * Z ) * X cycles
NOTES The file tccmmul.c included in the distribution diskette provides an example of this
function’s use.
a[x][y] = 1, 1 2, 2 3, 3 4, 4
5, 5 6, 6 7, 7 8, 8
9, 9 10, 10 11, 11 12, 12
1, 2 3, 4 5, 6
b[y][z] = 7, 8 9, 10 11, 12
13, 14 15, 16 17, 18
19, 20 21, 22 23, 24
x = 3, y = 4, z = 3 ;
ccmmul ( a, x, y, b, z, c ) ;
The resulting values in output matrix c [ ] [ ] would be as follows:
c[x][y] = 270, 10 310, 10 350, 10
606, 26 610, 26 814, 26
942, 42 1110, 42 1278, 42
The storage methodology for matrices is by rows. Matrices can be thought of as one
long array (vector) where the beginning of each row is offset by the number of col-
umns.
Wideband Computers, Inc. 5-77
24. ADSP-21K Optimized DSP Library User’s Manual
ccmsmul ( a, x, y, b, c )
NAME Complex Scalar-Complex Congugate Matrix Multiplication
DESCRIPTION This function computes the multiplication of the conjugate of the complex input
matrix a[ ] [ ] times complex input scalar b. The dimensions of complex input matrix
a[ ] [ ] are x and y. The results are stored in complex output matrix c[ ] [ ], which is of
dimensions x and y.
ALGORITHM
Cxy = B • Axy
SYNOPSIS void ccmsmul( a, x, y, b, c )
complex dm *a ; /* Pointer to complex input matrix a[ ][ ] */
int x ; /* Number of rows in complex matrix a[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ] */
complex dm *b ; /* Pointer to complex input scalar b */
complex dm *c ; /* Pointer to complex output matrix c[ ][ ] */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
5-78 Wideband Computers, Inc.
25. ADSP-21K Optimized DSP Library User’s Manual
ccmsmul ( a, x, y, b, c )
EXECUTION TIME 46 + 2 * X * Y cycles
NOTES The file tccmsmul.c included in the distribution diskette provides an example of this
function’s use.
a[x][y] = 1, 2 3, 4 5, 6
7, 8 9, 10 11, 12
13, 14 15, 16 17, 18
19, 20 21, 22 23, 24
25, 26 27, 28 29, 30
b = {8,2}
x = 8, y = 7 ;
ccmsmul ( a, x, y, b, c ) ;
The resulting values in output matrix c [ ] [ ] would be as follows:
c[x][y] = 12, – 14 32, – 26 52, – 38
72, – 50 92, – 62 112, – 74
132, – 86 152, – 98 172, – 110
192, – 122 212, – 134 232, – 146
252, – 158 272, – 170 292, – 182
The storage methodology for matrices is by rows. Matrices can be thought of as one
long array (vector) where the beginning of each row is offset by the number of col-
umns.
Wideband Computers, Inc. 5-79
26. ADSP-21K Optimized DSP Library User’s Manual
cccort ( a, b, c, m, n )
NAME Complex Cross-Correlation (Time Domain)
DESCRIPTION Computes the time domain (real) cross-correlation of the time domain (real) elements
stored in complex input vectors a[ ] and b[ ]. The result is stored in complex output
vector c [ ]. Values m and n define the number of cross-correlation values to compute.
The implementation uses a time domain technique.
n–i–1
ALGORITHM
Ci = ∑ Ai + j • Bj i = { 0, 1, 2, …, m – 1 }
j=0
SYNOPSIS void cccort ( a, b, c, m, n )
complex dm *a ; /* Pointer to input vector a[ ] */
complex dm *b ; /* Pointer to input vector b[ ] */
complex dm *c ; /* Pointer to output vector c[ ] */
int m ; /* Lag count m */
int n ; /* Number of elements in vector c[ ] */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 41 + ( 9 + 5 * n ) * m
NOTES The file tcccort.c included in the distribution tape provides an example of this func-
tion’s use.
Note that the lag count must be less than or equal to the number of floating-point ele-
ments (i.e. m ≤ n ).
The strides of vectors a[ ], b[ ], and c[ ] must always be 1.
5-80 Wideband Computers, Inc.
27. ADSP-21K Optimized DSP Library User’s Manual
ccort ( a, b, c, m, n )
NAME Cross-Correlation (Time Domain)
DESCRIPTION Computes the time domain (real) cross-correlation of the time domain (real) elements
stored in input vectors a[ ] and b[ ]. The result is stored in output real vector c [ ]. Val-
ues m and n define the number of cross-correlation values to compute. The implemen-
tation uses a time domain technique.
n–i–1
ALGORITHM Cm = ∑ Ai + j • Bj i = { 0, 1, 2, …, m – 1 }
j=0
SYNOPSIS void ccort ( a, b, c, m, n )
float *a ; /* Pointer to input vector a[ ] */
float *b ; /* Pointer to input vector b[ ] */
float *c ; /* Pointer to output vector c[ ] */
int m ; /* Lag count m */
int n ; /* Number of elements in vector c[ ] */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 32 + 9 * M + (M+1)(2*N-M)
NOTES The file tccort.c included in the distribution tape provides an example of this func-
tion’s use.
Note that the lag count must be less than or equal to the number of floating-point ele-
ments (i.e. m ≤ n ).
The strides of vectors a, b, and c must always be 1.
Wideband Computers, Inc. 5-81
28. ADSP-21K Optimized DSP Library User’s Manual
cdesamp ( data, coeff, output, d, n, p )
NAME Complex Decimating Finite Impulse Response (FIR) Filter
DESCRIPTION The function computes the convolution of complex vectors data [ ] and coeff [ ] plac-
ing the results in complex vector output [ ]. The number of output samples n and the
number of coefficients p may be dissimilar. n elements will be written to output [ ].
Complex vector data [ ] represents the real and imaginary (I and Q) components of the
input data respectively. Likewise, complex vector coeff [ ] represents the real and
imaginary ( I and Q) components of the coefficient data. A complex multiply and add
is performed to compute the convolutional output. The decimation factor d is used to
stride the next starting point in data [ ].
p–1
ALGORITHM
Output [ i ] = ∑ data [ i • d + j ] • coeff [ p – j – 1 ]
j=0
i = { 0, 1, 2…n – 1 }
SYNOPSIS void cdesamp ( data, coeff, output, d, n, p )
complex dm *data ; /* Complex input data ( len n+p-1 ) */
complex pm *coeff ; /* Complex coefficients ( len p ) */
complex dm *output ; /* Complex output data ( len n ) */
int d ; /* Decimation factor */
int n ; /* Number of output samples */
int p ; /* Number of coefficients */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
5-82 Wideband Computers, Inc.
29. ADSP-21K Optimized DSP Library User’s Manual
cdesamp ( data, coeff, output, d, n, p )
EXECUTION TIME 36 + ( 7 + 5 * p ) * n cycles
NOTES The file tcdesamp.c included in the distribution tape provides an example of this func-
tion’s use.
The number of filter output samples to generate can be obtained as follows:
n = ( ndata – p ) ⁄ d + 1
where ndata is the number of elements in data[ ].
A complex correlation can be performed by reversing the order of the coefficients vec-
tor.
Wideband Computers, Inc. 5-83
30. ADSP-21K Optimized DSP Library User’s Manual
cdotpr ( a, i, b, j, c, k, n )
NAME Complex Dot Product
DESCRIPTION This function computes the complex dot product of complex input vector a and com-
plex input vector b and stores the results in complex output vector c. This can altena-
tively thought of as C = A • B .
ALGORITHM n–1
Re { C } = ∑ Re { Ami } • Re { Bmj } – Im{ Ami } • Im{ Bmj }
m= 0
n–1
Im { C } = ∑ Re{ Ami } • Im { Bmj } + Im { Ami } • Re { Bmj }
m= 0
m = { 0, 1, 2…n – 1 }
SYNOPSIS void cdotpr ( a, i, b, j, c, k, n )
complex *a ; /* Pointer to complex input vector a */
int i ; /* Address stride in words for input vector a */
complex *b ; /* Pointer to complex input vector b */
int j ; /* Address stride in words for input vector b */
complex *c ; /* Pointer to complex output vector c */
int k ; /* Address stride in words for output vector c */
int n ; /* Element count */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
EXECUTION TIME 64 + 4*(N-1) cycles
NOTES The file tcdotpr.c included in the distribution diskette provides an example of this
function’s use.
5-84 Wideband Computers, Inc.
31. ADSP-21K Optimized DSP Library User’s Manual
ceil_wci ( x )
NAME Round Up to Nearest Integer
DESCRIPTION This function computes the smallest integral value greater than or equal to the float-
ing-point number x. A floating-point representation of this integer value is returned.
ALGORITHM return = smallest int ≥ x
SYNOPSIS float ceil_wci ( float x )
DOMAIN -3.4E+38 to 3.40E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 20 cycles
NOTES The file tceil.c included in the distribution tape provides an example of this function's
use.
Wideband Computers, Inc. 5-85
32. ADSP-21K Optimized DSP Library User’s Manual
cfft ( xr, xi, wr, wi, wstr, yr, yi, n )
NAME Fast Fourier Transform Of Complex Input Data
DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in com-
plex input vector a. The results are stored in complex output vector c.
n–1
– i2πmk ⁄ n
ALGORITHM Cm = ∑ Ake m = { 0, 1, 2, …, n – 1 }
k=0
SYNOPSIS void cfft ( xr, xi, wr, wi, wstr, yr, yi, n )
float dm *xr ; /* Pointer to real input data */
float dm *xi ; /* Pointer to imaginary input data */
float pm *wr ; /* Pointer to cosine table */
float dm *wi ; /* Pointer to sine table */
int wstr ; /* Cosine/sine table stride */
float dm *yr ; /* Pointer to real output data */
float pm *yi ; /* Pointer to imaginary output data */
int n ; /* FFT Size (In Complex Elements) */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME See Attached Table Below
5-86 Wideband Computers, Inc.
33. ADSP-21K Optimized DSP Library User’s Manual
cfft ( xr, xi, wr, wi, wstr, yr, yi, n )
NOTES This is a radix-2 Fast Fourier Transform using parallel data memory/program memory
data accesses to maximize the throughput on the 21020/60/62 processor. The complex
input data is separated into real and imaginary parts, xr and xi. These vectors must be
aligned on an address which is an integer multiple of the FFT size, as required for
21K bit-reverse addressing. The input vectors are both in data memory; the imagi-
nary data is bit-reversed into program memory at the beginning of the routine. The
number of elements n supplied to the algorithm must be an integral power of two and
a minimum of 32.
The complex output is separated into real and imaginary parts, yr and yi. These vec-
tors may have arbitrary address alignment; however yr is in the data memory and yi is
in the program memory. Vectors xr and xi must be in data memory and each must be
aligned to an integral multiple of n.
Vectors wr and wi are in program memory and data memory respectively and are
given the values:
wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )program memory
wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )data memory
The weight stride wst allows cfft() to be called with varying sizes n from a single set
of weights.These weights are generated using the fftwts() function.
This precomputed FFT weight approach was implemented in order to ensure accurate
results and boost the available cfft() dynamic range to approximately 130 dB for
longer length (>16K) FFTs. This is accomplished by using an implementation that
does not rely on a recursive call to a sin/cosine approximation routine, as found in
other implementations. Rather, the FFT weights are precomputed accurately using the
fftwts() function. This is sufficient for A/D converters with bit lengths up to 22 bits.
The number of elements n must be an integral power of two and a minimum of 32.
Vector yr is in data memory and has a minimum size of n. Vector yi is in program
memory and has a minimum size of n.
The file tcfft.c included in the distribution tape provides an example of this function’s
use.
Wideband Computers, Inc. 5-87
34. ADSP-21K Optimized DSP Library User’s Manual
cfft ( xr, xi, wr, wi, wstr, yr, yi, n )
SPECIAL NOTES Previous users have sometimes reported problems associated with implementing inter-
rupt service routines (ISRs), when used in conjunction with the FFT routines ( cfft( ),
cffti( ), rfft( ), rffti( ) ). Observations related to the Wideband technical staff typically
include a description of the Wideband routine executing perfectly, but unable to return
to an exact state after being interrupted by the ISR ( what is described as a “tumble
into the weeds.” )
The Wideband Fast Fourier transforms, both complex and real, forward and inverse,
use the built-in bit reversing and circular addressing capabilites of the SHARC archi-
tecture. Also, other routines such as some of the FIR filters use the SHARC’s internal
circular addressing capabilities.
End users are usually cognizant that their ISR calling routine is responsible for saving
and restoring the registers of the Wideband routines. However, end users sometimes
forget to save and restore ( push and pop ) the mode 1 regiser, which is associated with
bir reversing and the B ( base ) and L ( length ) registers associated with circular
addressing. In such circumstances where they are not saved and restored by the ISR
they are unable to return the proper length parameter ( L Register ) used for circular
addressing or the proper mode ( Mode 1 Register ) used in Bit Reversing. This results
in the strange manefestations users sometimes report.
To properly save and restore the above mentioned registers in an ISR, refer to page 4-
21, section 4.3 of the Analog Devices ADSP-21000 Family C Tools Manual (#31-
000005-08, dated August 95) which references examples of in line assembly code
within C code to save and restore registers.
For a detailed review of the relationships between the various FFT functions and how
to use them with one another, see the final section of Chapter 4.
5-88 Wideband Computers, Inc.
35. ADSP-21K Optimized DSP Library User’s Manual
Performance Issues
The inital timing shown below for the 32 point to 4,096 point FFTs were timed using the
Analog Devices simulator.
Performance Timings For Complex FFTs
Number of
Points Processor Cycles
8 See cfft8( ) function
16 See cfft16( ) function
32 771 Cycles
64 1,274 Cycles
128 2,368 Cycles
256 4,724 Cycles
512 10,060 Cycles
1,024 21,618 Cycles
2,048 46,744 Cycles
4,096 101,054 Cycles
8,192 217,828 Cycles
16,384 467,722 Cycles
32,768 1,000,240 Cycles
65,536 2,130,774 Cycles
Wideband Computers, Inc. 5-89
36. ADSP-21K Optimized DSP Library User’s Manual
cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n )
NAME Complex 2-Dimensional Fast Fourier Transform
DESCRIPTION Computes a 2-Dimensional Fast Fourier Transform of the complex input elements
stored in vector a[ ]. The results are stored in complex output vector c[ ].
n–1n–1
– 2 πj ( ( r ⋅ R + c ⋅ C ) ⁄ n )
ALGORITHM
C r, c = ∑ ∑ Ake
r = 0c = 0
R = { 0, 1, …n – 1 }
C = { 0, 1, …n – 1 }
SYNOPSIS void cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n )
float dm *xr ; /* Pointer to real input/output data */
float dm *xi ; /* Pointer to imaginary input/output data */
float pm *wr ; /* Pointer to cosine table */
float dm *wi ; /* Pointer to sine table */
int wstr ; /* Consine/sine Table table */
float dm *tmpdm ; /* Pointer to real output data */
float pm *tmppm ; /* Pointer to imag output data */
int n ; /* CFFT2D Size (Complex Elements n x n) */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
5-90 Wideband Computers, Inc.
37. ADSP-21K Optimized DSP Library User’s Manual
cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n )
EXECUTION TIME 32 x 32 Pts. 44,532 cycles
64 x 64 Pts. 165,364 cycles
128 x 128 Pts. 659,572 cycles
Wideband Computers, Inc. 5-91
38. ADSP-21K Optimized DSP Library User’s Manual
cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n )
NOTES The input data is an nxn complex matric x separated into real and imaginary parts xr
and xi stored as follows:
Re ( x r, c ) = xr [ r • n + c ]
r = { 0, 1, …, n – 1 } c = { 0, 1, …, n – 1 }
Im ( x r, c ) = xi [ r • n + c ]
r = { 0, 1, …, n – 1 } c = { 0, 1, …, n – 1 }
Variables r and c are the row and column numbers.
The DFT output replaces the input, and is stored as follows:
Re ( F R, C ) = xr [ R • n + C ]
R = { 0, 1, …, n – 1 } C = { 0, 1, …, n – 1 }
Im ( F R, C ) = xi [ R • n + C ]
R = { 0, 1, …, n – 1 } C = { 0, 1, …, n – 1 }
A radix-2 Fast Fourier Transform (FFT) algorithm is used to compute the individual
row and column DFTs.
The number of elements n must be an integral power of two and a minimum of 32.
Vectors xr and xi must be in data memory and are adress-aligned to an integral multi-
ple of n.
Vectors wr and wi must be in program memory and data memory respectively and are
pre-computed to be:
wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )
wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )
Vector tmpdm must be in data memory, having a minimum size of n, and be address-
aligned to an integral multiple of n.
Vector tmppm must be in program memory and have a minimum size of n,and be
address-aligned to an integral multiple of n.
The file tcfft2d.c included in the distribution tape provides an example of this func-
tion’s use.
5-92 Wideband Computers, Inc.
39. ADSP-21K Optimized DSP Library User’s Manual
cfft8 ( xr, xi, yr, yi )
NAME 8-Point Complex Fast Fourier Transform (Inline)
DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in input
vector xr and xi. The results are stored in output vector yr and yi.
ALGORITHM 7
– 2πj ( m • k ⁄ 8 )
Ym = ∑ Xke m = { 0, 1, 2, …, 7 }
k=0
SYNOPSIS void cfft8 ( xr, xi, yr, yi )
float dm *xr ; /* Pointer to real input data */
float dm *xi ; /* Pointer to imaginary input data */
float dm *yr ; /* Pointer to real output data */
float pm *yi ; /* Pointer to imaginary output data */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
Wideband Computers, Inc. 5-93
40. ADSP-21K Optimized DSP Library User’s Manual
cfft8 ( xr, xi, yr, yi )
EXECUTION TIME 184 Cycles
NOTES This is an 8-point radix-2 Fast Fourier Transform using parallel data memory/program
memory data accesses to maximize the throughput on the 21020/60/62 processor.
The complex input data is separated into real and imaginary parts, xr and xi. These
vectors must be aligned on an address which is an integer multiple of the FFT
size, as required for 21K bit-reverse addressing. The input vectors are both in data
memory; the imaginary data is bit-reversed into program memory at the beginning of
the routine.
This algorithm utilizies a decimation in time approach. As the cffti( ) function requires
a minimum of 32-points as input, there is no corresponding inverse algorithm for this
routine. The complex output is separated into real and imaginary parts, yr and yi.
These vectors may have arbitrary address alignment; however yr is in the data mem-
ory and yi is in the program memory.
•Vectors xr and xi are defined in cfft8dta.asm using the dm_align segment to
ensure address alignment.
For a detailed review of the relationships between the various FFT functions and how
to use them with one another, see the final section of Chapter 4.
The file tcfft8.c included in the distribution tape provides an example of this func-
tion’s use.
5-94 Wideband Computers, Inc.
41. ADSP-21K Optimized DSP Library User’s Manual
cfft16 ( xr, xi, yr, yi )
NAME 16-Point Complex Fast Fourier Transform (Inline)
DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in input
vector xr and xi. The results are stored in output vector yr and yi.
15
∑ Xke
ALGORITHM – 2πj16
Ym = m = { 0, 1, 2, …, 15 }
k=0
SYNOPSIS void cfft16 ( xr, xi, yr, yi )
float dm *xr ; /* Pointer to real input data */
float dm *xi ; /* Pointer to imaginary input data */
float dm *yr ; /* Pointer to real output data */
float pm *yi ; /* Pointer to imaginary output data */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
Wideband Computers, Inc. 5-95
42. ADSP-21K Optimized DSP Library User’s Manual
cfft16 ( xr, xi, yr, yi )
EXECUTION TIME 388 Cycles
NOTES This is an 16-point radix-2 Fast Fourier Transform using parallel data memory/pro-
gram memory data accesses to maximize the throughput on the 21020/60/62 proces-
sor.
The complex input data is separated into real and imaginary parts, xr and xi. These
vectors must be aligned on an address which is an integer multiple of the FFT
size, as required for 21K bit-reverse addressing. The input vectors are both in data
memory; the imaginary data is bit-reversed into program memory at the beginning of
the routine.
This algorithm utilizies a decimation in time approach. As the cffti( ) function requires
a minimum of 32-points as input, there is no corresponding inverse algorithm for this
routine. The complex output is separated into real and imaginary parts, yr and yi.
These vectors may have arbitrary address alignment; however yr is in the data mem-
ory and yi is in the program memory.
•Vectors xr and xi are defined in cfft16dt.asm using the dm_align segment to
ensure address alignment.
For a detailed review of the relationships between the various FFT functions and how
to use them with one another, see the final section of Chapter 4.
The file tcfft16.c included in the distribution tape provides an example of this func-
tion’s use.
5-96 Wideband Computers, Inc.
43. ADSP-21K Optimized DSP Library User’s Manual
cffti ( xr, xi, wr, wi, wstr, yr, yi, n )
NAME Inverse Complex FFT
DESCRIPTION Computes the Inverse Fast Fourier Transform of the input elements stored in vectors
xr and xi. The results are stored in complex output vector c. Note the Inverse FFT is
the same as the Forward FFT except that the sign of the imaginary components of the
twiddle factors is negated. The Inverse FFT swaps the real and imaginary input data,
perform the Forward FFT with the same weights table, and swaps the real and imagi-
nary ouptut data. Scaling by 1/N is then performed.
n–1
i2πmk ⁄ n
∑ Ak e
ALGORITHM 1
C m = --
- m = { 0, 1, 2, …, n – 1 }
n
k=0
SYNOPSIS void cffti ( xr, xi, wr, wi, wstr, yr, yi, n )
float dm *xr ; /* Pointer to real input data */
float dm *xi ; /* Pointer to imaginary input data */
float pm *wr ; /* Pointer to cosine table */
float dm *wi ; /* Pointer to sine table */
int wstr ; /* Cosine/sine table stride */
float dm *yr ; /* Pointer to real output data */
float pm *yi ; /* Pointer to imaginary output data */
int n ; /* FFT Size (In Complex Elements) */
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 22,650 Cycles @ 1,024 Points - Data and Program In On-Board Cache
Wideband Computers, Inc. 5-97
44. ADSP-21K Optimized DSP Library User’s Manual
cffti ( xr, xi, wr, wi, wstr, yr, yi, n )
NOTES This is a radix-2 inverse Fast Fourier Transform using parallel DM/PM data accesses
to maximize the throughput on the 21020 processor.The complex input data is sepa-
rated into real and imaginary parts, xr and xi. These vectors must be aligned on an
address which is an integer multiple of the FFT size, as required for 21K bit-
reverse addressing. The input vectors are both in DM; the imaginary data is bit-
reversed into PM at the beginning of the routine.The number of elements n must be an
integral power of two and a minimum of 32.
The complex output is separated into real and imaginary parts, yr and yi. These vec-
tors may have arbitrary address alignment; however yr is in the DM and yi is in the
PM. Vectors xr and xi mus be in data memory and each must be aligned to an integral
multiple of n.
Vectors wr and wi are in program memory and data memory respectively and are
given the values:
wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )
wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )
The weight stride, wst, allows for calling cfft() with varying sizes n from a single set
of weights. These weights are generated using the fftwts( ) function.
Vector yr is in data memory and has a minimum size of n.Vector yi is in program
memory and has a minimum size of n.
The file tcfft.c included in the distribution tape provides an example of this function’s
use.
5-98 Wideband Computers, Inc.
45. ADSP-21K Optimized DSP Library User’s Manual
cffti ( xr, xi, wr, wi, wstr, yr, yi, n )
SPECIAL NOTES Previous users have sometimes reported problems associated with implementing inter-
rupt service routines (ISRs), when used in conjunction with the FFT routines (cfft ( ),
cffti ( ), rfft ( ), rffti ( ) ). Observations related to the Wideband technical staff typi-
cally include a description of the Wideband routine executing perfectly, but unable to
return to an exact state after being interrupted by the ISR ( a “tumble into the weeds.” )
The Wideband Fast Fourier transforms, both complex and real, forward and inverse,
use the built-in bit reversing and circular addressing capabilites of the SHARC archi-
tecture. Also, other routines such as some of the FIR filters use the SHARC’s internal
circular addressing capabilities.
End users are usually cognizant that their ISR calling routine is responsible for saving
and restoring the registers of the Wideband routines. However, end users sometimes
forget to save and restore ( push and pop ) the mode 1 regiser, which is associated with
bir reversing and the B ( base ) and L ( length ) registers associated with circular
addressing. In such circumstances where they are not saved and restored by the ISR
they are unable to return the proper length parameter ( L Register ) used for circular
addressing or the proper mode ( Mode 1 Register ) used in Bit Reversing. This results
in the strange manefestations users sometimes report.
To properly save and restore the above mentioned registers in an ISR, refer to page 4-
21, section 4.3 of the Analog Devices ADSP-21000 Family C Tools Manual (#31-
000005-08, dated August 95) which references examples of in line assembly code
within C code to save and restore registers.
For a detailed review of the relationships between the various FFT functions and how
to use them with one another, see the final section of Chapter 4.
Wideband Computers, Inc. 5-99
47. ADSP-21K Optimized DSP Library User’s Manual
cfir ( ii, qq, ci, cq, oi, oq, d, n, p )
NAME Complex Finite Impulse Response Filter
DESCRIPTION The function cfir( ) computes the convolution of vectors ii[ ], iq[ ], ci[ ], and cq[ ]
placing the results in oi[ ] and oq[ ] respectively. The number of output samples n and
the number of coefficients p may be dissimilar. n elements will be writtento oi[ ] and
oq[].
The vectors ii[ ] and iq[ ] represent the real and imaginary (I and Q) components of the
input data respectively. Likewise,the vectors ci[ ] and cq[ ] represent the real and
imaginary (I and Q) components of the coefficient data. A complex multiply and add
is performed to compute the convolutional output. The decimation factor d is used to
stride the next starting ii[ ] and iq[ ] data.
p= 1
ALGORITHM
C[ i ]= ∑ a[a • d + j] • b[p – j – 1]
j=0
m = { 0, 1, 2, …, n – 1 }
where
a [ ] compromises complex components ii [ ] and iq [ ]
b [ ] compromises complex components ci [ ] and cq [ ]
c [ ] compromises complex components oi [ ] and oq [ ]
SYNOPSIS void cfir ( ii, qq, ci, cq, oi, oq, d, n, p )
*/ float dm *ii ; Input samples for I data ( len n+p-1 ) */
*/ float dm *iq ; Input samples for Q data ( len n+p-1 ) */
*/ float pm *ci ; Coefficients for I data ( len p ) */
*/ float pm *cq ; Coefficients for Q data ( len p ) */
*/ float dm *oi ; Output samples for I data ( len n ) */
*/ float dm *oq ; Output samples for Q data ( len n ) */
*/ int d ; Decimation factor */
*/ int n ; Number of output samples */
*/ int p ; Number of coefficients */
Wideband Computers, Inc. 5-101
48. ADSP-21K Optimized DSP Library User’s Manual
cfir ( ii, qq, ci, cq, oi, oq, d, n, p )
DOMAIN -3.4E+38 to 3.4E+38
ACCURACY 7.75 decimal digits
EXECUTION TIME 59 + ( 9 + 5 * p ) * n cycles
NOTES The file tfir.c included in the distribution tape provides an example of this function’s
use.
The number of filter output samples to generate can be obtainted as follows:
( ndata – p )
n = ----------------------------
d+1
where ndata is the number of elements in ii[ ] and iq[ ].
A correlation can be performed by reversing the order of the coefficients vector.
5-102 Wideband Computers, Inc.
49. ADSP-21K Optimized DSP Library User’s Manual
chksum ( a, i, type, n )
NAME Perform Checksum
DESCRIPTION This function performs a checksum on a memory block. The memory block is defined
by the start address a offset by n. The type flag determines whether dm or pm memory
is tested ( 1 = dm, 0 = pm).
ALGORITHM Return ⇐ Checksum
SYNOPSIS void chksum ( a, i, type, n )
int a ; /* Start address of memory */
int i ; /* Memory Stride */
int type ; /* Type of memory to test ( dm or pm ) */
int n ; /* Length of block to be checked */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
EXECUTION TIME 17 + 2 * N cycles
NOTES The file tchksum.c included in the distribution diskette provides an example of this
function’s use.
chksum( ) performs a two’s complement on the sum of the elements within the mem-
ory block. The check sum value is returned.
Wideband Computers, Inc. 5-103
50. ADSP-21K Optimized DSP Library User’s Manual
cmadd ( a, b, x, y, c )
NAME Complex Matrix Addition
DESCRIPTION This function computes the addition of complex input matrix a with complex input
matrix b and stores the results to complex output matrix c.
ALGORITHM
C ri11 C ri12 C ri13 A ri11 A ri12 A ri13 B ri11 B ri12 B ri13
= +
C ri21 C ri22 C ri23 A ri21 A ri22 A ri23 B ri21 B ri22 B ri23
where ri indicates a real and imaginary component
SYNOPSIS void cmadd ( a, b, x, y, c )
complex dm *a ; /* Pointer to input matrix a [ ][ ] */
complex dm *b ; /* Pointer to input matrix b [ ][ ] */
int x ; /* Number of rows in matrix a[ ][ ] & b[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ]& b[ ][ ]*/
complex dm *c ; /* Pointer to output matrix c [ ][ ] */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
5-104 Wideband Computers, Inc.
51. ADSP-21K Optimized DSP Library User’s Manual
cmadd ( a, b, x, y, c )
EXECUTION TIME 32 + 3*X*Y cycles
NOTES The file tcmadd.c included in the distribution diskette provides an example of this
function’s use.
The addition of a complex matrix is mathematically expressed as follows:
Real C [ x ] [ y ] = A [ x ] [ y ] Real + B [ x ] [ y ] Real
Imaginary C [ x ] [ y ] = A [ x ] [ y ] Imaginary + B [ x ] [ y ] Imaginary
An example of the additon of one complex matrix to another is as follows:
1, 2 3.8, 1.7 8.8, 5.5 9.9, 14
7.1, 5 9.3, 1.6 0.4, 1 51, 3.3
0.9, 1 8, 5 2.1, 6 – 3.1, – 1
A[ x][ y]=
9.3, 1 2.5, 1.5 6.9, 9 10, 22.1
1.3, 1.4 0.2, 4.5 0.9, 51.4 1.5, 4.4
9.2, 4 7.8, 1.7 61, 3.4 14.3, 1.4
3.2, 1 8.8, 2 9.9, 3 44.3, 13.3
8.1, 4 6.5, 5 3.2, 6 – 2.3, – 9.9
8.9, 7 2.8, 8 1.7, 9 – 8.1, – 2.2
B [x] [y] =
6.4, 10 11, 1.3 12, 4.5 22.9, – 5.4
6.5, 7 2.1, 8 2.2, 9 32, 9.8
1.1, 4 7.7, 5 4.4, 6 – 2.1, – 0.3
x=6, y=4
4.2, 3 12.6, 3.7 18.7, 8.5 54.2, 27.3
15.2, 9 15.8, 6.6 3.6, 7 48.7, – 6.6
9.8, 8 10.8, 13 3.8, 15 – 11.2, – 3.2
C [x] [y] =
15.7, 11 13.5, 2.8 18.9, 13.5 32.9, 16.7
7.8, 8.4 2.3, 12.5 3.1, 60.4 33.5, 14.2
10.3, 8 15.5, 6.7 65.4, 9.4 12.2, 1.1
Wideband Computers, Inc. 5-105
52. ADSP-21K Optimized DSP Library User’s Manual
cmmov ( a, x, y, b )
NAME Complex Matrix Move
DESCRIPTION This function moves a source complex input matrix a to a destination complex output
matrix b.
ALGORITHM
C ri11 C ri12 C ri13 A ri11 A ri12 A ri13
⇐
C ri21 C ri22 C ri23 A ri21 A ri22 A ri23
where ri indicates a real and imaginary component
SYNOPSIS void cmmov ( a, x, y, b )
complex dm *a ; /* Pointer to input matrix a [ ][ ] */
int x ; /* Number of rows in matrix a[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ] */
complex dm *b ; /* Pointer to output matrix b [ ][ ] */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
EXECUTION TIME 13 + ( 2 * X * Y ) cycles
NOTES The file tcmmov.c included in the distribution diskette provides an example of this
function’s use.
The storage methodology for matrices is by rows. Matricies can be thought of as one
long array (vector) where the beginning of each row is offset by the length of the col-
umn.
5-106 Wideband Computers, Inc.
53. ADSP-21K Optimized DSP Library User’s Manual
cmmul ( a, x, y, b, z, c )
NAME Complex Matrix Multiplication
DESCRIPTION This function computes the multiplication of complex input matrix a times complex
input matrix b and stores the results to complex output matrix c. The dimension of
complex matrix a [ ] [ ] is x and y and the dimension of complex input matrix b [ ] [ ]
is y and z. The resulting complex output matrix c [ ] [ ] is of dimension x and z.
ALGORITHM
B ri11 B ri12
C ri11 C ri12 A ri11 A ri12 A ri13
= • B ri21 B ri22
C ri21 C ri22 A ri21 A ri22 A ri23
B ri31 B ri32
where ri indicates a real and imaginary component
SYNOPSIS void cmmul ( a, x, y, b, z, c )
complex dm *a ; /* Pointer to input matrix a [ ][ ] */
int x ; /* Number of rows in matrix a[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ] */
/* Number of rows in matrix b[ ][ ] */
complex dm *b ; /* Pointer to input matrix b [ ][ ] */
int z ; /* Number of columns in matrix b[ ][ ] */
complex dm *c ; /* Pointer to output matrix c [ ][ ] */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
Wideband Computers, Inc. 5-107
54. ADSP-21K Optimized DSP Library User’s Manual
cmmul ( a, x, y, b, z, c )
EXECUTION TIME 45 + (4 + ( 10 + 5 * Y) * Z) * X cycles
NOTES The file tcmmul.c included in the distribution diskette provides an example of this
function’s use.
The multiplication of a complex matrix is as follows:
y
C[x ][y ]= ∑ ( Real Sum + Imaginary Sum ) where
k=1
Real Sum = A Real • BReal – AImaginary • BImagainary
Imaginary Sum = A Real • BImaginary + BReal • A Imaginary
The storage methodology for matrices is by rows. Matrices can be thought of as one
long array (vector) where the beginning of each row is offset by the length of the col-
umn.
The first row of a [ ] [ ] times the first column of b [ ] [ ] is the first element of c [ ] [ ]
(row 1, column 1). The first row of a [ ] [ ] times the second row of b [ ] [ ] is the sec-
ond element of c [ ] [ ] (row 1, column 2 ) ... etc.
This algorithm follows the general law of matrix multiplication whereby the number
of columns of input matrix a must equal the number of rows of input matrix b.
ri indicates that each component of the matrix is composed of a complex number
which has both a real and imaginary component.
An example of the multipication of one complex matrices by another is as follows:
1, 2 3, 4 5, 6
1, 1 2, 2 3, 3 4, 4
7, 8 9, 10 11, 12
A [ x ] [ y ] = 5, 5 6, 6 7, 7 8, 8 B [y] [z] =
13, 14 15, 16 17, 18
9, 9 10, 10 11, 11 12, 12
19, 20 21, 22 23, 24
x=3, y=4, z=3
– 10, 270 – 10, 310 – 10, 350
C [ ] = – 26, 606 – 26, 710 – 26, 814
– 42, 942 – 42, 1110 – 42, 1278
5-108 Wideband Computers, Inc.
55. ADSP-21K Optimized DSP Library User’s Manual
cmmul_dpd ( a, x, y, b, z, c )
NAME Complex Matrix Multiplication (Data Memory x Program Memory to Data Memory)
DESCRIPTION This function computes the multiplication of complex input matrix a[ ] (in data mem-
ory) times complex input matrix b[ ] (in program memory) and stores the results to
complex output matrix c[ ] (in data memory). The dimension of complex matrix a [ ] [
] is x and y and the dimension of complex input matrix b [ ] [ ] is y and z. The resulting
complex output matrix c [ ] [ ] is of dimension x and z.
ALGORITHM
B ri11 B ri12
C ri11 C ri12 A ri11 A ri12 A ri13
= • B ri21 B ri22
C ri21 C ri22 A ri21 A ri22 A ri23
B ri31 B ri32
where ri indicates a real and imaginary component
SYNOPSIS void cmmul_dpd ( a, x, y, b, z, c )
complex dm *a ; /* Pointer to input matrix a [ ][ ] */
int x ; /* Number of rows in matrix a[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ] */
/* Number of rows in matrix b[ ][ ] */
complex pm *b ; /* Pointer to input matrix b [ ][ ] */
int z ; /* Number of columns in matrix b[ ][ ] */
complex dm *c ; /* Pointer to output matrix c [ ][ ] */
DOMAIN -3.4 x 1038 to +3.4 x 1038
ACCURACY 7.75 decimal digits
Wideband Computers, Inc. 5-109