Chap5 - ADSP 21K Manual

ADSP-21K Optimized DSP Library User’s Manual

CHAPTER 5 Function Descriptions For
The ADSP-21K Optimized
DSP Library

Each function described in the following pages includes the following topics in order to
better understand its use:
• Name
• Description of the function's operation
• The algorithm as applicable
• Synopsis of function prototype
• Domain valid for arguments
• Accuracy of the returned value(s)
• Execution time in machine cycles
• Notes applicable to this function

Wideband Computers, Inc. 5-55


acort ( a, c, m, n )
NAME Auto-correlation (Time Domain)

DESCRIPTION Computes the time domain auto-correlation of the real elements stored in input vector
a[ ]. Values m and n define the number of auto-correlation values to compute. The
resulting auto-correlation values are stored in output vector c[ ].
n–i–1
ALGORITHM Ci= ∑ Ai + j • Aj i = { 0, 1, 2, …m – 1 }
j= 0
SYNOPSIS void acort ( a, c, m, n )
float *a ; /* Pointer to input vector a[ ] */
float *c ; /* Pointer to output vector c[ ] */
int m ; /* Lag count m */
int n ; /* Number of elements in vector a[ ] */

DOMAIN -3.4E+38 to 3.4E+38

ACCURACY 7.75 decimal digits

EXECUTION TIME 31 + 9*M + (M+1) (2*N-M)

NOTES The file tacort.c included in the distribution tape provides an example of this func-
tion’s use.

Note that the lag count m must be less than or equal to the number of floating-point
elements (i.e. m ≤ n ).

5-56 Wideband Computers, Inc.


acos_wci ( x )
NAME Arc Cosine

DESCRIPTION This function computes the arc cosine of a floating-point number, x. The computed
value returned from this function is in the range [0 to π ] radians. A domain error is
returned if x is not in the range [-1 to +1].

ALGORITHM return = cos –1( x )
SYNOPSIS float acos_wci ( float x )

DOMAIN -1.0 < x < +1.0


EXECUTION TIME If A <= 0.5 then 55 cycles, Else if A >0.5 then 75 cycles

NOTES The file tacos.c included in the distribution tape provides an example of this function's
use.

acosh_wci ( x )
NAME Inverse Hyperbolic Cosine

DESCRIPTION This function computes the inverse hyperbolic cosine of a floating-point number, x.

ALGORITHM return = cosh – 1( x )
SYNOPSIS float acosh_wci ( float x )

DOMAIN 1.0 to 3.4E+38


EXECUTION TIME 72 cycles

NOTES The file tacosh.c included in the distribution tape provides an example of this func-
tion's use.



alawc ( a, i, c, k, n )
NAME a-Law Compression

DESCRIPTION This routine performs an a-law compression on the elements in input vector a and out-
puts the compressed results to output vector c.

C mk = alaw compression of A mi
ALGORITHM

m = { 0, 1, 2, …n – 1 }
SYNOPSIS void alawc ( a, i, c, k, n )
int *a ; /* Pointer to input vector a */
int i ; /* Element stride for vector i */
int *c ; /* Pointer to output vector c */
int k ; /* Element stride for vector c */
int n ; /* Number of floating-point elements */

DOMAIN 0 to 255


EXECUTION TIME 49 + 12 * ( N-1 )

NOTES The file talawc.c included in the distribution tape provides an example of this func-
tion’s use.

The alawc() routine takes a linear 13-bit signed speech sample and compresses it
according to CCITT (now ITU) recommendation G.711. The 8-bit compressed sample
is output to vector c.

This function is found on the serial port hardware for the ADSP-2106x DSP proces-
sors.



alawe ( a, i, c, k, n )
NAME a-Law Expansion

DESCRIPTION This routine performs an a-law expansion on the elements in input vector a and out-
puts the expanded results to output vector c.

C mk = alaw expansion of A mi
ALGORITHM

m = { 0, 1, 2, …n – 1 }
SYNOPSIS void alawe ( a, i, c, k, n )
int *a ; /* Pointer to input vector a */
int i ; /* Element stride for vector i */
int *c ; /* Pointer to output vector c */
int n ; /* Number of floating-point elements */

DOMAIN 0 to 255


EXECUTION TIME 46 + 17 * ( N-1 )

NOTES The file talawe.c included in the distribution tape provides an example of this func-
tion’s use.

The alawe() routine takes an 8-bit compressed speech sample and expands it accord-
ing to CCITT (now ITU) recommendation G.711. The 13-bit signed sample is output
to vector c.

This function is found on the serial port hardware for the ADSP-2106x DSP proces-
sors.



alpha ( df, a, &al, &n )
NAME Kaiser-Bessel Window Shape Parameter

DESCRIPTION Computes a Kaiser-Bessel window shape parameter for later use by the kaiser( ) win-
dow mutiply library function. The computation is based on the input attenutation
specified in input scalar a and the transition width specified in real input scalar df.
From this, a count of floating-point elements (output scalar n) and an output window
shape parameter (output scalar al) is computed.

If A ≤ 21 then al = 0
ALGORITHM
Else If
0.4
A < 50 then al = 0.5842 • ( A – 21 ) + 0.07886 • ( A – 21 )
Else If
al = 0.1102 • ( A – 8.7 )

Number of Elements n is computed as follows:
( A – 7.95 )
If A > 21 then d = ------------------------ else d = 0.922
-
14.36
n = 1 + ceiling ( d ⁄ df )
n = n + 1 – remainder ( n ⁄ 2 )

SYNOPSIS void alpha ( df, a, &al, &n )
float dm *df ; /* Input transition width in fs units */
float dm a ; /* Input ripple attenutation in dB */
float dm &al ; /* Output alpha window shape parameter */
int &n ; /* Output floating-point element count */

-3.4E+38 to 3.4E+38




alpha ( df, a, &al, &n )
EXECUTION TIME If a >= 50 then 143 Cycles

If 21 < a < 50 then 221 Cycles

If A <= 21 then 124 Cycles

NOTES The file talpha.c included in the distribution tape provides an example of this func-
tion’s use.
– A ⁄ 20
df = ∆f ⁄ f s, A = ripple attentuation in dB, δ = 10

asin_wci ( x )
NAME Arc Sine

DESCRIPTION This function computes the arc sine of a floating-point number, x. The computed
value returned from this function is in the range [-π/2 to π/2] radians. A domain error
is returned if x is not in the range [-1 to +1].

ALGORITHM return = sin – 1( x )



asin_wci ( x )
SYNOPSIS float asin_wci ( float x )

DOMAIN - 1.0 < x < +1.0


EXECUTION TIME If A <= 0.5 then 55 cycles, Else if A >0.5 then 73 cycles

NOTES The file tasin.c included in the distribution tape provides an example of this function's
use.

asinh_wci ( x )
NAME Inverse Hyperbolic Sine

DESCRIPTION This function computes the inverse hyperbolic sine of a floating-point number, x.

ALGORITHM return = sinh –1( x )
SYNOPSIS float asinh_wci ( float x )

DOMAIN -3.4E+38 to 3.4E+38



NOTES The file tasinh.c included in the distribution tape provides an example of this func-
tion's use.



aspec ( a, c, n )
NAME Accumulating Auto-spectrum

DESCRIPTION Computes the auto-spectrum of complex input vector a by multiplying vector a by its
complex conjugate and adding the resulting real number to the current value of vector
c. Vector c must be initialized prior to invoking a series of accumulating auto-spec-
trum calls.
2 2
ALGORITHM C m ⇐ C m + Re Am + Im Am
m = { 0, 1, 2, …n – 1 }

SYNOPSIS void aspec ( a, c, n )
complex *a ; /* Pointer to input vector a */
float *c ; /* Pointer to output vector c */
int n ; /* Element count for vector c */

DOMAIN -3.4E+38 to 3.4E+38


EXECUTION TIME 28 + 6*N cycles

NOTES The file taspec.c included in the distribution tape provides an example of this func-
tion’s use.

The stride of vectors a and c must always be 1.

If you wish to clear the auto-spectrum results before they are added to output vector c
use the vclr( ) function. If the results are not cleared using vclr( ), autospectrum results
are added to output vector c, thus computing an accumulating autospectrum.

Note that input vector a is of type complex, and data arguments supplied to this routine
will be treated as interleaved real and imaginary data.



atan_wci ( x )
NAME Arc Tangent

DESCRIPTION This function computes the arc tangent of a floating-point number x. The computed
value returned from this function is in the range [-π/2 to +π/2] radians.

ALGORITHM return = tan –1( x )
SYNOPSIS float atan_wci ( float x )

DOMAIN - 4.2E+37 < x < +4.2E+37



NOTES The file tatan.c included in the distribution tape provides an example of this function's
use.



atan2_wci ( y, x )
NAME Arc Tangent 2 Arguments

DESCRIPTION This function computes the arc tangent of a floating-point number x. The computed
value returned from this function is in the range [-π to +π] radians.

–1 y
return = tan  -- 
ALGORITHM
-
x

SYNOPSIS float atan2_wci ( y, x )
float dm y ; /* Input value y */
float dm x ; /* Input value x */

DOMAIN - 4.2E+37 < y/x < +4.2E+37, except x = 0.0



NOTES The file tatan2.c included in the distribution tape provides an example of this func-
tion's use.



atanh_wci ( x )
NAME Inverse Hyperbolic Tangent

DESCRIPTION This function computes the inverse hyperbolic tangent of a floating-point number, x.

ALGORITHM return = tanh – 1( x )
SYNOPSIS float atanh_wci ( float x )

DOMAIN -1.0 to +1.0



NOTES The file tatanh.c included in the distribution tape provides an example of this func-
tion's use.



bartlett ( a, i, c, k, n )
NAME Bartlett Window

DESCRIPTION This function generates a Bartlett window multiply on the elements of input vector a
and places the results in output vector c.

ALGORITHM
 1 
 m – -- n 
2
-
C mk = Ami • 1 – ---------------- 
 -- n 
1
-
 2 

m = { 0, 1, 2, …n – 1 }
SYNOPSIS void bartlett ( a, i, c, k, n )
float *a ; /* Pointer to input vector a */
int i ; /* Address stride in words for input vector a */
float *c ; /* Pointer to output vector c */
int k ; /* Address stride in words for output vector c */
int n ; /* Element count */

DOMAIN -3.4 x 1038 to +3.4 x 1038


EXECUTION TIME 44 + 17 * ( N-1 ) cycles

NOTES The file tbartlett.c included in the distribution diskette provides an example of this
function’s use.

The Bartlett window is also known as a triangular window.



biquad ( x, d, c, y, n )
NAME Bi-Quad IIR Filter

DESCRIPTION Using a bi-quad implementation, this function computes an IIR ( Infinite Impulse
Response ) filter using coefficients stored in input vector c, delay node points stored in
input buffer d, and applied to the elements of input vector x. The results are stored in
output vector y.

–1 –2
B0 + B1 z + B2 z
ALGORITHM H ( z ) = -----------------------------------------------
-
–1 –2
1 – A1 z – A2 z

where

Dm = A2 • Dm – 2 + A1 • Dm – 1 + xm
Y m = B2 • Dm – 2 + B1 • Dm – 1 + Dm

m = { 0, 1 , 2 , …, n – 1 }
SYNOPSIS void biquad ( x, d, c, y, n )
float *x ; /* Pointer to input buffer vector x of length n */
float *d ; /* Pointer to input delay node buff vector d of length 2 */
float *c ; /* Pointer to input coeff buffer vector c of length 5 */
float *y ; /* Pointer to output buffer vector y of length n */
int n ; /* Number of input/output samples to compute */

DOMAIN -3.4E+38 to 3.4E+38




biquad ( x, d, c, y, n )
EXECUTION TIME 65 + 13*N

NOTES This is a single bi-quad form of an infinite impulse response filter (IIR), defined by the
first equation shown above. It is implemented using a delay node buffer d shown in the
second and third equation shown above. The coefficients a[ ] and b[ ] are passed in a
single array c[ ] given by the following:

c [ 0 ] = A2 c [ 1 ] = B2 c [ 2 ] = A1 c [ 3 ] = B1 c [ 4 ] = B0
Prior to executing the filter loop, the two “oldest” delay node values are loaded from
buffer d[ ]. When the filter loop has completed (n samples have been processed) the
two “newest” delay node values are written to d[ ]. In this way the filter delay node
states are retained between calls, allowing filtering on blocks of contiguous samples.
The user is responsible for allocating the delay node array and for initializing its ele-
ments to zero prior to the first call to biquad( ).

Defining

d0 = D m d1 = D m – 1 d2 = D m – 2
Then

d0 = c0 • d2 + c2 • d1 + xm
ym = c1 • d2 + c 3 • d1 + c 4 • d0

d2 = d1
d1 = d0

m = { 0 , 1 , 2, … , n – 1 }

The coefficient buffer length is defined symbolically in the file dsppac.h as
DSP_BIQUAD_NCOEFF. The delay node buffer length is defined symbolically in
the file dsppac.h as DSP_BIQUAD_NDELAY.

The number of input samples n must be greater than or equal to 5.

The file tbiquad.c included in the distribution tape provides an example of this func-
tion’s use.



blkman ( a, i, c, k, w, h, n )
NAME Blackman Window Multiply

DESCRIPTION Multiplies the input vector a[ ] by a Blackman window and stores the result to vector
c[ ].

ALGORITHM 2πmi 4πmi
C mk = A mi • 0.42 – 0.50 • cos ------------ + 0.08 • cos ------------
- -
N N
m = { 0, 1, 2, …, n – 1 }

SYNOPSIS void blkman ( a, i, c, k, w, h, n )
float dm *a ; /* Pointer to input vector a */
int i ; /* Element stride for vector a */
float dm *c ; /* Pointer to output vector c */
float pm *w ; /* Pointer to cosine weights array */
int h ; /* Element stride for weights array */

DOMAIN -3.4E+38 to 3.4E+38




blkman ( a, i, c, k, w, h, n )
EXECUTION TIME 41 + 4*(N-1) cycles

NOTES The file tblkman.c included in the distribution tape provides an example of this func-
tion’s use.

For real-time applications, the Blackman window can be computed once, and a simple
multiply used to window data as shown in the variable W ml . The Blackman Win-
dow is computed using the winwts( ) function found in the DSP Pac library. The win-
wts( ) function computes the weights array using the sin and cosine functions. This
array is pointed to by variable w listed in the synopsis section above.

The blkman( ) function is a vector function. You may therefore use the stride argu-
ments i, k and h to decimate both the input and output for data congruence. For exam-
ple, suppose you use winwts( ) to compute the FFT weights for a 16K FFT. This would
result in an fftwts array whose length would be 16,384 points. If you were to later
decide to compute an FFT of length 1,024 and run a Blackman Window on the results,
you would not need to rerun the winwts( ) function to generate new weights. Simply
use the old weights and stride by 16 (16,384/1024 = 16) on stride element h to obtain
the correct Blackman window FFT weights . In this manner you need only compute
winwts( ) once and later us them for varying length FFTs and windowing functions.

The cosine arguments are held in input vector w[ ] and can be computed from the win-
wts( ) function. Note that larger vector sizes of w[ ] can be used by changing the stride
for w[ ]. For example, if w[ ] were computed for a window of size 2,048, but a Black-
man Window of 1,024 was needed, use a stride of 2,048/1,024 = 2.

Note that the Blackman window has a passband ripple of 0.0017 dB, a maximum stop-
band attenuation of 74 dB, and a 57 dB main lobe relative to side lobe.



blkmanh ( a, i, c, k, w, h, n )
NAME Blackman-Harris Window Multiply

DESCRIPTION Multiplies the input vector a[ ] by a Blackman-Harris window and stores the result to
output vector c[ ].
2πmi 4πmi 6πmi
ALGORITHM C mk = A mi • 0.35875 – 0.48829 • cos ------------ + 0.14128 • cos ------------ – 0.01168 • cos ------------
- - -
N N N
m = { 0, 1, 2, …, n – 1 }

SYNOPSIS void blkmanh ( a, i, c, k, w, h, n )
float dm *a ; /* Pointer to input vector a */
int i ; /* Element stride for vector a */
float dm *c ; /* Pointer to output vector c */
float pm *w ; /* Pointer to cosine weights array */
int h ; /* Element stride for weights array */

DOMAIN -3.4E+38 to 3.4E+38




blkmanh ( a, i, c, k, w, h, n )

NOTES The file tblkmanh.c included in the distribution tape provides an example of this
function’s use.

For real time applications, the Blackman-Harris window can be computed once, and a
simple multiply used to window data, as shown in the variable W ml . The Blackman-
Harris Window is computed using the winwts( ) function found in the DSP Pac library.
The winwts( ) function computes the weights array using the sin and cosine functions.
This array is pointed to by variable w listed in the synopisis section above.

The blkmanh function is a vector function. You may therefore use the stride argu-
ments i, k and h to decimate both the input and output for data congruence. For exam-
ple, suppose you use winwts( ) to compute the FFT weights for a 16K point FFT. This
would result in an fftwts array whose length would be 16,384 points. If you were to
later decide to compute an FFT of length 1,024 and run a Blackman-Harris Window on
the results, you would not need to rerun the winwts( ) function to generate new
weights. Simply use the old weights and stride by 16 (16,384/1024 = 16) on stride ele-
ment h to obtain the correct window FFT weights . In this manner you need only com-
pute winwts( ) once and later us them for varying length FFTs and windowing
functions.

The cosine arguments are held in input vector w[ ] and can be computed from the win-
wts( ) function. Note that larger vector sizes of w[ ] can be used by changing the stride
for w[ ]. For example, if w[ ] were computed for a window of size 2,048, but a Black-
man Window of 1,024 was needed, use a stride of 2,048/1,024 = 2.

Note that the Blackman-Harris window has a passband ripple of 0.0017 dB, a maxi-
mum stopband attenuation of 74 dB, and a 57 dB main lobe relative to side lobe.



cacort ( a, c, m, n )
NAME Complex Auto-Correlation (Time Domain)

DESCRIPTION Computes the time domain auto-correlation of the complex elements stored in input
vector a[ ]. Values m and n define the number of auto-correlation values to compute.
The resulting auto-correlation values are stored in output complex vector c[ ].
n–i–1

ALGORITHM Ci= ∑ Ai + j • Aj i = { 0, 1, 2, …m – 1 }
j=0
SYNOPSIS void cacort ( a, c, m, n )
complex dm *a ; /* Pointer to input vector a[ ] */
complex dm *c ; /* Pointer to output vector c[ ] */
int n ; /* Number of elements in vector a[ ] */

DOMAIN -3.4E+38 to 3.4E+38


EXECUTION TIME 39 + ( 9 + 5 * n ) * n

NOTES The file tacort.c included in the distribution tape provides an example of this func-
tion’s use.

Note that the lag count m must be less than or equal to the number of floating-point
elements (i.e. m ≤ n ).

The strides of vectors a[ ] and c [ ] must be 1.



ccdotpr ( a, i, b, j, c, k, n )
NAME Complex Dot Product Multiply by Conjugate

DESCRIPTION This function computes the complex dot product of complex input vector a by the
complex conjugate of input vector b and stores the results in complex output vector c.
This can be alternatively expressed as C=AB*.

ALGORITHM n–1

Re { C } = ∑ Re{ Ami } • Re { Bmj } + Im { Ami } • Im{ Bmj }
m= 0
n–1

Im { C } = ∑ –Re{ Ami } • Im { Bmj } + Im { Ami } • Re { Bmj }
m= 0
m = { 0, 1, 2…n – 1 }
SYNOPSIS void ccdotpr ( a, i, b, j, c, k, n )
complex *a ; /* Pointer to complex input vector a */
complex *b ; /* Pointer to complex input vector b */
int j ; /* Address stride in words for input vector b */
complex *c ; /* Pointer to complex output vector c */

DOMAIN -3.4 x 1038 to +3.4 x 1038



NOTES The file tccdotpr.c included in the distribution diskette provides an example of this
function’s use.



ccmmul ( a, b, x, y, b, z, c )
NAME Complex Matrix Multiply By Congugate of Complex Matrix

DESCRIPTION This function computes the multiplication of the conjugate of complex input matrix
a [ ] [ ] times the elements of complex input matrix b[ ] [ ]. The dimensions of com-
plex input matrix a[ ] [ ] are x and y, while the dimensions of complex input matrix
b[ ] [ ] are defined by input scalars y and z. The results are stored in complex output
matrix c[ ] [ ], which is of dimensions x and z.

ALGORITHM y

Re ( C ij ) = ∑ [ ( Re )Aik • ( Re )Bkj + ( Im )Aik • ( Im )Bkj ]
k=1
y
Im(C
ij ) = ∑ [ ( Re )C ik • ( Im )B kj – ( Re )B kj • ( Im )A ik ]
k=1
for i = { 0, 1, …x }
for j = { 0, 1, …z }

SYNOPSIS void ccmmul( a, x, y, b, z, c )
complex dm *a ; /* Pointer to complex input matrix a[ ][ ] */
int x ; /* Number of rows in complex matrix a[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ] And */
/* Number of rows in complex matrix b[ ][ ] */
complex dm *b ; /* Pointer to complex input matrix b[ ][ ] */
int z ; /* Number of columns in matrix b[ ][ ] */
complex dm *c ; /* Pointer to complex output matrix c[ ][ ] */

DOMAIN -3.4 x 1038 to +3.4 x 1038




ccmmul ( a, b, x, y, b, z, c )
EXECUTION TIME 62 + ( 6 + ( 12 + 7 * Y ) * Z ) * X cycles

NOTES The file tccmmul.c included in the distribution diskette provides an example of this
function’s use.

a[x][y] = 1, 1 2, 2 3, 3 4, 4
5, 5 6, 6 7, 7 8, 8
9, 9 10, 10 11, 11 12, 12

1, 2 3, 4 5, 6
b[y][z] = 7, 8 9, 10 11, 12
13, 14 15, 16 17, 18
19, 20 21, 22 23, 24
x = 3, y = 4, z = 3 ;

ccmmul ( a, x, y, b, z, c ) ;

The resulting values in output matrix c [ ] [ ] would be as follows:

c[x][y] = 270, 10 310, 10 350, 10
606, 26 610, 26 814, 26
942, 42 1110, 42 1278, 42
The storage methodology for matrices is by rows. Matrices can be thought of as one
long array (vector) where the beginning of each row is offset by the number of col-
umns.



ccmsmul ( a, x, y, b, c )
NAME Complex Scalar-Complex Congugate Matrix Multiplication

DESCRIPTION This function computes the multiplication of the conjugate of the complex input
matrix a[ ] [ ] times complex input scalar b. The dimensions of complex input matrix
a[ ] [ ] are x and y. The results are stored in complex output matrix c[ ] [ ], which is of
dimensions x and y.

ALGORITHM
Cxy = B • Axy

SYNOPSIS void ccmsmul( a, x, y, b, c )
complex dm *a ; /* Pointer to complex input matrix a[ ][ ] */
int x ; /* Number of rows in complex matrix a[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ] */
complex dm *b ; /* Pointer to complex input scalar b */
complex dm *c ; /* Pointer to complex output matrix c[ ][ ] */

DOMAIN -3.4 x 1038 to +3.4 x 1038




ccmsmul ( a, x, y, b, c )
EXECUTION TIME 46 + 2 * X * Y cycles

NOTES The file tccmsmul.c included in the distribution diskette provides an example of this
function’s use.

a[x][y] = 1, 2 3, 4 5, 6
7, 8 9, 10 11, 12
13, 14 15, 16 17, 18
19, 20 21, 22 23, 24
25, 26 27, 28 29, 30

b = {8,2}

x = 8, y = 7 ;

ccmsmul ( a, x, y, b, c ) ;

The resulting values in output matrix c [ ] [ ] would be as follows:

c[x][y] = 12, – 14 32, – 26 52, – 38
72, – 50 92, – 62 112, – 74
132, – 86 152, – 98 172, – 110
192, – 122 212, – 134 232, – 146
252, – 158 272, – 170 292, – 182

long array (vector) where the beginning of each row is offset by the number of col-
umns.



cccort ( a, b, c, m, n )
NAME Complex Cross-Correlation (Time Domain)

DESCRIPTION Computes the time domain (real) cross-correlation of the time domain (real) elements
stored in complex input vectors a[ ] and b[ ]. The result is stored in complex output
vector c [ ]. Values m and n define the number of cross-correlation values to compute.
The implementation uses a time domain technique.
n–i–1

ALGORITHM
Ci = ∑ Ai + j • Bj i = { 0, 1, 2, …, m – 1 }
j=0

SYNOPSIS void cccort ( a, b, c, m, n )
complex dm *a ; /* Pointer to input vector a[ ] */
complex dm *b ; /* Pointer to input vector b[ ] */
complex dm *c ; /* Pointer to output vector c[ ] */
int n ; /* Number of elements in vector c[ ] */

DOMAIN -3.4E+38 to 3.4E+38


EXECUTION TIME 41 + ( 9 + 5 * n ) * m

NOTES The file tcccort.c included in the distribution tape provides an example of this func-
tion’s use.

Note that the lag count must be less than or equal to the number of floating-point ele-
ments (i.e. m ≤ n ).

The strides of vectors a[ ], b[ ], and c[ ] must always be 1.



ccort ( a, b, c, m, n )
NAME Cross-Correlation (Time Domain)

DESCRIPTION Computes the time domain (real) cross-correlation of the time domain (real) elements
stored in input vectors a[ ] and b[ ]. The result is stored in output real vector c [ ]. Val-
ues m and n define the number of cross-correlation values to compute. The implemen-
tation uses a time domain technique.
n–i–1
ALGORITHM Cm = ∑ Ai + j • Bj i = { 0, 1, 2, …, m – 1 }
j=0

SYNOPSIS void ccort ( a, b, c, m, n )
float *a ; /* Pointer to input vector a[ ] */
float *b ; /* Pointer to input vector b[ ] */
float *c ; /* Pointer to output vector c[ ] */
int n ; /* Number of elements in vector c[ ] */

DOMAIN -3.4E+38 to 3.4E+38


EXECUTION TIME 32 + 9 * M + (M+1)(2*N-M)

NOTES The file tccort.c included in the distribution tape provides an example of this func-
tion’s use.

Note that the lag count must be less than or equal to the number of floating-point ele-
ments (i.e. m ≤ n ).

The strides of vectors a, b, and c must always be 1.



cdesamp ( data, coeff, output, d, n, p )
NAME Complex Decimating Finite Impulse Response (FIR) Filter

DESCRIPTION The function computes the convolution of complex vectors data [ ] and coeff [ ] plac-
ing the results in complex vector output [ ]. The number of output samples n and the
number of coefficients p may be dissimilar. n elements will be written to output [ ].

Complex vector data [ ] represents the real and imaginary (I and Q) components of the
input data respectively. Likewise, complex vector coeff [ ] represents the real and
imaginary ( I and Q) components of the coefficient data. A complex multiply and add
is performed to compute the convolutional output. The decimation factor d is used to
stride the next starting point in data [ ].

p–1
ALGORITHM
Output [ i ] = ∑ data [ i • d + j ] • coeff [ p – j – 1 ]
j=0
i = { 0, 1, 2…n – 1 }
SYNOPSIS void cdesamp ( data, coeff, output, d, n, p )
complex dm *data ; /* Complex input data ( len n+p-1 ) */
complex pm *coeff ; /* Complex coefficients ( len p ) */
complex dm *output ; /* Complex output data ( len n ) */
int d ; /* Decimation factor */
int n ; /* Number of output samples */
int p ; /* Number of coefficients */

DOMAIN -3.4E+38 to 3.4E+38




cdesamp ( data, coeff, output, d, n, p )
EXECUTION TIME 36 + ( 7 + 5 * p ) * n cycles

NOTES The file tcdesamp.c included in the distribution tape provides an example of this func-
tion’s use.

The number of filter output samples to generate can be obtained as follows:

n = ( ndata – p ) ⁄ d + 1
where ndata is the number of elements in data[ ].

A complex correlation can be performed by reversing the order of the coefficients vec-
tor.



cdotpr ( a, i, b, j, c, k, n )
NAME Complex Dot Product

DESCRIPTION This function computes the complex dot product of complex input vector a and com-
plex input vector b and stores the results in complex output vector c. This can altena-
tively thought of as C = A • B .

ALGORITHM n–1

Re { C } = ∑ Re { Ami } • Re { Bmj } – Im{ Ami } • Im{ Bmj }
m= 0
n–1

Im { C } = ∑ Re{ Ami } • Im { Bmj } + Im { Ami } • Re { Bmj }
m= 0
m = { 0, 1, 2…n – 1 }
SYNOPSIS void cdotpr ( a, i, b, j, c, k, n )
complex *a ; /* Pointer to complex input vector a */
complex *b ; /* Pointer to complex input vector b */
int j ; /* Address stride in words for input vector b */
complex *c ; /* Pointer to complex output vector c */

DOMAIN -3.4 x 1038 to +3.4 x 1038



NOTES The file tcdotpr.c included in the distribution diskette provides an example of this
function’s use.



ceil_wci ( x )
NAME Round Up to Nearest Integer

DESCRIPTION This function computes the smallest integral value greater than or equal to the float-
ing-point number x. A floating-point representation of this integer value is returned.

ALGORITHM return = smallest int ≥ x
SYNOPSIS float ceil_wci ( float x )

DOMAIN -3.4E+38 to 3.40E+38



NOTES The file tceil.c included in the distribution tape provides an example of this function's
use.



cfft ( xr, xi, wr, wi, wstr, yr, yi, n )
NAME Fast Fourier Transform Of Complex Input Data

DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in com-
plex input vector a. The results are stored in complex output vector c.
n–1
– i2πmk ⁄ n
ALGORITHM Cm = ∑ Ake m = { 0, 1, 2, …, n – 1 }
k=0

SYNOPSIS void cfft ( xr, xi, wr, wi, wstr, yr, yi, n )
float dm *xr ; /* Pointer to real input data */
float dm *xi ; /* Pointer to imaginary input data */
float pm *wr ; /* Pointer to cosine table */
float dm *wi ; /* Pointer to sine table */
int wstr ; /* Cosine/sine table stride */
float dm *yr ; /* Pointer to real output data */
float pm *yi ; /* Pointer to imaginary output data */
int n ; /* FFT Size (In Complex Elements) */

DOMAIN -3.4E+38 to 3.4E+38


EXECUTION TIME See Attached Table Below



NOTES This is a radix-2 Fast Fourier Transform using parallel data memory/program memory
data accesses to maximize the throughput on the 21020/60/62 processor. The complex
input data is separated into real and imaginary parts, xr and xi. These vectors must be
aligned on an address which is an integer multiple of the FFT size, as required for
21K bit-reverse addressing. The input vectors are both in data memory; the imagi-
nary data is bit-reversed into program memory at the beginning of the routine. The
number of elements n supplied to the algorithm must be an integral power of two and
a minimum of 32.

The complex output is separated into real and imaginary parts, yr and yi. These vec-
tors may have arbitrary address alignment; however yr is in the data memory and yi is
in the program memory. Vectors xr and xi must be in data memory and each must be
aligned to an integral multiple of n.

Vectors wr and wi are in program memory and data memory respectively and are
given the values:

wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )program memory

wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )data memory

The weight stride wst allows cfft() to be called with varying sizes n from a single set
of weights.These weights are generated using the fftwts() function.

This precomputed FFT weight approach was implemented in order to ensure accurate
results and boost the available cfft() dynamic range to approximately 130 dB for
longer length (>16K) FFTs. This is accomplished by using an implementation that
does not rely on a recursive call to a sin/cosine approximation routine, as found in
other implementations. Rather, the FFT weights are precomputed accurately using the
fftwts() function. This is sufficient for A/D converters with bit lengths up to 22 bits.

The number of elements n must be an integral power of two and a minimum of 32.
Vector yr is in data memory and has a minimum size of n. Vector yi is in program
memory and has a minimum size of n.

The file tcfft.c included in the distribution tape provides an example of this function’s
use.



SPECIAL NOTES Previous users have sometimes reported problems associated with implementing inter-
rupt service routines (ISRs), when used in conjunction with the FFT routines ( cfft( ),
cffti( ), rfft( ), rffti( ) ). Observations related to the Wideband technical staff typically
include a description of the Wideband routine executing perfectly, but unable to return
to an exact state after being interrupted by the ISR ( what is described as a “tumble
into the weeds.” )

The Wideband Fast Fourier transforms, both complex and real, forward and inverse,
use the built-in bit reversing and circular addressing capabilites of the SHARC archi-
tecture. Also, other routines such as some of the FIR filters use the SHARC’s internal
circular addressing capabilities.

End users are usually cognizant that their ISR calling routine is responsible for saving
and restoring the registers of the Wideband routines. However, end users sometimes
forget to save and restore ( push and pop ) the mode 1 regiser, which is associated with
bir reversing and the B ( base ) and L ( length ) registers associated with circular
addressing. In such circumstances where they are not saved and restored by the ISR
they are unable to return the proper length parameter ( L Register ) used for circular
addressing or the proper mode ( Mode 1 Register ) used in Bit Reversing. This results
in the strange manefestations users sometimes report.

To properly save and restore the above mentioned registers in an ISR, refer to page 4-
21, section 4.3 of the Analog Devices ADSP-21000 Family C Tools Manual (#31-
000005-08, dated August 95) which references examples of in line assembly code
within C code to save and restore registers.

For a detailed review of the relationships between the various FFT functions and how
to use them with one another, see the final section of Chapter 4.



Performance Issues

The inital timing shown below for the 32 point to 4,096 point FFTs were timed using the
Analog Devices simulator.

Performance Timings For Complex FFTs

Number of
Points Processor Cycles
8 See cfft8( ) function
16 See cfft16( ) function
32 771 Cycles
64 1,274 Cycles
128 2,368 Cycles
256 4,724 Cycles
512 10,060 Cycles
1,024 21,618 Cycles
2,048 46,744 Cycles
4,096 101,054 Cycles
8,192 217,828 Cycles
16,384 467,722 Cycles
32,768 1,000,240 Cycles
65,536 2,130,774 Cycles



cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n )
NAME Complex 2-Dimensional Fast Fourier Transform

DESCRIPTION Computes a 2-Dimensional Fast Fourier Transform of the complex input elements
stored in vector a[ ]. The results are stored in complex output vector c[ ].

n–1n–1
– 2 πj ( ( r ⋅ R + c ⋅ C ) ⁄ n )
ALGORITHM
C r, c = ∑ ∑ Ake
r = 0c = 0
R = { 0, 1, …n – 1 }
C = { 0, 1, …n – 1 }
SYNOPSIS void cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n )
float dm *xr ; /* Pointer to real input/output data */
float dm *xi ; /* Pointer to imaginary input/output data */
int wstr ; /* Consine/sine Table table */
float dm *tmpdm ; /* Pointer to real output data */
float pm *tmppm ; /* Pointer to imag output data */
int n ; /* CFFT2D Size (Complex Elements n x n) */

DOMAIN -3.4E+38 to 3.4E+38




EXECUTION TIME 32 x 32 Pts. 44,532 cycles

64 x 64 Pts. 165,364 cycles

128 x 128 Pts. 659,572 cycles



NOTES The input data is an nxn complex matric x separated into real and imaginary parts xr
and xi stored as follows:

Re ( x r, c ) = xr [ r • n + c ]
r = { 0, 1, …, n – 1 } c = { 0, 1, …, n – 1 }

Im ( x r, c ) = xi [ r • n + c ]
r = { 0, 1, …, n – 1 } c = { 0, 1, …, n – 1 }

Variables r and c are the row and column numbers.

The DFT output replaces the input, and is stored as follows:

Re ( F R, C ) = xr [ R • n + C ]
R = { 0, 1, …, n – 1 } C = { 0, 1, …, n – 1 }

Im ( F R, C ) = xi [ R • n + C ]
R = { 0, 1, …, n – 1 } C = { 0, 1, …, n – 1 }

A radix-2 Fast Fourier Transform (FFT) algorithm is used to compute the individual
row and column DFTs.

The number of elements n must be an integral power of two and a minimum of 32.

Vectors xr and xi must be in data memory and are adress-aligned to an integral multi-
ple of n.

Vectors wr and wi must be in program memory and data memory respectively and are
pre-computed to be:

wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )

wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )

Vector tmpdm must be in data memory, having a minimum size of n, and be address-
aligned to an integral multiple of n.

Vector tmppm must be in program memory and have a minimum size of n,and be
address-aligned to an integral multiple of n.

The file tcfft2d.c included in the distribution tape provides an example of this func-
tion’s use.



cfft8 ( xr, xi, yr, yi )
NAME 8-Point Complex Fast Fourier Transform (Inline)

DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in input
vector xr and xi. The results are stored in output vector yr and yi.

ALGORITHM 7
– 2πj ( m • k ⁄ 8 )
Ym = ∑ Xke m = { 0, 1, 2, …, 7 }
k=0

SYNOPSIS void cfft8 ( xr, xi, yr, yi )

DOMAIN -3.4E+38 to 3.4E+38




EXECUTION TIME 184 Cycles

NOTES This is an 8-point radix-2 Fast Fourier Transform using parallel data memory/program
memory data accesses to maximize the throughput on the 21020/60/62 processor.

The complex input data is separated into real and imaginary parts, xr and xi. These
vectors must be aligned on an address which is an integer multiple of the FFT
size, as required for 21K bit-reverse addressing. The input vectors are both in data
memory; the imaginary data is bit-reversed into program memory at the beginning of
the routine.

This algorithm utilizies a decimation in time approach. As the cffti( ) function requires
a minimum of 32-points as input, there is no corresponding inverse algorithm for this
routine. The complex output is separated into real and imaginary parts, yr and yi.
These vectors may have arbitrary address alignment; however yr is in the data mem-
ory and yi is in the program memory.
•Vectors xr and xi are defined in cfft8dta.asm using the dm_align segment to
ensure address alignment.


The file tcfft8.c included in the distribution tape provides an example of this func-
tion’s use.



NAME 16-Point Complex Fast Fourier Transform (Inline)

DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in input
vector xr and xi. The results are stored in output vector yr and yi.

15

∑ Xke
ALGORITHM – 2πj16
Ym = m = { 0, 1, 2, …, 15 }
k=0
SYNOPSIS void cfft16 ( xr, xi, yr, yi )

DOMAIN -3.4E+38 to 3.4E+38




EXECUTION TIME 388 Cycles

NOTES This is an 16-point radix-2 Fast Fourier Transform using parallel data memory/pro-
gram memory data accesses to maximize the throughput on the 21020/60/62 proces-
sor.

The complex input data is separated into real and imaginary parts, xr and xi. These
vectors must be aligned on an address which is an integer multiple of the FFT
size, as required for 21K bit-reverse addressing. The input vectors are both in data
memory; the imaginary data is bit-reversed into program memory at the beginning of
the routine.

This algorithm utilizies a decimation in time approach. As the cffti( ) function requires
a minimum of 32-points as input, there is no corresponding inverse algorithm for this
routine. The complex output is separated into real and imaginary parts, yr and yi.
These vectors may have arbitrary address alignment; however yr is in the data mem-
ory and yi is in the program memory.
•Vectors xr and xi are defined in cfft16dt.asm using the dm_align segment to
ensure address alignment.


The file tcfft16.c included in the distribution tape provides an example of this func-
tion’s use.



cffti ( xr, xi, wr, wi, wstr, yr, yi, n )
NAME Inverse Complex FFT

DESCRIPTION Computes the Inverse Fast Fourier Transform of the input elements stored in vectors
xr and xi. The results are stored in complex output vector c. Note the Inverse FFT is
the same as the Forward FFT except that the sign of the imaginary components of the
twiddle factors is negated. The Inverse FFT swaps the real and imaginary input data,
perform the Forward FFT with the same weights table, and swaps the real and imagi-
nary ouptut data. Scaling by 1/N is then performed.

n–1
i2πmk ⁄ n
∑ Ak e
ALGORITHM 1
C m = --
- m = { 0, 1, 2, …, n – 1 }
n
k=0
SYNOPSIS void cffti ( xr, xi, wr, wi, wstr, yr, yi, n )
int wstr ; /* Cosine/sine table stride */
int n ; /* FFT Size (In Complex Elements) */

DOMAIN -3.4E+38 to 3.4E+38


EXECUTION TIME 22,650 Cycles @ 1,024 Points - Data and Program In On-Board Cache



NOTES This is a radix-2 inverse Fast Fourier Transform using parallel DM/PM data accesses
to maximize the throughput on the 21020 processor.The complex input data is sepa-
rated into real and imaginary parts, xr and xi. These vectors must be aligned on an
address which is an integer multiple of the FFT size, as required for 21K bit-
reverse addressing. The input vectors are both in DM; the imaginary data is bit-
reversed into PM at the beginning of the routine.The number of elements n must be an
integral power of two and a minimum of 32.

The complex output is separated into real and imaginary parts, yr and yi. These vec-
tors may have arbitrary address alignment; however yr is in the DM and yi is in the
PM. Vectors xr and xi mus be in data memory and each must be aligned to an integral
multiple of n.

Vectors wr and wi are in program memory and data memory respectively and are
given the values:

wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )

wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )

The weight stride, wst, allows for calling cfft() with varying sizes n from a single set
of weights. These weights are generated using the fftwts( ) function.

Vector yr is in data memory and has a minimum size of n.Vector yi is in program
memory and has a minimum size of n.

The file tcfft.c included in the distribution tape provides an example of this function’s
use.



SPECIAL NOTES Previous users have sometimes reported problems associated with implementing inter-
rupt service routines (ISRs), when used in conjunction with the FFT routines (cfft ( ),
cffti ( ), rfft ( ), rffti ( ) ). Observations related to the Wideband technical staff typi-
cally include a description of the Wideband routine executing perfectly, but unable to
return to an exact state after being interrupted by the ISR ( a “tumble into the weeds.” )

The Wideband Fast Fourier transforms, both complex and real, forward and inverse,
use the built-in bit reversing and circular addressing capabilites of the SHARC archi-
tecture. Also, other routines such as some of the FIR filters use the SHARC’s internal
circular addressing capabilities.

End users are usually cognizant that their ISR calling routine is responsible for saving
and restoring the registers of the Wideband routines. However, end users sometimes
forget to save and restore ( push and pop ) the mode 1 regiser, which is associated with
bir reversing and the B ( base ) and L ( length ) registers associated with circular
addressing. In such circumstances where they are not saved and restored by the ISR
they are unable to return the proper length parameter ( L Register ) used for circular
addressing or the proper mode ( Mode 1 Register ) used in Bit Reversing. This results
in the strange manefestations users sometimes report.

To properly save and restore the above mentioned registers in an ISR, refer to page 4-
21, section 4.3 of the Analog Devices ADSP-21000 Family C Tools Manual (#31-
000005-08, dated August 95) which references examples of in line assembly code
within C code to save and restore registers.




TABLE 8 Table of Inverse Complex FFT Timing

Number Processor
of Points Cycles
32 868 Cycles
64 1,435 Cycles
128 2,657 Cycles
256 5,319 Cycles
512 11,117 Cycles
1,024 23,699 Cycles
2,048 50,873 Cycles
4,096 109,281 Cycles
8,192 234,244 Cycles
16,384 500,525 Cycles
32,768 1.072,560 Cycles
65,536 2,288,128 Cycles



cfir ( ii, qq, ci, cq, oi, oq, d, n, p )
NAME Complex Finite Impulse Response Filter

DESCRIPTION The function cfir( ) computes the convolution of vectors ii[ ], iq[ ], ci[ ], and cq[ ]
placing the results in oi[ ] and oq[ ] respectively. The number of output samples n and
the number of coefficients p may be dissimilar. n elements will be writtento oi[ ] and
oq[].

The vectors ii[ ] and iq[ ] represent the real and imaginary (I and Q) components of the
input data respectively. Likewise,the vectors ci[ ] and cq[ ] represent the real and
imaginary (I and Q) components of the coefficient data. A complex multiply and add
is performed to compute the convolutional output. The decimation factor d is used to
stride the next starting ii[ ] and iq[ ] data.

p= 1
ALGORITHM
C[ i ]= ∑ a[a • d + j] • b[p – j – 1]
j=0
m = { 0, 1, 2, …, n – 1 }
where
a [ ] compromises complex components ii [ ] and iq [ ]
b [ ] compromises complex components ci [ ] and cq [ ]
c [ ] compromises complex components oi [ ] and oq [ ]
SYNOPSIS void cfir ( ii, qq, ci, cq, oi, oq, d, n, p )
*/ float dm *ii ; Input samples for I data ( len n+p-1 ) */
*/ float dm *iq ; Input samples for Q data ( len n+p-1 ) */
*/ float pm *ci ; Coefficients for I data ( len p ) */
*/ float pm *cq ; Coefficients for Q data ( len p ) */
*/ float dm *oi ; Output samples for I data ( len n ) */
*/ float dm *oq ; Output samples for Q data ( len n ) */
*/ int d ; Decimation factor */
*/ int n ; Number of output samples */
*/ int p ; Number of coefficients */



cfir ( ii, qq, ci, cq, oi, oq, d, n, p )
DOMAIN -3.4E+38 to 3.4E+38


EXECUTION TIME 59 + ( 9 + 5 * p ) * n cycles

NOTES The file tfir.c included in the distribution tape provides an example of this function’s
use.

The number of filter output samples to generate can be obtainted as follows:
( ndata – p )
n = ----------------------------
d+1
where ndata is the number of elements in ii[ ] and iq[ ].

A correlation can be performed by reversing the order of the coefficients vector.



chksum ( a, i, type, n )
NAME Perform Checksum

DESCRIPTION This function performs a checksum on a memory block. The memory block is defined
by the start address a offset by n. The type flag determines whether dm or pm memory
is tested ( 1 = dm, 0 = pm).

ALGORITHM Return ⇐ Checksum

SYNOPSIS void chksum ( a, i, type, n )
int a ; /* Start address of memory */
int i ; /* Memory Stride */
int type ; /* Type of memory to test ( dm or pm ) */
int n ; /* Length of block to be checked */

DOMAIN -3.4 x 1038 to +3.4 x 1038


EXECUTION TIME 17 + 2 * N cycles

NOTES The file tchksum.c included in the distribution diskette provides an example of this
function’s use.

chksum( ) performs a two’s complement on the sum of the elements within the mem-
ory block. The check sum value is returned.



cmadd ( a, b, x, y, c )
NAME Complex Matrix Addition

DESCRIPTION This function computes the addition of complex input matrix a with complex input
matrix b and stores the results to complex output matrix c.

ALGORITHM
C ri11 C ri12 C ri13 A ri11 A ri12 A ri13 B ri11 B ri12 B ri13
= +
C ri21 C ri22 C ri23 A ri21 A ri22 A ri23 B ri21 B ri22 B ri23

where ri indicates a real and imaginary component

SYNOPSIS void cmadd ( a, b, x, y, c )
complex dm *a ; /* Pointer to input matrix a [ ][ ] */
complex dm *b ; /* Pointer to input matrix b [ ][ ] */
int x ; /* Number of rows in matrix a[ ][ ] & b[ ][ ] */
int y ; /* Number of columns in matrix a[ ][ ]& b[ ][ ]*/
complex dm *c ; /* Pointer to output matrix c [ ][ ] */

DOMAIN -3.4 x 1038 to +3.4 x 1038




cmadd ( a, b, x, y, c )
EXECUTION TIME 32 + 3*X*Y cycles

NOTES The file tcmadd.c included in the distribution diskette provides an example of this
function’s use.

The addition of a complex matrix is mathematically expressed as follows:

Real C [ x ] [ y ] = A [ x ] [ y ] Real + B [ x ] [ y ] Real

Imaginary C [ x ] [ y ] = A [ x ] [ y ] Imaginary + B [ x ] [ y ] Imaginary

An example of the additon of one complex matrix to another is as follows:

1, 2 3.8, 1.7 8.8, 5.5 9.9, 14
7.1, 5 9.3, 1.6 0.4, 1 51, 3.3
0.9, 1 8, 5 2.1, 6 – 3.1, – 1
A[ x][ y]=
9.3, 1 2.5, 1.5 6.9, 9 10, 22.1
1.3, 1.4 0.2, 4.5 0.9, 51.4 1.5, 4.4
9.2, 4 7.8, 1.7 61, 3.4 14.3, 1.4

3.2, 1 8.8, 2 9.9, 3 44.3, 13.3
8.1, 4 6.5, 5 3.2, 6 – 2.3, – 9.9
8.9, 7 2.8, 8 1.7, 9 – 8.1, – 2.2
B [x] [y] =
6.4, 10 11, 1.3 12, 4.5 22.9, – 5.4
6.5, 7 2.1, 8 2.2, 9 32, 9.8
1.1, 4 7.7, 5 4.4, 6 – 2.1, – 0.3

x=6, y=4

4.2, 3 12.6, 3.7 18.7, 8.5 54.2, 27.3
15.2, 9 15.8, 6.6 3.6, 7 48.7, – 6.6
9.8, 8 10.8, 13 3.8, 15 – 11.2, – 3.2
C [x] [y] =
15.7, 11 13.5, 2.8 18.9, 13.5 32.9, 16.7
7.8, 8.4 2.3, 12.5 3.1, 60.4 33.5, 14.2
10.3, 8 15.5, 6.7 65.4, 9.4 12.2, 1.1



cmmov ( a, x, y, b )
NAME Complex Matrix Move

DESCRIPTION This function moves a source complex input matrix a to a destination complex output
matrix b.

ALGORITHM
C ri11 C ri12 C ri13 A ri11 A ri12 A ri13
⇐
C ri21 C ri22 C ri23 A ri21 A ri22 A ri23


SYNOPSIS void cmmov ( a, x, y, b )
int x ; /* Number of rows in matrix a[ ][ ] */
complex dm *b ; /* Pointer to output matrix b [ ][ ] */

DOMAIN -3.4 x 1038 to +3.4 x 1038


EXECUTION TIME 13 + ( 2 * X * Y ) cycles

NOTES The file tcmmov.c included in the distribution diskette provides an example of this
function’s use.

The storage methodology for matrices is by rows. Matricies can be thought of as one
long array (vector) where the beginning of each row is offset by the length of the col-
umn.



cmmul ( a, x, y, b, z, c )
NAME Complex Matrix Multiplication

DESCRIPTION This function computes the multiplication of complex input matrix a times complex
input matrix b and stores the results to complex output matrix c. The dimension of
complex matrix a [ ] [ ] is x and y and the dimension of complex input matrix b [ ] [ ]
is y and z. The resulting complex output matrix c [ ] [ ] is of dimension x and z.

ALGORITHM
B ri11 B ri12
C ri11 C ri12 A ri11 A ri12 A ri13
= • B ri21 B ri22
B ri31 B ri32

SYNOPSIS void cmmul ( a, x, y, b, z, c )
/* Number of rows in matrix b[ ][ ] */
complex dm *b ; /* Pointer to input matrix b [ ][ ] */

DOMAIN -3.4 x 1038 to +3.4 x 1038




cmmul ( a, x, y, b, z, c )
EXECUTION TIME 45 + (4 + ( 10 + 5 * Y) * Z) * X cycles

NOTES The file tcmmul.c included in the distribution diskette provides an example of this
function’s use.

The multiplication of a complex matrix is as follows:
y

C[x ][y ]= ∑ ( Real Sum + Imaginary Sum ) where
k=1

Real Sum = A Real • BReal – AImaginary • BImagainary

Imaginary Sum = A Real • BImaginary + BReal • A Imaginary

long array (vector) where the beginning of each row is offset by the length of the col-
umn.

The first row of a [ ] [ ] times the first column of b [ ] [ ] is the first element of c [ ] [ ]
(row 1, column 1). The first row of a [ ] [ ] times the second row of b [ ] [ ] is the sec-
ond element of c [ ] [ ] (row 1, column 2 ) ... etc.

This algorithm follows the general law of matrix multiplication whereby the number
of columns of input matrix a must equal the number of rows of input matrix b.

ri indicates that each component of the matrix is composed of a complex number
which has both a real and imaginary component.

An example of the multipication of one complex matrices by another is as follows:

1, 2 3, 4 5, 6
1, 1 2, 2 3, 3 4, 4
7, 8 9, 10 11, 12
A [ x ] [ y ] = 5, 5 6, 6 7, 7 8, 8 B [y] [z] =
13, 14 15, 16 17, 18
9, 9 10, 10 11, 11 12, 12
19, 20 21, 22 23, 24

x=3, y=4, z=3

– 10, 270 – 10, 310 – 10, 350
C [ ] = – 26, 606 – 26, 710 – 26, 814
– 42, 942 – 42, 1110 – 42, 1278



cmmul_dpd ( a, x, y, b, z, c )
NAME Complex Matrix Multiplication (Data Memory x Program Memory to Data Memory)

DESCRIPTION This function computes the multiplication of complex input matrix a[ ] (in data mem-
ory) times complex input matrix b[ ] (in program memory) and stores the results to
complex output matrix c[ ] (in data memory). The dimension of complex matrix a [ ] [
] is x and y and the dimension of complex input matrix b [ ] [ ] is y and z. The resulting
complex output matrix c [ ] [ ] is of dimension x and z.

ALGORITHM
B ri11 B ri12
= • B ri21 B ri22
B ri31 B ri32

SYNOPSIS void cmmul_dpd ( a, x, y, b, z, c )
/* Number of rows in matrix b[ ][ ] */
complex pm *b ; /* Pointer to input matrix b [ ][ ] */

DOMAIN -3.4 x 1038 to +3.4 x 1038



Chap5 - ADSP 21K Manual

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Chap5 - ADSP 21K Manual

Ähnlich wie Chap5 - ADSP 21K Manual (20)

Chap5 - ADSP 21K Manual