SlideShare a Scribd company logo
1 of 10
Download to read offline
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
DOI : 10.5121/ijma.2013.5407 85
Implementation Of Grigoryan FFT For Its
Performance Case Study Over Cooley-Tukey FFT
Using Xilinx Virtex-II Pro, Virtex-5 And Virtex-4
FPGAs
Narayanam Ranganadh1
, Muni Guravaiah P2
, Bindu Tushara D3
1
Department of Electronics and Communications Engineering, Bharat Institute of
Engineering and Technology, Hyderabad, A.P., India.
rnara100@uottawa.ca,rnara100@biet.ac.in,rnara100@gmail.com
2
Department of Electronics and Communications Engineering, Bharat Institute of
Engineering and Technology, Hyderabad, A.P., India.
3
Department of Electronics and Communications Engineering, Vignan Institute of
Technology and Science, Hyderabad, A.P., India.
ABSTRACT
A large family of signal processing techniques consist of Fourier-transforming a signal, manipulating the
Fourier-transformed data in a simple way, and reversing the transformation. We widely use Fourier
frequency analysis in equalization of audio recordings, X-ray crystallography, artefact removal in
Neurological signal and image processing, Voice Activity Detection in Brain stem speech evoked
potentials, speech processing spectrograms are used to identify phonetic sounds and so on. Discrete
Fourier Transform (DFT) is a principal mathematical method for the frequency analysis. The way of
splitting the DFT gives out various fast algorithms. In this paper, we present the implementation of two fast
algorithms for the DFT for evaluating their performance. One of them is the popular radix-2 Cooley-Tukey
fast Fourier transform algorithm (FFT) [1] and the other one is the Grigoryan FFT based on the splitting
by the paired transform [2]. We evaluate the performance of these algorithms by implementing them on the
Xilinx Virtex-II pro [3], Virtex-5 [4] and Virtex-4 [6] FPGAs, by developing our own FFT processor
architectures. Finally we show that the Grigoryan FFT is working faster than Cooley-Tukey FFT,
consequently it is useful for higher sampling rates. At the same time we also confirm that Virtex-5 is better
platform, for the same architectures, among all these for implementing Grigoryan FFT (FFT algorithm
under evaluation), as Virtex-5 FPGAs give highest speed of operation for higher sampling rates of FFT.
Operating at higher sampling rates is a challenge in DSP applications.
Keywords
Frequency analysis, fast algorithms, DFT, FFT, paired transforms.
1. INTRODUCTION
In the recent decades, fast orthogonal transforms have been widely used in areas of data
compression, pattern recognition and image reconstruction, interpolation, linear filtering, and
spectral analysis. The suitability of unitary transforms in each of the above applications depends
on the properties of their basis functions as well as on the existence of fast algorithms, including
parallel ones. Since the introduction of the Fast Fourier Transform (FFT), Fourier analysis has
become one of the most frequently used tool in signal/image processing and communication
systems; The main problem when calculating the transform relates to construction of the
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
86
decomposition, namely, the transition to the short DFT’s with minimal computational complexity.
The computation of unitary transforms is complicated and time consuming process. Since the
decomposition of the DFT is not unique, it is natural to ask how to manage splitting and how to
obtain the fastest algorithm of the DFT. The difference between the lower bound of arithmetical
operations and the complexity of fast transform algorithms shows that it is possible to obtain FFT
algorithms of various speed [2]. One approach is to design efficient manageable split algorithms.
Indeed, many algorithms make different assumptions about the transform length. The
signal/image processing related to engineering research becomes increasingly dependent on the
development and implementation of the algorithms of orthogonal or non-orthogonal transforms
and convolution operations in modern computer systems. The increasing importance of
processing large vectors and parallel computing in many scientific and engineering applications
require new ideas for designing super-efficient algorithms of the transforms and their
implementations [2].
In this paper we present the implementation techniques and their results for two different fast
DFT algorithms. The difference between the algorithm development lies in the way the two
algorithms use the splitting of the DFT. The two fast algorithms considered are radix-2 Cooley-
Tukey FFT and paired transform [2] based Grigoryan FFT algorithms. The implementation of the
algorithms is done on the Xilinx Viretx-II Pro [3], Virtex-5 [4] and Virtex-4 [6] FPGAs. The
performance of the two algorithms is compared in terms of their sampling rates and also in terms
of their hardware resource utilization and also in terms of their percentage speed improvements. It
is also discussed regarding their speed improvements for Virtex-5 and Virtex-4 over Virtex-II pro.
Section 2 presents the paired transform decomposition used in the development of Grigoryan
FFT. Section 3 presents the implementation techniques for the radix-2 (Cooley-Tukey FFT) and
paired transform (Grigoryan FFT) algorithms on FPGAs. Section 4 presents the results. Finally
with the Section 5 we conclude the work and put forward some suggestions for further sampling
rate improvements.
2. DECOMPOSITION ALGORITHM OF THE FAST DFT USING PAIRED
TRANSFORM
In this algorithm the decomposition of the DFT is done by using the paired transform [2]. Let
{ )(nx }, n = 0:(N-1) be an input signal, N>1. Then the DFT of the input sequence )(nx is
X(k) = ∑
−
=
1
0
)(
N
n
nk
NWnx , k =0:(N-1) (1)
which is in matrix form
X = xFN ][ (2)
where X(k) and )(nx are column vectors, the matrix NF =
)1:0(, −= Nkn
nk
NW , is a permutation of
X.
]][]}[[],.....,[],{[ '
21 NNkNNN WFFFdiagF χ
−−−−
= (3)
which shows the applying transform is decomposed into short transforms NiF , i = 1: k. Let FS be
the domain of the transform F the set of sequences f over which F is defined. Let (D;σ ) be a
class of unitary transforms revealed by a partition σ . For any transform ∈F (D;σ ), the
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
87
computation is performed by using paired transform in this particular algorithm. To denote this
type of transform, we introduce “paired functions [2].”
Let ∈tp, period N, and let


 =
=
.;0
);(mod;1
)(,
otherwise
Ntnp
ntpχ n = 0: (N-1) (4)
Let L be a non trivial factor of the number N, and LW = L
e /2Π
, then the complex function
lkNtp
L
k
k
LLtptp W /,
1
0
'
;,
,
, +
−
=
∑== χχχ (5)
t = 0:( )1/ −LN , p€ to the period 0: N-1
is called L-paired function [2]. Basing on this paired functions the complete system of paired
functions can be constructed. The totality of the paired functions in the case of N=2r
is
{{ }1)},1(:0),12(:0; 1
2,2
−=−= −−
rnt nr
tnnχ (6)
Now considering the case of N = 2r
N1, where N1 is odd, r ≥ 1 for the application of the paired
transform.
a) The totality of the partitions is
),.....,,,( '
2;2
'
2;4
'
2;2
'
2;1
'
rTTTT=σ
b) The splitting of NF by the partition '
σ is
{ },,.....,, 12/4/2/ NNNN FFFF r
c) The matrix of the transform can be represented as
[ NF ] =
1
[ ]
r
n
LnF
=
 
  
 
⊕ [W ][ '
nχ ]
.,2/ 1
1
NLNL r
n
n == +
Where the [W ] is diagonal matrix of the twiddle factors. The steps involved in finding the DFT
using the paired transform are given below:
1) Perform the L-paired transform g = )('
xNχ over the input x
2) Compose r vectors by dividing the set of outputs so that the first 1−r
L elements
g1,g2,…..,gL
r-1
compose the first vector X1, the next Lr-2
elements gLr-1+1,…..,gLr-1+Lr-2
compose the second vector X2, etc.
3) Calculate the new vectors Yk, k=1:(r-1) by multiplying element-wise the vectors Xk by
the corresponding turned factors 1, )(,,...,, 1/2 krLt
ttt LtWWW −−
= ,(t=Lr-k). Take Yr=Xr
4) Perform the Lr-k-point DFT’s over Yk, k=1: r
5) Make the permutation of outputs, if needed.
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
88
3. IMPLEMENTATION TECHNIQUES
We have implemented various architectures for radix-2 and paired transform processors on
Virtex-II Pro,Virtex-5 and Virtex-4 FPGAs. As there are embedded multipliers [3] and embedded
block RAMs [3] available, we can use them without using distributed logic, which economize
some of the CLBs [3]. Virtex-5 [4] is having DSP48E slices As we are having DSP48E slices on
Virtex-5 FPGAs, to utilize them and improve speed performance of these 2 FFTs and to compare
their speed performances on Virtex-5 FPGAs and Virtex-II pro FPGAs. We did the same for
Virtex-4 FPGAs to observe the performance of both algorithms on Virtex-II pro and Virtex-4 by
utilizing XtremeDSP slices. As most of the transforms are applied on complex data, the arithmetic
unit always needs two data points at a time for each operand (real part and complex part),
dualport RAMs are very useful in all these implementation techniques.
In the Fast Fourier Transform process the butterfly operation is the main unit on which the speed
of the whole process of the FFT depends. So the faster the butterfly operation, the faster the FFT
process. The adders and subtractors are implemented using the LUTs (distributed arithmetic). The
inputs and outputs of all the arithmetic units can be registered or non-registered.
Various possible implementations of multipliers we considered are:
Embedded multiplier:
a) With non-registered inputs and outputs
b) With registered inputs or outputs, and
c) With registered inputs and outputs.
Distributed multiplier: Distributed multipliers are implemented using the LUTs in the CLBs.
These can also be implemented with the above three possible ways. Various considerations made
to implement butterfly operation for its speed improvement and resource requirements. Basing on
the availability of number of Embedded multipliers and design feasibility we have implemented
both multiplication processes.
The various architectures proposed for implementing radix-2 and paired transform processors are
single memory (pair) architecture, dual memory (pair) architecture and multiple memory (pair)
architectures. We applied the following two best butterfly techniques for the implementation of
the processors on the FPGAs [3].
1. One with Distributed multipliers, with fully pipelined stages. (Best in case of
performance)
2. One with embedded multipliers and one level pipelining. (Best in case of resource
utilization)
Single memory (pair) architecture (shown in Figure 1) is suitable for single snapshot applications,
where samples are acquired and processed thereafter. The processing time is typically greater than
the acquisition time. The main disadvantage in this architecture is while doing the transform
process we cannot load the next coming data. We have to wait until the current data is processed.
So we proposed dual memory (pair) architecture for faster sampling rate applications (shown in
Figure 2). In this architecture there are three main processes for the transformation of the sampled
data. Loading the sampled data into the memories, processing the loaded data, Reading out the
processed data. As there are two pairs of dual port memories available, one pair can be used for
loading the incoming sampled data, while at the same time the other pair can be used for
processing the previously loaded sampled data. For further sampling rate improvements we
proposed multiple memory (pair) architecture (shown in Figure 3). This is the best of all
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
89
architectures in case of very high sampling rate applications, but in case of hardware utilization it
uses lot more resources than any other architecture. In this model there is a memory set, one
arithmetic unit for each iteration. The advantage of this model over the previous models is that we
do not need to wait until the end of all iterations (i.e. whole FFT process), to take the next set of
samples to get the FFT process to be started again. We just need to wait until the end of the first
iteration and then load the memory with the next set of samples and start the process again. After
the first iteration the processed data is transferred to the next set of RAMs, so the previous set of
RAMs can be loaded with the next coming new data samples. This leads to the increased
sampling rate.
Coming to the implementation of the paired transform based DFT algorithm, there is no complete
butterfly operation, as that in case of radix-2 algorithm. According to the mathematical
description given in the Section 2, the arithmetic unit is divided into two parts, addition part and
multiplication part. This makes the main difference between the two algorithms, which causes the
process of the DFT completes earlier than the radix-2 algorithm. The addition part of the
algorithm for 8-point transform is shown in Figure 4. The architectures are implemented for the 8-
point 64-point,128-point,256-point transforms for Viretx-II Pro and Virex-5 and Virtex-4
FPGAs. The radix-2 FFT algorithm is efficient in case of resource utilization and the paired
transform algorithm is very efficient in case of higher sampling rate applications.
Figure 1. Single memory (pair) architecture
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
90
Figure 2. Dual memory (pair) architecture
Figure 3. Multiple memory (pair) architecture
(Transform length = N = 2n
)
(1,2);(3,4);(5,6) ---- (-,-) memory pairs for each iteration.
----- Butterfly unit for each iteration.
x1,0
x1,1
x 1,2
x 1,3
x 2,0
x2,2
x4,0
x0,0
x(0)
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
add operation
subtract operation
Figure 4. Figure showing the addition part of the 8-point paired transform based DFT
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
91
3. THE IMPLEMENTATION RESULTS
Results obtained on Virtex-II Pro and Virtex-5 and Virtex-4 FPGAs: The hardware modeling of
the algorithms is done by using Xilinx’s system generator plug-in software tool running under
SIMULINK environment provided under the Mathworks’s MATLAB software. The functionality
of the model is verified using the SIMULINK Simulator and the MODELSIM software as well.
The implementation is done using the Xilinx project navigator backend software tools.
Table 1 shows the implementation results of the two algorithms on the Virtex-II Pro FPGAs.
Table 2 shows the implementation results of the two algorithms on Virtex-5 FPGAs using
DSP48E slices. Table 1 and 2 also show the percentage improvement in speed of Grigoryan FFT
over Cooley-Tukey FFT. Table 3 shows the implementation results and percentage speed
improvement in speed of Grigorayn FFT on Virtex-4 FPGAs using ExtremeDSP Slices. Table 4
shows the percentage improvement in speed of operation over Virtex-II Pro FPGAs, of Virtex-5
and Virtex-4; of both Cooley-Tukey and Grigoryan FFT algorithms.
From Tables 1, 2 and 3 we can see that Grigoryan FFT is always faster than the Cooley-Tukey
FFT algorithm. Thus paired-transform based algorithm can be used for higher sampling rate
applications. In military applications, while doing the process, only some of the DFT coefficients
are needed at a time. For this type of applications paired transform can be used as it generates
some of the coefficients earlier, and also it is very fast. But in terms of resource utilization
Grigoryan FFT is utilizing more resources than Cooley-Tukey FFT, which makes Grigoryan FFT
to be more expensive, but with higher speeds of operation. From Table 4 results we can easily
verify that Grigoryan FFT can be utilized at higher sampling rates than Cooley-Tukey FFT. It is
also clear that the high speed DSP slices are making the FFTs much faster than Virtex-II Pro
FPGAs, both in case of Virtex-4 and Virtex-5 FPGAs. From the Table 4 we can also observe that
for the same architectures the Virtex-5 platform is giving better speed of operation than Virtex-II
pro and also on Virtex-4 FPGAs. So in the case of Grigoryan FFT, which is under performance
evaluations in this research, it is preferable to choose Virtex-5 FPGA platforms.
Table 1. Efficient performance of Grigoryan FFT over Cooley-Tukey FFT, on Virtex-II Pro FPGAs.
Table showing the sampling rates and the resource utilization summaries for both the algorithms,
implemented on the Virtex-II Pro FPGAs.
No.
of
points
Cooley-Tukey radix-
2 FFT
Grigoryan Radix-2 FFT
% improvement
In speed from
Cooley-Tukey to
Grigoryan FFT
Max.
freq.
MHz
No.
of
mult.
No.
of
Slices
Max.Freq.
MHz
No.
of
mult
No.
of
slices
8 35 4 264 35 8 475 0
64 42.45 4 480 52.5 8 855 23.67
128 50.25 12 560 60 16 1248 19.40
256 55.40 24 648 68.75 48 1985 24.09
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
92
Table 2. Efficient performance of Grigoryan FFT over Cooley-Tukey FFT, on Virtex-5 FPGAs. Table
showing the sampling rates and the resource utilization summaries for both the algorithms, implemented
on the Virtex-5 FPGAs. We have utilized DSP48E slices in this, which is making us much faster than
Virtex-II Pro FPGAs.
No. of
points
Cooley-Tukey radix-2
FFT
Grigoryan Radix-2
FFT
% improvement
In speed from Cooley-
Tukey to Grigoryan
FFT
Max.
freq.
MHz
No.
of
Slice
s
No. of
DSP48
E
slices
Ma
x.
fre
q
M
Hz
No.
of
Slice
s
No.o
f
DSP
48E
slices
8 48 200 4 52 350 4 8.33
64 55.25 375 6 60 700 8 8.59
128 60.50 450 11 80 1000 26 32.23
256 75 560 14 85 1645 82 13.33
Table 3. Efficient performance of Grigoryan FFT over Cooley-Tukey FFT, on Virtex-4 FPGAs. Table
showing the sampling rates and the resource utilization summaries for both the algorithms, implemented on
the Virtex-4 FPGAs. We have utilized Extreme DSP slices in this, which is making us much faster than
Virtex-II Pro FPGAs.
No. of points Cooley-Tukey radix-2
FFT
Grigoryan Radix-2 FFT %
improvemen
t
In speed
from
Cooley-
Tukey to
Grigoryan
FFT
Max.
freq.
MHz
No.
of
Slic
-es
No. of
Extre
me
DSP
slices
Max.
freq
MHz
No.
of
Slice
s
No.of
DSP48
E
slices
8 41.38 258 4 50 450 6 20.83
64 48 400 7 56.09 800 8 16.85
128 53 496 12 72 1120 28 35.85
256 67 598 20 79.95 1735 86 19.33
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
93
Table 4. The percentage improvement in speed of operation over Virtex-II Pro FPGAs, of Virtex-5 and
Virtex-4; of both Cooley-Tukey and Grigoryan FFT algorithms. It shows clearly that Virtex-5 plat form is
better than Virtex-4 and also Virtex-II pro, in terms of speed of operation.
Number
Of
Sample
points
of FFT
% Improvement in speed over Xilinx Virtex-II Pro FPGAs
Cooley-Tukey FFT Grigoryan FFT
Virtex - 5 Virtex - 4 Virtex - 5 Virtex - 4
8 37.14 18.23 48.57 42.86
64 30.15 13.07 14.29 6.84
128 20.40 5.47 33.33 20
256 35.38 20.94 23.64 16.29
4. CONCLUSIONS AND FURTHER RESEARCH
In this paper we have shown that on Virtex-II Pro, Virtex-5 and Virtex-4 FPGAs the paired
transform based Grigoryan FFT algorithm is faster and can be used at higher sampling rates than
the Cooley-Tukey FFT at an expense of high resource utilization. Grigoryan FFT and also
Cooley-Tukey FFT both are running at higher speeds on Virtex-5 platform. So we ultimately
recommend these implementations to be done onto Virtex-5 FPGAs than onto any other FPGAs.
We are planning to implement these algorithms onto Latest TMS DSP processors, and also by
utilizing MAC engines on them; for extensive verification of these two FFT algorithms. Then we
would like to standardize Grigoryan FFT and apply for real time applications.
ACKNOWLEDGEMENTS
This is to thank Dr. Parimal A. Patel, Dr. Artyom M. Grigoryan of The University of Texas at San
Antonio; regarding their valuable guidance to do the first part of this research series as my (Mr.
Narayanam) master’s thesis. Their support will always be with me to make Grigoryan FFT as a
standarad FFT as our patent in the industrial market. I would like to thank also University of
Ottawa research facility providers to provide me to do some critical part of this extended research.
REFERENCES
[1] James W. Cooley and John W. Tukey, “An algorithm for the machine calculation of complex
Fourier Series”, Math. comput. 19, 297-301 (1965).
[2] Artyom M. Grigoryan and Sos S. Agaian, “Split Manageable Efficient Algorithm for Fourier and
Hadamard transforms”, IEEE Transactions on Signal Processing, Volume: 48, Issue: 1, Pages: 172 –
183, Jan.2000.
[3] Virtex-II Pro platform FPGAs: detailed description
http://www.xilinx.com/support/documentation/data_sheets/ds083.pdf
[4] Virtex-5 platform FPGAs: detailed description
http://www.xilinx.com/support/documentation/virtex-5_user_guides.htm
[5] N. Ranganadh, P. Patel, and A.M. Grigoryan, “Case study of Grigoryan FFT onto FPGAs and DSPs” ,
IEEE proceedings, IEEE ICECT 2012, 2012 4th International Conference on Electronics Computer
Technology (ICECT 2012), kanya kumari – 2012.
[6] Virtex-4 platform FPGAs: detailed description
http://www.xilinx.com/support/documentation/virtex-4_user_guides.htm
[7] The scientist and engineer’s guide to Digital Signal Processing, Dr. Steven W. Smith.
The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013
94
[8] N. Ranganadh, P. Patel, and A.M. Grigoryan, "Implementation of the DFT using radix-2 and paired
transform algorithms," 17th International Conference on Computer Applications in Industry and
Engineering, CAINE-2004, Orlando, Florida USA, Nov. 17-19, 2004.
[9] N. Ranganadh, P. Patel, and A.M. Grigoryan, “performances of Texas instruments DSP and Xilinx
FPGAs for Cooley-Tukey and Grigoryan FFT algorithms”, Wolters Kluwer and Medknow
International Journal of Engineering and Technology, Jul-Dec 2011,Vol 1,Issue 2.
Authors
Mr. Ranganadh Narayanam is an Assistant professor in the department of Electronics &
Communications Engineering in Bharat Institute of Engineering & Technology (BIET). This research is
continuation of the research done in the Univeristy of Texas at San Antonio under the guidance of Dr.
Parimal A. Patel, Dr. Artyom M. Grigoryan, as my master’s thesis. This current research was partly funded
by BIET. Mr.Narayanam, a research student in the area of “Brain Stem Speech Evoked Potentials” under
the guidance of Dr. Hilmi Dajani of University of Ottawa,Canada. He was also a research student in The
University of Texas at San Antonio under Dr. Parimal A Patel,Dr. Artyom M. Grigoryan, Dr Sos Again,
Dr. CJ Qian,. in the areas of signal processing and digital systems, control systems. He worked in the area
of Brian Imaging in University of California Berkeley. Mr. Narayanam is having around 5+2 years of full
time teaching & research experiences respectively; and more than 5 years of entry level research experience
and more than 15 publications. Mr. Narayanam’s research interests include neurological Signal & Image
processing, DSP software & Hardware design and implementations, neurotechnologies. Mr. Narayanam
was a TPGFS grant holder Through The University of Texas at San Antonio for the year 2003 to 2004. Mr.
Narayanam can also be contacted at rnara100@uottawa.ca, rnara100@gmail.com, rnara100@biet.ac.in,
ranganadh.narayanam@gmail.com
Mr. P. Muni Guravaiah is an Associate Professor & Admin. Incharge of Department of Electronics &
Communications Engineering in BIET. He is currently pursuing Ph.D. in Jawaharlal Nehru Technological
University Hyderabad, in the area of VLSI Design. His research interests include DSP; VLSI Design. He is
having around 12-15 years of accumulated Teaching and Research experiences.
Ms. Bindu Tushara D. is currently an Assistant Professor in Vignan Institute of Technology & Science
(VITS). She is a double gold medalist of Jawaharlal Nehru Technological University as the best out going
student of the year 2009. She has received medals from Governer of Andhara Pradesh. Her research
interests are DSP and Wireless Communications.

More Related Content

What's hot

Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd Iaetsd
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET Journal
 
64 point fft chip
64 point fft chip64 point fft chip
64 point fft chipShalyJ
 
Design of Processing Element (PE3) for Implementing Pipeline FFT Processor
Design of Processing Element (PE3) for Implementing Pipeline FFT Processor Design of Processing Element (PE3) for Implementing Pipeline FFT Processor
Design of Processing Element (PE3) for Implementing Pipeline FFT Processor ijcisjournal
 
A comprehensive study on Applications of Vedic Multipliers in signal processing
A comprehensive study on Applications of Vedic Multipliers in signal processingA comprehensive study on Applications of Vedic Multipliers in signal processing
A comprehensive study on Applications of Vedic Multipliers in signal processingIRJET Journal
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...Kusano Hitoshi
 
IRJET- A Study on Algorithms for FFT Computations
IRJET- A Study on Algorithms for FFT ComputationsIRJET- A Study on Algorithms for FFT Computations
IRJET- A Study on Algorithms for FFT ComputationsIRJET Journal
 
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...IRJET Journal
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
 
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
 
Iaetsd a novel vlsi dht algorithm for a highly modular and parallel
Iaetsd a novel vlsi dht algorithm for a highly modular and parallelIaetsd a novel vlsi dht algorithm for a highly modular and parallel
Iaetsd a novel vlsi dht algorithm for a highly modular and parallelIaetsd Iaetsd
 
HIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTHIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTAM Publications
 
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...vtunotesbysree
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDAIJERA Editor
 

What's hot (20)

Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
 
427 432
427 432427 432
427 432
 
Gn3311521155
Gn3311521155Gn3311521155
Gn3311521155
 
64 point fft chip
64 point fft chip64 point fft chip
64 point fft chip
 
Design of Processing Element (PE3) for Implementing Pipeline FFT Processor
Design of Processing Element (PE3) for Implementing Pipeline FFT Processor Design of Processing Element (PE3) for Implementing Pipeline FFT Processor
Design of Processing Element (PE3) for Implementing Pipeline FFT Processor
 
Aw4102359364
Aw4102359364Aw4102359364
Aw4102359364
 
A comprehensive study on Applications of Vedic Multipliers in signal processing
A comprehensive study on Applications of Vedic Multipliers in signal processingA comprehensive study on Applications of Vedic Multipliers in signal processing
A comprehensive study on Applications of Vedic Multipliers in signal processing
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
 
IRJET- A Study on Algorithms for FFT Computations
IRJET- A Study on Algorithms for FFT ComputationsIRJET- A Study on Algorithms for FFT Computations
IRJET- A Study on Algorithms for FFT Computations
 
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
 
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fft
 
Dg34662666
Dg34662666Dg34662666
Dg34662666
 
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless ApplicationDesign Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
 
Iaetsd a novel vlsi dht algorithm for a highly modular and parallel
Iaetsd a novel vlsi dht algorithm for a highly modular and parallelIaetsd a novel vlsi dht algorithm for a highly modular and parallel
Iaetsd a novel vlsi dht algorithm for a highly modular and parallel
 
HIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTHIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFT
 
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
 
F011123134
F011123134F011123134
F011123134
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
 

Viewers also liked

An optimized framework for detection and tracking of video objects in challen...
An optimized framework for detection and tracking of video objects in challen...An optimized framework for detection and tracking of video objects in challen...
An optimized framework for detection and tracking of video objects in challen...ijma
 
The kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingThe kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingijma
 
Adaptive trilateral filter for hevc standard
Adaptive trilateral filter for hevc standardAdaptive trilateral filter for hevc standard
Adaptive trilateral filter for hevc standardijma
 
Column store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setColumn store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setijma
 
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel ApproachMining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approachijma
 
System analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systemsSystem analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systemsijma
 
On deferred constraints in distributed database systems
On deferred constraints in distributed database systemsOn deferred constraints in distributed database systems
On deferred constraints in distributed database systemsijma
 
Integrated system for monitoring and recognizing students during class session
Integrated system for monitoring and recognizing students during class sessionIntegrated system for monitoring and recognizing students during class session
Integrated system for monitoring and recognizing students during class sessionijma
 
The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...ijma
 
A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...ijma
 
A survey on components of virtual 'umrah application
A survey on components of virtual 'umrah applicationA survey on components of virtual 'umrah application
A survey on components of virtual 'umrah applicationijma
 
Real time realistic illumination and rendering of cumulus clouds
Real time realistic illumination and rendering of cumulus cloudsReal time realistic illumination and rendering of cumulus clouds
Real time realistic illumination and rendering of cumulus cloudsijma
 
A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...ijma
 
Performance analysis of image
Performance analysis of imagePerformance analysis of image
Performance analysis of imageijma
 
Indoor 3 d video monitoring using multiple kinect depth cameras
Indoor 3 d video monitoring using multiple kinect depth camerasIndoor 3 d video monitoring using multiple kinect depth cameras
Indoor 3 d video monitoring using multiple kinect depth camerasijma
 
Satellite image compression algorithm based on the fft
Satellite image compression algorithm based on the fftSatellite image compression algorithm based on the fft
Satellite image compression algorithm based on the fftijma
 
Consideration of reputation prediction of ladygaga using the mathematical mod...
Consideration of reputation prediction of ladygaga using the mathematical mod...Consideration of reputation prediction of ladygaga using the mathematical mod...
Consideration of reputation prediction of ladygaga using the mathematical mod...ijma
 
ALGORITHM FOR IMAGE MIXING AND ENCRYPTION
ALGORITHM FOR IMAGE MIXING AND ENCRYPTIONALGORITHM FOR IMAGE MIXING AND ENCRYPTION
ALGORITHM FOR IMAGE MIXING AND ENCRYPTIONijma
 
O desenvolvimento geográfico desigual da suzano papel e celulose no maranhão
O desenvolvimento geográfico desigual da suzano papel e celulose no maranhãoO desenvolvimento geográfico desigual da suzano papel e celulose no maranhão
O desenvolvimento geográfico desigual da suzano papel e celulose no maranhãoajr_tyler
 
Cópia (2) de esc construtivismo
Cópia (2) de esc construtivismoCópia (2) de esc construtivismo
Cópia (2) de esc construtivismofernandodiasnet
 

Viewers also liked (20)

An optimized framework for detection and tracking of video objects in challen...
An optimized framework for detection and tracking of video objects in challen...An optimized framework for detection and tracking of video objects in challen...
An optimized framework for detection and tracking of video objects in challen...
 
The kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingThe kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key finding
 
Adaptive trilateral filter for hevc standard
Adaptive trilateral filter for hevc standardAdaptive trilateral filter for hevc standard
Adaptive trilateral filter for hevc standard
 
Column store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setColumn store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute set
 
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel ApproachMining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
 
System analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systemsSystem analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systems
 
On deferred constraints in distributed database systems
On deferred constraints in distributed database systemsOn deferred constraints in distributed database systems
On deferred constraints in distributed database systems
 
Integrated system for monitoring and recognizing students during class session
Integrated system for monitoring and recognizing students during class sessionIntegrated system for monitoring and recognizing students during class session
Integrated system for monitoring and recognizing students during class session
 
The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...
 
A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...
 
A survey on components of virtual 'umrah application
A survey on components of virtual 'umrah applicationA survey on components of virtual 'umrah application
A survey on components of virtual 'umrah application
 
Real time realistic illumination and rendering of cumulus clouds
Real time realistic illumination and rendering of cumulus cloudsReal time realistic illumination and rendering of cumulus clouds
Real time realistic illumination and rendering of cumulus clouds
 
A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...
 
Performance analysis of image
Performance analysis of imagePerformance analysis of image
Performance analysis of image
 
Indoor 3 d video monitoring using multiple kinect depth cameras
Indoor 3 d video monitoring using multiple kinect depth camerasIndoor 3 d video monitoring using multiple kinect depth cameras
Indoor 3 d video monitoring using multiple kinect depth cameras
 
Satellite image compression algorithm based on the fft
Satellite image compression algorithm based on the fftSatellite image compression algorithm based on the fft
Satellite image compression algorithm based on the fft
 
Consideration of reputation prediction of ladygaga using the mathematical mod...
Consideration of reputation prediction of ladygaga using the mathematical mod...Consideration of reputation prediction of ladygaga using the mathematical mod...
Consideration of reputation prediction of ladygaga using the mathematical mod...
 
ALGORITHM FOR IMAGE MIXING AND ENCRYPTION
ALGORITHM FOR IMAGE MIXING AND ENCRYPTIONALGORITHM FOR IMAGE MIXING AND ENCRYPTION
ALGORITHM FOR IMAGE MIXING AND ENCRYPTION
 
O desenvolvimento geográfico desigual da suzano papel e celulose no maranhão
O desenvolvimento geográfico desigual da suzano papel e celulose no maranhãoO desenvolvimento geográfico desigual da suzano papel e celulose no maranhão
O desenvolvimento geográfico desigual da suzano papel e celulose no maranhão
 
Cópia (2) de esc construtivismo
Cópia (2) de esc construtivismoCópia (2) de esc construtivismo
Cópia (2) de esc construtivismo
 

Similar to Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tukey FFT Using Xilinx Virtex-II Pro, Virtex-5 And Virtex-4 FPGAs

Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...IOSRJECE
 
Iaetsd computational performances of ofdm using
Iaetsd computational performances of ofdm usingIaetsd computational performances of ofdm using
Iaetsd computational performances of ofdm usingIaetsd Iaetsd
 
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmResourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmeSAT Publishing House
 
Review on low power high speed 32 point cyclotomic parallel FFT Processor
Review on low power high speed 32 point cyclotomic parallel FFT ProcessorReview on low power high speed 32 point cyclotomic parallel FFT Processor
Review on low power high speed 32 point cyclotomic parallel FFT ProcessorIRJET Journal
 
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT VLSICS Design
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierIJERA Editor
 
Design of FFT Processor
Design of FFT ProcessorDesign of FFT Processor
Design of FFT ProcessorRohit Singh
 
Survey of Optimization of FFT processor for OFDM Receivers
Survey of Optimization of FFT processor for OFDM ReceiversSurvey of Optimization of FFT processor for OFDM Receivers
Survey of Optimization of FFT processor for OFDM Receiversijsrd.com
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd Iaetsd
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applicationsAgam Goel
 
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET Journal
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
 

Similar to Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tukey FFT Using Xilinx Virtex-II Pro, Virtex-5 And Virtex-4 FPGAs (20)

Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
 
Iaetsd computational performances of ofdm using
Iaetsd computational performances of ofdm usingIaetsd computational performances of ofdm using
Iaetsd computational performances of ofdm using
 
G010233540
G010233540G010233540
G010233540
 
Fft
FftFft
Fft
 
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmResourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
 
Review on low power high speed 32 point cyclotomic parallel FFT Processor
Review on low power high speed 32 point cyclotomic parallel FFT ProcessorReview on low power high speed 32 point cyclotomic parallel FFT Processor
Review on low power high speed 32 point cyclotomic parallel FFT Processor
 
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic Multiplier
 
Design of FFT Processor
Design of FFT ProcessorDesign of FFT Processor
Design of FFT Processor
 
Survey of Optimization of FFT processor for OFDM Receivers
Survey of Optimization of FFT processor for OFDM ReceiversSurvey of Optimization of FFT processor for OFDM Receivers
Survey of Optimization of FFT processor for OFDM Receivers
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformation
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applications
 
20120140506008 2
20120140506008 220120140506008 2
20120140506008 2
 
A new radix 4 fft algorithm
A new radix 4 fft algorithmA new radix 4 fft algorithm
A new radix 4 fft algorithm
 
Ppt on fft
Ppt on fftPpt on fft
Ppt on fft
 
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
 
B1030610
B1030610B1030610
B1030610
 
D0542130
D0542130D0542130
D0542130
 
My paper
My paperMy paper
My paper
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tukey FFT Using Xilinx Virtex-II Pro, Virtex-5 And Virtex-4 FPGAs

  • 1. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 DOI : 10.5121/ijma.2013.5407 85 Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tukey FFT Using Xilinx Virtex-II Pro, Virtex-5 And Virtex-4 FPGAs Narayanam Ranganadh1 , Muni Guravaiah P2 , Bindu Tushara D3 1 Department of Electronics and Communications Engineering, Bharat Institute of Engineering and Technology, Hyderabad, A.P., India. rnara100@uottawa.ca,rnara100@biet.ac.in,rnara100@gmail.com 2 Department of Electronics and Communications Engineering, Bharat Institute of Engineering and Technology, Hyderabad, A.P., India. 3 Department of Electronics and Communications Engineering, Vignan Institute of Technology and Science, Hyderabad, A.P., India. ABSTRACT A large family of signal processing techniques consist of Fourier-transforming a signal, manipulating the Fourier-transformed data in a simple way, and reversing the transformation. We widely use Fourier frequency analysis in equalization of audio recordings, X-ray crystallography, artefact removal in Neurological signal and image processing, Voice Activity Detection in Brain stem speech evoked potentials, speech processing spectrograms are used to identify phonetic sounds and so on. Discrete Fourier Transform (DFT) is a principal mathematical method for the frequency analysis. The way of splitting the DFT gives out various fast algorithms. In this paper, we present the implementation of two fast algorithms for the DFT for evaluating their performance. One of them is the popular radix-2 Cooley-Tukey fast Fourier transform algorithm (FFT) [1] and the other one is the Grigoryan FFT based on the splitting by the paired transform [2]. We evaluate the performance of these algorithms by implementing them on the Xilinx Virtex-II pro [3], Virtex-5 [4] and Virtex-4 [6] FPGAs, by developing our own FFT processor architectures. Finally we show that the Grigoryan FFT is working faster than Cooley-Tukey FFT, consequently it is useful for higher sampling rates. At the same time we also confirm that Virtex-5 is better platform, for the same architectures, among all these for implementing Grigoryan FFT (FFT algorithm under evaluation), as Virtex-5 FPGAs give highest speed of operation for higher sampling rates of FFT. Operating at higher sampling rates is a challenge in DSP applications. Keywords Frequency analysis, fast algorithms, DFT, FFT, paired transforms. 1. INTRODUCTION In the recent decades, fast orthogonal transforms have been widely used in areas of data compression, pattern recognition and image reconstruction, interpolation, linear filtering, and spectral analysis. The suitability of unitary transforms in each of the above applications depends on the properties of their basis functions as well as on the existence of fast algorithms, including parallel ones. Since the introduction of the Fast Fourier Transform (FFT), Fourier analysis has become one of the most frequently used tool in signal/image processing and communication systems; The main problem when calculating the transform relates to construction of the
  • 2. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 86 decomposition, namely, the transition to the short DFT’s with minimal computational complexity. The computation of unitary transforms is complicated and time consuming process. Since the decomposition of the DFT is not unique, it is natural to ask how to manage splitting and how to obtain the fastest algorithm of the DFT. The difference between the lower bound of arithmetical operations and the complexity of fast transform algorithms shows that it is possible to obtain FFT algorithms of various speed [2]. One approach is to design efficient manageable split algorithms. Indeed, many algorithms make different assumptions about the transform length. The signal/image processing related to engineering research becomes increasingly dependent on the development and implementation of the algorithms of orthogonal or non-orthogonal transforms and convolution operations in modern computer systems. The increasing importance of processing large vectors and parallel computing in many scientific and engineering applications require new ideas for designing super-efficient algorithms of the transforms and their implementations [2]. In this paper we present the implementation techniques and their results for two different fast DFT algorithms. The difference between the algorithm development lies in the way the two algorithms use the splitting of the DFT. The two fast algorithms considered are radix-2 Cooley- Tukey FFT and paired transform [2] based Grigoryan FFT algorithms. The implementation of the algorithms is done on the Xilinx Viretx-II Pro [3], Virtex-5 [4] and Virtex-4 [6] FPGAs. The performance of the two algorithms is compared in terms of their sampling rates and also in terms of their hardware resource utilization and also in terms of their percentage speed improvements. It is also discussed regarding their speed improvements for Virtex-5 and Virtex-4 over Virtex-II pro. Section 2 presents the paired transform decomposition used in the development of Grigoryan FFT. Section 3 presents the implementation techniques for the radix-2 (Cooley-Tukey FFT) and paired transform (Grigoryan FFT) algorithms on FPGAs. Section 4 presents the results. Finally with the Section 5 we conclude the work and put forward some suggestions for further sampling rate improvements. 2. DECOMPOSITION ALGORITHM OF THE FAST DFT USING PAIRED TRANSFORM In this algorithm the decomposition of the DFT is done by using the paired transform [2]. Let { )(nx }, n = 0:(N-1) be an input signal, N>1. Then the DFT of the input sequence )(nx is X(k) = ∑ − = 1 0 )( N n nk NWnx , k =0:(N-1) (1) which is in matrix form X = xFN ][ (2) where X(k) and )(nx are column vectors, the matrix NF = )1:0(, −= Nkn nk NW , is a permutation of X. ]][]}[[],.....,[],{[ ' 21 NNkNNN WFFFdiagF χ −−−− = (3) which shows the applying transform is decomposed into short transforms NiF , i = 1: k. Let FS be the domain of the transform F the set of sequences f over which F is defined. Let (D;σ ) be a class of unitary transforms revealed by a partition σ . For any transform ∈F (D;σ ), the
  • 3. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 87 computation is performed by using paired transform in this particular algorithm. To denote this type of transform, we introduce “paired functions [2].” Let ∈tp, period N, and let    = = .;0 );(mod;1 )(, otherwise Ntnp ntpχ n = 0: (N-1) (4) Let L be a non trivial factor of the number N, and LW = L e /2Π , then the complex function lkNtp L k k LLtptp W /, 1 0 ' ;, , , + − = ∑== χχχ (5) t = 0:( )1/ −LN , p€ to the period 0: N-1 is called L-paired function [2]. Basing on this paired functions the complete system of paired functions can be constructed. The totality of the paired functions in the case of N=2r is {{ }1)},1(:0),12(:0; 1 2,2 −=−= −− rnt nr tnnχ (6) Now considering the case of N = 2r N1, where N1 is odd, r ≥ 1 for the application of the paired transform. a) The totality of the partitions is ),.....,,,( ' 2;2 ' 2;4 ' 2;2 ' 2;1 ' rTTTT=σ b) The splitting of NF by the partition ' σ is { },,.....,, 12/4/2/ NNNN FFFF r c) The matrix of the transform can be represented as [ NF ] = 1 [ ] r n LnF =        ⊕ [W ][ ' nχ ] .,2/ 1 1 NLNL r n n == + Where the [W ] is diagonal matrix of the twiddle factors. The steps involved in finding the DFT using the paired transform are given below: 1) Perform the L-paired transform g = )(' xNχ over the input x 2) Compose r vectors by dividing the set of outputs so that the first 1−r L elements g1,g2,…..,gL r-1 compose the first vector X1, the next Lr-2 elements gLr-1+1,…..,gLr-1+Lr-2 compose the second vector X2, etc. 3) Calculate the new vectors Yk, k=1:(r-1) by multiplying element-wise the vectors Xk by the corresponding turned factors 1, )(,,...,, 1/2 krLt ttt LtWWW −− = ,(t=Lr-k). Take Yr=Xr 4) Perform the Lr-k-point DFT’s over Yk, k=1: r 5) Make the permutation of outputs, if needed.
  • 4. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 88 3. IMPLEMENTATION TECHNIQUES We have implemented various architectures for radix-2 and paired transform processors on Virtex-II Pro,Virtex-5 and Virtex-4 FPGAs. As there are embedded multipliers [3] and embedded block RAMs [3] available, we can use them without using distributed logic, which economize some of the CLBs [3]. Virtex-5 [4] is having DSP48E slices As we are having DSP48E slices on Virtex-5 FPGAs, to utilize them and improve speed performance of these 2 FFTs and to compare their speed performances on Virtex-5 FPGAs and Virtex-II pro FPGAs. We did the same for Virtex-4 FPGAs to observe the performance of both algorithms on Virtex-II pro and Virtex-4 by utilizing XtremeDSP slices. As most of the transforms are applied on complex data, the arithmetic unit always needs two data points at a time for each operand (real part and complex part), dualport RAMs are very useful in all these implementation techniques. In the Fast Fourier Transform process the butterfly operation is the main unit on which the speed of the whole process of the FFT depends. So the faster the butterfly operation, the faster the FFT process. The adders and subtractors are implemented using the LUTs (distributed arithmetic). The inputs and outputs of all the arithmetic units can be registered or non-registered. Various possible implementations of multipliers we considered are: Embedded multiplier: a) With non-registered inputs and outputs b) With registered inputs or outputs, and c) With registered inputs and outputs. Distributed multiplier: Distributed multipliers are implemented using the LUTs in the CLBs. These can also be implemented with the above three possible ways. Various considerations made to implement butterfly operation for its speed improvement and resource requirements. Basing on the availability of number of Embedded multipliers and design feasibility we have implemented both multiplication processes. The various architectures proposed for implementing radix-2 and paired transform processors are single memory (pair) architecture, dual memory (pair) architecture and multiple memory (pair) architectures. We applied the following two best butterfly techniques for the implementation of the processors on the FPGAs [3]. 1. One with Distributed multipliers, with fully pipelined stages. (Best in case of performance) 2. One with embedded multipliers and one level pipelining. (Best in case of resource utilization) Single memory (pair) architecture (shown in Figure 1) is suitable for single snapshot applications, where samples are acquired and processed thereafter. The processing time is typically greater than the acquisition time. The main disadvantage in this architecture is while doing the transform process we cannot load the next coming data. We have to wait until the current data is processed. So we proposed dual memory (pair) architecture for faster sampling rate applications (shown in Figure 2). In this architecture there are three main processes for the transformation of the sampled data. Loading the sampled data into the memories, processing the loaded data, Reading out the processed data. As there are two pairs of dual port memories available, one pair can be used for loading the incoming sampled data, while at the same time the other pair can be used for processing the previously loaded sampled data. For further sampling rate improvements we proposed multiple memory (pair) architecture (shown in Figure 3). This is the best of all
  • 5. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 89 architectures in case of very high sampling rate applications, but in case of hardware utilization it uses lot more resources than any other architecture. In this model there is a memory set, one arithmetic unit for each iteration. The advantage of this model over the previous models is that we do not need to wait until the end of all iterations (i.e. whole FFT process), to take the next set of samples to get the FFT process to be started again. We just need to wait until the end of the first iteration and then load the memory with the next set of samples and start the process again. After the first iteration the processed data is transferred to the next set of RAMs, so the previous set of RAMs can be loaded with the next coming new data samples. This leads to the increased sampling rate. Coming to the implementation of the paired transform based DFT algorithm, there is no complete butterfly operation, as that in case of radix-2 algorithm. According to the mathematical description given in the Section 2, the arithmetic unit is divided into two parts, addition part and multiplication part. This makes the main difference between the two algorithms, which causes the process of the DFT completes earlier than the radix-2 algorithm. The addition part of the algorithm for 8-point transform is shown in Figure 4. The architectures are implemented for the 8- point 64-point,128-point,256-point transforms for Viretx-II Pro and Virex-5 and Virtex-4 FPGAs. The radix-2 FFT algorithm is efficient in case of resource utilization and the paired transform algorithm is very efficient in case of higher sampling rate applications. Figure 1. Single memory (pair) architecture
  • 6. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 90 Figure 2. Dual memory (pair) architecture Figure 3. Multiple memory (pair) architecture (Transform length = N = 2n ) (1,2);(3,4);(5,6) ---- (-,-) memory pairs for each iteration. ----- Butterfly unit for each iteration. x1,0 x1,1 x 1,2 x 1,3 x 2,0 x2,2 x4,0 x0,0 x(0) x(1) x(2) x(3) x(4) x(5) x(6) x(7) add operation subtract operation Figure 4. Figure showing the addition part of the 8-point paired transform based DFT
  • 7. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 91 3. THE IMPLEMENTATION RESULTS Results obtained on Virtex-II Pro and Virtex-5 and Virtex-4 FPGAs: The hardware modeling of the algorithms is done by using Xilinx’s system generator plug-in software tool running under SIMULINK environment provided under the Mathworks’s MATLAB software. The functionality of the model is verified using the SIMULINK Simulator and the MODELSIM software as well. The implementation is done using the Xilinx project navigator backend software tools. Table 1 shows the implementation results of the two algorithms on the Virtex-II Pro FPGAs. Table 2 shows the implementation results of the two algorithms on Virtex-5 FPGAs using DSP48E slices. Table 1 and 2 also show the percentage improvement in speed of Grigoryan FFT over Cooley-Tukey FFT. Table 3 shows the implementation results and percentage speed improvement in speed of Grigorayn FFT on Virtex-4 FPGAs using ExtremeDSP Slices. Table 4 shows the percentage improvement in speed of operation over Virtex-II Pro FPGAs, of Virtex-5 and Virtex-4; of both Cooley-Tukey and Grigoryan FFT algorithms. From Tables 1, 2 and 3 we can see that Grigoryan FFT is always faster than the Cooley-Tukey FFT algorithm. Thus paired-transform based algorithm can be used for higher sampling rate applications. In military applications, while doing the process, only some of the DFT coefficients are needed at a time. For this type of applications paired transform can be used as it generates some of the coefficients earlier, and also it is very fast. But in terms of resource utilization Grigoryan FFT is utilizing more resources than Cooley-Tukey FFT, which makes Grigoryan FFT to be more expensive, but with higher speeds of operation. From Table 4 results we can easily verify that Grigoryan FFT can be utilized at higher sampling rates than Cooley-Tukey FFT. It is also clear that the high speed DSP slices are making the FFTs much faster than Virtex-II Pro FPGAs, both in case of Virtex-4 and Virtex-5 FPGAs. From the Table 4 we can also observe that for the same architectures the Virtex-5 platform is giving better speed of operation than Virtex-II pro and also on Virtex-4 FPGAs. So in the case of Grigoryan FFT, which is under performance evaluations in this research, it is preferable to choose Virtex-5 FPGA platforms. Table 1. Efficient performance of Grigoryan FFT over Cooley-Tukey FFT, on Virtex-II Pro FPGAs. Table showing the sampling rates and the resource utilization summaries for both the algorithms, implemented on the Virtex-II Pro FPGAs. No. of points Cooley-Tukey radix- 2 FFT Grigoryan Radix-2 FFT % improvement In speed from Cooley-Tukey to Grigoryan FFT Max. freq. MHz No. of mult. No. of Slices Max.Freq. MHz No. of mult No. of slices 8 35 4 264 35 8 475 0 64 42.45 4 480 52.5 8 855 23.67 128 50.25 12 560 60 16 1248 19.40 256 55.40 24 648 68.75 48 1985 24.09
  • 8. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 92 Table 2. Efficient performance of Grigoryan FFT over Cooley-Tukey FFT, on Virtex-5 FPGAs. Table showing the sampling rates and the resource utilization summaries for both the algorithms, implemented on the Virtex-5 FPGAs. We have utilized DSP48E slices in this, which is making us much faster than Virtex-II Pro FPGAs. No. of points Cooley-Tukey radix-2 FFT Grigoryan Radix-2 FFT % improvement In speed from Cooley- Tukey to Grigoryan FFT Max. freq. MHz No. of Slice s No. of DSP48 E slices Ma x. fre q M Hz No. of Slice s No.o f DSP 48E slices 8 48 200 4 52 350 4 8.33 64 55.25 375 6 60 700 8 8.59 128 60.50 450 11 80 1000 26 32.23 256 75 560 14 85 1645 82 13.33 Table 3. Efficient performance of Grigoryan FFT over Cooley-Tukey FFT, on Virtex-4 FPGAs. Table showing the sampling rates and the resource utilization summaries for both the algorithms, implemented on the Virtex-4 FPGAs. We have utilized Extreme DSP slices in this, which is making us much faster than Virtex-II Pro FPGAs. No. of points Cooley-Tukey radix-2 FFT Grigoryan Radix-2 FFT % improvemen t In speed from Cooley- Tukey to Grigoryan FFT Max. freq. MHz No. of Slic -es No. of Extre me DSP slices Max. freq MHz No. of Slice s No.of DSP48 E slices 8 41.38 258 4 50 450 6 20.83 64 48 400 7 56.09 800 8 16.85 128 53 496 12 72 1120 28 35.85 256 67 598 20 79.95 1735 86 19.33
  • 9. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 93 Table 4. The percentage improvement in speed of operation over Virtex-II Pro FPGAs, of Virtex-5 and Virtex-4; of both Cooley-Tukey and Grigoryan FFT algorithms. It shows clearly that Virtex-5 plat form is better than Virtex-4 and also Virtex-II pro, in terms of speed of operation. Number Of Sample points of FFT % Improvement in speed over Xilinx Virtex-II Pro FPGAs Cooley-Tukey FFT Grigoryan FFT Virtex - 5 Virtex - 4 Virtex - 5 Virtex - 4 8 37.14 18.23 48.57 42.86 64 30.15 13.07 14.29 6.84 128 20.40 5.47 33.33 20 256 35.38 20.94 23.64 16.29 4. CONCLUSIONS AND FURTHER RESEARCH In this paper we have shown that on Virtex-II Pro, Virtex-5 and Virtex-4 FPGAs the paired transform based Grigoryan FFT algorithm is faster and can be used at higher sampling rates than the Cooley-Tukey FFT at an expense of high resource utilization. Grigoryan FFT and also Cooley-Tukey FFT both are running at higher speeds on Virtex-5 platform. So we ultimately recommend these implementations to be done onto Virtex-5 FPGAs than onto any other FPGAs. We are planning to implement these algorithms onto Latest TMS DSP processors, and also by utilizing MAC engines on them; for extensive verification of these two FFT algorithms. Then we would like to standardize Grigoryan FFT and apply for real time applications. ACKNOWLEDGEMENTS This is to thank Dr. Parimal A. Patel, Dr. Artyom M. Grigoryan of The University of Texas at San Antonio; regarding their valuable guidance to do the first part of this research series as my (Mr. Narayanam) master’s thesis. Their support will always be with me to make Grigoryan FFT as a standarad FFT as our patent in the industrial market. I would like to thank also University of Ottawa research facility providers to provide me to do some critical part of this extended research. REFERENCES [1] James W. Cooley and John W. Tukey, “An algorithm for the machine calculation of complex Fourier Series”, Math. comput. 19, 297-301 (1965). [2] Artyom M. Grigoryan and Sos S. Agaian, “Split Manageable Efficient Algorithm for Fourier and Hadamard transforms”, IEEE Transactions on Signal Processing, Volume: 48, Issue: 1, Pages: 172 – 183, Jan.2000. [3] Virtex-II Pro platform FPGAs: detailed description http://www.xilinx.com/support/documentation/data_sheets/ds083.pdf [4] Virtex-5 platform FPGAs: detailed description http://www.xilinx.com/support/documentation/virtex-5_user_guides.htm [5] N. Ranganadh, P. Patel, and A.M. Grigoryan, “Case study of Grigoryan FFT onto FPGAs and DSPs” , IEEE proceedings, IEEE ICECT 2012, 2012 4th International Conference on Electronics Computer Technology (ICECT 2012), kanya kumari – 2012. [6] Virtex-4 platform FPGAs: detailed description http://www.xilinx.com/support/documentation/virtex-4_user_guides.htm [7] The scientist and engineer’s guide to Digital Signal Processing, Dr. Steven W. Smith.
  • 10. The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.4, August 2013 94 [8] N. Ranganadh, P. Patel, and A.M. Grigoryan, "Implementation of the DFT using radix-2 and paired transform algorithms," 17th International Conference on Computer Applications in Industry and Engineering, CAINE-2004, Orlando, Florida USA, Nov. 17-19, 2004. [9] N. Ranganadh, P. Patel, and A.M. Grigoryan, “performances of Texas instruments DSP and Xilinx FPGAs for Cooley-Tukey and Grigoryan FFT algorithms”, Wolters Kluwer and Medknow International Journal of Engineering and Technology, Jul-Dec 2011,Vol 1,Issue 2. Authors Mr. Ranganadh Narayanam is an Assistant professor in the department of Electronics & Communications Engineering in Bharat Institute of Engineering & Technology (BIET). This research is continuation of the research done in the Univeristy of Texas at San Antonio under the guidance of Dr. Parimal A. Patel, Dr. Artyom M. Grigoryan, as my master’s thesis. This current research was partly funded by BIET. Mr.Narayanam, a research student in the area of “Brain Stem Speech Evoked Potentials” under the guidance of Dr. Hilmi Dajani of University of Ottawa,Canada. He was also a research student in The University of Texas at San Antonio under Dr. Parimal A Patel,Dr. Artyom M. Grigoryan, Dr Sos Again, Dr. CJ Qian,. in the areas of signal processing and digital systems, control systems. He worked in the area of Brian Imaging in University of California Berkeley. Mr. Narayanam is having around 5+2 years of full time teaching & research experiences respectively; and more than 5 years of entry level research experience and more than 15 publications. Mr. Narayanam’s research interests include neurological Signal & Image processing, DSP software & Hardware design and implementations, neurotechnologies. Mr. Narayanam was a TPGFS grant holder Through The University of Texas at San Antonio for the year 2003 to 2004. Mr. Narayanam can also be contacted at rnara100@uottawa.ca, rnara100@gmail.com, rnara100@biet.ac.in, ranganadh.narayanam@gmail.com Mr. P. Muni Guravaiah is an Associate Professor & Admin. Incharge of Department of Electronics & Communications Engineering in BIET. He is currently pursuing Ph.D. in Jawaharlal Nehru Technological University Hyderabad, in the area of VLSI Design. His research interests include DSP; VLSI Design. He is having around 12-15 years of accumulated Teaching and Research experiences. Ms. Bindu Tushara D. is currently an Assistant Professor in Vignan Institute of Technology & Science (VITS). She is a double gold medalist of Jawaharlal Nehru Technological University as the best out going student of the year 2009. She has received medals from Governer of Andhara Pradesh. Her research interests are DSP and Wireless Communications.