1. IMPLEMENTATION OF CORDIC
ALGORITHM ON FPGA
HDL MINI PROJECT
Submitted By: PANDU RANGA M (M150213EC)
VIVEK KUMAR SHUKLA (M150149EC)
NIT CALICUT
2. 1
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
IMPLEMENTATION OF CORDIC
ALGORITHM ON FPGA
HDL MINI PROJECT
ABSTRACT
The aim of this project is to implement the Cordic Algorithm on FPGA for calculation of sine
function. CORDIC (Coordinate Rotation Digital Computer) is a method for computing functions
like trigonometric, exponential and other elementary mathematical functions. Cordic algorithm
will be implemented on FPGA using Spartan 3E kit. The input angle will be taken from
keyboard and result will be shown LCD module of the Spartan 3E kit.
THEORY
FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a
customer or a designer after manufacturing – hence "field-programmable".
FPGAs are programmable digital logic chips. What that means is that it can be programmed to
do almost any digital function. A computer can be used to describe a "logic function" or else one
can draw a schematic, or create a text file describing the function, doesn't matter.
SPARTAN 3E compile the "logic function" on computer, using software provided by the FPGA
vendor. That creates a binary file that can be downloaded into the FPGA.
SPARTAN 3E connect a cable from a computer to the FPGA, and download the binary file to
the FPGA. It can download FPGAs as many time as we want - no limit - with different
functionalities every time if we want.
FPGAs lose their functionality when the power goes away (like RAM in a computer that loses its
content). we have to re-download them when power goes back up to restore the functionality.
3. 2
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
The Spartan-3E Starter Kit board highlights the unique features of the Spartan-3E FPGA family
and provides a convenient development board for embedded processing applications.
Spartan-3E FPGA specific features-
Parallel NOR Flash configuration
MultiBoot FPGA configuration from Parallel NOR Flash PROM
SPI serial Flash configuration
Embedded development
Micro Blaze™ 32-bit embedded RISC processor
Pico Blaze™ 8-bit embedded controller
DDR memory interfaces
2-line, 16-character LCD screen
PS/2 mouse or keyboard port
50 MHz clock oscillator
Eight discrete LEDs
Four slide switches
Four push-button switches
ABOUT CORDIC ALGORITHM
CORDIC (Coordinate Rotation Digital Computer) is a method for computing elementary
functions using minimal hardware such as shifts, adds/subs and compares.
CORDIC works by rotating the coordinate system through constant angles until the angle is
reduces to zero. The angle offsets are selected such that the operations on X and Y are only shifts
and adds
This section describes the mathematics behind the CORDIC algorithm.
The CORDIC algorithm performs a planar rotation. Graphically, planar rotation means
Y
X
(Xj, Yj)
(Xi, Yi)
4. 3
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
transforming a vector (Xi, Yi) into a new vector (Xj, Yj).
Using a matrix form, a planar rotation for a vector of (Xi, Yi) is defined as
i
i
j
j
Y
X
Y
X
cossin
sincos
(1)
The q angle rotation can be executed in several steps, using an iterative process. Each step
completes a small part of the rotation. Many steps will compose one planar rotation. A single
step is defined by the following equation:
n
n
nn
nn
n
n
Y
X
Y
X
cossin
sincos
1
1
(2)
Equation 2 can be modified by eliminating the ncos factor.
n
n
n
n
n
n
n
Y
X
Y
X
1tan
tan1
cos
1
1
(3)
Equation 3 requires three multiplies, compared to the four needed in equation 2.
Additional multipliers can be eliminated by selecting the angle steps such that the tangent of a
step is a power of 2. Multiplying or dividing by a power of 2 can be implemented using a simple
shift operation.
The angle for each step is given by
nn
2
1
arctan (4)
All iteration-angles summed must equal the rotation angle q.
0n
nnS (5)
where
1;1 nS (6)
This results in the following equation for ntan
n
nn S
2tan (7)
Combining equation 3 and 7 results in
5. 4
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
n
n
n
n
n
n
n
n
n
Y
X
S
S
Y
X
12
21
cos
1
1
(8)
Besides for the ncos coefficient, the algorithm has been reduced to a few simple shifts and
additions. The coefficient can be eliminated by pre-computing the final result. The first step is to
rewrite the coefficient.
nn
2
1
arctancoscos (9)
The second step is to compute equation 9 for all values of ‘n’ and multiplying the results, which
we will refer to as K.
607253.0
2
1
arctancos
1
0
n
n
P
K (10)
K is constant for all initial vectors and for all values of the rotation angle, it is normally referred
to as the congregate constant. The derivative P (approx. 1.64676) is defined here because it is
also commonly used.
We can now formulate the exact calculation the CORDIC performs.
sincos
sincos
iij
iij
XYKY
YXKX
(11)
Because the coefficient K is pre-computed and taken into account at a later stage, equation 8 may
be written as
n
n
n
n
n
n
n
n
Y
X
S
S
Y
X
12
21
1
1
(12)
or as
n
n
nnn
n
n
nnn
XSYY
YSXX
2
1
2
1
2
2
(13)
At this point a new variable called ‘Z’ is introduced. Z represents the part of the angle q which
has not been rotated yet.
n
i
inZ
0
1 (14)
For every step of the rotation Sn is computed as a sign of Zn.
6. 5
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
01
01
n
n
n
Zif
Zif
S (15)
Combining equations 5 and 15 results in a system which reduces the not rotated part of angle q to
zero.
Or in a program-like style:
For n=0 to [inf]
If (Z(n) >= 0) then
Z(n + 1) := Z(n) – arctan(1/2^n);
Else
Z(n + 1) := Z(n) + arctan(1/2^n);
End if;
End for;
The atan(1/2^i) is pre-calculated and stored in a table. [inf] is replaced with the required number
of iterations, which is about 1 iteration per bit (16 iterations yield a 16bit result).
If we add the computation for X and Y we get the program-like style for the CORDIC core.
For n=0 to [inf]
If (Z(n) >= 0) then
X(n + 1) := X(n) – (Yn/2^n);
Y(n + 1) := Y(n) + (Xn/2^n);
Z(n + 1) := Z(n) – atan(1/2^n);
Else
X(n + 1) := X(n) + (Yn/2^n);
Y(n + 1) := Y(n) – (Xn/2^n);
Z(n + 1) := Z(n) + atan(1/2^n);
End if;
End for;
This algorithm is commonly referred to as driving Z to zero. The CORDIC core computes:
0,sincos,sincos,, iiiiiiiijjj ZXZYPZYZXPZYX
7. 6
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
LCD Display
LCD (Liquid Crystal Display) screen is an electronic display module and find a wide range of
applications. A 16x2 LCD display is very basic module and is very commonly used in various
devices and circuits. These modules are preferred over seven segments and other multi segment
LEDs. The reasons being: LCDs are economical; easily programmable; have no limitation of
displaying special & even custom characters (unlike in seven segments), animations and so on.
A 16x2 LCD means it can display 16 characters per line and there are 2 such lines. This LCD has
two registers, namely, Command and Data.
The command register stores the command instructions given to the LCD. A command is an
instruction given to LCD to do a predefined task like initializing it, clearing its screen, setting the
cursor position, controlling display etc. The data register stores the data to be displayed on the
LCD. The data is the ASCII value of the character to be displayed on the LCD.
Following diagram shows the ASCII value of each character that can be displayed on LCD.
Initialization of Display
The initialization sequence is simple and ideally suited to the highly-efficient 8-bit picoblaze
embedded controller. After initialization, the PicoBlaze controller is available for more complex
control or computation beyond simply driving the display.After power-on, the display must be
initialized to establish the required communication protocol.
Power-On Initialization-
The initialization sequence first establishes that the FPGA application wishes to use the four-bit
data interface to the LCD as follows:
1. Wait 15 ms or longer, although the display is generally ready when the FPGA finishes
configuration. The 15 ms interval is 750,000 clock cycles at 50 MHz.
2. Write SF_D<11:8> = 0x3, pulse LCD_E High for 12 clock cycles.
3. Wait 4.1 ms or longer, which is 205,000 clock cycles at 50 MHz.
4. Write SF_D<11:8> = 0x3, pulse LCD_E High for 12 clock cycles.
8. 7
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
5. Wait 100 μs or longer, which is 5,000 clock cycles at 50 MHz.
6. Write SF_D<11:8> = 0x3, pulse LCD_E High for 12 clock cycles.
7. Wait 40 μs or longer, which is 2,000 clock cycles at 50 MHz.
8. Write SF_D<11:8> = 0x2, pulse LCD_E High for 12 clock cycles.
9. Wait 40 μs or longer, which is 2,000 clock cycles at 50 MHz.
Display Configuration-
After the power-on initialization is completed, the four-bit interface is now established. The
next part of the sequence configures the display:
1. Issue a function set command, 0x28, to configure the display for operation on the
Spartan-3E Starter Kit board.
2. Issue an Entry Mode Set command, 0x06, to set the display to automatically
increment the address pointer.
3. Issue a Display On/Off command, 0x0C, to turn the display on and disables the
cursor and blinking.
4. Finally, issue a Clear Display command. Allow at least 1.64 ms (82,000 clock cycles)
after issuing this command.
Writing Data to the Display
The board uses a 4-bit data interface to the character LCD. The following figures illustrates a
write operation to the LCD, showing the minimum times allowed for setup, hold, and enable
pulse length relative to the 50 MHz clock (20 ns period) provided on the board.
Figure 1: Timing diagram to show write operation
9. 8
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
DESIGN AND DESCRIPTION
Parallel/Cascaded CORDIC Architecture
Combinational circuit
� More Delay, but processing time is reduced as compared to iterative circuit.
� Shifters are of fixed shift, so they can be implemented in the wiring.
� Constants can be hardwired instead of requiring storage space.
FIGURE 2:PARALLEL CORDIC
Parallel Pipelined CORDIC Architecture
Parallel CORDIC can be pipelined by inserting registers between the adders stages.
In most FPGA architectures there are already registers present in each logic cell, so pipeline
registers has no hardware cost.
Number of stages after which pipeline register is inserted can be modeled, considering clock
frequency of system.
10. 9
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
When operating at greater clock period power consumption in later stages reduces due to lesser
switching activity in each clock period.
SCHEMATIC DIAGRAM
VHDL CODE
For cordic algorithm(sine)
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;
use work.packint.all;
entity cordic_algorithm is
port(clk,reset:in std_logic;
11. 10
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
angle :in std_logic_vector(15 downto 0);--angle in radians qformat Q13
sine:out std_logic_vector(15 downto 0) --value in Q14
);
end cordic_algorithm;
architecture Behavioral of cordic_algorithm is
signal
x,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14:std_logic_vector(15
downto 0):=(others=>'0');
signal
y,y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14:std_logic_vector(15
downto 0):=(others=>'0');
signal
temp1,temp2,temp3,temp4,temp5,temp6,temp7,temp8,temp9,temp10,temp11:std_logic
_vector(15 downto 0):=(others=>'0');
signal temp12,temp13,temp14,temp15:std_logic_vector(15 downto
0):=(others=>'0');
signal acc1,
acc2,acc3,acc4,acc5,acc6,acc7,acc8,acc9,acc10,acc11,acc12:std_logic_vector(15
downto 0):=(others=>'0');
signal acc13,acc14,acc15:std_logic_vector(15 downto 0):=(others=>'0');
signal sine_result:std_logic_vector(31 downto 0);
begin
process(reset,clk)
begin
if(reset = '1') then
acc2<=x"0000";
acc3<=x"0000";
acc4<=x"0000";
acc5<=x"0000";
acc6<=x"0000";
acc7<=x"0000";
acc8<=x"0000";
acc9<=x"0000";
18. 17
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
SIMULATION WAVEFORM
SYNTHESIS REPORT
Final Report
=====================================================================
Final Results
RTL Top Level Output File Name : cordic_algorithm.ngr
Top Level Output File Name : cordic_algorithm
Output Format : NGC
Optimization Goal : Speed
Keep Hierarchy : No
Design Statistics
# IOs : 34
Cell Usage :
# BELS : 2398
# GND : 1
# INV : 34
# LUT1 : 19
# LUT2 : 436
# LUT3 : 368
# LUT4 : 59
# MUXCY : 841
19. 18
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
# VCC : 1
# XORCY : 639
# FlipFlops/Latches : 642
# FDC : 212
# FDE : 430
# Clock Buffers : 1
# BUFGP : 1
# IO Buffers : 33
# IBUF : 17
# OBUF : 16
# MULTs : 1
# MULT18X18SIO : 1
=====================================================================
Device utilization summary:
Selected Device : 3s500efg320-4
Number of Slices: 461 out of 4656 9%
Number of Slice Flip Flops: 642 out of 9312 6%
Number of 4 input LUTs: 916 out of 9312 9%
Number of IOs: 34
Number of bonded IOBs: 34 out of 232 14%
Number of MULT18X18SIOs: 1 out of 20 5%
Number of GCLKs: 1 out of 24 4%
20. 19
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
VHDL CODE TO INTERFACE LCD
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use work.packint.all;
entity lcd is
port(clk, reset : in std_logic;
SF_D : out std_logic_vector(3 downto 0);
LCD_E, LCD_RS, LCD_RW, SF_CE0 : out std_logic);
end lcd;
architecture behavior of lcd is
signal display_byte:std_logic_vector(19 downto 0):=(others=>'0');
signal m:std_logic_vector(7 downto 0);
signal radian: std_logic_vector(15 downto 0):=x"0000";
signal valueout: std_logic_vector(15 downto 0):=x"0000";
type tx_sequence is (high_setup, high_hold, oneus, low_setup, low_hold,
fortyus, done);
signal tx_state : tx_sequence := done;
signal tx_byte : std_logic_vector(7 downto 0);
signal tx_init : std_logic := '0';
type init_sequence is (idle, fifteenms, one, two, three, four, five, six,
seven, eight, done);
signal init_state : init_sequence := idle;
signal init_init, init_done : std_logic := '0';
signal i : integer range 0 to 750000 := 0;
signal i2 : integer range 0 to 2000 := 0;
signal i3 : integer range 0 to 82000 := 0;
signal SF_D0, SF_D1 : std_logic_vector(3 downto 0);
21. 20
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
signal LCD_E0, LCD_E1 : std_logic;
signal mux : std_logic;
type display_state is (init, function_set,entry_set, set_display,
clr_display, pause, set_addr, char_sign, char_1, char_dot,
char_2,char_3,char_4,char_5, done);
signal cur_state : display_state := init;
signal binary_int:integer;
signal inter :std_logic_vector(31 downto 0);
signal inter2:std_logic_vector(31 downto 0);
begin
m<=x"3C";--input angle for calculation of value
degreetoradian_unit:entity work.degreetoradian port map(m,radian);
cordic_unit:entity work.cordic_algorithm port map(clk,reset,radian,valueout);
binary_int<=binary_to_int(valueout);
inter<=convert_bin(binary_int);
inter2<=divide(inter,x"000003E8");
bcdunit:entity work.binary_bcd port map(clk,reset,inter2,display_byte);
SF_CE0 <= '1'; --disable intel strataflash
LCD_RW <= '0'; --write only
--The following "with" statements simplify the process of adding and removing
states.
--when to transmit a command/data and when not to
with cur_state select
tx_init <= '0' when init | pause | done,
'1' when others;
--control the bus
with cur_state select
mux <= '1' when init,
'0' when others;
--control the initialization sequence
22. 21
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
with cur_state select
init_init <= '1' when init,
'0' when others;
--register select
with cur_state select
LCD_RS <= '0' when function_set|set_display|clr_display|set_addr|entry_set,
'1' when others;
display: process(clk, reset)
begin
if(reset='1') then
cur_state <= function_set;
elsif(clk='1' and clk'event) then
case cur_state is
when init => if(init_done = '1') then
cur_state <= function_set;
else
cur_state <= init;
end if;
when function_set => tx_byte <= "00101000";
if(i2 = 2000) then
cur_state <= entry_set;
else
cur_state <= function_set;
end if;
when entry_set => tx_byte <= "00000110";
if(i2 = 2000) then
cur_state <= set_display;
else
cur_state <= entry_set;
23. 22
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
end if;
when set_display => tx_byte <= "00001100";
if(i2 = 2000) then
cur_state <= clr_display;
else
cur_state <= set_display;
end if;
when clr_display => tx_byte <= "00000001";
i3 <= 0;
if(i2 = 2000) then
cur_state <= pause;
else
cur_state <= clr_display;
end if;
when pause =>
else if(i3 = 82000) then
cur_state <= set_addr;
i3 <= 0;
cur_state <= pause;
i3 <= i3 + 1;
end if;
when set_addr => tx_byte <= "10000000";
if(i2 = 2000) then
cur_state <= char_sign;
else
cur_state <= set_addr;
end if;
when char_sign =>
if(i2 = 2000) then
cur_state <= char_1;
else
24. 23
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
if(valueout(15)='1') then
tx_byte<=X"2D";
elsif(valueout(15)='0') then
tx_byte<=X"2B";
cur_state <= char_sign;
end if;
end if;
when char_1=>
if(i2 = 2000) then
cur_state <= char_dot;
else
if(valueout(14)='0') then
tx_byte<=X"30";
elsif(valueout(14)='1') then
tx_byte<=X"31";
cur_state <= char_sign;
end if;
cur_state <= char_1;
end if;
when char_dot => tx_byte<=X"2E";
if(i2 = 2000) then
cur_state <= char_2;
else
cur_state <= char_dot;
end if;
when char_2 =>tx_byte<=x"3"&display_byte(15 downto 12);
if(i2 = 2000) then
cur_state <= char_3;
else
cur_state <= char_2;
end if;
25. 24
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
when char_3 => tx_byte<=x"3"&display_byte(11 downto 8);
if(i2 = 2000) then
cur_state <= char_4;
else
cur_state <= char_3;
end if;
when char_4 => tx_byte<=x"3"&display_byte(7 downto 4);
if(i2 = 2000) then
cur_state <= char_5;
else
cur_state <= char_4;
end if;
when char_5 => tx_byte<=x"3"&display_byte(3 downto 0);
if(i2 = 2000) then
cur_state <= done;
else
cur_state <= char_5;
end if;
when done => cur_state <= done;
end case;
end if;
end process display;
with mux select
SF_D <= SF_D0 when '0', --transmit
SF_D1 when others; --initialize
with mux select
LCD_E <= LCD_E0 when '0', --transmit
LCD_E1 when others; --initialize
--specified by datasheet
transmit : process(clk, reset, tx_init)
26. 25
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
begin
if(reset='1') then
tx_state <= done;
elsif(clk='1' and clk'event) then
case tx_state is
when high_setup => --40ns
LCD_E0 <= '0';
SF_D0 <= tx_byte(7 downto 4);
if(i2 = 2) then
tx_state <= high_hold;
i2 <= 0;
else
tx_state <= high_setup;
i2 <= i2 + 1;
end if;
when high_hold => --230ns
LCD_E0 <= '1';
SF_D0 <= tx_byte(7 downto 4);
if(i2 = 12) then
tx_state <= oneus;
i2 <= 0;
else
tx_state <= high_hold;
i2 <= i2 + 1;
end if;
when oneus =>
LCD_E0 <= '0';
if(i2 = 50) then
tx_state <= low_setup;
i2 <= 0;
else
28. 27
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
end if;
when done =>
end case;
end if;
end process transmit LCD_E0 <= '0';
if(tx_init = '1') then
tx_state <= high_setup;
i2 <= 0;
else
tx_state <= done;
i2 <= 0;
end if;
;
--specified by datasheet
power_on_initialize: process(clk, reset, init_init) --power on initialization
sequence
begin
if(reset='1') then
init_state <= idle;
init_done <= '0';
elsif(clk='1' and clk'event) then
case init_state is
when idle => init_done <= '0';
if(init_init = '1') then
init_state <= fifteenms;
i <= 0;
else
init_state <= idle;
i <= i + 1;
29. 28
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
end if;
when fifteenms => init_done <= '0';
if(i = 750000) then
init_state <= one;
i <= 0;
else
init_state <= fifteenms;
i <= i + 1;
end if;
when one => SF_D1 <= "0011";
LCD_E1 <= '1';
init_done <= '0';
if(i = 11) then
init_state<=two;
i <= 0;
else
init_state<=one;
i <= i + 1;
end if;
when two =>
LCD_E1 <= '0';
init_done <= '0';
if(i = 205000) then
init_state<=three;
i <= 0;
else
init_state<=two;
i <= i + 1;
end if;
when three =>
SF_D1 <= "0011";
30. 29
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
LCD_E1 <= '1';
init_done <= '0';
if(i = 11) then
init_state<=four;
i <= 0;
else
init_state<=three;
i <= i + 1;
end if;
when four =>
LCD_E1 <= '0';
init_done <= '0';
if(i = 5000) then
init_state<=five;
i <= 0;
else
init_state<=four;
i <= i + 1;
end if;
when five =>
SF_D1 <= "0011";
LCD_E1 <= '1';
init_done <= '0';
if(i = 11) then
init_state<=six;
i <= 0;
else
init_state<=five;
i <= i + 1;
end if;
when six =>
31. 30
IMPLEMENTATIONOFCORDICALGORITHMONFPGA|16-Nov-15
LCD_E1 <= '0';
init_done <= '0';
if(i = 2000) then
init_state<=seven;
i <= 0;
else
init_state<=six;
i <= i + 1;
end if;
when seven =>
SF_D1 <= "0010";
LCD_E1 <= '1';
init_done <= '0';
if(i = 11) then
init_state<=eight;
i <= 0;
else
init_state<=seven;
i <= i + 1;
end if;
when eight =>
LCD_E1 <= '0';
init_done <= '0';
if(i = 2000) then
init_state<=done;
i <= 0;
else
init_state<=eight;
i <= i + 1;
end if;
when done =>