SlideShare ist ein Scribd-Unternehmen logo
1 von 17
VLSI DESIGN PROJECT
           IN VHDL




PIPELINE STALLING WITH CLOCK
                      KOSURU SAI MALLESWAR
CONTENTS
 1. Objectives
 2. Pipelining – definition
 3. Modules of the project
 4. Coding technique
 5. VHDL files
  i. Register.vhd
  ii. Multiplexer.vhd
  iii. Pipelinestalling.vhd
  iv. Alu.vhd
  v. Pipelinedmultiplier.vhd
 6. User constraints file for FPGA
 7. Simulation waveforms – Performance Analysis
 8. Applications
 9. Conclusions
1. Objectives

    1. To design the “pipeline stalling system” used in the design of computers and other
       digital electronic devices to increase their instruction throughput i.e., to program a
       series of registers to move data from one stage to the next stage based on a
       common clock.
    2. To program an ALU that can fetch the opcode and operands in a pipelined sequence
       and executes the operations.
    3. To program a pipelined multiplier for the ALU, which can perform multiplication
       of 32 bit numbers using “partial multiply,shift and add” algorithm.



                                 2. Pipelining - definition
    Pipelining is an implementation technique where multiple instructions are overlapped
    in execution. The computer pipeline is divided in stages. Each stage completes a part
    of an instruction in parallel. The stages are connected one to the next to form a pipe -
    instructions enter at one end, progress through the stages, and exit at the other end.This
    allows the computer's control circuitry to issue instructions at the processing rate of the
    slowest step, which is much faster than the time needed to perform all steps at once.
    The term pipeline refers to the fact that each step is carrying data at once, and each step
    is connected to the next.

        The scheduling of transfer of data from one stage to next stage can be done with
   the help of a“clock”. Most modern CPUs are driven by a clock. The CPU consists
   internally of logic and register (flip flops). When the clock signal arrives, the flip flops
   take their new value and the logic then requires a period of time to decode the new
   values. Then the next clock pulse arrives and the flip flops again take their new values,
   and so on.

Pipelining does not decrease the time for individual instruction execution. Instead, it
    increases instruction throughput. The throughput of the instruction pipeline is
    determined by how often an instruction exits the pipeline.Because the pipe stages are
    hooked together, all the stages must be ready to proceed at the same time. The
    performance of a pipelined processor may vary widely between different programs.
3. Modules of the project
1. Stalling pipeline architecture with registers accepting data on rising edges:

 Logic diagram:




             Stalling pipeline architecture with registers accepting data on rising edges

2. ALU that can fetch the opcode and operands in a pipelined sequence
3. Pipelined multiplier that can multiply two 32 bit numbers by using partial
   multiply shift and add algorithm:
4.Coding technique


When storage elements accept data on a rising clock, initialize clock to 0 so that a
transition does not occur at time zero. The 3 registers R1, R2 and R3 are in the three stages
of processor namely fetching unit, decoding unit and executing unit. The registers are
pointing to the location from where program code is being read. Stall clock is “OR” of
clock and stall signal.

On first rising edge of stall clock the data in the R1 will be sent to R2; data in R2 will be
sent to R3. On next rising edge, R1 increments and points to the next location; Data in R2
will move to R3.If stall becomes low, R1 updates R2 at each rising edge of the clock and
R2 updates R3 at each rising edge of clock.

When stall becomes high, R1 transfers data to R2 and R1 is updated from memory on
rising edge of clock. But R3 doesn’t receive instructions. It receives zeros from
Multiplexer. This is useful for execution of instructions involving forward jump.

ALU is programmed by making use of the pipelined increment of the pointed memory
locations. The code is stored in the memory such that the contents first location specifies
the operation to be performed followed by the next locations which will contain the
operands. ALU fetches contents of 3 memory locations at a time. The arithmetic or logical
operation will be performed based on the most 16 significant bits of the instruction which
is presented by ir register.

The operands are stored in registers ar, br of ALU temporarily while calculations are
performed. The output of the ALU is given by alu_out.

In pipelined multiplier, the inputs are a and b, which are two unsigned 32 bit numbers. On
each rising edge a and b are multiplied and the output y is updated. Starting from right end,
a is multiplied with least 8 significant bits of b, then b shifts right by 8 digits and again
multiplies a with least 8 significant bits and so on till multiplication is completed. The 4
partial sums are added to produce the output.
5. VHDL files
   1. Register.vhd:

  library IEEE;
  use IEEE.std_logic_1164.all;

  entityregist is
  port(clk : in std_logic;
  clear : in std_logic;
  ip : in std_logic_vector (31 downto 0);
  op : out std_logic_vector (31 downto 0) );
  end entity regist;

  architecturebeh of regist is
  signaltemp:std_logic_vector(31 downto 0);
  begin
  reg: process(clk, clear)
  begin
  if clear='1' then
  temp<= (others=>'0');
          --elsifrising_edge(clk) then
  elsifclk='1' then
  temp<= ip ;
  end if;
  end process reg;
  op<=temp;
  end architecture beh;

 2. Multiplexer.vhd:

    library IEEE;
    use IEEE.std_logic_1164.all;

    entity mux is
    port(in0 : in std_logic_vector (31 downto 0);
         in1 : in std_logic_vector (31 downto 0);
    ctl : in std_logic;
    result : out std_logic_vector (31 downto 0));
    end entity mux;
architecturebeh of mux is
begin
result<= in1 when ctl='1'
else in0 after 1 ns;
end architecture beh;

3. Pipelinestalling.vhd:

   library IEEE;
   use IEEE.std_logic_1164.all;
   use IEEE.std_logic_textio.all;
   use IEEE.std_logic_arith.all;

   entity pipe is
   port(
   reg1,reg2,reg3:inout std_logic_vector(31 downto 0));
   end entity pipe;

   architecture beh of pipe is
   signal clk    : std_logic := '0';        -- master clock
   signal stall : std_logic := '0';         -- stall signal
   signal sclk : std_logic := '0';           -- stall clock
   signal clear : std_logic := '1';          -- one shot clear

   subtype word is std_logic_vector(31 downto 0);
    signal zeros : word := (others=>'0');
    signal R1     : word;
    signal R1_a : word;
    signal R2     : word;
    signal R2_mux : word;
    signal R3     : word;
    signal cnt : word := "00000000000000000000000000000000";

   begin
    clock: process(clk)
            begin
             if clear='1' then
               clear <= '0' after 500 ps;
             end if;
             clk <= not clk after 5 ns;
            end process clock;
cnt <= unsigned(cnt)+unsigned'("00000000000000000000000000000001") after 1 ns
                 when sclk'event and sclk='1';


         stall <= '1' after 1 ns when R2="00000000000000000000000000000010" and
        R3="00000000000000000000000000000001"
                else '0' after 1 ns;

            sclk <= clk or stall after 1 ns;

        -- pipeline stages
          R1_reg: entity work.regist port map(sclk, clear, cnt, R1);

                 R1_a <= R1 or "00000000000000000000000000000000" after 1 ns ; --logic

            R2_reg: entity work.regist port map(sclk, clear, R1_a, R2);

            A2_mux: entity work.mux port map(R2, zeros, stall, R2_mux);

            R3_reg: entity work.regist port map(clk, clear, R2_mux, R3);

            reg1<=R1;
            reg2<=R2;
            reg3<=R3;

            end beh;

    4. ALU.vhd:

library IEEE;

use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
use IEEE.std_logic_arith.all;
use ieee.numeric_std.all;


entity alu is
port (clk        : in std_logic;
ir, ar, br : inout std_logic_vector(31 downto 0);
alu_sig : out std_logic;
alu_out : out std_logic_vector(63 downto 0));
end alu;


architecture beh of alu is
signal alu_st : std_logic;
signal alu_output : std_logic_vector(63 downto 0);
type mem is array (0 to 31) of std_logic_vector(31 downto 0);
signal temp_mem:mem;


constant content:mem :=
 (
 0=>"00000000000000000000000000000001",
1=>"00000000000000000000000000000100",2=>"00000000000000000000000000000110",others=
>"11111111111111111111111111111111");


component pipe is
port(
reg1,reg2,reg3:inout std_logic_vector(31 downto 0));
end component;


begin
reg: pipe port map(reg1=>br,reg2=>ar,reg3=>ir);
clocked_alu: process(clk,ir)
begin
if (rising_edge(clk)) then
alu_output<=(others =>'0');
alu_st<= '1';


case ir(31 downto 16) is
when "0000000000000000" =>
            alu_output<=temp_mem(conv_integer(ar))+temp_mem(conv_integer(br));
when "0000000000000001" =>
alu_output<= temp_mem(conv_integer(ar))*temp_mem(conv_integer(br));
when "0000000000000010" =>
alu_output<= temp_mem(conv_integer(ar))-temp_mem(conv_integer(br));
when "0000000000000011" =>
alu_output<= temp_mem(conv_integer(br))-temp_mem(conv_integer(ar));
when "0000000000000100" =>
alu_output<= temp_mem(conv_integer(ar)) and temp_mem(conv_integer(br));
when "0000000000000101" =>
alu_output<= temp_mem(conv_integer(ar)) or temp_mem(conv_integer(br));
when "0000000000000110" =>
alu_output<= temp_mem(conv_integer(ar)) xor temp_mem(conv_integer(br));
when "0000000000000111" =>
alu_output<= temp_mem(conv_integer(ar)) nand temp_mem(conv_integer(br));
when "0000000000001000" =>
alu_output<= temp_mem(conv_integer(ar)) nor temp_mem(conv_integer(br));
when "0000000000001001" =>
alu_output<= not(temp_mem(conv_integer(ar)));
when others => null;
end case;
    end if;
alu_sig<= alu_st;
alu_out<= alu_output;
end process clocked_alu;
end beh;
5. Pipelinedmultiplier.vhd:

      libraryieee;
      use ieee.std_logic_1164.all;
      useieee.std_logic_arith.all;
      useieee.std_logic_unsigned.all;
      entitypipemult is
      port (
           clk1        : in std_logic ;
      a, b         : in unsigned(31 downto 0) ;
      y            : out unsigned(63 downto 0)
       );
      endpipemult ;


      architecture rtl3 of pipemult is
      signal y1, y2, y3, y4, y5 : unsigned (39 downto 0) ;
      constant z : unsigned (63 downto 0) := (others => '0');
      begin
      process(clk1)


      begin
      if (rising_edge(clk1)) then
      y1 <= a * b( 7 downto 0) ;
      y2 <= a * b(15 downto 8) ;
      y3 <= a * b(23 downto 16) ;
      y4 <= a * b(31 downto 24) ;
          y <= (z(63 downto 40) & y1 ) +
               (z(63 downto 48) & y2 & z( 7 downto 0)) +
               (z(63 downto 56) & y3 & z(15 downto 0)) +
               (y4 & z( 23 downto 0)) ;
      end if;
      end process;
      end rtl3 ;
6. User constraints file for FPGA

  NET "clk" LOC = "AJ15";
  NET "clear" LOC = "AC11"; #SW0

  NET "R3(31)" LOC = "T7";
  NET "R3(30)" LOC = "T8";
  NET "R3(29)" LOC = "U4";
  NET "R3(28)" LOC = "U5";
  NET "R3(27)" LOC = "V2";
  NET "R3(26)" LOC = "W2";
  NET "R3(25)" LOC = "T9";
  NET "R3(24)" LOC = "U9";
  NET "R3(23)" LOC = "V3";
  NET "R3(22)" LOC = "V4";
  NET "R3(21)" LOC = "W1";
  NET "R3(20)" LOC = "Y1";
  NET "R3(19)" LOC = "U7";
  NET "R3(18)" LOC = "U8";
  NET "R3(17)" LOC = "V5";
  NET "R3(16)" LOC = "V6";
  NET "R3(15)" LOC = "W3";
  NET "R3(14)" LOC = "W4";
  NET "R3(13)" LOC = "AA1";
  NET "R3(12)" LOC = "AB1";
  NET "R3(11)" LOC = "W5";
  NET "R3(10)" LOC = "W6";
  NET "R3(9)" LOC = "Y4";
  NET "R3(8)" LOC = "Y5";
  NET "R3(7)" LOC = "AA3";
  NET "R3(6)" LOC = "AA4";
  NET "R3(5)" LOC = "W7";
  NET "R3(4)" LOC = "W8";
  NET "R3(3)" LOC = "AB3"; NET "R3(2)" LOC = "AB4";
  NET "R3(1)" LOC = "AB2";
  NET "R3(0)" LOC = "AC2";
7. Simulation waveforms – Performance Analysis
Output wave forms of pipeline simulation on Modelsim:
   Pipeline simulation:




   Pipelined multiplier simulation
8. Applications
Pipelining for multicore computers:

Using a Pipeline architecture is a common and effective method of increasing
throughput and reducing loop execution times on multicore computers. Pipelining
can be used when data must go through multiple processes that can be broken into
stage. Pipelining is a type of task parallelism that can be implemented for a series
of serial tasks that have data dependencies.




Operating systems design:

In Unix-like computer operating systems (and, to some extent, Windows), a
pipeline is the original software pipeline: a set of processes chained by their
standard streams, so that the output of each process (stdout) feeds directly as input
(stdin) to the next one. Each connection is implemented by an anonymous pipe.
Filter programs are often used in this configuration.
Super scalar pipelining:

Superscalar pipelining involves multiple pipelines in parallel. Internal components
of the processor are replicated so it can launch multiple instructions in some or all
of its pipeline stages. The RISC System/6000 has a forked pipeline with different
paths for floating-point and integer instructions. If there is a mixture of both types
in a program, the processor can keep both forks running simultaneously. Both
types of instructions share two initial stages (Instruction Fetch and Instruction
Dispatch) before they fork. Often, however, superscalar pipelining refers to
multiple copies of all pipeline stages (In terms of laundry, this would mean four
washers, four dryers, and four people who fold clothes). Many of today's machines
attempt to find two to six instructions that it can execute in every pipeline stage. If
some of the instructions are dependent, however, only the first instruction or
instructions are issued.

Pipelining to firmware:

pipelining at the firmware level of machine organization can provide significant
execution time benefits for certain types of instructions. The essential concept
involved with this approach is the pipelining of operations within the hardware
under direct control of the firmware, rather than the pipelining of
microinstructions.

Dynamic pipelining:

Dynamic pipelines have the capability to schedule around stalls. A dynamic
pipeline is divided into three units: the instruction fetch and decode unit, five to ten
execute or functional units, and a commit unit. Each execute unit has reservation
stations, which act as buffers and hold the operands and operations.
9. Conlusions
To summarize, pipelining is a technique that programmers can use to gain a
performance increase in inherently serial applications (on multicore machines).
The CPU industry trend of increasing cores per chip means that strategies such as
pipelining will become essential to application development in the near future.In
order to gain the most performance increase possible from pipelining, individual
stages must be carefully balanced so that no single stage takes a much longer time
to complete than other stages.

The project has been done on pipelined execution unit, pipelined multiplier and
pipelined alu. Handling of structural, data and control hazards can also be
programed to improve design efficiency, since it is very important for physical
implementation of the design. Cache miss handling and exception handling are
also required for improving performance of pipelining for RISC like systems.
Pipeline stalling in vhdl

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Microcontrollers
Introduction to MicrocontrollersIntroduction to Microcontrollers
Introduction to Microcontrollersmike parks
 
05 global mobile satellite
05 global mobile satellite05 global mobile satellite
05 global mobile satelliteHardik Kakadiya
 
Multi-Touch Interaction Overview
Multi-Touch Interaction OverviewMulti-Touch Interaction Overview
Multi-Touch Interaction OverviewTNO
 
Digital noticeboard using vb
Digital noticeboard using vbDigital noticeboard using vb
Digital noticeboard using vbsayalipatil528
 
Touchless Touchscreen Technology
Touchless Touchscreen TechnologyTouchless Touchscreen Technology
Touchless Touchscreen TechnologySaurabh Tripathi
 
Digital Notice Board
Digital Notice BoardDigital Notice Board
Digital Notice BoardRaaki Gadde
 
ATmega32-AVR microcontrollers-Part I
ATmega32-AVR microcontrollers-Part IATmega32-AVR microcontrollers-Part I
ATmega32-AVR microcontrollers-Part IVineethMP2
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architectureZakaria Gomaa
 
5 pen pc technology
5 pen pc technology5 pen pc technology
5 pen pc technologyabshidore
 
Introduction to microbit-2
Introduction to microbit-2Introduction to microbit-2
Introduction to microbit-2jonathan Dietz
 
Iot Bootcamp - abridged - part 1
Iot Bootcamp - abridged - part 1Iot Bootcamp - abridged - part 1
Iot Bootcamp - abridged - part 1Marcus Tarquinio
 
OSCILATORS introduction & ring oscillator
OSCILATORS introduction & ring oscillator OSCILATORS introduction & ring oscillator
OSCILATORS introduction & ring oscillator NandanavanamRajesh1
 
Touchless touchscreen technology
Touchless touchscreen technologyTouchless touchscreen technology
Touchless touchscreen technologyMATHEW JOSEPH
 

Was ist angesagt? (20)

Introduction to Microcontrollers
Introduction to MicrocontrollersIntroduction to Microcontrollers
Introduction to Microcontrollers
 
Introduction to BBC Micro:Bit
Introduction to BBC Micro:BitIntroduction to BBC Micro:Bit
Introduction to BBC Micro:Bit
 
05 global mobile satellite
05 global mobile satellite05 global mobile satellite
05 global mobile satellite
 
Touchless touch screen
Touchless touch screenTouchless touch screen
Touchless touch screen
 
Multi-Touch Interaction Overview
Multi-Touch Interaction OverviewMulti-Touch Interaction Overview
Multi-Touch Interaction Overview
 
9 d55201 testing & testability
9 d55201 testing & testability9 d55201 testing & testability
9 d55201 testing & testability
 
Digital noticeboard using vb
Digital noticeboard using vbDigital noticeboard using vb
Digital noticeboard using vb
 
Touchless Touchscreen Technology
Touchless Touchscreen TechnologyTouchless Touchscreen Technology
Touchless Touchscreen Technology
 
4 bit uni shift reg
4 bit uni shift reg4 bit uni shift reg
4 bit uni shift reg
 
Digital Notice Board
Digital Notice BoardDigital Notice Board
Digital Notice Board
 
ATmega32-AVR microcontrollers-Part I
ATmega32-AVR microcontrollers-Part IATmega32-AVR microcontrollers-Part I
ATmega32-AVR microcontrollers-Part I
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
 
5 pen pc technology
5 pen pc technology5 pen pc technology
5 pen pc technology
 
Introduction to microbit-2
Introduction to microbit-2Introduction to microbit-2
Introduction to microbit-2
 
Virtual keyboard
Virtual keyboard Virtual keyboard
Virtual keyboard
 
Iot Bootcamp - abridged - part 1
Iot Bootcamp - abridged - part 1Iot Bootcamp - abridged - part 1
Iot Bootcamp - abridged - part 1
 
OSCILATORS introduction & ring oscillator
OSCILATORS introduction & ring oscillator OSCILATORS introduction & ring oscillator
OSCILATORS introduction & ring oscillator
 
Touchless Touchscreen
Touchless TouchscreenTouchless Touchscreen
Touchless Touchscreen
 
Touchless touchscreen technology
Touchless touchscreen technologyTouchless touchscreen technology
Touchless touchscreen technology
 
Esd module2
Esd module2Esd module2
Esd module2
 

Ähnlich wie Pipeline stalling in vhdl

Practical file
Practical filePractical file
Practical filerajeevkr35
 
Digital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECEDigital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECERamesh Naik Bhukya
 
Digital system design practical file
Digital system design practical fileDigital system design practical file
Digital system design practical fileArchita Misra
 
Write complete VHDL codes for the following schematic. Solution.pdf
Write complete VHDL codes for the following schematic.  Solution.pdfWrite complete VHDL codes for the following schematic.  Solution.pdf
Write complete VHDL codes for the following schematic. Solution.pdfarjuncollection
 
Laboratory exercise 5
Laboratory exercise 5Laboratory exercise 5
Laboratory exercise 5swapnilswap11
 
Arithmatic logic unit using VHDL (gates)
Arithmatic logic unit using VHDL (gates)Arithmatic logic unit using VHDL (gates)
Arithmatic logic unit using VHDL (gates)TakashiSuoh
 
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...Kevin Mathew
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Digital system design lab manual
Digital system design lab manualDigital system design lab manual
Digital system design lab manualSanthosh Poralu
 
Sequential and combinational alu
Sequential and combinational alu Sequential and combinational alu
Sequential and combinational alu Piyush Rochwani
 
VLSI Design Final Project - 32 bit ALU
VLSI Design Final Project - 32 bit ALUVLSI Design Final Project - 32 bit ALU
VLSI Design Final Project - 32 bit ALUSachin Kumar Asokan
 
ParallelLogicToEventDrivenFirmware_Doin
ParallelLogicToEventDrivenFirmware_DoinParallelLogicToEventDrivenFirmware_Doin
ParallelLogicToEventDrivenFirmware_DoinJonny Doin
 

Ähnlich wie Pipeline stalling in vhdl (20)

Practical file
Practical filePractical file
Practical file
 
Digital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECEDigital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECE
 
Fpga creating counter with internal clock
Fpga   creating counter with internal clockFpga   creating counter with internal clock
Fpga creating counter with internal clock
 
Digital system design practical file
Digital system design practical fileDigital system design practical file
Digital system design practical file
 
Write complete VHDL codes for the following schematic. Solution.pdf
Write complete VHDL codes for the following schematic.  Solution.pdfWrite complete VHDL codes for the following schematic.  Solution.pdf
Write complete VHDL codes for the following schematic. Solution.pdf
 
Session1
Session1Session1
Session1
 
VHDL Programs
VHDL ProgramsVHDL Programs
VHDL Programs
 
Reporte vhdl9
Reporte vhdl9Reporte vhdl9
Reporte vhdl9
 
Laboratory exercise 5
Laboratory exercise 5Laboratory exercise 5
Laboratory exercise 5
 
Uart
UartUart
Uart
 
Arithmatic logic unit using VHDL (gates)
Arithmatic logic unit using VHDL (gates)Arithmatic logic unit using VHDL (gates)
Arithmatic logic unit using VHDL (gates)
 
Assembler4
Assembler4Assembler4
Assembler4
 
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Digital system design lab manual
Digital system design lab manualDigital system design lab manual
Digital system design lab manual
 
W10: Interrupts
W10: InterruptsW10: Interrupts
W10: Interrupts
 
Sequential and combinational alu
Sequential and combinational alu Sequential and combinational alu
Sequential and combinational alu
 
VLSI Design Final Project - 32 bit ALU
VLSI Design Final Project - 32 bit ALUVLSI Design Final Project - 32 bit ALU
VLSI Design Final Project - 32 bit ALU
 
Mpi lab manual eee
Mpi lab manual eeeMpi lab manual eee
Mpi lab manual eee
 
ParallelLogicToEventDrivenFirmware_Doin
ParallelLogicToEventDrivenFirmware_DoinParallelLogicToEventDrivenFirmware_Doin
ParallelLogicToEventDrivenFirmware_Doin
 

Mehr von Sai Malleswar

PANORAMA OF GLOBAL SPACE ECONOMY
PANORAMA OF GLOBAL SPACE ECONOMYPANORAMA OF GLOBAL SPACE ECONOMY
PANORAMA OF GLOBAL SPACE ECONOMYSai Malleswar
 
Digital Speedo Meter Powered by Dynamo
Digital Speedo Meter Powered by DynamoDigital Speedo Meter Powered by Dynamo
Digital Speedo Meter Powered by DynamoSai Malleswar
 
SWOT analysis of TATA motors
SWOT analysis of TATA motorsSWOT analysis of TATA motors
SWOT analysis of TATA motorsSai Malleswar
 
Review of "The anatomy of a large scale hyper textual web search engine"
Review of  "The anatomy of a large scale hyper textual web search engine" Review of  "The anatomy of a large scale hyper textual web search engine"
Review of "The anatomy of a large scale hyper textual web search engine" Sai Malleswar
 
Temp based fan speed control
Temp based fan speed controlTemp based fan speed control
Temp based fan speed controlSai Malleswar
 
Impact of IT on environment
Impact of IT on environmentImpact of IT on environment
Impact of IT on environmentSai Malleswar
 
Adaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalAdaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalSai Malleswar
 
Bidirectional data flow
Bidirectional data flowBidirectional data flow
Bidirectional data flowSai Malleswar
 
Mobile cell phone charger
Mobile cell phone charger Mobile cell phone charger
Mobile cell phone charger Sai Malleswar
 
Manufacturing of liquid propellant tank
Manufacturing of liquid propellant tankManufacturing of liquid propellant tank
Manufacturing of liquid propellant tankSai Malleswar
 
Magneto rhelogical fluids
Magneto rhelogical fluidsMagneto rhelogical fluids
Magneto rhelogical fluidsSai Malleswar
 
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTIONLIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTIONSai Malleswar
 

Mehr von Sai Malleswar (18)

Digital Anemometer
Digital AnemometerDigital Anemometer
Digital Anemometer
 
PANORAMA OF GLOBAL SPACE ECONOMY
PANORAMA OF GLOBAL SPACE ECONOMYPANORAMA OF GLOBAL SPACE ECONOMY
PANORAMA OF GLOBAL SPACE ECONOMY
 
Digital Speedo Meter Powered by Dynamo
Digital Speedo Meter Powered by DynamoDigital Speedo Meter Powered by Dynamo
Digital Speedo Meter Powered by Dynamo
 
Vx works RTOS
Vx works RTOSVx works RTOS
Vx works RTOS
 
SWOT analysis of TATA motors
SWOT analysis of TATA motorsSWOT analysis of TATA motors
SWOT analysis of TATA motors
 
Review of "The anatomy of a large scale hyper textual web search engine"
Review of  "The anatomy of a large scale hyper textual web search engine" Review of  "The anatomy of a large scale hyper textual web search engine"
Review of "The anatomy of a large scale hyper textual web search engine"
 
Sorting manipulator
Sorting manipulatorSorting manipulator
Sorting manipulator
 
Temp based fan speed control
Temp based fan speed controlTemp based fan speed control
Temp based fan speed control
 
Impact of IT on environment
Impact of IT on environmentImpact of IT on environment
Impact of IT on environment
 
Adaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalAdaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signal
 
Digital stop watch
Digital stop watchDigital stop watch
Digital stop watch
 
Bidirectional data flow
Bidirectional data flowBidirectional data flow
Bidirectional data flow
 
Mobile cell phone charger
Mobile cell phone charger Mobile cell phone charger
Mobile cell phone charger
 
Manufacturing of liquid propellant tank
Manufacturing of liquid propellant tankManufacturing of liquid propellant tank
Manufacturing of liquid propellant tank
 
Magneto rhelogical fluids
Magneto rhelogical fluidsMagneto rhelogical fluids
Magneto rhelogical fluids
 
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTIONLIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
 
POLYIMIDES
POLYIMIDESPOLYIMIDES
POLYIMIDES
 
PYROTECHNICS
PYROTECHNICSPYROTECHNICS
PYROTECHNICS
 

Kürzlich hochgeladen

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Kürzlich hochgeladen (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Pipeline stalling in vhdl

  • 1. VLSI DESIGN PROJECT IN VHDL PIPELINE STALLING WITH CLOCK KOSURU SAI MALLESWAR
  • 2. CONTENTS 1. Objectives 2. Pipelining – definition 3. Modules of the project 4. Coding technique 5. VHDL files i. Register.vhd ii. Multiplexer.vhd iii. Pipelinestalling.vhd iv. Alu.vhd v. Pipelinedmultiplier.vhd 6. User constraints file for FPGA 7. Simulation waveforms – Performance Analysis 8. Applications 9. Conclusions
  • 3. 1. Objectives 1. To design the “pipeline stalling system” used in the design of computers and other digital electronic devices to increase their instruction throughput i.e., to program a series of registers to move data from one stage to the next stage based on a common clock. 2. To program an ALU that can fetch the opcode and operands in a pipelined sequence and executes the operations. 3. To program a pipelined multiplier for the ALU, which can perform multiplication of 32 bit numbers using “partial multiply,shift and add” algorithm. 2. Pipelining - definition Pipelining is an implementation technique where multiple instructions are overlapped in execution. The computer pipeline is divided in stages. Each stage completes a part of an instruction in parallel. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end.This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once. The term pipeline refers to the fact that each step is carrying data at once, and each step is connected to the next. The scheduling of transfer of data from one stage to next stage can be done with the help of a“clock”. Most modern CPUs are driven by a clock. The CPU consists internally of logic and register (flip flops). When the clock signal arrives, the flip flops take their new value and the logic then requires a period of time to decode the new values. Then the next clock pulse arrives and the flip flops again take their new values, and so on. Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction throughput. The throughput of the instruction pipeline is determined by how often an instruction exits the pipeline.Because the pipe stages are hooked together, all the stages must be ready to proceed at the same time. The performance of a pipelined processor may vary widely between different programs.
  • 4. 3. Modules of the project 1. Stalling pipeline architecture with registers accepting data on rising edges: Logic diagram: Stalling pipeline architecture with registers accepting data on rising edges 2. ALU that can fetch the opcode and operands in a pipelined sequence 3. Pipelined multiplier that can multiply two 32 bit numbers by using partial multiply shift and add algorithm:
  • 5. 4.Coding technique When storage elements accept data on a rising clock, initialize clock to 0 so that a transition does not occur at time zero. The 3 registers R1, R2 and R3 are in the three stages of processor namely fetching unit, decoding unit and executing unit. The registers are pointing to the location from where program code is being read. Stall clock is “OR” of clock and stall signal. On first rising edge of stall clock the data in the R1 will be sent to R2; data in R2 will be sent to R3. On next rising edge, R1 increments and points to the next location; Data in R2 will move to R3.If stall becomes low, R1 updates R2 at each rising edge of the clock and R2 updates R3 at each rising edge of clock. When stall becomes high, R1 transfers data to R2 and R1 is updated from memory on rising edge of clock. But R3 doesn’t receive instructions. It receives zeros from Multiplexer. This is useful for execution of instructions involving forward jump. ALU is programmed by making use of the pipelined increment of the pointed memory locations. The code is stored in the memory such that the contents first location specifies the operation to be performed followed by the next locations which will contain the operands. ALU fetches contents of 3 memory locations at a time. The arithmetic or logical operation will be performed based on the most 16 significant bits of the instruction which is presented by ir register. The operands are stored in registers ar, br of ALU temporarily while calculations are performed. The output of the ALU is given by alu_out. In pipelined multiplier, the inputs are a and b, which are two unsigned 32 bit numbers. On each rising edge a and b are multiplied and the output y is updated. Starting from right end, a is multiplied with least 8 significant bits of b, then b shifts right by 8 digits and again multiplies a with least 8 significant bits and so on till multiplication is completed. The 4 partial sums are added to produce the output.
  • 6. 5. VHDL files 1. Register.vhd: library IEEE; use IEEE.std_logic_1164.all; entityregist is port(clk : in std_logic; clear : in std_logic; ip : in std_logic_vector (31 downto 0); op : out std_logic_vector (31 downto 0) ); end entity regist; architecturebeh of regist is signaltemp:std_logic_vector(31 downto 0); begin reg: process(clk, clear) begin if clear='1' then temp<= (others=>'0'); --elsifrising_edge(clk) then elsifclk='1' then temp<= ip ; end if; end process reg; op<=temp; end architecture beh; 2. Multiplexer.vhd: library IEEE; use IEEE.std_logic_1164.all; entity mux is port(in0 : in std_logic_vector (31 downto 0); in1 : in std_logic_vector (31 downto 0); ctl : in std_logic; result : out std_logic_vector (31 downto 0)); end entity mux;
  • 7. architecturebeh of mux is begin result<= in1 when ctl='1' else in0 after 1 ns; end architecture beh; 3. Pipelinestalling.vhd: library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_textio.all; use IEEE.std_logic_arith.all; entity pipe is port( reg1,reg2,reg3:inout std_logic_vector(31 downto 0)); end entity pipe; architecture beh of pipe is signal clk : std_logic := '0'; -- master clock signal stall : std_logic := '0'; -- stall signal signal sclk : std_logic := '0'; -- stall clock signal clear : std_logic := '1'; -- one shot clear subtype word is std_logic_vector(31 downto 0); signal zeros : word := (others=>'0'); signal R1 : word; signal R1_a : word; signal R2 : word; signal R2_mux : word; signal R3 : word; signal cnt : word := "00000000000000000000000000000000"; begin clock: process(clk) begin if clear='1' then clear <= '0' after 500 ps; end if; clk <= not clk after 5 ns; end process clock;
  • 8. cnt <= unsigned(cnt)+unsigned'("00000000000000000000000000000001") after 1 ns when sclk'event and sclk='1'; stall <= '1' after 1 ns when R2="00000000000000000000000000000010" and R3="00000000000000000000000000000001" else '0' after 1 ns; sclk <= clk or stall after 1 ns; -- pipeline stages R1_reg: entity work.regist port map(sclk, clear, cnt, R1); R1_a <= R1 or "00000000000000000000000000000000" after 1 ns ; --logic R2_reg: entity work.regist port map(sclk, clear, R1_a, R2); A2_mux: entity work.mux port map(R2, zeros, stall, R2_mux); R3_reg: entity work.regist port map(clk, clear, R2_mux, R3); reg1<=R1; reg2<=R2; reg3<=R3; end beh; 4. ALU.vhd: library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_unsigned.all; use IEEE.std_logic_arith.all; use ieee.numeric_std.all; entity alu is port (clk : in std_logic; ir, ar, br : inout std_logic_vector(31 downto 0); alu_sig : out std_logic;
  • 9. alu_out : out std_logic_vector(63 downto 0)); end alu; architecture beh of alu is signal alu_st : std_logic; signal alu_output : std_logic_vector(63 downto 0); type mem is array (0 to 31) of std_logic_vector(31 downto 0); signal temp_mem:mem; constant content:mem := ( 0=>"00000000000000000000000000000001", 1=>"00000000000000000000000000000100",2=>"00000000000000000000000000000110",others= >"11111111111111111111111111111111"); component pipe is port( reg1,reg2,reg3:inout std_logic_vector(31 downto 0)); end component; begin reg: pipe port map(reg1=>br,reg2=>ar,reg3=>ir); clocked_alu: process(clk,ir) begin if (rising_edge(clk)) then alu_output<=(others =>'0'); alu_st<= '1'; case ir(31 downto 16) is
  • 10. when "0000000000000000" => alu_output<=temp_mem(conv_integer(ar))+temp_mem(conv_integer(br)); when "0000000000000001" => alu_output<= temp_mem(conv_integer(ar))*temp_mem(conv_integer(br)); when "0000000000000010" => alu_output<= temp_mem(conv_integer(ar))-temp_mem(conv_integer(br)); when "0000000000000011" => alu_output<= temp_mem(conv_integer(br))-temp_mem(conv_integer(ar)); when "0000000000000100" => alu_output<= temp_mem(conv_integer(ar)) and temp_mem(conv_integer(br)); when "0000000000000101" => alu_output<= temp_mem(conv_integer(ar)) or temp_mem(conv_integer(br)); when "0000000000000110" => alu_output<= temp_mem(conv_integer(ar)) xor temp_mem(conv_integer(br)); when "0000000000000111" => alu_output<= temp_mem(conv_integer(ar)) nand temp_mem(conv_integer(br)); when "0000000000001000" => alu_output<= temp_mem(conv_integer(ar)) nor temp_mem(conv_integer(br)); when "0000000000001001" => alu_output<= not(temp_mem(conv_integer(ar))); when others => null; end case; end if; alu_sig<= alu_st; alu_out<= alu_output; end process clocked_alu; end beh;
  • 11. 5. Pipelinedmultiplier.vhd: libraryieee; use ieee.std_logic_1164.all; useieee.std_logic_arith.all; useieee.std_logic_unsigned.all; entitypipemult is port ( clk1 : in std_logic ; a, b : in unsigned(31 downto 0) ; y : out unsigned(63 downto 0) ); endpipemult ; architecture rtl3 of pipemult is signal y1, y2, y3, y4, y5 : unsigned (39 downto 0) ; constant z : unsigned (63 downto 0) := (others => '0'); begin process(clk1) begin if (rising_edge(clk1)) then y1 <= a * b( 7 downto 0) ; y2 <= a * b(15 downto 8) ; y3 <= a * b(23 downto 16) ; y4 <= a * b(31 downto 24) ; y <= (z(63 downto 40) & y1 ) + (z(63 downto 48) & y2 & z( 7 downto 0)) + (z(63 downto 56) & y3 & z(15 downto 0)) + (y4 & z( 23 downto 0)) ; end if; end process; end rtl3 ;
  • 12. 6. User constraints file for FPGA NET "clk" LOC = "AJ15"; NET "clear" LOC = "AC11"; #SW0 NET "R3(31)" LOC = "T7"; NET "R3(30)" LOC = "T8"; NET "R3(29)" LOC = "U4"; NET "R3(28)" LOC = "U5"; NET "R3(27)" LOC = "V2"; NET "R3(26)" LOC = "W2"; NET "R3(25)" LOC = "T9"; NET "R3(24)" LOC = "U9"; NET "R3(23)" LOC = "V3"; NET "R3(22)" LOC = "V4"; NET "R3(21)" LOC = "W1"; NET "R3(20)" LOC = "Y1"; NET "R3(19)" LOC = "U7"; NET "R3(18)" LOC = "U8"; NET "R3(17)" LOC = "V5"; NET "R3(16)" LOC = "V6"; NET "R3(15)" LOC = "W3"; NET "R3(14)" LOC = "W4"; NET "R3(13)" LOC = "AA1"; NET "R3(12)" LOC = "AB1"; NET "R3(11)" LOC = "W5"; NET "R3(10)" LOC = "W6"; NET "R3(9)" LOC = "Y4"; NET "R3(8)" LOC = "Y5"; NET "R3(7)" LOC = "AA3"; NET "R3(6)" LOC = "AA4"; NET "R3(5)" LOC = "W7"; NET "R3(4)" LOC = "W8"; NET "R3(3)" LOC = "AB3"; NET "R3(2)" LOC = "AB4"; NET "R3(1)" LOC = "AB2"; NET "R3(0)" LOC = "AC2";
  • 13. 7. Simulation waveforms – Performance Analysis Output wave forms of pipeline simulation on Modelsim:  Pipeline simulation:  Pipelined multiplier simulation
  • 14. 8. Applications Pipelining for multicore computers: Using a Pipeline architecture is a common and effective method of increasing throughput and reducing loop execution times on multicore computers. Pipelining can be used when data must go through multiple processes that can be broken into stage. Pipelining is a type of task parallelism that can be implemented for a series of serial tasks that have data dependencies. Operating systems design: In Unix-like computer operating systems (and, to some extent, Windows), a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one. Each connection is implemented by an anonymous pipe. Filter programs are often used in this configuration.
  • 15. Super scalar pipelining: Superscalar pipelining involves multiple pipelines in parallel. Internal components of the processor are replicated so it can launch multiple instructions in some or all of its pipeline stages. The RISC System/6000 has a forked pipeline with different paths for floating-point and integer instructions. If there is a mixture of both types in a program, the processor can keep both forks running simultaneously. Both types of instructions share two initial stages (Instruction Fetch and Instruction Dispatch) before they fork. Often, however, superscalar pipelining refers to multiple copies of all pipeline stages (In terms of laundry, this would mean four washers, four dryers, and four people who fold clothes). Many of today's machines attempt to find two to six instructions that it can execute in every pipeline stage. If some of the instructions are dependent, however, only the first instruction or instructions are issued. Pipelining to firmware: pipelining at the firmware level of machine organization can provide significant execution time benefits for certain types of instructions. The essential concept involved with this approach is the pipelining of operations within the hardware under direct control of the firmware, rather than the pipelining of microinstructions. Dynamic pipelining: Dynamic pipelines have the capability to schedule around stalls. A dynamic pipeline is divided into three units: the instruction fetch and decode unit, five to ten execute or functional units, and a commit unit. Each execute unit has reservation stations, which act as buffers and hold the operands and operations.
  • 16. 9. Conlusions To summarize, pipelining is a technique that programmers can use to gain a performance increase in inherently serial applications (on multicore machines). The CPU industry trend of increasing cores per chip means that strategies such as pipelining will become essential to application development in the near future.In order to gain the most performance increase possible from pipelining, individual stages must be carefully balanced so that no single stage takes a much longer time to complete than other stages. The project has been done on pipelined execution unit, pipelined multiplier and pipelined alu. Handling of structural, data and control hazards can also be programed to improve design efficiency, since it is very important for physical implementation of the design. Cache miss handling and exception handling are also required for improving performance of pipelining for RISC like systems.