SlideShare ist ein Scribd-Unternehmen logo
1 von 40
DSP Architectures




Rensselaer at Hartford
ECSE 6620 - Fall 2001
     Lecture 16
         Jason M. Stripinis
    jasonstripinis@engineer.com
Basic Processor Structure




• Here we see a very simple processor structure - such as
  might be found in a small 8-bit microprocessor.
12 DEC 01           ECSE 6620 - Jason Stripinis2(jasonstripinis@eng
Basic Processor Functions
• ALU
  – Arithmetic Logic Unit - this circuit takes two operands on the
    inputs (labeled A and B) and produces a result on the output
    (labeled Y).
  – The operations will usually include, as a minimum:
      •   add, subtract
      •   and, or, not
      •   shift right, shift left
      •   ALUs in more complex processors will execute many more
          instructions.




12 DEC 01              ECSE 6620 - Jason Stripinis3(jasonstripinis@eng
Basic Processor Functions
• Register File
   – A set of storage locations (registers) for storing temporary results.
     Early machines had just one register (accumulator). Modern RISC
     processors will have at least 32 registers.
• Instruction Register
   – The instruction currently being executed by the processor is stored
     here.
• Control Unit
   – The control unit decodes the instruction in the instruction register
     and sets signals which control the operation of most other units of
     the processor. For example, the operation code (opcode) in the
     instruction will be used to determine the settings of control signals
     for the ALU which determine which operation (+,-,^,v,~,shift,etc)
     it performs.
12 DEC 01              ECSE 6620 - Jason Stripinis4(jasonstripinis@eng
Basic Processor Functions
• Clock
   – The vast majority of processors are synchronous, that is, they use a
     clock signal to determine when to capture the next data word and
     perform an operation on it. In a globally synchronous processor, a
     common clock needs to be routed (connected) to every unit in the
     processor.
• Program counter
   – The program counter holds the memory address of the next
     instruction to be executed. It is updated every instruction cycle to
     point to the next instruction in the program. Branch instructions
     change the program counter by other than a simple increment.




12 DEC 01              ECSE 6620 - Jason Stripinis5(jasonstripinis@eng
Basic Processor Functions
• Memory Address Register
   – This register is loaded with the address of the next data word to be
     fetched from or stored into main memory.
• Address Bus
   – Transfers addresses to memory and memory-mapped peripherals.
     It is driven by the processor acting as a bus master.
• Data Bus
   – Carries data to and from the processor, memory and peripherals. It
     will be driven by the data source, i.e. processor, memory, etc.
• Multiplexed Bus
   – To limit device pin counts and bus complexity, some processors
     MUX address and data onto the same bus, with an adverse affect
     on performance.
12 DEC 01              ECSE 6620 - Jason Stripinis6(jasonstripinis@eng
DSP Implementations
• DSP Algorithm
   – Series of mathematical operations that are applied to process a
     sequence of digital signals sampled from the real (analog) world
• Application examples
   –   Filtering
   –   FFT
   –   Noise cancellation
   –   Spectral Processing




12 DEC 01              ECSE 6620 - Jason Stripinis7(jasonstripinis@eng
Why is special architecture good for
    digital signal processing?
• DSPs are tailored to run DSP algorithms efficiently.
• Special functions to handle DSP algorithm demands:
   – Unique data access patterns
        • Streams of data requiring high bandwidth
        • Low data repetition but high code repetition
   –   Math operation focus (“number cruncher”)
   –   Real-time constraints
   –   Power and size constraints
   –   Cost requirement
   –   Attention to numeric effects (limited fixed point error)




12 DEC 01               ECSE 6620 - Jason Stripinis8(jasonstripinis@eng
DSP Functional Characteristics
• Typically require a few specific operations
• Consider a FIR Filter :




       This requires:
          –additions & multiplications
          –delays
          –array handling


12 DEC 01          ECSE 6620 - Jason Stripinis9(jasonstripinis@eng
DSP Typical Operations
• Additions & Multiplications
   – fetch two operands
   – perform the addition or multiplication (or both)
   – store the result


• Delays
   – store the result for later use


• Array Handling
   – fetch values from consecutive memory locations
   – copy data from register to register


12 DEC 01              ECSE 6620 - Jason Stripinis10
                                                   (jasonstripinis@eng
DSP Typical Operations
• To perform these basic operations most DSPs:
   – have a parallel multiply and add
   – have multiple memory accesses (to fetch two operands and store the
     result)
   – have sufficient registers to hold data temporarily
   – efficient address generation for array handling
   – special features such as delays or circular addressing




12 DEC 01              ECSE 6620 - Jason Stripinis11
                                                   (jasonstripinis@eng
DSP Arithmetic Logic Unit
• Most DSP operations require additions and multiplications
  together. So DSP processors usually have parallel
  hardware adders and multipliers which can be used with a
  single instruction:




12 DEC 01          ECSE 6620 - Jason Stripinis12
                                               (jasonstripinis@eng
Register Structure
• Delays require that intermediate values be held for later
  use.
• For example, when keeping a running total - the total can
  be kept within the processor to avoid wasting repeated
  reads from and writes to memory.
• For this reason DSP processors have lots of registers which
  can be used to hold intermediate values.
• Registers may be fixed-point or floating-point.




12 DEC 01           ECSE 6620 - Jason Stripinis13
                                                (jasonstripinis@eng
Memory Addressing
• Array handling requires that data can be fetched efficiently
  from consecutive memory locations.
• For this reason DSP processors have address registers
  which are used to hold addresses and can be used to
  generate the next needed address efficiently.
• Usually, the next needed address can be generated during
  the data fetch or store operation, and with no overhead.




12 DEC 01           ECSE 6620 - Jason Stripinis14
                                                (jasonstripinis@eng
Memory Addressing
• Example DSP address generation operations:

Instruction Name                    Description
                                    read the data pointed to by the address in
*rP        register indirect
                                    register rP
                                    having read the data, postincrement the address
*rP++      postincrement
                                    pointer to point to the next value in the array
                                    having read the data, postdecrement the address
*rP--      postdecrement            pointer to point to the previous value in the
                                    array
                                    having read the data, postincrement the address
*rP++rI    register postincrement   pointer by the amount held in register rI to point
                                    to rI values further down the array
                                    having read the data, postincrement the address
*rP++rIr   bit reversed             pointer to point to the next value in the array, as
                                    if the address bits were in bit reversed order


12 DEC 01                  ECSE 6620 - Jason Stripinis15
                                                       (jasonstripinis@eng
Memory Architectures for DSP
• For arithmetic the DSP needs to fetch two operands in a
  single instruction cycle.
• Since we also need to store the result and to read the
  instruction itself more than two memory accesses per
  instruction cycle are needed.
• Even the simplest DSP operation - an addition involving
  two operands and a store of the result to memory - requires
  four memory accesses (three to fetch the two operands and
  the instruction, plus a fourth to write the result)




12 DEC 01          ECSE 6620 - Jason Stripinis16
                                               (jasonstripinis@eng
Memory Architectures for DSP
• DSP processors usually support multiple memory accesses
  in the same instruction cycle.
• It is not possible to access two different memory addresses
  simultaneously over a single memory bus.
• There are two common methods to achieve multiple
  memory accesses per instruction cycle:
            • Harvard architecture
            • modified von Neumann architecture




12 DEC 01          ECSE 6620 - Jason Stripinis17
                                               (jasonstripinis@eng
Memory Architectures for DSP
                (Harvard Architecture)
• The Harvard architecture has two separate physical
  memory buses, allowing two simultaneous memory
  accesses.
• The true Harvard architecture dedicates one bus for
  fetching instructions, with the other available to fetch
  operands.
• This is inadequate for DSP operations, which usually
  involve at least two operands. So DSP Harvard
  architectures usually permit the 'program' bus to be used
  also for access of operands.



12 DEC 01           ECSE 6620 - Jason Stripinis18
                                                (jasonstripinis@eng
Memory Architectures for DSP
                (Harvard Architecture)
• Note that it is often necessary to fetch three things - the
  instruction plus two operands - and the Harvard
  architecture is inadequate to support this.
• So DSP Harvard architectures often also include a cache
  memory which can be used to store instructions which will
  be reused, leaving both Harvard buses free for fetching
  operands.
• The Harvard architecture plus cache - is sometimes called
  an extended Harvard architecture or Super Harvard
  ARChitecture (SHARC).


12 DEC 01           ECSE 6620 - Jason Stripinis19
                                                (jasonstripinis@eng
Memory Architectures for DSP
                (Harvard Architecture)
• The Harvard architecture requires two memory buses. This
  makes it expensive to bring off the chip - for example a
  DSP using 32 bit words and with a 32 bit address space
  requires at least 64 pins for each memory bus - a total of
  128 pins if the Harvard architecture is brought off the chip.
  This results in very large chips, which are difficult to
  design into a circuit.




12 DEC 01           ECSE 6620 - Jason Stripinis20
                                                (jasonstripinis@eng
Memory Architectures for DSP
            (von Neumann Architecture)
• The von Neumann architecture uses only a single memory
  bus. This is relatively cheap, requiring less pins that the
  Harvard architecture, and simple to use because the
  programmer can place instructions or data anywhere
  throughout the available memory.
• But it does not permit multiple memory accesses.
• The modified von Neumann architecture allows multiple
  memory accesses per instruction cycle by running the
  memory clock faster than the instruction cycle.




12 DEC 01          ECSE 6620 - Jason Stripinis21
                                               (jasonstripinis@eng
Memory Architectures for DSP
            (von Neumann Architecture)
• Each instruction cycle is divided into multiple 'machine
  states' and a memory access can be made in each machine
  state, permitting a multiple memory accesses per
  instruction cycle.
• The modified von Neumann architecture permits all the
  memory accesses needed to support addition or
  multiplication: fetch of the instruction; fetch of the two
  operands; and storage of the result.




12 DEC 01           ECSE 6620 - Jason Stripinis22
                                                (jasonstripinis@eng
Why use a special architecture for
     digital signal processing?
                         The Answers
      Unique data access patterns      Bit reversed addressing (FFT)
      Streams of data requiring high   Multiple access memory
      bandwidth                        architecture
      Low data repetition but high     Eliminate data cache (save $$)
      code repetition
      Math operation focus             MAC instruction
                                       Vector processing unit
      Real-time constraints            Zero-overhead loops
      Power and size constraints       Limited addition function
                                       units (unlike GPP)
      Cost requirement                 On-board peripherals (SOC)
      Attention to numeric effects     ALU with 16-bit operands and
      (limited fixed point error)      32-bit result



12 DEC 01             ECSE 6620 - Jason Stripinis23
                                                  (jasonstripinis@eng
DSP Generations
• 1st Generation (1979-1982)
   – Transition from experimental signal processors
• 2nd Generation (1985-1986)
   – Move from co-processor to stand-alone processor
• 3rd Generation (1987-1989)
   – Major hardware improvements to speed
• 4th Generation (1990-1996)
   – More on-chip integration (ADC, DAC, memory, multi-processor)
• 5th Generation (1997-)




12 DEC 01            ECSE 6620 - Jason Stripinis24
                                                 (jasonstripinis@eng
DSP Generations
              1st Generation (1979-1982)
• Primarily targeted at digital filtering
• Specialized co-processor for signal processing
• NMOS (n-Channel Metal Oxide Semi) fabrication

•   16-bit fixed point
•   fast multiplier (and adder)
•   Harvard architecture
•   Specialized Instruction set




12 DEC 01            ECSE 6620 - Jason Stripinis25
                                                 (jasonstripinis@eng
DSP Generations
               1st Generation (1979-1982)
• Example = Texas Instruments TMS32010
   –   16-bit fixed point
   –   Harvard architecture
   –   two Address registers
   –   one A register (adder)
   –   one P register (multiplier)
   –   one T register (data shift on delay line)
   –   No zero-overhead loop
   –   Specialized Instruction set
   –   MAC Time 400 ns (<100 ns today)
   –   50 ms per 1024-FFT



12 DEC 01               ECSE 6620 - Jason Stripinis26
                                                    (jasonstripinis@eng
DSP Generations
            1st Generation (1979-1982)
• Example = Texas Instruments TMS32010




12 DEC 01       ECSE 6620 - Jason Stripinis27
                                            (jasonstripinis@eng
DSP Generations
            2nd Generation (1985-1986)
• Move from co-processor to stand-alone processor
• CMOS (Complementary Metal Oxide Semi) fabrication
• Double the speed of first generation

•   Advances in memory architecture (more internal RAM)
•   better pipelining of functional units
•   address generators (bit-reversing)
•   Zero-overhead loop HW
•   Limited floating point in SW



12 DEC 01          ECSE 6620 - Jason Stripinis28
                                               (jasonstripinis@eng
DSP Generations
              2nd Generation (1985-1986)
• Example = Texas Instruments TMS32020 (1985)
   –   16-bit fixed point
   –   Harvard architecture
   –   Improved TMS32010
   –   RPTS allows pipelined instruction performed in single cycle
   –   Specialized Instruction set
   –   MAC Time 200 ns
   –   10 ms per 1024-FFT




12 DEC 01              ECSE 6620 - Jason Stripinis29
                                                   (jasonstripinis@eng
DSP Generations
              3rd Generation (1987-1989)
• Increased floating point support
   – 32-bit floating point hardware DSPs released
   – Floating point emulation on fixed point processors
   – IEEE754 support
• Hardware enhancements (large speed increase)
   –   dense CMOS fabrication
   –   on chip DMA
   –   instruction caches
   –   increased clock rates (first cores above 10 MHz)
• Increased complexity of SW



12 DEC 01              ECSE 6620 - Jason Stripinis30
                                                   (jasonstripinis@eng
DSP Generations
              3rd Generation (1987-1989)
• Example = Motorola DSP56001 (1988)
   –   24-bit data, instructions
   –   24-bit fixed point
   –   3 memory spaces (P, X, Y)
   –   parallel moves
   –   circular addressing
   –   MAC Time 75 ns (21 ns today)
   –   ~3 ms per 1024-FFT
• Other Examples:
   – AT&T DSP16A
   – Analog Devices ADSP-2100
   – TI TMS320C50

12 DEC 01            ECSE 6620 - Jason Stripinis31
                                                 (jasonstripinis@eng
DSP Generations
              4th Generation (1990-1996)
• Hardware integration
   –   ADC
   –   DAC
   –   more memory
   –   multiple DSPs on one chip
• Decreasing power consumption
   – 5.0 VDC → 3.3 VDC → 3.0 VDC → 2.7 VDC
• GPPs start to get DSP functions
   – SIMD
   – Leads to Intel introducing MMX (MultiMedia eXtensions) for x86




12 DEC 01             ECSE 6620 - Jason Stripinis32
                                                  (jasonstripinis@eng
DSP Generations
               4th Generation (1990-1996)
• Example = TI TMS320C541 (1995)
   –   Enhanced architecture
   –   Low voltage (3.3 VDC)
   –   More on-chip memory
   –   Application specific functional units
   –   MAC Time 20 ns (10 ns today)
   –   ~1 ms per 1024-FFT


• Example = TI TMS320C80
   – multiple processors per chip




12 DEC 01              ECSE 6620 - Jason Stripinis33
                                                   (jasonstripinis@eng
The GPP Option
• High-performance general-purpose processors for PCs and
  workstations are increasingly suitable for some DSP
  applications.
• E.g., Intel MMX Pentium, Motorola/IBM PowerPC 604e
• These processors achieve excellent to outstanding floating
  and/or fixed-point DSP performance via:
   –   Very high clock rates (200-500 MHz)
   –   Superscalar architectures
   –   Single-cycle multiplication and arithmetic operations
   –   Good memory bandwidth
   –   Branch prediction
   –   In some cases, single-instruction, multiple-data (SIMD) ops

12 DEC 01              ECSE 6620 - Jason Stripinis34
                                                   (jasonstripinis@eng
DSP Generations
                  5th Generation (1997-)
• Not the classic DSP architectures
   – SIMD (Single Instruction Multiple Data stream) instructions
   – VLIW (Very Long Instruction Words) allows RISC processing
       • High parallelism
       • Increased clock speeds
       • No longer application specific functional units (no MAC FU)
• Low voltage (2.5 VDC or less, even 1.2 VDC cores)
• MAC Time 3 ns (but can be power hungry)
• GPPs start to get DSP functions
   – Intel introduces MMX (MultiMedia eXtensions) for x86 in 1997
• Increased integration
   – MCU and DSP cores on same chip
   – MCU functions/ports added to DSPs
12 DEC 01             ECSE 6620 - Jason Stripinis35
                                                  (jasonstripinis@eng
DSP Generations
                   5th Generation (1997-)
• SIMD (Single Instruction Multiple Data) instructions
   –   Enhance throughput by allowing parallelism
   –   Requires multiple functional units and wider buses
   –   May support multiple data widths (different functional groups)
   –   Example = DSP16000




               WAS                              SIMD


12 DEC 01              ECSE 6620 - Jason Stripinis36
                                                   (jasonstripinis@eng
DSP Generations
                  5th Generation (1997-)
• VLIW (Very Long Instruction
  Words)
   – Instruction Level Parallelism (ILP) can
     be a major performance gain
       • Superscalar implementation requires
         larger die and more power to
         dynamically pipeline instructions
   – VLIW can be used to statically pipeline
     instructions at compile time (or even by
     hand!)
   – VLIW instruction words have fixed
     "slots" for instructions that map to the
     functional units available.


12 DEC 01             ECSE 6620 - Jason Stripinis37
                                                  (jasonstripinis@eng
DSP Generations
                 5th Generation (1997-)
• VLIW Advantages
   – huge theoretical pay off
       • less than 1 ns per MAC!
       • Less than 75 ns per 1024-FFT


• VLIW Drawbacks
   – Can be very difficult to program and debug
   – High power consumption if VLIW is not filled
   – Code size dramatically increases requiring more program memory




12 DEC 01             ECSE 6620 - Jason Stripinis38
                                                  (jasonstripinis@eng
DSP Generations
            5th Generation (1997-)
• VLIW Example = TI TMS320C6201




                                             32-bit Functional Units
                                             Lx = ALU
                                             Sx = Branching
                                                 and shifting
                                             Mx = Multiplier
                                             Dx = Data Store




12 DEC 01      ECSE 6620 - Jason Stripinis39
                                           (jasonstripinis@eng
DSP Generational Development
• DSP processor performance has increased by a factor of
  about 400x over the past 20 years
                   400
                   350
                   300
                   250
                   200
                   150                                 MAC (ns)
                   100
                    50
                     0
                         1st   2nd   3rd   4th   5th
                         Gen   Gen   Gen   Gen   Gen

• DSP architectures will be increasingly specialized for
  applications, especially communications applications
• General-purpose processors will become viable for many
  DSP applications

12 DEC 01          ECSE 6620 - Jason Stripinis40
                                               (jasonstripinis@eng

Weitere ähnliche Inhalte

Was ist angesagt?

DSP Memory Architecture
DSP Memory ArchitectureDSP Memory Architecture
DSP Memory ArchitecturePriyanka Anni
 
Arm programmer's model
Arm programmer's modelArm programmer's model
Arm programmer's modelv Kalairajan
 
Programmable Logic Devices
Programmable Logic DevicesProgrammable Logic Devices
Programmable Logic DevicesMadhusudan Donga
 
Design challenges in embedded systems
Design challenges in embedded systemsDesign challenges in embedded systems
Design challenges in embedded systemsmahalakshmimalini
 
FPGA TECHNOLOGY AND FAMILIES
FPGA TECHNOLOGY AND FAMILIESFPGA TECHNOLOGY AND FAMILIES
FPGA TECHNOLOGY AND FAMILIESrevathilakshmi2
 
Microcontrollers 8051 MSP430 notes
Microcontrollers 8051 MSP430 notesMicrocontrollers 8051 MSP430 notes
Microcontrollers 8051 MSP430 notesNiteesh Shanbog
 
Digital signal processing
Digital signal processingDigital signal processing
Digital signal processingVedavyas PBurli
 
Addressing modes of 8086
Addressing modes of 8086Addressing modes of 8086
Addressing modes of 8086saurav kumar
 
Unit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationUnit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationPavithra S
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsAmr E. Mohamed
 
Introduction to Digital Signal Processing
Introduction to Digital Signal ProcessingIntroduction to Digital Signal Processing
Introduction to Digital Signal Processingop205
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier TransformAbhishek Choksi
 
Characteristics of Embedded Systems
Characteristics of Embedded SystemsCharacteristics of Embedded Systems
Characteristics of Embedded SystemsShreyaBhoje
 
Introduction to Embedded System I: Chapter 2 (5th portion)
Introduction to Embedded System I: Chapter 2 (5th portion)Introduction to Embedded System I: Chapter 2 (5th portion)
Introduction to Embedded System I: Chapter 2 (5th portion)Moe Moe Myint
 
Unit II Study of Onchip Peripherals
Unit II Study of Onchip PeripheralsUnit II Study of Onchip Peripherals
Unit II Study of Onchip PeripheralsDr. Pankaj Zope
 
Cmos design
Cmos designCmos design
Cmos designMahi
 

Was ist angesagt? (20)

DSP Memory Architecture
DSP Memory ArchitectureDSP Memory Architecture
DSP Memory Architecture
 
Arm programmer's model
Arm programmer's modelArm programmer's model
Arm programmer's model
 
Programmable Logic Devices
Programmable Logic DevicesProgrammable Logic Devices
Programmable Logic Devices
 
Design challenges in embedded systems
Design challenges in embedded systemsDesign challenges in embedded systems
Design challenges in embedded systems
 
FPGA
FPGAFPGA
FPGA
 
FPGA TECHNOLOGY AND FAMILIES
FPGA TECHNOLOGY AND FAMILIESFPGA TECHNOLOGY AND FAMILIES
FPGA TECHNOLOGY AND FAMILIES
 
ARM Processors
ARM ProcessorsARM Processors
ARM Processors
 
Microcontrollers 8051 MSP430 notes
Microcontrollers 8051 MSP430 notesMicrocontrollers 8051 MSP430 notes
Microcontrollers 8051 MSP430 notes
 
Digital signal processing
Digital signal processingDigital signal processing
Digital signal processing
 
Addressing modes of 8086
Addressing modes of 8086Addressing modes of 8086
Addressing modes of 8086
 
Unit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationUnit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisation
 
Pin diagram 8085
Pin diagram 8085 Pin diagram 8085
Pin diagram 8085
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
 
Introduction to Digital Signal Processing
Introduction to Digital Signal ProcessingIntroduction to Digital Signal Processing
Introduction to Digital Signal Processing
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier Transform
 
Characteristics of Embedded Systems
Characteristics of Embedded SystemsCharacteristics of Embedded Systems
Characteristics of Embedded Systems
 
Embedded System Basics
Embedded System BasicsEmbedded System Basics
Embedded System Basics
 
Introduction to Embedded System I: Chapter 2 (5th portion)
Introduction to Embedded System I: Chapter 2 (5th portion)Introduction to Embedded System I: Chapter 2 (5th portion)
Introduction to Embedded System I: Chapter 2 (5th portion)
 
Unit II Study of Onchip Peripherals
Unit II Study of Onchip PeripheralsUnit II Study of Onchip Peripherals
Unit II Study of Onchip Peripherals
 
Cmos design
Cmos designCmos design
Cmos design
 

Andere mochten auch

Digital Signal Processors - DSP's
Digital Signal Processors - DSP'sDigital Signal Processors - DSP's
Digital Signal Processors - DSP'sHicham Berkouk
 
Digital Signal Processor ( DSP ) [French]
Digital Signal Processor ( DSP )  [French]Digital Signal Processor ( DSP )  [French]
Digital Signal Processor ( DSP ) [French]Assia Mounir
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGSnehal Hedau
 
Real time DSP algorithms for Mobile communication
Real time DSP algorithms for Mobile communicationReal time DSP algorithms for Mobile communication
Real time DSP algorithms for Mobile communicationEmbedded Plus Trichy
 
07 - MartinièreMonplaisir - Lyon - F2000 - 2012
07 - MartinièreMonplaisir - Lyon - F2000 - 201207 - MartinièreMonplaisir - Lyon - F2000 - 2012
07 - MartinièreMonplaisir - Lyon - F2000 - 2012Cédric Frayssinet
 
Lecture: Digital Signal Processing Batch 2009
Lecture: Digital Signal Processing Batch 2009Lecture: Digital Signal Processing Batch 2009
Lecture: Digital Signal Processing Batch 2009ubaidis
 
Digital Signal Processing
Digital Signal Processing Digital Signal Processing
Digital Signal Processing Sri Rakesh
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processingsandhya jois
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image ProcessingSahil Biswas
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017NVIDIA
 

Andere mochten auch (18)

Digital Signal Processors - DSP's
Digital Signal Processors - DSP'sDigital Signal Processors - DSP's
Digital Signal Processors - DSP's
 
Digital Signal Processor ( DSP ) [French]
Digital Signal Processor ( DSP )  [French]Digital Signal Processor ( DSP )  [French]
Digital Signal Processor ( DSP ) [French]
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSING
 
Dsp ppt
Dsp pptDsp ppt
Dsp ppt
 
Real time DSP algorithms for Mobile communication
Real time DSP algorithms for Mobile communicationReal time DSP algorithms for Mobile communication
Real time DSP algorithms for Mobile communication
 
Dsp algorithms 02
Dsp algorithms 02Dsp algorithms 02
Dsp algorithms 02
 
Chap1 dsp
Chap1 dspChap1 dsp
Chap1 dsp
 
Dsp book
Dsp bookDsp book
Dsp book
 
Coursdsp tdi
Coursdsp tdiCoursdsp tdi
Coursdsp tdi
 
07 - MartinièreMonplaisir - Lyon - F2000 - 2012
07 - MartinièreMonplaisir - Lyon - F2000 - 201207 - MartinièreMonplaisir - Lyon - F2000 - 2012
07 - MartinièreMonplaisir - Lyon - F2000 - 2012
 
CV
CVCV
CV
 
Chap2 dsp
Chap2 dspChap2 dsp
Chap2 dsp
 
Lecture: Digital Signal Processing Batch 2009
Lecture: Digital Signal Processing Batch 2009Lecture: Digital Signal Processing Batch 2009
Lecture: Digital Signal Processing Batch 2009
 
Digital Signal Processing
Digital Signal Processing Digital Signal Processing
Digital Signal Processing
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processing
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Ähnlich wie DSP architecture

11-risc-cisc-and-isa-w.pptx
11-risc-cisc-and-isa-w.pptx11-risc-cisc-and-isa-w.pptx
11-risc-cisc-and-isa-w.pptxSuma Prakash
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memoryNico Ludwig
 
W8_1: Intro to UoS Educational Processor
W8_1: Intro to UoS Educational ProcessorW8_1: Intro to UoS Educational Processor
W8_1: Intro to UoS Educational ProcessorDaniel Roggen
 
Computer organization
Computer organizationComputer organization
Computer organizationishapadhy
 
Introduction to debugging linux applications
Introduction to debugging linux applicationsIntroduction to debugging linux applications
Introduction to debugging linux applicationscommiebstrd
 
Digital Signal processor ADSP 21XX family
Digital Signal processor ADSP 21XX familyDigital Signal processor ADSP 21XX family
Digital Signal processor ADSP 21XX familySaloni Rane
 
Unit-1_Digital Computers, number systemCOA[1].pptx
Unit-1_Digital Computers, number systemCOA[1].pptxUnit-1_Digital Computers, number systemCOA[1].pptx
Unit-1_Digital Computers, number systemCOA[1].pptxVanshJain322212
 
Chap2 - ADSP 21K Manual - Processor and Software Overview
Chap2 - ADSP 21K Manual - Processor and Software OverviewChap2 - ADSP 21K Manual - Processor and Software Overview
Chap2 - ADSP 21K Manual - Processor and Software OverviewSethCopeland
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
8 bit Microprocessor with Single Vectored Interrupt
8 bit Microprocessor with Single Vectored Interrupt8 bit Microprocessor with Single Vectored Interrupt
8 bit Microprocessor with Single Vectored InterruptHardik Manocha
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
digital logic circuits, digital component
digital logic circuits, digital componentdigital logic circuits, digital component
digital logic circuits, digital componentRai University
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh PPlusOrMinusZero
 
B.sc cs-ii -u-1.2 digital logic circuits, digital component
B.sc cs-ii -u-1.2 digital logic circuits, digital componentB.sc cs-ii -u-1.2 digital logic circuits, digital component
B.sc cs-ii -u-1.2 digital logic circuits, digital componentRai University
 

Ähnlich wie DSP architecture (20)

11-risc-cisc-and-isa-w.pptx
11-risc-cisc-and-isa-w.pptx11-risc-cisc-and-isa-w.pptx
11-risc-cisc-and-isa-w.pptx
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
 
W8_1: Intro to UoS Educational Processor
W8_1: Intro to UoS Educational ProcessorW8_1: Intro to UoS Educational Processor
W8_1: Intro to UoS Educational Processor
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptx
 
Dsp lab seminar
Dsp lab seminarDsp lab seminar
Dsp lab seminar
 
Dsp ajal
Dsp  ajalDsp  ajal
Dsp ajal
 
Computer organization
Computer organizationComputer organization
Computer organization
 
Introduction to debugging linux applications
Introduction to debugging linux applicationsIntroduction to debugging linux applications
Introduction to debugging linux applications
 
Lecture02 types
Lecture02 typesLecture02 types
Lecture02 types
 
Digital Signal processor ADSP 21XX family
Digital Signal processor ADSP 21XX familyDigital Signal processor ADSP 21XX family
Digital Signal processor ADSP 21XX family
 
Unit-1_Digital Computers, number systemCOA[1].pptx
Unit-1_Digital Computers, number systemCOA[1].pptxUnit-1_Digital Computers, number systemCOA[1].pptx
Unit-1_Digital Computers, number systemCOA[1].pptx
 
Chap2 - ADSP 21K Manual - Processor and Software Overview
Chap2 - ADSP 21K Manual - Processor and Software OverviewChap2 - ADSP 21K Manual - Processor and Software Overview
Chap2 - ADSP 21K Manual - Processor and Software Overview
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
8 bit Microprocessor with Single Vectored Interrupt
8 bit Microprocessor with Single Vectored Interrupt8 bit Microprocessor with Single Vectored Interrupt
8 bit Microprocessor with Single Vectored Interrupt
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
digital logic circuits, digital component
digital logic circuits, digital componentdigital logic circuits, digital component
digital logic circuits, digital component
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh P
 
B.sc cs-ii -u-1.2 digital logic circuits, digital component
B.sc cs-ii -u-1.2 digital logic circuits, digital componentB.sc cs-ii -u-1.2 digital logic circuits, digital component
B.sc cs-ii -u-1.2 digital logic circuits, digital component
 
Unit4.addressing modes 54 xx
Unit4.addressing modes 54 xxUnit4.addressing modes 54 xx
Unit4.addressing modes 54 xx
 

Kürzlich hochgeladen

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Kürzlich hochgeladen (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

DSP architecture

  • 1. DSP Architectures Rensselaer at Hartford ECSE 6620 - Fall 2001 Lecture 16 Jason M. Stripinis jasonstripinis@engineer.com
  • 2. Basic Processor Structure • Here we see a very simple processor structure - such as might be found in a small 8-bit microprocessor. 12 DEC 01 ECSE 6620 - Jason Stripinis2(jasonstripinis@eng
  • 3. Basic Processor Functions • ALU – Arithmetic Logic Unit - this circuit takes two operands on the inputs (labeled A and B) and produces a result on the output (labeled Y). – The operations will usually include, as a minimum: • add, subtract • and, or, not • shift right, shift left • ALUs in more complex processors will execute many more instructions. 12 DEC 01 ECSE 6620 - Jason Stripinis3(jasonstripinis@eng
  • 4. Basic Processor Functions • Register File – A set of storage locations (registers) for storing temporary results. Early machines had just one register (accumulator). Modern RISC processors will have at least 32 registers. • Instruction Register – The instruction currently being executed by the processor is stored here. • Control Unit – The control unit decodes the instruction in the instruction register and sets signals which control the operation of most other units of the processor. For example, the operation code (opcode) in the instruction will be used to determine the settings of control signals for the ALU which determine which operation (+,-,^,v,~,shift,etc) it performs. 12 DEC 01 ECSE 6620 - Jason Stripinis4(jasonstripinis@eng
  • 5. Basic Processor Functions • Clock – The vast majority of processors are synchronous, that is, they use a clock signal to determine when to capture the next data word and perform an operation on it. In a globally synchronous processor, a common clock needs to be routed (connected) to every unit in the processor. • Program counter – The program counter holds the memory address of the next instruction to be executed. It is updated every instruction cycle to point to the next instruction in the program. Branch instructions change the program counter by other than a simple increment. 12 DEC 01 ECSE 6620 - Jason Stripinis5(jasonstripinis@eng
  • 6. Basic Processor Functions • Memory Address Register – This register is loaded with the address of the next data word to be fetched from or stored into main memory. • Address Bus – Transfers addresses to memory and memory-mapped peripherals. It is driven by the processor acting as a bus master. • Data Bus – Carries data to and from the processor, memory and peripherals. It will be driven by the data source, i.e. processor, memory, etc. • Multiplexed Bus – To limit device pin counts and bus complexity, some processors MUX address and data onto the same bus, with an adverse affect on performance. 12 DEC 01 ECSE 6620 - Jason Stripinis6(jasonstripinis@eng
  • 7. DSP Implementations • DSP Algorithm – Series of mathematical operations that are applied to process a sequence of digital signals sampled from the real (analog) world • Application examples – Filtering – FFT – Noise cancellation – Spectral Processing 12 DEC 01 ECSE 6620 - Jason Stripinis7(jasonstripinis@eng
  • 8. Why is special architecture good for digital signal processing? • DSPs are tailored to run DSP algorithms efficiently. • Special functions to handle DSP algorithm demands: – Unique data access patterns • Streams of data requiring high bandwidth • Low data repetition but high code repetition – Math operation focus (“number cruncher”) – Real-time constraints – Power and size constraints – Cost requirement – Attention to numeric effects (limited fixed point error) 12 DEC 01 ECSE 6620 - Jason Stripinis8(jasonstripinis@eng
  • 9. DSP Functional Characteristics • Typically require a few specific operations • Consider a FIR Filter : This requires: –additions & multiplications –delays –array handling 12 DEC 01 ECSE 6620 - Jason Stripinis9(jasonstripinis@eng
  • 10. DSP Typical Operations • Additions & Multiplications – fetch two operands – perform the addition or multiplication (or both) – store the result • Delays – store the result for later use • Array Handling – fetch values from consecutive memory locations – copy data from register to register 12 DEC 01 ECSE 6620 - Jason Stripinis10 (jasonstripinis@eng
  • 11. DSP Typical Operations • To perform these basic operations most DSPs: – have a parallel multiply and add – have multiple memory accesses (to fetch two operands and store the result) – have sufficient registers to hold data temporarily – efficient address generation for array handling – special features such as delays or circular addressing 12 DEC 01 ECSE 6620 - Jason Stripinis11 (jasonstripinis@eng
  • 12. DSP Arithmetic Logic Unit • Most DSP operations require additions and multiplications together. So DSP processors usually have parallel hardware adders and multipliers which can be used with a single instruction: 12 DEC 01 ECSE 6620 - Jason Stripinis12 (jasonstripinis@eng
  • 13. Register Structure • Delays require that intermediate values be held for later use. • For example, when keeping a running total - the total can be kept within the processor to avoid wasting repeated reads from and writes to memory. • For this reason DSP processors have lots of registers which can be used to hold intermediate values. • Registers may be fixed-point or floating-point. 12 DEC 01 ECSE 6620 - Jason Stripinis13 (jasonstripinis@eng
  • 14. Memory Addressing • Array handling requires that data can be fetched efficiently from consecutive memory locations. • For this reason DSP processors have address registers which are used to hold addresses and can be used to generate the next needed address efficiently. • Usually, the next needed address can be generated during the data fetch or store operation, and with no overhead. 12 DEC 01 ECSE 6620 - Jason Stripinis14 (jasonstripinis@eng
  • 15. Memory Addressing • Example DSP address generation operations: Instruction Name Description read the data pointed to by the address in *rP register indirect register rP having read the data, postincrement the address *rP++ postincrement pointer to point to the next value in the array having read the data, postdecrement the address *rP-- postdecrement pointer to point to the previous value in the array having read the data, postincrement the address *rP++rI register postincrement pointer by the amount held in register rI to point to rI values further down the array having read the data, postincrement the address *rP++rIr bit reversed pointer to point to the next value in the array, as if the address bits were in bit reversed order 12 DEC 01 ECSE 6620 - Jason Stripinis15 (jasonstripinis@eng
  • 16. Memory Architectures for DSP • For arithmetic the DSP needs to fetch two operands in a single instruction cycle. • Since we also need to store the result and to read the instruction itself more than two memory accesses per instruction cycle are needed. • Even the simplest DSP operation - an addition involving two operands and a store of the result to memory - requires four memory accesses (three to fetch the two operands and the instruction, plus a fourth to write the result) 12 DEC 01 ECSE 6620 - Jason Stripinis16 (jasonstripinis@eng
  • 17. Memory Architectures for DSP • DSP processors usually support multiple memory accesses in the same instruction cycle. • It is not possible to access two different memory addresses simultaneously over a single memory bus. • There are two common methods to achieve multiple memory accesses per instruction cycle: • Harvard architecture • modified von Neumann architecture 12 DEC 01 ECSE 6620 - Jason Stripinis17 (jasonstripinis@eng
  • 18. Memory Architectures for DSP (Harvard Architecture) • The Harvard architecture has two separate physical memory buses, allowing two simultaneous memory accesses. • The true Harvard architecture dedicates one bus for fetching instructions, with the other available to fetch operands. • This is inadequate for DSP operations, which usually involve at least two operands. So DSP Harvard architectures usually permit the 'program' bus to be used also for access of operands. 12 DEC 01 ECSE 6620 - Jason Stripinis18 (jasonstripinis@eng
  • 19. Memory Architectures for DSP (Harvard Architecture) • Note that it is often necessary to fetch three things - the instruction plus two operands - and the Harvard architecture is inadequate to support this. • So DSP Harvard architectures often also include a cache memory which can be used to store instructions which will be reused, leaving both Harvard buses free for fetching operands. • The Harvard architecture plus cache - is sometimes called an extended Harvard architecture or Super Harvard ARChitecture (SHARC). 12 DEC 01 ECSE 6620 - Jason Stripinis19 (jasonstripinis@eng
  • 20. Memory Architectures for DSP (Harvard Architecture) • The Harvard architecture requires two memory buses. This makes it expensive to bring off the chip - for example a DSP using 32 bit words and with a 32 bit address space requires at least 64 pins for each memory bus - a total of 128 pins if the Harvard architecture is brought off the chip. This results in very large chips, which are difficult to design into a circuit. 12 DEC 01 ECSE 6620 - Jason Stripinis20 (jasonstripinis@eng
  • 21. Memory Architectures for DSP (von Neumann Architecture) • The von Neumann architecture uses only a single memory bus. This is relatively cheap, requiring less pins that the Harvard architecture, and simple to use because the programmer can place instructions or data anywhere throughout the available memory. • But it does not permit multiple memory accesses. • The modified von Neumann architecture allows multiple memory accesses per instruction cycle by running the memory clock faster than the instruction cycle. 12 DEC 01 ECSE 6620 - Jason Stripinis21 (jasonstripinis@eng
  • 22. Memory Architectures for DSP (von Neumann Architecture) • Each instruction cycle is divided into multiple 'machine states' and a memory access can be made in each machine state, permitting a multiple memory accesses per instruction cycle. • The modified von Neumann architecture permits all the memory accesses needed to support addition or multiplication: fetch of the instruction; fetch of the two operands; and storage of the result. 12 DEC 01 ECSE 6620 - Jason Stripinis22 (jasonstripinis@eng
  • 23. Why use a special architecture for digital signal processing? The Answers Unique data access patterns Bit reversed addressing (FFT) Streams of data requiring high Multiple access memory bandwidth architecture Low data repetition but high Eliminate data cache (save $$) code repetition Math operation focus MAC instruction Vector processing unit Real-time constraints Zero-overhead loops Power and size constraints Limited addition function units (unlike GPP) Cost requirement On-board peripherals (SOC) Attention to numeric effects ALU with 16-bit operands and (limited fixed point error) 32-bit result 12 DEC 01 ECSE 6620 - Jason Stripinis23 (jasonstripinis@eng
  • 24. DSP Generations • 1st Generation (1979-1982) – Transition from experimental signal processors • 2nd Generation (1985-1986) – Move from co-processor to stand-alone processor • 3rd Generation (1987-1989) – Major hardware improvements to speed • 4th Generation (1990-1996) – More on-chip integration (ADC, DAC, memory, multi-processor) • 5th Generation (1997-) 12 DEC 01 ECSE 6620 - Jason Stripinis24 (jasonstripinis@eng
  • 25. DSP Generations 1st Generation (1979-1982) • Primarily targeted at digital filtering • Specialized co-processor for signal processing • NMOS (n-Channel Metal Oxide Semi) fabrication • 16-bit fixed point • fast multiplier (and adder) • Harvard architecture • Specialized Instruction set 12 DEC 01 ECSE 6620 - Jason Stripinis25 (jasonstripinis@eng
  • 26. DSP Generations 1st Generation (1979-1982) • Example = Texas Instruments TMS32010 – 16-bit fixed point – Harvard architecture – two Address registers – one A register (adder) – one P register (multiplier) – one T register (data shift on delay line) – No zero-overhead loop – Specialized Instruction set – MAC Time 400 ns (<100 ns today) – 50 ms per 1024-FFT 12 DEC 01 ECSE 6620 - Jason Stripinis26 (jasonstripinis@eng
  • 27. DSP Generations 1st Generation (1979-1982) • Example = Texas Instruments TMS32010 12 DEC 01 ECSE 6620 - Jason Stripinis27 (jasonstripinis@eng
  • 28. DSP Generations 2nd Generation (1985-1986) • Move from co-processor to stand-alone processor • CMOS (Complementary Metal Oxide Semi) fabrication • Double the speed of first generation • Advances in memory architecture (more internal RAM) • better pipelining of functional units • address generators (bit-reversing) • Zero-overhead loop HW • Limited floating point in SW 12 DEC 01 ECSE 6620 - Jason Stripinis28 (jasonstripinis@eng
  • 29. DSP Generations 2nd Generation (1985-1986) • Example = Texas Instruments TMS32020 (1985) – 16-bit fixed point – Harvard architecture – Improved TMS32010 – RPTS allows pipelined instruction performed in single cycle – Specialized Instruction set – MAC Time 200 ns – 10 ms per 1024-FFT 12 DEC 01 ECSE 6620 - Jason Stripinis29 (jasonstripinis@eng
  • 30. DSP Generations 3rd Generation (1987-1989) • Increased floating point support – 32-bit floating point hardware DSPs released – Floating point emulation on fixed point processors – IEEE754 support • Hardware enhancements (large speed increase) – dense CMOS fabrication – on chip DMA – instruction caches – increased clock rates (first cores above 10 MHz) • Increased complexity of SW 12 DEC 01 ECSE 6620 - Jason Stripinis30 (jasonstripinis@eng
  • 31. DSP Generations 3rd Generation (1987-1989) • Example = Motorola DSP56001 (1988) – 24-bit data, instructions – 24-bit fixed point – 3 memory spaces (P, X, Y) – parallel moves – circular addressing – MAC Time 75 ns (21 ns today) – ~3 ms per 1024-FFT • Other Examples: – AT&T DSP16A – Analog Devices ADSP-2100 – TI TMS320C50 12 DEC 01 ECSE 6620 - Jason Stripinis31 (jasonstripinis@eng
  • 32. DSP Generations 4th Generation (1990-1996) • Hardware integration – ADC – DAC – more memory – multiple DSPs on one chip • Decreasing power consumption – 5.0 VDC → 3.3 VDC → 3.0 VDC → 2.7 VDC • GPPs start to get DSP functions – SIMD – Leads to Intel introducing MMX (MultiMedia eXtensions) for x86 12 DEC 01 ECSE 6620 - Jason Stripinis32 (jasonstripinis@eng
  • 33. DSP Generations 4th Generation (1990-1996) • Example = TI TMS320C541 (1995) – Enhanced architecture – Low voltage (3.3 VDC) – More on-chip memory – Application specific functional units – MAC Time 20 ns (10 ns today) – ~1 ms per 1024-FFT • Example = TI TMS320C80 – multiple processors per chip 12 DEC 01 ECSE 6620 - Jason Stripinis33 (jasonstripinis@eng
  • 34. The GPP Option • High-performance general-purpose processors for PCs and workstations are increasingly suitable for some DSP applications. • E.g., Intel MMX Pentium, Motorola/IBM PowerPC 604e • These processors achieve excellent to outstanding floating and/or fixed-point DSP performance via: – Very high clock rates (200-500 MHz) – Superscalar architectures – Single-cycle multiplication and arithmetic operations – Good memory bandwidth – Branch prediction – In some cases, single-instruction, multiple-data (SIMD) ops 12 DEC 01 ECSE 6620 - Jason Stripinis34 (jasonstripinis@eng
  • 35. DSP Generations 5th Generation (1997-) • Not the classic DSP architectures – SIMD (Single Instruction Multiple Data stream) instructions – VLIW (Very Long Instruction Words) allows RISC processing • High parallelism • Increased clock speeds • No longer application specific functional units (no MAC FU) • Low voltage (2.5 VDC or less, even 1.2 VDC cores) • MAC Time 3 ns (but can be power hungry) • GPPs start to get DSP functions – Intel introduces MMX (MultiMedia eXtensions) for x86 in 1997 • Increased integration – MCU and DSP cores on same chip – MCU functions/ports added to DSPs 12 DEC 01 ECSE 6620 - Jason Stripinis35 (jasonstripinis@eng
  • 36. DSP Generations 5th Generation (1997-) • SIMD (Single Instruction Multiple Data) instructions – Enhance throughput by allowing parallelism – Requires multiple functional units and wider buses – May support multiple data widths (different functional groups) – Example = DSP16000 WAS SIMD 12 DEC 01 ECSE 6620 - Jason Stripinis36 (jasonstripinis@eng
  • 37. DSP Generations 5th Generation (1997-) • VLIW (Very Long Instruction Words) – Instruction Level Parallelism (ILP) can be a major performance gain • Superscalar implementation requires larger die and more power to dynamically pipeline instructions – VLIW can be used to statically pipeline instructions at compile time (or even by hand!) – VLIW instruction words have fixed "slots" for instructions that map to the functional units available. 12 DEC 01 ECSE 6620 - Jason Stripinis37 (jasonstripinis@eng
  • 38. DSP Generations 5th Generation (1997-) • VLIW Advantages – huge theoretical pay off • less than 1 ns per MAC! • Less than 75 ns per 1024-FFT • VLIW Drawbacks – Can be very difficult to program and debug – High power consumption if VLIW is not filled – Code size dramatically increases requiring more program memory 12 DEC 01 ECSE 6620 - Jason Stripinis38 (jasonstripinis@eng
  • 39. DSP Generations 5th Generation (1997-) • VLIW Example = TI TMS320C6201 32-bit Functional Units Lx = ALU Sx = Branching and shifting Mx = Multiplier Dx = Data Store 12 DEC 01 ECSE 6620 - Jason Stripinis39 (jasonstripinis@eng
  • 40. DSP Generational Development • DSP processor performance has increased by a factor of about 400x over the past 20 years 400 350 300 250 200 150 MAC (ns) 100 50 0 1st 2nd 3rd 4th 5th Gen Gen Gen Gen Gen • DSP architectures will be increasingly specialized for applications, especially communications applications • General-purpose processors will become viable for many DSP applications 12 DEC 01 ECSE 6620 - Jason Stripinis40 (jasonstripinis@eng