1. Field-Programmable Gate Arrays
as tracking devices
Roberto Rodríguez Osorio
Javier Díaz Bruguera
Group of Computer Architecture
Dept. of Electronics and Computer Science
University of Santiago de Compostela
3. Application-specific computing machines
Microprocessor Application-Specific
Integrated Circuit
Code Data
memory memory
M p
t p M
PC IR Register
file Control
logic MAC
Control
logic Functional Control
units Datapath
section
Control
Datapath
section
Performance: 10 cycles @ 3GHz Performance: 1 cycle @ 1GHz
Dissipated power: ~35 W Dissipated power: ~mW
3
6. FPGA technology basics – Computing
a b carry carry
input a b s output
0 0 0 0 0
c out FA c in 0 0 1 1 0
0 1 0 1 0
s 0 1 1 0 1
1 0 0 1 0
c in
1 0 1 0 1
a s
1 1 0 0 1
b
1 1 1 1 1
a
b
a
c out
cin
b
c in
6
7. FPGA technology basics – Do not compute
Logic blocks
a
SRAM
b Memory s
8x1-bit
cin
SRAM
Memory cout
8x1-bit
7
11. FPGA technology basics – Interconnect + memory
FPGA fabric consists of a huge number of simple memory
elements connected by means of a reconfigurable network
Design software must break every computing tasks into
1-bit size operation with no more than 4, 5 or 6 variables
Operations are spatially distributed according to proximity
criteria
Routing may be troublesome
Long paths are slow
Routing though logic blocks increase area
11
12. Hard cores in FPGAs
Memory blocks ████████████████████
Multipliers ████████████████████
DSP blocks ████████████████████
Microprocessors ████████████████████
Floating point units? ████████████████████
████████████████████
████████████████████
████████████████████
████████████████████
████████████████████
12
13. Memory blocks
Hundreds or thousands of small memory blocks
Dual-port blocks
18 K-bit each for Xilinx
Flexible configurations
Many short words or a few large word
Independent access
Huge aggregated bandwidth
13
14. Multipliers and DSP blocks
As FPGAs were becoming larger, some people tried to
implement DSP algorithms on them
However: Multipliers take too much area
Therefore: Hardwired multipliers were introduced
DSP algorithms are often based on
multiply & add
multiply & accumulate
DSP blocks in modern FPGAs implement hardwired:
multipliy, multiply & add, multiply & accumulate
optional addition before multiplying
three-input add
1 large, 2 medium or 4 small operations on the same hardware
shifting, comparisons, bit-wise operations,…
Up to 2000 DSP blocks in current FPGAs for massive
parallelism
14
15. Microprocessors
Xilinx:
IBMs Power PC processors
Virtex II Pro
Virtex-4 FX
Virtex-5 FX
Microblaze soft processors
Altera:
ARM RISC processors
Nios soft processor
15
16. Floating point units
Not implemented so far
• Suggested to help to accelerate scientific computing
• For engineering, fixed point arithmetic is usually enough
Would it happen?
☺ It happened with multipliers, transceivers, DSP blocks, …
GPUs have already a strong position in this field
16
17. Performance
Compared to an ASIC
10 times slower, larger and power hungry
Compared to a microprocessor
Fast, depending on:
Potential parallelism
Required bandwidth
Small and simple, even standalone
Reduced power consumption (< 1W), they may run on batteries
17
18. Design effort
Several scenarios:
Pure VHDL or Verilog coding
Higher flexibility, efficiency and performance
Long design time
Costly debugging
Use macros combined with VHDL or Verilog
Libraries of IP blocks easy the design process
It is not guaranteed that the required functionalities can be found
High level languages (DSP logic (Matlab), Impulse-C,
Handel-C,…)
Efficient and simple implementation for simple algorithms
Lack of expressiveness for complex algorithms
18
21. In the context of this applications
Device choice
• Logic bounded
• Standard logic
• Multipliers
• IO bounded
Parallel acquisition
• Switching memory blocks for acquisition and computation
High computing speed
• Via pipelining
Results storage
• Internal or external memory
Power consumption
Configuration
21