12. Ratio = FPGA/ASIC, 種々のベンチマークの相乗平均
12
FPGAの中身についてもう少し
“論理回路を作り込める”仕組みのオーバヘッド
Logic Only Logic & DSP Logic &
Memory
Logic, Memory
& DSP
Area Ratio 40 28 37 21
Critical Path
Delay(Fastest Grade) 3.2 3.4 2.3 2.1
Critical Path
Delay(Slowest Grade) 4.3 4.5 3.1 2.8
Dynamic Power
Consumption 12 12 9.2 9.0
[7] I. Kuon and J. Rose, “Measuring the gap between fpgas and asics,”
Proceedings of the 2006 ACM/SIGDA 14th Inter- national Symposium on Field Programmable Gate Arrays, pp.21–30,
FPGA ’06, ACM, New York, NY, USA, 2006.
16. 16
“アプリケーション”は様々
EE Times - Google's Project ARA Smartphones to Use Lattice ECP5 FPGAs
http://www.eetimes.com/document.asp?doc_id=1321936
FPGA入門 - どこで使われているか?
http://www.fpga.co.jp/nyumon2.html
18. 18
アプリケーション研究事例
@FCCM2014,FPL2014,FPGA2014
- Fast, Power-Efficient Biophotonic Simulations for Cancer Treatment Using FPGAs
- SMCGen: Generating Reconfigurable Design for Sequential Monte Carlo Applications
- FPGA Gaussian Random Number Generators with Guaranteed Statistical Accuracy
- FPGA Implementation of EM Algorithm for 3D CT Reconstruction
- A Scalable Multi-engine Xpress9 Compressor with Asynchronous Data Transfer
- FPGA Accelerated Online Boosting for Multi-target Tracking
- High-Throughput Implementation of a Million-Point Sparse Fourier Transform
- Power-efficient Re-gridding Architecture for Accelerating Non-uniform Fast Fourier Transform
- Radix-4 and Radix-8 Booth Encoded Interleaved Modular Multipliers Over General Fp
- Dataflow Acceleration of Krylov Subspace Sparse Banded Problems
- A Highly-efficient and Green Data Flow Engine for Solving Euler Atmospheric Equations
- An Efficient FPGA-based Hardware Framework for Natural Feature Extraction and Related Computer Vision Tasks
- An Efficient Sparse Conjugate Gradient Solver Using a Benes Permutation Network
- Efficient 3D Triangulation in Hardware for Dense Structure-from-Motion in Low-Speed Automotive Scenarios
- FPGA-based Biophysically-Meaningful Modeling of Olivocerebellar Neurons
- Square-Rich Fixed Point Polynomial Evaluation on FPGAs
- Hardware Acceleration of Database Operations
- A Scalable Sparse Matrix-Vector Multiplication Kernel For Energy-Efficient Sparse-BLAS On FPGAs
- Binary Stochastic Implementation of Digital Logic
- Accelerating Parameter Estimation for Multivariate Self-Exciting Point Processes
- Energy-Efficient Multiplier-Less Discrete Convolver through Probabilistic Domain Transformation
- …..
19. 19
アプリケーション研究事例
FPGAドミナントではない会議での事例
- The Click2NetFPGA Toolchain @USENIX ATC2012
- SURF Algorithm in FPGA: a Novel Architecture for High Demanding Industrial Applications @DATE2012
- Achieving 10Gbps line-rate key-value stores with FPGAs @USENIX HotCloud 2013
- FPGA Acceleration for the Frequent Item Problem @ICDE 2010
- An FPGA-based pattern classifier using data compression @IEEEI 2010
- A reconfigurable fabric for accelerating large-scale datacenter services @ISCA 2014
- LINQits: big data on little clients @ISCA 2013
- Parallel Real-time Garbage Collection of Multiple Heaps in Reconfigurable Hardware @ISMM2014
- Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition @J. of Sig. Process. Syst.
- Willow: A User-Programmable SSD @USENIX OSDI2014
- Hardware Enforcement of Application Security Policies Using Tagged Memory @USENIX OSDI2008
- Histograms as a Side Effect of Data Movement for Big Data @SIGMOD2014
- Flexible Query Processor on FPGAs @VLDB2013
- Complex Event Detection at Wire Speed with FPGAs @VLDB2010
- Data Processing on FPGAs @VLDB2009
- …..
などなどなど沢山
22. 22
アプリケーション研究事例
FPGAドミナントではない会議での事例
- The Click2NetFPGA Toolchain @USENIX ATC2012
- SURF Algorithm in FPGA: a Novel Architecture for High Demanding Industrial Applications @DATE2012
- Achieving 10Gbps line-rate key-value stores with FPGAs @USENIX HotCloud 2013
- FPGA Acceleration for the Frequent Item Problem @ICDE 2010
- An FPGA-based pattern classifier using data compression @IEEEI 2010
- A reconfigurable fabric for accelerating large-scale datacenter services @ISCA 2014
- LINQits: big data on little clients @ISCA 2013
- Parallel Real-time Garbage Collection of Multiple Heaps in Reconfigurable Hardware @ISMM2014
- Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition @J. of Sig. Process. Syst.
- Willow: A User-Programmable SSD @USENIX OSDI2014
- Hardware Enforcement of Application Security Policies Using Tagged Memory @USENIX OSDI2008
- Histograms as a Side Effect of Data Movement for Big Data @SIGMOD2014
- Flexible Query Processor on FPGAs @VLDB2013
- Complex Event Detection at Wire Speed with FPGAs @VLDB2010
- Data Processing on FPGAs @VLDB2009
- …..
などなどなど沢山
27. 27
Memcached@Xilinx, ETH Zurich
10G if
Network stack Memcached
DRAM
Network
adapter
FPGA
x86 DRAM
motherboard
Hash table Value store
✔ Memcached部分はデータフローアーキテクチャ
✔ レイテンシ = 481Cycles@156MHz
https://www.usenix.org/sites/default/files/conference/protected-files/blott_hotcloud13_slides.pdf
28. 28
Memcached@Xilinx, ETH Zurich
https://www.usenix.org/sites/default/files/conference/protected-files/blott_hotcloud13_slides.pdf
49. データはどうせ移動させる
移動途中で副次的に処理できる
49
なぜFPGAでデータ処理をするか?
10G if
Network stack Memcached
DRAM
Network
adapter
FPGA
x86 DRAM
motherboard
Hash table Value store
図は https://www.usenix.org/sites/default/files/conference/protected-files/blott_hotcloud13_slides.pdf より