10. ASIC vs. FPGA
n ASIC (Application Specific Integrated Circuit)
l それぞれのアプリケーションに特化した専用回路を設計
• 専用パイプライン・高い周波数で高い性能
n FPGA (Field Programmable Gate Array)
l どのアプリケーションも数種類のFPGAで実現:少量生産もOK
l 製品リリース後の回路構成の改変が可能
The number of units
Cost
ASIC
FPGA
FPGA is cheaper ASIC is cheaper
2015-03-11 Shinya T-Y, NAIST 10
16. How to Develop a Software?
Writing a software
in programming languages
Preprocess
Compile
Assemble
Link
CompilerFlow
Execution on a CPU
int main(){�
int a = 1 + 2;�
printf(“Hello %dn”, a);�
return 0;�
}�
add $t0, $t1, $t2�
li $v0, 1�
syscall�
ELF01ABF00F1...�
Executable Binary
2015-03-11 Shinya T-Y, NAIST 16
17. How to Develop a (FPGA) Hardware?
Writing a hardware design in HDL
(Hardware Description Language)
Synthesis
Technology Mapping
Place and Route
Bitstream Generation
EDAFlow
Configuration of the bitstream
to an FPGA
module top�
(input CLK, RST, �
output reg [7:0] LED);�
always @(posedge CLK) begin�
LED <= LED + 1;�
end�
endmodule�
1A0C021E...�
Original HW on an FPGA
Bitstream
2015-03-11 Shinya T-Y, NAIST 17
32. PyCoRAM [Takamaeda+,CARL’13]:
CoRAM for Modern EDKs
n CoRAMのメモリ抽象化を今時のEDKで使いたい
l 標準的なインターコネクト(AXI4/Avalon)に繋ぎたい
l そうすれば他の普通のIPコアとも簡単に共存できそう
Standard On-chip Interconnect
CoRAM
Abstraction
Accelerator logic
Standard IP-core
Device-dependent Interfaces
CPU core
Portable application
design with CoRAM Cooperation with standard IP-cores
2015-03-11 Shinya T-Y, NAIST 32
41. CNN and DNN
n CNN: Convolutional Neural Network
l 畳み込みニューラルネットワーク
• 畳み込み演算・プーリング(選択)を多層に積む
From “DaDianNao: A Machine-Learning Supercomputer (MICRO’14)”
2015-03-11 Shinya T-Y, NAIST 41
42. Deep Learning on FPGAs
n データフローアーキテクチャの採用
l ニューロン・シナプスの値が乗算・加算パイプラインを流れる
n ニューロンのネットワークをHW実装したわけではない
l DNNモデルを高速化し何かを予測したいだけ
Zhang+, Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA’15
2015-03-11 Shinya T-Y, NAIST 42
43. FPGA-based Machine Learning System
n Microsoft Bing Search Engine (Catapult)
l A commercial FPGA-based web search engine
l Machine-learning (DNN) algorithms as hardware pipeline
Putnam+, A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, ISCA'14
2015-03-11 Shinya T-Y, NAIST 43
44. DaDianNao: A Machine-Learning Supercomputer
n CNN/DNNのためのプログラマブルアクセラレータ
n IEEE/ACM MICRO’14ベストペーパー
l µアーキテクチャのトップカンファレンスでベストペーパー
We show that, on a subset of the largest known neural network layers,
it is possible to achieve a speedup of 450.65x over a GPU, and reduce
the energy by 150.31x on average for a 64-chip system.
2015-03-11 Shinya T-Y, NAIST 44
45. Chip Layout of DaDianNao
2015-03-11 Shinya T-Y, NAIST 45