SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Memory Requirements for Convolutional
Neural Network Hardware Accelerators
Memory Requirements
for Convolutional Neural
Network Hardware
Accelerators
Kevin Siu, Dylan Malone Stuart, Mostafa Mahmoud, and
Andreas Moshovos
University of Toronto,
2018 IEEE International Symposium on Workload
Characterization (IISWC)
SEPIDEH SHIRKHANZADEH
1
WHY WE NEED HARDWARE ACCELERATORS?
 (CNNs) have been highly successful in image
processesing and Image classification
 hardware architectures have designed to
accelerate the computations in CNNs
 Memory,Band-width, performance
 The main challenge in designing efficient
memory system is : sizing memory on-chip to
minimizing off-chip access costs.
2
Type of Memory systems
 There is 3 type of memory system
1. centralized on-chip global memory
2. specialized partitioned memories
3. partition the storage into space for weights and activations
 The hierarchy is fixed or flexible
Benefit of fexible hierarchy : optimum energy for each layer of each network
disadvantage : configuration extraction is a time Consuming process
3
Basics of Convolutional Nueral NetworkS
The input activations I are a block of size X * Y * C
There are K filters Fk, which are of size R * S * C.
The output activations O are a block of size P *Q * K
P = (X-R)/m + 1
Q = (Y- S)/m + 1,
where m is the stride length.
4
Convolutional Computations
• To filter F0 is multiplied element-wise
with the upper-leftmost values of the
input to produce O(0; 0; 0).
• In the next step, the filter is shifted by
stride m across the input, to produce
O(0, 1, 0).
• This process is repeated for the entirety
of the input block to compute the output
activation plane O(P; Q; 0).
• To compute the other planes, we apply
the same process using filters F1 to Fk
5
characterization of the on-chip memory storage requirements
 computation of each output window is
independent of one another, which
permits very wide parallelism across the
computation.
 the same input activation and filter value
get accessed multiple times throughout
the computation and this is the
opportunity for Data Reuse.
 Each calculations in convolutional layer is
independent, so the order does not
affect the final outcome but affects when
the operands are accessed from memory.
 So data reuse and local access get
importance
6
characterization of the on-chip memory storage requirements
Computation Orders
& Data Reuse
7
Computation Order
Order 1 (Input-Major Order)
• each input window of size R x S x C is
multiplied with each of the K filters,
producing a 1 x 1 output activations of
length K.
•
weights are re-accessed P*Q times
In Order 2 (Filter-Major Order)
one filter is convolved
across the entire input activation
• the input activation values are
re-accessed K times
8
characterization of the on-chip memory storage requirements
4 Memory On-Chip
Schem
9
1. Everything On Chip:
• We have AM and WM for On-Chip Memory
• the WM is sized such that the weights from all layers can
simultaneously fit onchip.
• the input and output activations for any layer yet one layer at
a time can also fit in the on-chip AM.
• “zero” off-chip bandwidth because everything fits on-chip.
• the weights will be loaded once
• for each inference only off-chip traffic will be for the input
image and for the final output.
• generally infeasible
10
2. Working Set of Activations + All Filters (Off-Chip Activations)
the WM is sized such that all weights for all layers fit on-chip.
AM to be able to hold one “row” of input
windows, namely a block of size X *S*C.
each input activation needs to be only read once from
off-chip
parallel loading the next set
of X *m* C activations so that by the end we have the next
“row“ of activation windows on chip.
The total off-chip = the sum of the input activations and output
activations for each layer.
WEIGHTS ON-CHIP
S
X
C
11
3. Working Set of Filters + All Activations (Off-Chip Activations):
WM hold only a certain number of filters to satisfy the parallel
computation
each filter only needs to be fetched once from offchip
per layer.
The AM is sized to be able to hold both the input and the output
activations per layer.
AM = MAX LAYER
The off-chip traffic for this scheme is simply the size of the
weights across all layers of the network.
ACTIVATIONS ON-CHIP
12
4. Working Set of Filters + Working Set of Activations (Both Off-Chip):
WM = one set of filters ,by the on-chip execution engine
AM is sized to store only one row of activations
To minimize off-chip bandwidth accesses, we can either
choose to re-fetch the activation values K times (as in Order
2), or re-fetch the weight values Q times (as in Order 1).
*always opt for the order that is most favorable to the
metric under study.
13
characterization of the on-chip memory storage requirements
EVALUATION ON
BENCHMARKS
14
CNN Benchmarks
15
Benchmark Networks
 Image Classification Networks =Low Resolution
 Computational Imaging Networks = high Resolution
16
Total Storage for CNNs-Schem 1
MobileNet = 10.5 MB
it is certainly expensive for mobile devices for
which this network targets by sacrificing accuracy
compared to others such as ResNet.
17
TOTAL
On-chip Activation Memory Requirements-SCHEM 2
Assuming that all the
weights are stored in on-
chip memory
single “row” of windows of
the activations needs to be
stored on-chip.
computational
imaging
18
Weight Memory Requirements –SCHEM 3
• assuming that all the activations
are stored in on-chip memory
• We define the working set of
filters as the number of filters that
are computed in parallel and kept
on-chip at the same time. IMAGECLASSIFICATION
19
Total review
Scheme 2 that buffers all weights on-chip is impractical for the classification networks
Scheme 3 which buffers all activations per layer on chip and either all or a subset of the filters ,practical for the
classification models. VGG-19 and DPNet are outliers
Scheme 4 with processing concurrently only 64, 16, or 1 filters does have vastly lower on-chip storage requirements,
it will have much higher offchip bandwidth requirements.
All ActivationsAll weights
20
characterization of the on-chip memory storage requirements
BANDWIDTH
21
Computational Intensity
• computational intensity is typically much larger in early layers of convolutional neural networks.
• much larger input dimensions in early layers, filter is reused many times over input
• lower computational intensity(reuse) implies larger bandwidth requirements in later layers.
22
Peak Bandwidth and Memory Requirement
• under (all weights on chip) Image Classification have very low bandwidth
• under (all act on chip) super-resolution have very low bandwidth
• Under Scheme 4, only one working set on-chip at a time, memory reduced - higher bandwidth
23
Related Works
• Yang show how to optimize CNN loop blocking in order to minimize total memory energy expenditure
• DaDianNao used large on-chip eDRAM to store all activations and weights
• SCNN size activation RAMs to capture the capacity requirements of nearly all of the layers in the
networks
• The TPU used multi-megabyte on-chip AM and 64KB double buffers for the weights
24

Weitere ähnliche Inhalte

Was ist angesagt?

multi processors
multi processorsmulti processors
multi processorsAcad
 
A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
A Simplex Architecture for Intelligent and Safe Unmanned Aerial VehiclesA Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
A Simplex Architecture for Intelligent and Safe Unmanned Aerial VehiclesHeechul Yun
 
eTPU to GTM Migration Presentation
eTPU to GTM Migration PresentationeTPU to GTM Migration Presentation
eTPU to GTM Migration PresentationParker Mosman
 
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...Michael Gschwind
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
Presentation aix performance updates & issues
Presentation   aix performance updates & issuesPresentation   aix performance updates & issues
Presentation aix performance updates & issuessolarisyougood
 
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Rahul Jain
 
eTPU to GTM Migration
eTPU to GTM MigrationeTPU to GTM Migration
eTPU to GTM MigrationParker Mosman
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumptionhaish
 
Q2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE systemQ2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE systemLinaro
 
Multiple processor (ppt 2010)
Multiple processor (ppt 2010)Multiple processor (ppt 2010)
Multiple processor (ppt 2010)Arth Ramada
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 
Advanced High-Performance Computing Features of the Open Power ISA
Advanced High-Performance Computing Features of the Open Power ISAAdvanced High-Performance Computing Features of the Open Power ISA
Advanced High-Performance Computing Features of the Open Power ISAGanesan Narayanasamy
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsQian Lin
 
CPU Performance Enhancements
CPU Performance EnhancementsCPU Performance Enhancements
CPU Performance EnhancementsDilum Bandara
 
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISCBenchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISCPriyodarshini Dhar
 
Pipelining, processors, risc and cisc
Pipelining, processors, risc and ciscPipelining, processors, risc and cisc
Pipelining, processors, risc and ciscMark Gibbs
 

Was ist angesagt? (20)

multi processors
multi processorsmulti processors
multi processors
 
A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
A Simplex Architecture for Intelligent and Safe Unmanned Aerial VehiclesA Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
 
Debate on RISC-CISC
Debate on RISC-CISCDebate on RISC-CISC
Debate on RISC-CISC
 
eTPU to GTM Migration Presentation
eTPU to GTM Migration PresentationeTPU to GTM Migration Presentation
eTPU to GTM Migration Presentation
 
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Presentation aix performance updates & issues
Presentation   aix performance updates & issuesPresentation   aix performance updates & issues
Presentation aix performance updates & issues
 
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
eTPU to GTM Migration
eTPU to GTM MigrationeTPU to GTM Migration
eTPU to GTM Migration
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumption
 
Esd module2
Esd module2Esd module2
Esd module2
 
Q2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE systemQ2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE system
 
Multiple processor (ppt 2010)
Multiple processor (ppt 2010)Multiple processor (ppt 2010)
Multiple processor (ppt 2010)
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Advanced High-Performance Computing Features of the Open Power ISA
Advanced High-Performance Computing Features of the Open Power ISAAdvanced High-Performance Computing Features of the Open Power ISA
Advanced High-Performance Computing Features of the Open Power ISA
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 
CPU Performance Enhancements
CPU Performance EnhancementsCPU Performance Enhancements
CPU Performance Enhancements
 
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISCBenchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
 
Pipelining, processors, risc and cisc
Pipelining, processors, risc and ciscPipelining, processors, risc and cisc
Pipelining, processors, risc and cisc
 

Ähnlich wie Memory Requirements for Convolutional Neural Network Hardware Accelerators

Morph : a novel accelerator
Morph : a novel acceleratorMorph : a novel accelerator
Morph : a novel acceleratorBaharJV
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAlan Renouf
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Process synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesProcess synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesArun Joseph
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxKandavelEee
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
Brief Introduction.ppt
Brief Introduction.pptBrief Introduction.ppt
Brief Introduction.pptMollyZolly
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
Ch1Intro.pdf Computer organization and org.
Ch1Intro.pdf Computer organization and org.Ch1Intro.pdf Computer organization and org.
Ch1Intro.pdf Computer organization and org.gadisaAdamu
 
4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdfarpowersarps
 

Ähnlich wie Memory Requirements for Convolutional Neural Network Hardware Accelerators (20)

Morph : a novel accelerator
Morph : a novel acceleratorMorph : a novel accelerator
Morph : a novel accelerator
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtop
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
Process synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesProcess synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memories
 
module01.ppt
module01.pptmodule01.ppt
module01.ppt
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
Pentium iii
Pentium iiiPentium iii
Pentium iii
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptx
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
Brief Introduction.ppt
Brief Introduction.pptBrief Introduction.ppt
Brief Introduction.ppt
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
Ch1Intro.pdf Computer organization and org.
Ch1Intro.pdf Computer organization and org.Ch1Intro.pdf Computer organization and org.
Ch1Intro.pdf Computer organization and org.
 
4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 

Kürzlich hochgeladen

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7shivanni mehta
 
Vip Mumbai Call Girls Andheri East Call On 9920725232 With Body to body massa...
Vip Mumbai Call Girls Andheri East Call On 9920725232 With Body to body massa...Vip Mumbai Call Girls Andheri East Call On 9920725232 With Body to body massa...
Vip Mumbai Call Girls Andheri East Call On 9920725232 With Body to body massa...amitlee9823
 
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Naicy mandal
 
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)kojalkojal131
 
Introduction-to-4x4-SRAM-Memory-Block.pptx
Introduction-to-4x4-SRAM-Memory-Block.pptxIntroduction-to-4x4-SRAM-Memory-Block.pptx
Introduction-to-4x4-SRAM-Memory-Block.pptxJaiLegal
 
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...Pooja Nehwal
 
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)amitlee9823
 
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...tanu pandey
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...drmarathore
 
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men 🔝Vijayawada🔝 E...
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men  🔝Vijayawada🔝   E...➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men  🔝Vijayawada🔝   E...
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men 🔝Vijayawada🔝 E...amitlee9823
 
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ kakinada Call-girls in Women Seeking Men 🔝kakinada🔝 Escor...
➥🔝 7737669865 🔝▻ kakinada Call-girls in Women Seeking Men  🔝kakinada🔝   Escor...➥🔝 7737669865 🔝▻ kakinada Call-girls in Women Seeking Men  🔝kakinada🔝   Escor...
➥🔝 7737669865 🔝▻ kakinada Call-girls in Women Seeking Men 🔝kakinada🔝 Escor...amitlee9823
 
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...ranjana rawat
 

Kürzlich hochgeladen (20)

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7
 
Vip Mumbai Call Girls Andheri East Call On 9920725232 With Body to body massa...
Vip Mumbai Call Girls Andheri East Call On 9920725232 With Body to body massa...Vip Mumbai Call Girls Andheri East Call On 9920725232 With Body to body massa...
Vip Mumbai Call Girls Andheri East Call On 9920725232 With Body to body massa...
 
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
 
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
 
Introduction-to-4x4-SRAM-Memory-Block.pptx
Introduction-to-4x4-SRAM-Memory-Block.pptxIntroduction-to-4x4-SRAM-Memory-Block.pptx
Introduction-to-4x4-SRAM-Memory-Block.pptx
 
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking
 
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...
 
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
 
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men 🔝Vijayawada🔝 E...
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men  🔝Vijayawada🔝   E...➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men  🔝Vijayawada🔝   E...
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men 🔝Vijayawada🔝 E...
 
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ kakinada Call-girls in Women Seeking Men 🔝kakinada🔝 Escor...
➥🔝 7737669865 🔝▻ kakinada Call-girls in Women Seeking Men  🔝kakinada🔝   Escor...➥🔝 7737669865 🔝▻ kakinada Call-girls in Women Seeking Men  🔝kakinada🔝   Escor...
➥🔝 7737669865 🔝▻ kakinada Call-girls in Women Seeking Men 🔝kakinada🔝 Escor...
 
CHEAP Call Girls in Vinay Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Vinay Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Vinay Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Vinay Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
 

Memory Requirements for Convolutional Neural Network Hardware Accelerators

  • 1. Memory Requirements for Convolutional Neural Network Hardware Accelerators Memory Requirements for Convolutional Neural Network Hardware Accelerators Kevin Siu, Dylan Malone Stuart, Mostafa Mahmoud, and Andreas Moshovos University of Toronto, 2018 IEEE International Symposium on Workload Characterization (IISWC) SEPIDEH SHIRKHANZADEH 1
  • 2. WHY WE NEED HARDWARE ACCELERATORS?  (CNNs) have been highly successful in image processesing and Image classification  hardware architectures have designed to accelerate the computations in CNNs  Memory,Band-width, performance  The main challenge in designing efficient memory system is : sizing memory on-chip to minimizing off-chip access costs. 2
  • 3. Type of Memory systems  There is 3 type of memory system 1. centralized on-chip global memory 2. specialized partitioned memories 3. partition the storage into space for weights and activations  The hierarchy is fixed or flexible Benefit of fexible hierarchy : optimum energy for each layer of each network disadvantage : configuration extraction is a time Consuming process 3
  • 4. Basics of Convolutional Nueral NetworkS The input activations I are a block of size X * Y * C There are K filters Fk, which are of size R * S * C. The output activations O are a block of size P *Q * K P = (X-R)/m + 1 Q = (Y- S)/m + 1, where m is the stride length. 4
  • 5. Convolutional Computations • To filter F0 is multiplied element-wise with the upper-leftmost values of the input to produce O(0; 0; 0). • In the next step, the filter is shifted by stride m across the input, to produce O(0, 1, 0). • This process is repeated for the entirety of the input block to compute the output activation plane O(P; Q; 0). • To compute the other planes, we apply the same process using filters F1 to Fk 5
  • 6. characterization of the on-chip memory storage requirements  computation of each output window is independent of one another, which permits very wide parallelism across the computation.  the same input activation and filter value get accessed multiple times throughout the computation and this is the opportunity for Data Reuse.  Each calculations in convolutional layer is independent, so the order does not affect the final outcome but affects when the operands are accessed from memory.  So data reuse and local access get importance 6
  • 7. characterization of the on-chip memory storage requirements Computation Orders & Data Reuse 7
  • 8. Computation Order Order 1 (Input-Major Order) • each input window of size R x S x C is multiplied with each of the K filters, producing a 1 x 1 output activations of length K. • weights are re-accessed P*Q times In Order 2 (Filter-Major Order) one filter is convolved across the entire input activation • the input activation values are re-accessed K times 8
  • 9. characterization of the on-chip memory storage requirements 4 Memory On-Chip Schem 9
  • 10. 1. Everything On Chip: • We have AM and WM for On-Chip Memory • the WM is sized such that the weights from all layers can simultaneously fit onchip. • the input and output activations for any layer yet one layer at a time can also fit in the on-chip AM. • “zero” off-chip bandwidth because everything fits on-chip. • the weights will be loaded once • for each inference only off-chip traffic will be for the input image and for the final output. • generally infeasible 10
  • 11. 2. Working Set of Activations + All Filters (Off-Chip Activations) the WM is sized such that all weights for all layers fit on-chip. AM to be able to hold one “row” of input windows, namely a block of size X *S*C. each input activation needs to be only read once from off-chip parallel loading the next set of X *m* C activations so that by the end we have the next “row“ of activation windows on chip. The total off-chip = the sum of the input activations and output activations for each layer. WEIGHTS ON-CHIP S X C 11
  • 12. 3. Working Set of Filters + All Activations (Off-Chip Activations): WM hold only a certain number of filters to satisfy the parallel computation each filter only needs to be fetched once from offchip per layer. The AM is sized to be able to hold both the input and the output activations per layer. AM = MAX LAYER The off-chip traffic for this scheme is simply the size of the weights across all layers of the network. ACTIVATIONS ON-CHIP 12
  • 13. 4. Working Set of Filters + Working Set of Activations (Both Off-Chip): WM = one set of filters ,by the on-chip execution engine AM is sized to store only one row of activations To minimize off-chip bandwidth accesses, we can either choose to re-fetch the activation values K times (as in Order 2), or re-fetch the weight values Q times (as in Order 1). *always opt for the order that is most favorable to the metric under study. 13
  • 14. characterization of the on-chip memory storage requirements EVALUATION ON BENCHMARKS 14
  • 16. Benchmark Networks  Image Classification Networks =Low Resolution  Computational Imaging Networks = high Resolution 16
  • 17. Total Storage for CNNs-Schem 1 MobileNet = 10.5 MB it is certainly expensive for mobile devices for which this network targets by sacrificing accuracy compared to others such as ResNet. 17 TOTAL
  • 18. On-chip Activation Memory Requirements-SCHEM 2 Assuming that all the weights are stored in on- chip memory single “row” of windows of the activations needs to be stored on-chip. computational imaging 18
  • 19. Weight Memory Requirements –SCHEM 3 • assuming that all the activations are stored in on-chip memory • We define the working set of filters as the number of filters that are computed in parallel and kept on-chip at the same time. IMAGECLASSIFICATION 19
  • 20. Total review Scheme 2 that buffers all weights on-chip is impractical for the classification networks Scheme 3 which buffers all activations per layer on chip and either all or a subset of the filters ,practical for the classification models. VGG-19 and DPNet are outliers Scheme 4 with processing concurrently only 64, 16, or 1 filters does have vastly lower on-chip storage requirements, it will have much higher offchip bandwidth requirements. All ActivationsAll weights 20
  • 21. characterization of the on-chip memory storage requirements BANDWIDTH 21
  • 22. Computational Intensity • computational intensity is typically much larger in early layers of convolutional neural networks. • much larger input dimensions in early layers, filter is reused many times over input • lower computational intensity(reuse) implies larger bandwidth requirements in later layers. 22
  • 23. Peak Bandwidth and Memory Requirement • under (all weights on chip) Image Classification have very low bandwidth • under (all act on chip) super-resolution have very low bandwidth • Under Scheme 4, only one working set on-chip at a time, memory reduced - higher bandwidth 23
  • 24. Related Works • Yang show how to optimize CNN loop blocking in order to minimize total memory energy expenditure • DaDianNao used large on-chip eDRAM to store all activations and weights • SCNN size activation RAMs to capture the capacity requirements of nearly all of the layers in the networks • The TPU used multi-megabyte on-chip AM and 64KB double buffers for the weights 24