SlideShare a Scribd company logo
1 of 6
Download to read offline
High Level Synthesis Framework For a Coarse
           Grain Reconfigurable Architecture
                                 Omer Malik, Ahmed Hemani and Muhammad Ali Shami
                                          Dept. of Electronic Systems, School of ICT
                                    Royal Institute of Technology, KTH, Stockholm, Sweden
                                            Email: {omerm, hemani, shami}@kth.se


   Abstract—A High Level Synthesis Framework for mapping                Algorithmic developer guides VESYLA towards a spe-
DSP algorithms on a Coarse Grain Reconfigurable Architecture          cific architectural style by using VESYLA pragmas. A small
is presented. Behavioral specification of the algorithm in C is       change in these pragmas would result in a different archi-
specified with pragmas in comments and the tool generates
configware after performing timing and synchronization synthe-        tectural implementation and this change can also manipulate
sis. Pragmas identify SIMD type concurrency and sweep the            serial/parallel structure of the implementation; thus user is
architectural space with allocation and binding annotations to       always in full control and can capture the architectural space
produce implementations from fully serial to fully parallel. This    effectively. These properties makes VESYLA an interactive
allows user to stay at algorithmic level and guide the HLS tool to   design tool where it utilizes the human developer’s guidance
search a restricted architectural space bounded by the pragmas
thus making the synthesis process more efficient and predictable.     and yet following the “push button” methodology to generate
                                                                     the RTL implementation.
   Index Terms—High level synthesis; CGRA; Symbolic Assem-              Main contributions of this paper are : (a) Design space
bler; High Level Language;                                           is easily explorable and various architectural solutions can
                                                                     be implemented with minimum efforts. (b) VESYLA hides
                      I. I NTRODUCTION                               unnecessary low level details from the users by allowing it to
   DRRA (Dynamically Reconfigurable Resource Array), is               work on the higher abstraction level; results in less chances
a CGRA (Coarse Grain Reconfigurable Architecture) for                 of making mistakes and design time is reduced. (c) Developer
implementing DSP applications. DRRA offers DLP (Data                 can exploit parallelization options available with ease using
Level Parallelism), where large set of data is processed by          pragmas. (d) A controlled HLS approach that would result in
the same set of instructions in parallel threads like SIMD           an optimal solution (discussed in “Related Work” Section).
(Single Instruction Multiple Data). DRRA also allows MIMD               In Section II, we have presented the related work, Section
(Multiple Instructions Multiple Data) where multiple different       III outlines the DRRA fabric, Section IV describes VESYLA,
SIMD clusters are operating in parallel. Modems and Codecs           Section V consists of experimental results and Section VI
represent the physical layer in the ISO’s (International Orga-       presents the conclusion and future work.
nization for Standardization) 7 layer model and the functions
used in these applications are DSP functions characterized by                            II. R ELATED W ORK
high degree of regularity. Due to their regular structure, the          In this section we will review some industry standard HLS
computation can be divided into threads of data parallel tasks       tools followed by few schemes for mapping algorithms on
and we call these as a pattern in a DSP function.                    CGRAs.
   VESYLA (VEctorizing SYmbolic Language Assembler) is                  GAUT [5] is an open source HLS tool for DSP applications,
a semi-automatic framework for implementing DSP functions            which takes input in bit-accurate C format along with some
on DRRA. VESYLA takes an untimed C specification of                   design constraints. The C specification is converted into DFG
a DSP function and generates configware for DRRA after                (Data Flow Graph) for extracting potential parallelism and data
performing timing and synchronization synthesis; an activity         dependencies. Lastly it generates the RTL after going through
that is cumbersome, time consuming and error prone. The              allocation, scheduling and binding tasks.
developer makes and explicitly expresses critical implemen-             [3] is an automatic synthesis tool which accepts code
tation decisions on how many resources are allocated and             described in ANSI C++ and few synthesis constraints which
how the operators and operands are mapped to the allocated           are used to explore the design space. Designer can guide
resources. These activities are very much like HLS (High             the synthesis procedure towards an optimal solution. Tool
Level Synthesis), but in this case VESYLA user knows the             generates RTL suitable for targeted hardware.
targeted RTL structure and guides the tool towards it with the          [4] takes System C modules as input and produce optimized
allocation and binding pragmas. Allocation and binding being         RTL for specified target technology identified by the user in
the well known concepts from the HLS domain. VESYLA                  form of a .lib file.
performs the scheduling, syntheses control and does the code            [6] is based on a design environment where designer con-
generation.                                                          trols the HLS and can change the synthesis decisions about


978-1-4244-8971-8/10$26.00 c 2010 IEEE
>                            Z               D
scheduling, allocation and binding by using a GUI (Graphical            d                              d   Z       &       d           ^
User Interface) at any stage.
   DRESC[8] focuses on loop level parallelization for different
segments of application code and map them on a CGRA
by using Modulo scheduling algorithms to achieve ILP for
optimal performance.
   [7] is based on mapping the hyperops obtained from a DFG
on to the CGRA. Code acceleration is achieved by run time
reconfiguration of CGRA to accommodate these hyperops.
   [9] uses SUIF compiler framework for portioning the input
code w.r.t. to available resources and Native Mapping Lan-
guage generates XPP’s PE, which are automatically placed by
their tool.
   Our approach is similar to traditional HLS tool but in
our case search space is significantly reduced by explicitly
identifying architectural elements with the help of pragmas                 d               d    /         d ^         
                                                                        ^                                  ZZ                     ZZ 
and automating the synthesis procedure is much easier as
compared to traditional HLS tools because of their extremely
                                                                                Figure 1.       The DRRA PHY Layer Fabric Fragment
large search space. Another key differentiator is that VESYLA
outputs generate distributed controls i.e. it generates multi-
ple FSMs implemented in the micro-coded sequencer with
                                                                     from the LATAA or inputs from other MAUSEEQAARS as
each thread having it’s own control, being able to execute
                                                                     interrupts.
independently or can synchronize with each other if required.
                                                                        d. SILSILAY (a Seamless, sliding window, circuit switched
Traditional HLS techniques generates a thread as a single
                                                                     interconnect fabric) are regular non-blocking point-to-point,
control scheme or a single FSM.
                                                                     point-to-multipoint, low latency interconnection network with
                                                                     sliding window connectivity which allows arbitrary parallelism
                 III. DRRA A RCHITECTURE
                                                                     among large subsystems. The DRRA interconnect fabric al-
   This section briefly describes the DRRA architecture. Figure       lows every resource to receive/send input/output to/from any
1 shows an instance of 7x2 DRRA fabric. Every DRRA cell              other resource in its own column and 3 columns on each side
consists of following components;                                    [1], i.e., every LATAA and REFI is connected to 28 LATAAs
   a. LATAAs (Logic And morphable daTA path unit) are 16-            and REFIs, including itself.
bit data path unit which consists of : 1) Logic Partitions to
deal with logical functions. 2) Arithmetic Partition to imple-                                       IV. VESYLA
ment commonly used signal processing algorithms like, MAC,              VESYLA is based on subset of C language with some
Symmetric FIR, FFT butterfly, Sum of difference, Different of         restrictions imposed, like user is not allowed to use the
sum, 4/2 input add/subtracts etc [2]. The mDPU also has 17           FILE handling features, creation of dynamic data structures,
bits counter and 13 bits status register. The values of the status   and dynamic functions available in C. The operators that
registers are read by micro-coded state machines, which takes        are allowed in VESYLA, are also restricted to those that
decisions accordingly.                                               correspond to the instructions offered by the DRRA LATAA.
   b. REFIs (REgister FIle) are 64 word 16 bit Register Files        As the instructions of DRRA’s LATAA already correspond to
with 2 read and 2 write ports. REFI has an AGU (Address              the typical DSP operations like MAC, butterfly etc. and as the
Generation Unit) that can generate address in vector mode,           objective is to specify the RTL implementation of an algorithm
circular buffer and in bit reversing modes for FFT. These            and not the algorithm itself, this restriction is natural and not
modes can execute once, some limited number of times or              intrusive; the implementation in any case requires composing
in infinite loop by using an arbitrary initial delay, a middle        the algorithms in terms of DRRA operations.
delay between each read/write and an end delay before the               Information about allocation and binding is provided to
loop iterates.                                                       VESYLA with the help of pragmas, which are essentially
   c. MAUSEEQAARs (Micro-coded hierarchical sequenc-                 the directives for the guidance to VESYLA for generating
ing machine) controls all the resources in a DRRA cell. It           a specific RTL architectural style. These pragmas are in
sends instructions to the AGUs of the register file, selects          the form of parameters and constraints and after analyzing
LATAA modes, and configure the interconnects for proper               them VESYLA takes the decision that how operands and
operations. They can send output signals to each other for           operators are mapped to the DRRA resources involved in
control communication. They can also receive status bits             the implementation. Topological relationship of the DRRA
from mDPU. The MAUSEEQAAR has configurable interrupt                  resources is also part of the mapping specifications in the
handling capabilities which allows user to configure the inputs       pragmas. With the help of these pragmas, user specifies a
pattern of a particular implementation. This pattern sweeps                  of cycles required for each operation. This synchronization is
the implementation space in terms of degree of parallelism -                 achieved by inserting wait instructions, using delays provided
from fully serial to fully parallel and number of SIMD/MIMD                  by AGU and counters present in LATAAs. VESYLA resolve
threads.                                                                     all these issues (related to synchronization of statements and
   These pragmas are categorized in form of dimensional and                  dependencies) and generates configware for programming the
positional generics. a) Dimensional generics identifies the                   MAUSEEQAARs; contains instructions set for configuring
dimension of the problem and the architecture. For example                   REFIs/LATAAs/SILSILAY in their respective modes.
a FFT can be characterized in multiple ways by setting its                      Consider VESYLA code in Figure 3 which generates a FIR
parameters to match the dimension to what is required by                     asymmetric Filter for N number of taps. This small piece
a specific context (like WLAN would need 64 point FFT,                        of code can generate a Nth order filter with M degree of
whereas DVB would need 4096 point FFT). Similarly one                        parallelism. Fully parallel implementation is achieved when
has to decide the specific micro-architecture for that imple-                 M=N and when M=1 this filter is fully serial while anything
mentation (like radix- 2/4/mixed and the number of butterflys                 in between is partially parallel.
etc). Lastly number of resources to be used should also be
taken into consideration. b) positional generics decides the                                                                         E   D

                                                                                                            Z            D
exact location/range of locations to be used in DRRA Fabric
for that particular implementation (These things are discussed                                                  E D                          
later in the section).                                                                             E                     D                   ^
                                                                                                   E                     D
   Figure 2 shows the complete flow of VESYLA HLS frame-                                                                   D
work. Developer specifies the behavioral specification as an                                              ^                            D
                                                                                                           Z                            D
untimed C model and is responsible for defining the allocation
and binding constraints. VESYLA analyzes each statement of                                                  D
the code and builds a CDFG (Control Data Flow Graph).                                          s                     D
                                                                                                                         DE
VESYLA performs an extensive DDA (Data Dependency                                                                                                Z/
Analysis) and RDA(Resource Dependency Analysis) phase                                                           D
                                                                                                                                             
using scheduling information implied from relative ordering                                                                                  ^
of the statements.                                                                                 ^             d                               D
                                                                                                                                 E

                                  

                                                                                     Figure 3.         VESYLA pattern for Asymmetric FIR filters
                             s^z
                                                                               VESYLA code is structured in two parts; First part consists
                W
                                                                             of declaration statements and the second part consists of
                                                                             functional statements.
                   '
                                           Z/ 
                                                                             A. Declarative Statements
         W                                                                    Statements in declarative section involve the pragmas and
          Z                                   t /d
                                           ^                                 resources to be used.
            ^                              /                                    1) Generics Declaration: Statement 1 declares VESYLA
                                       D




        ^                                                                   generics. Usage of generics N, M has already been discussed
                                   ^




                                                                             above. Generics r  c specifies the row and column indices in
                                                                             DRRA fabric. All indices of DRRA resources in this pattern
                                                      D                  ^
                            '                                                are specified relative to r and c. Statement 3 identifies the
                                                                    
                                                                            number of columns to be used in DRRA architecture.
                                                              s,
                                                                                2) Resource Declaration: These statements allocate and
                                ZZ
                                                          d                  bind operands and operations using pragmas. “_REFI” 
                                                                             “_LATAA” pragmas identify the location and number of
                      Figure 2.    VESYLA Framework                          resources to be used in DRRA fabric. Statements 4 to 8
                                                                             are resource declaration statements. Statement 4,5  6 are
   VESYLA resolves dependencies while considering the allo-                  using the “_REFI pragmas” to specify that x (the delay line)
cation/binding pragmas in resource/data dependency analysis                  and c (co-efficients) are to be placed in row r and will
phase and makes the critical decision that which section                     occupy columns c to c+M-1; i.e., distribute x  c across M
of algorithm/code can be executed in parallel and which                      REFIS; the distribution is as equitable as possible.Statement
part of the algorithm has to be sequentialized. After this                   11 is specifying the number of LATAAs (mDPUs) required
step, VESYLA schedules the sequences of operations and                       in DRRA fabric. Generics r  constant colRange, indicates
synchronizes the execution by calculating the exact number                   the exact location of the mDPUs in DRRA Fabric. This
scheme allows a) serial, parallel trade-offs to be captured in                   ^
terms of generics like N and M and b) positioning a specific
implementation anywhere in the fabric for an arbitrary value
of r (and c).
B. Functional Statements
   Functional statements consists of pre-defined functions for                                                           
performing various operations. Statements 12 to 15 are the
functional statements of VESYLA. Statement 12 computes the
convolution sum using M macs and the values are held in the                                                                 
intermediate variables Lout, which are fed to an adder tree
that sums up these values. Statement 15 implements a shift                                                  d
line for shifting the samples in REFIs when the new sample
X0 is arrived. vAsymMac is a pre-defined functions that
performs the asymmetric MAC operations on a vector or slice
of max k size created by the pre-defined (r)slice functions.
VESYLA creates M parallel threads (using statement 12), each
performing max k MACs. Similarly adderTree is also a pre-                                                  ^
defined function that corresponds to the 4 input adderTree
                                                                         Figure 4.   Partially Parallel 125 Taps Asymmetric FIR Filter
mode of LATAAs. Statement 15 implements the shift line.
C. VESYLA Configware Generation
   VESYLA generates configware corresponding to specific             E. Flexibility offered by VESYLA
values of generics like N, M, r and c. This configware                 Primary target of any HLS tool is to meet the performance
is at register transfer level of abstraction and has absolute      by optimizing area/power constraints. VESYLA helps this
timing and synchronization details that VESYLA synthesizes.        by a controlled mechanism for HLS, where user can choose
Suppose we want to compute convolution sum of 126 Taps             any resource in DRRA Fabric. User can easily change the
asymmetric FIR FILTER using only 5 computation threads             allocation parameters in DRRA Fabric using pragmas (po-
(partially parallel implementation) in DRRA fabric. This can       sitional generics). In the example earlier, co-efficients and
be achieved by using M=5  N=126 as shown in Figure 4.             samples are sharing the same REFIs, but this can be easily
Additionally assume r and c to be zero, which implies that         changed by making the following small alteration as shown
columns 0 - 4 are being used. Each REFI can store 25 samples       in Figure 5. Although this change looks very simple but it
except the last one which contains 26 samples. VESYLA will         has a significant effect at low level, as in each SIMD thread
program the micro-coded state machines in row 0  columns          the bit stream pattern for MAUSEEQAARS are changed and
(0-4) for generating these REFI instructions (Read, Write using    interconnect mechanism between storage/functional unit is
port A/B in streaming Modes), LATAA instructions (Asym.            affected as well. Manually altering these bit streams at lower
Mac mode, Adder Tree) accordingly. VESYLA will similarly           level is a very difficult task (as shown in “Experimental
deal with allocation of REFI locations for co-efficients ensur-     Results”) and VESYLA unburdens the user from doing these
ing that they are aligned with their corresponding samples.        tedious jobs by using this simple mechanism.
Instructions in these MAUSEEQAARs are issued sequentially
and are synchronized by VESYLA using the delays provided
                                                                                              E                     D
by AGU and in each MAUSEEQAAR there are multiple
FSMs being created, which includes streaming data to/from
                                                                                              E                         D
REFIS, consuming data for computational purpose in LATAAs
including the Adder Trees.
                                                                         Figure 5.   Simple Mechanism to Select Resources in DRRA
D. VESYLA Optional Pragmas  Inferences
   There are some optional parameters which programmer can            One can switch to two different architectures, without mak-
omit and VESYLA will still generate the correct functionality      ing any significant changes in the code and with full control
by inferring these parameters. It can be seen from the FIR filter   over the degree of parallelism. This can be achieved by making
codes that there is no information about the ports of REFI.        few changes in VESYLA code for generating a different
VESYLA chooses by itself and assign the unused ports of            architectural solution for the same algorithm. Consider the
that particular REFI. Similarly the address ranges mentioned       code presented in Figure 6 which is generating symmetric Fir
in the code are logical addresses which are resolved to physical   Filter instead of asymmetric FIR Filter. Few changes are made
addresses by VESYLA. If a user wants to use some specific           in previous VESYLA code, as in symmetric FIR Filter samples
physical addresses, then it should inform VESYLA by using          are summed together and then multiplied by the coefficients.
some additional syntax in pragmas.                                 Similarly distribution of samples and coefficients over REFI
are different as compared to the code for asymmetric FIR                                                                                    V. E XPERIMENTAL R ESULTS
Filter. Samples for symmetric FIRs are divided into two halves                                                      Some of the above mentioned algorithms were implemented
and are distributed over the same resource while co-efficients                                                    in VESYLA and were simulated using the generated config-
are using a different REFIs. A 64 taps symmetric Fir Filter                                                      ware. Figure 8 shows the partial configware generated for
using M = 1  N=64 will generate the configuration shown                                                          a single MAUSEEQAAR (FIR FILTER) with their instruc-
in Figure 6.                                                                                                     tion sets and the corresponding FSMs being generated. This
                                                                                                                 MAUSEEQAAR represents an instance from a single thread
                                                       E       D                                                 of multiple SIMD threads and Figure 9 elaborates only one of
                                                                           ^                                     the FSM (instr. num. 5) executed by using VESYLA. FSMs
                                   W           E
                                       E       D                                                                 related to Read/Write operations are similar to each other. It
                       W                               D                                                         should also be noted that the manually mapped design would
               E                        D                      W
                               D        E              W                                                         produce the same output as VESYLA.
                                       D
                   ^                               D                                                                                                 D h^Y            Z
                           D                               D                                                      E    /                                                E    /
                                                                                                                                                                                                                         Z/

                                                                                                  
                           D               s                   D                                                       /                    /                               /           Z           W
                                                                                                                       /                            /
                                   W
                                                                                                                                                                                         Z/
                                                                                                                       /            Z Z/       W                           ^                       
                       ^       D                                                                                                                                             /                       /
                                                                                                                       ^   /                            /
                                                                                                                       /                         D                          /                       D
                                                                                       y
               ^                   d                                   D                                                                       K
                                                                                                                                                                                                     

                                               E                                                                       /                         /
                                                                                                                                                                                                    K
                                                                                                                                                                             
                                                                                                                       /            Z Z/       W                          /                       /
                                                                                                                       ^   /                            /
                                                                                                                       /                         D                          /           t           ^
                                                                                       ^
                                                                                                                                               K
                                                                                                                                                                             :                       
                                                                                                                       /                         /
                       Figure 6.           Symmetric FIR filters in VESYLA                                              /             d                                     /            d         
                                                                                                                       K            D           
                                                                                                                       K                         K                           K           D          

                                                                                                                       /            t Z/       W
   VESYLA can also exploit TLP, where multiple execution                                                               ^   /                            /
                                                                                                                                                                             K                       K
                                                                                                                                                                             /           t           ^
units are performing computations on data sets with different                                                          /                         D 
                                                                                                                                               K                           :                       
set of instructions just like in MIMD. Consider the code pre-                                                                                                                                                             d
sented in Figure 7 which will generate two threads executing
independent of each other computing different instructions.
                                                                                                                                    Figure 8.                     Configware For N-Taps FIR Filter
Here binding/allocation pragams are pointing to the exact
row/column in DRRA fabric. We can use the same code to
                                                                                                                    VESYLA programs each MAUSEEQAAR involved by ex-
generate multiple SIMD threads with in each MIMD thread
                                                                                                                 ecuting and synchronizing these FSMs properly. Imagine how
by replacing the 0’s  1’s in pragmas with some generic
                                                                                                                 complex the job would be in case of manually generating
values; which will sweep the architectural space accordingly.
                                                                                                                 this configware for each architectural choice with multiple
There can be other variants, where multiple MIMD threads
                                                                                                                 SIMD/MIMD threads. Chances of making mistakes and time
can communicate with each other and VESYLA can handle
                                                                                                                 for debugging the code goes up as complexity of the algorithm
the complexity.
                                                                                                                 increases.

       D/D d                              E           E       Z               Z
                                                                                                                                                                                  
                                                                                                                                                                                 
                   E
                   E
                   E                                                                                                                                                                                 ^
                   E                                                                                                                                                                                 
                                                                                                                                                                                                                
                   E
                                                                                                                                                              /
                                                                                                                                   ^                                           /                       ^
                                                                      y                               Z
                                                                                                                                                                                                                               ^
                                                                                                       z   z                                                                                                      

                       E
                                                                                                                               /                                                     D
                                                                                 y                                                                                                
                                                                   
                           E                                                                                                                                                                     D
                                                                                                                                                                                                
                                                                                                                                       /
                                                                                                                                        
                                              z

                                                                                                                                                         Figure 9.          Single Read FSM

                       Figure 7.           VESYLA MIMD Code  Threads                                              Due to lack of space we cannot show the complete con-
82

More Related Content

What's hot

DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOLDYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOLIJCNCJournal
 
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...IJASCSE
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...IDES Editor
 
Haqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetsHaqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetscsandit
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingMuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingM H
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS ijwmn
 
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferImproved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferEswar Publications
 
11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manetAlexander Decker
 
Design of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkDesign of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkIJMTST Journal
 
Dual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained NetworksDual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained Networksambitlick
 
VTU 8TH SEM CSE ADHOC NETWORKS SOLVED PAPERS OF JUNE-2014 DEC-14 & JUNE-2015
VTU 8TH SEM CSE ADHOC NETWORKS SOLVED PAPERS OF JUNE-2014 DEC-14 & JUNE-2015VTU 8TH SEM CSE ADHOC NETWORKS SOLVED PAPERS OF JUNE-2014 DEC-14 & JUNE-2015
VTU 8TH SEM CSE ADHOC NETWORKS SOLVED PAPERS OF JUNE-2014 DEC-14 & JUNE-2015vtunotesbysree
 

What's hot (19)

DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOLDYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
 
Ns2 x graphs
Ns2 x graphsNs2 x graphs
Ns2 x graphs
 
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
 
Ax24329333
Ax24329333Ax24329333
Ax24329333
 
V25112115
V25112115V25112115
V25112115
 
Haqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetsHaqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manets
 
G0544650
G0544650G0544650
G0544650
 
C0431320
C0431320C0431320
C0431320
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingMuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical Routing
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
 
B031201016019
B031201016019B031201016019
B031201016019
 
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferImproved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
 
11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet
 
Design of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkDesign of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection Network
 
Dual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained NetworksDual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained Networks
 
VTU 8TH SEM CSE ADHOC NETWORKS SOLVED PAPERS OF JUNE-2014 DEC-14 & JUNE-2015
VTU 8TH SEM CSE ADHOC NETWORKS SOLVED PAPERS OF JUNE-2014 DEC-14 & JUNE-2015VTU 8TH SEM CSE ADHOC NETWORKS SOLVED PAPERS OF JUNE-2014 DEC-14 & JUNE-2015
VTU 8TH SEM CSE ADHOC NETWORKS SOLVED PAPERS OF JUNE-2014 DEC-14 & JUNE-2015
 
Fa25939942
Fa25939942Fa25939942
Fa25939942
 

Similar to 82

SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
 
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
 
A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable Systems
A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable SystemsA NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable Systems
A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable SystemsLisa Muthukumar
 
Cloud Module 3 .pptx
Cloud Module 3 .pptxCloud Module 3 .pptx
Cloud Module 3 .pptxssuser41d319
 
E5 05 ijcite august 2014
E5 05 ijcite august 2014E5 05 ijcite august 2014
E5 05 ijcite august 2014ijcite
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
 
ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...Ruo Ando
 
Zou Layered VO PDCAT2008 V0.5 Concise
Zou Layered VO PDCAT2008 V0.5 ConciseZou Layered VO PDCAT2008 V0.5 Concise
Zou Layered VO PDCAT2008 V0.5 Conciseyongqiangzou
 
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONMAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONijdms
 
How to Develop True Distributed Real Time Simulations? Mixing IEEE HLA and OM...
How to Develop True Distributed Real Time Simulations? Mixing IEEE HLA and OM...How to Develop True Distributed Real Time Simulations? Mixing IEEE HLA and OM...
How to Develop True Distributed Real Time Simulations? Mixing IEEE HLA and OM...Simware
 
RTI/Cisco response to the OMG Software Defined Networks (SDN) RFI
RTI/Cisco response to the OMG Software Defined Networks (SDN) RFIRTI/Cisco response to the OMG Software Defined Networks (SDN) RFI
RTI/Cisco response to the OMG Software Defined Networks (SDN) RFIGerardo Pardo-Castellote
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 
ENHANCING AND MEASURING THE PERFORMANCE IN SOFTWARE DEFINED NETWORKING
ENHANCING AND MEASURING THE PERFORMANCE IN SOFTWARE DEFINED NETWORKINGENHANCING AND MEASURING THE PERFORMANCE IN SOFTWARE DEFINED NETWORKING
ENHANCING AND MEASURING THE PERFORMANCE IN SOFTWARE DEFINED NETWORKINGIJCNCJournal
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
 
ModelTalk - When Everything is a Domain Specific Language
ModelTalk - When Everything is a Domain Specific LanguageModelTalk - When Everything is a Domain Specific Language
ModelTalk - When Everything is a Domain Specific LanguageAtzmon Hen-Tov
 

Similar to 82 (20)

SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
 
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...
 
A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable Systems
A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable SystemsA NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable Systems
A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable Systems
 
Cloud Module 3 .pptx
Cloud Module 3 .pptxCloud Module 3 .pptx
Cloud Module 3 .pptx
 
E5 05 ijcite august 2014
E5 05 ijcite august 2014E5 05 ijcite august 2014
E5 05 ijcite august 2014
 
Ca alternative architecture
Ca alternative architectureCa alternative architecture
Ca alternative architecture
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Ersa11 Holland
Ersa11 HollandErsa11 Holland
Ersa11 Holland
 
ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...
 
Flynn taxonomies
Flynn taxonomiesFlynn taxonomies
Flynn taxonomies
 
Zou Layered VO PDCAT2008 V0.5 Concise
Zou Layered VO PDCAT2008 V0.5 ConciseZou Layered VO PDCAT2008 V0.5 Concise
Zou Layered VO PDCAT2008 V0.5 Concise
 
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONMAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
 
How to Develop True Distributed Real Time Simulations? Mixing IEEE HLA and OM...
How to Develop True Distributed Real Time Simulations? Mixing IEEE HLA and OM...How to Develop True Distributed Real Time Simulations? Mixing IEEE HLA and OM...
How to Develop True Distributed Real Time Simulations? Mixing IEEE HLA and OM...
 
RTI/Cisco response to the OMG Software Defined Networks (SDN) RFI
RTI/Cisco response to the OMG Software Defined Networks (SDN) RFIRTI/Cisco response to the OMG Software Defined Networks (SDN) RFI
RTI/Cisco response to the OMG Software Defined Networks (SDN) RFI
 
International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Grid Presentation
Grid PresentationGrid Presentation
Grid Presentation
 
ENHANCING AND MEASURING THE PERFORMANCE IN SOFTWARE DEFINED NETWORKING
ENHANCING AND MEASURING THE PERFORMANCE IN SOFTWARE DEFINED NETWORKINGENHANCING AND MEASURING THE PERFORMANCE IN SOFTWARE DEFINED NETWORKING
ENHANCING AND MEASURING THE PERFORMANCE IN SOFTWARE DEFINED NETWORKING
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
ModelTalk - When Everything is a Domain Specific Language
ModelTalk - When Everything is a Domain Specific LanguageModelTalk - When Everything is a Domain Specific Language
ModelTalk - When Everything is a Domain Specific Language
 

More from srimoorthi (20)

84
8484
84
 
75
7575
75
 
73
7373
73
 
72
7272
72
 
70
7070
70
 
69
6969
69
 
68
6868
68
 
63
6363
63
 
62
6262
62
 
61
6161
61
 
60
6060
60
 
59
5959
59
 
57
5757
57
 
56
5656
56
 
50
5050
50
 
55
5555
55
 
52
5252
52
 
53
5353
53
 
51
5151
51
 
49
4949
49
 

82

  • 1. High Level Synthesis Framework For a Coarse Grain Reconfigurable Architecture Omer Malik, Ahmed Hemani and Muhammad Ali Shami Dept. of Electronic Systems, School of ICT Royal Institute of Technology, KTH, Stockholm, Sweden Email: {omerm, hemani, shami}@kth.se Abstract—A High Level Synthesis Framework for mapping Algorithmic developer guides VESYLA towards a spe- DSP algorithms on a Coarse Grain Reconfigurable Architecture cific architectural style by using VESYLA pragmas. A small is presented. Behavioral specification of the algorithm in C is change in these pragmas would result in a different archi- specified with pragmas in comments and the tool generates configware after performing timing and synchronization synthe- tectural implementation and this change can also manipulate sis. Pragmas identify SIMD type concurrency and sweep the serial/parallel structure of the implementation; thus user is architectural space with allocation and binding annotations to always in full control and can capture the architectural space produce implementations from fully serial to fully parallel. This effectively. These properties makes VESYLA an interactive allows user to stay at algorithmic level and guide the HLS tool to design tool where it utilizes the human developer’s guidance search a restricted architectural space bounded by the pragmas thus making the synthesis process more efficient and predictable. and yet following the “push button” methodology to generate the RTL implementation. Index Terms—High level synthesis; CGRA; Symbolic Assem- Main contributions of this paper are : (a) Design space bler; High Level Language; is easily explorable and various architectural solutions can be implemented with minimum efforts. (b) VESYLA hides I. I NTRODUCTION unnecessary low level details from the users by allowing it to DRRA (Dynamically Reconfigurable Resource Array), is work on the higher abstraction level; results in less chances a CGRA (Coarse Grain Reconfigurable Architecture) for of making mistakes and design time is reduced. (c) Developer implementing DSP applications. DRRA offers DLP (Data can exploit parallelization options available with ease using Level Parallelism), where large set of data is processed by pragmas. (d) A controlled HLS approach that would result in the same set of instructions in parallel threads like SIMD an optimal solution (discussed in “Related Work” Section). (Single Instruction Multiple Data). DRRA also allows MIMD In Section II, we have presented the related work, Section (Multiple Instructions Multiple Data) where multiple different III outlines the DRRA fabric, Section IV describes VESYLA, SIMD clusters are operating in parallel. Modems and Codecs Section V consists of experimental results and Section VI represent the physical layer in the ISO’s (International Orga- presents the conclusion and future work. nization for Standardization) 7 layer model and the functions used in these applications are DSP functions characterized by II. R ELATED W ORK high degree of regularity. Due to their regular structure, the In this section we will review some industry standard HLS computation can be divided into threads of data parallel tasks tools followed by few schemes for mapping algorithms on and we call these as a pattern in a DSP function. CGRAs. VESYLA (VEctorizing SYmbolic Language Assembler) is GAUT [5] is an open source HLS tool for DSP applications, a semi-automatic framework for implementing DSP functions which takes input in bit-accurate C format along with some on DRRA. VESYLA takes an untimed C specification of design constraints. The C specification is converted into DFG a DSP function and generates configware for DRRA after (Data Flow Graph) for extracting potential parallelism and data performing timing and synchronization synthesis; an activity dependencies. Lastly it generates the RTL after going through that is cumbersome, time consuming and error prone. The allocation, scheduling and binding tasks. developer makes and explicitly expresses critical implemen- [3] is an automatic synthesis tool which accepts code tation decisions on how many resources are allocated and described in ANSI C++ and few synthesis constraints which how the operators and operands are mapped to the allocated are used to explore the design space. Designer can guide resources. These activities are very much like HLS (High the synthesis procedure towards an optimal solution. Tool Level Synthesis), but in this case VESYLA user knows the generates RTL suitable for targeted hardware. targeted RTL structure and guides the tool towards it with the [4] takes System C modules as input and produce optimized allocation and binding pragmas. Allocation and binding being RTL for specified target technology identified by the user in the well known concepts from the HLS domain. VESYLA form of a .lib file. performs the scheduling, syntheses control and does the code [6] is based on a design environment where designer con- generation. trols the HLS and can change the synthesis decisions about 978-1-4244-8971-8/10$26.00 c 2010 IEEE
  • 2. > Z D scheduling, allocation and binding by using a GUI (Graphical d d Z & d ^ User Interface) at any stage. DRESC[8] focuses on loop level parallelization for different segments of application code and map them on a CGRA by using Modulo scheduling algorithms to achieve ILP for optimal performance. [7] is based on mapping the hyperops obtained from a DFG on to the CGRA. Code acceleration is achieved by run time reconfiguration of CGRA to accommodate these hyperops. [9] uses SUIF compiler framework for portioning the input code w.r.t. to available resources and Native Mapping Lan- guage generates XPP’s PE, which are automatically placed by their tool. Our approach is similar to traditional HLS tool but in our case search space is significantly reduced by explicitly identifying architectural elements with the help of pragmas d d / d ^ ^ ZZ ZZ and automating the synthesis procedure is much easier as compared to traditional HLS tools because of their extremely Figure 1. The DRRA PHY Layer Fabric Fragment large search space. Another key differentiator is that VESYLA outputs generate distributed controls i.e. it generates multi- ple FSMs implemented in the micro-coded sequencer with from the LATAA or inputs from other MAUSEEQAARS as each thread having it’s own control, being able to execute interrupts. independently or can synchronize with each other if required. d. SILSILAY (a Seamless, sliding window, circuit switched Traditional HLS techniques generates a thread as a single interconnect fabric) are regular non-blocking point-to-point, control scheme or a single FSM. point-to-multipoint, low latency interconnection network with sliding window connectivity which allows arbitrary parallelism III. DRRA A RCHITECTURE among large subsystems. The DRRA interconnect fabric al- This section briefly describes the DRRA architecture. Figure lows every resource to receive/send input/output to/from any 1 shows an instance of 7x2 DRRA fabric. Every DRRA cell other resource in its own column and 3 columns on each side consists of following components; [1], i.e., every LATAA and REFI is connected to 28 LATAAs a. LATAAs (Logic And morphable daTA path unit) are 16- and REFIs, including itself. bit data path unit which consists of : 1) Logic Partitions to deal with logical functions. 2) Arithmetic Partition to imple- IV. VESYLA ment commonly used signal processing algorithms like, MAC, VESYLA is based on subset of C language with some Symmetric FIR, FFT butterfly, Sum of difference, Different of restrictions imposed, like user is not allowed to use the sum, 4/2 input add/subtracts etc [2]. The mDPU also has 17 FILE handling features, creation of dynamic data structures, bits counter and 13 bits status register. The values of the status and dynamic functions available in C. The operators that registers are read by micro-coded state machines, which takes are allowed in VESYLA, are also restricted to those that decisions accordingly. correspond to the instructions offered by the DRRA LATAA. b. REFIs (REgister FIle) are 64 word 16 bit Register Files As the instructions of DRRA’s LATAA already correspond to with 2 read and 2 write ports. REFI has an AGU (Address the typical DSP operations like MAC, butterfly etc. and as the Generation Unit) that can generate address in vector mode, objective is to specify the RTL implementation of an algorithm circular buffer and in bit reversing modes for FFT. These and not the algorithm itself, this restriction is natural and not modes can execute once, some limited number of times or intrusive; the implementation in any case requires composing in infinite loop by using an arbitrary initial delay, a middle the algorithms in terms of DRRA operations. delay between each read/write and an end delay before the Information about allocation and binding is provided to loop iterates. VESYLA with the help of pragmas, which are essentially c. MAUSEEQAARs (Micro-coded hierarchical sequenc- the directives for the guidance to VESYLA for generating ing machine) controls all the resources in a DRRA cell. It a specific RTL architectural style. These pragmas are in sends instructions to the AGUs of the register file, selects the form of parameters and constraints and after analyzing LATAA modes, and configure the interconnects for proper them VESYLA takes the decision that how operands and operations. They can send output signals to each other for operators are mapped to the DRRA resources involved in control communication. They can also receive status bits the implementation. Topological relationship of the DRRA from mDPU. The MAUSEEQAAR has configurable interrupt resources is also part of the mapping specifications in the handling capabilities which allows user to configure the inputs pragmas. With the help of these pragmas, user specifies a
  • 3. pattern of a particular implementation. This pattern sweeps of cycles required for each operation. This synchronization is the implementation space in terms of degree of parallelism - achieved by inserting wait instructions, using delays provided from fully serial to fully parallel and number of SIMD/MIMD by AGU and counters present in LATAAs. VESYLA resolve threads. all these issues (related to synchronization of statements and These pragmas are categorized in form of dimensional and dependencies) and generates configware for programming the positional generics. a) Dimensional generics identifies the MAUSEEQAARs; contains instructions set for configuring dimension of the problem and the architecture. For example REFIs/LATAAs/SILSILAY in their respective modes. a FFT can be characterized in multiple ways by setting its Consider VESYLA code in Figure 3 which generates a FIR parameters to match the dimension to what is required by asymmetric Filter for N number of taps. This small piece a specific context (like WLAN would need 64 point FFT, of code can generate a Nth order filter with M degree of whereas DVB would need 4096 point FFT). Similarly one parallelism. Fully parallel implementation is achieved when has to decide the specific micro-architecture for that imple- M=N and when M=1 this filter is fully serial while anything mentation (like radix- 2/4/mixed and the number of butterflys in between is partially parallel. etc). Lastly number of resources to be used should also be taken into consideration. b) positional generics decides the E D Z D exact location/range of locations to be used in DRRA Fabric for that particular implementation (These things are discussed E D later in the section). E D ^ E D Figure 2 shows the complete flow of VESYLA HLS frame- D work. Developer specifies the behavioral specification as an ^ D Z D untimed C model and is responsible for defining the allocation and binding constraints. VESYLA analyzes each statement of D the code and builds a CDFG (Control Data Flow Graph). s D DE VESYLA performs an extensive DDA (Data Dependency Z/ Analysis) and RDA(Resource Dependency Analysis) phase D using scheduling information implied from relative ordering ^ of the statements. ^ d D E Figure 3. VESYLA pattern for Asymmetric FIR filters s^z VESYLA code is structured in two parts; First part consists W of declaration statements and the second part consists of functional statements. ' Z/ A. Declarative Statements W Statements in declarative section involve the pragmas and Z t /d ^ resources to be used. ^ / 1) Generics Declaration: Statement 1 declares VESYLA D ^ generics. Usage of generics N, M has already been discussed ^ above. Generics r c specifies the row and column indices in DRRA fabric. All indices of DRRA resources in this pattern D ^ ' are specified relative to r and c. Statement 3 identifies the number of columns to be used in DRRA architecture. s, 2) Resource Declaration: These statements allocate and ZZ d bind operands and operations using pragmas. “_REFI” “_LATAA” pragmas identify the location and number of Figure 2. VESYLA Framework resources to be used in DRRA fabric. Statements 4 to 8 are resource declaration statements. Statement 4,5 6 are VESYLA resolves dependencies while considering the allo- using the “_REFI pragmas” to specify that x (the delay line) cation/binding pragmas in resource/data dependency analysis and c (co-efficients) are to be placed in row r and will phase and makes the critical decision that which section occupy columns c to c+M-1; i.e., distribute x c across M of algorithm/code can be executed in parallel and which REFIS; the distribution is as equitable as possible.Statement part of the algorithm has to be sequentialized. After this 11 is specifying the number of LATAAs (mDPUs) required step, VESYLA schedules the sequences of operations and in DRRA fabric. Generics r constant colRange, indicates synchronizes the execution by calculating the exact number the exact location of the mDPUs in DRRA Fabric. This
  • 4. scheme allows a) serial, parallel trade-offs to be captured in ^ terms of generics like N and M and b) positioning a specific implementation anywhere in the fabric for an arbitrary value of r (and c). B. Functional Statements Functional statements consists of pre-defined functions for performing various operations. Statements 12 to 15 are the functional statements of VESYLA. Statement 12 computes the convolution sum using M macs and the values are held in the intermediate variables Lout, which are fed to an adder tree that sums up these values. Statement 15 implements a shift d line for shifting the samples in REFIs when the new sample X0 is arrived. vAsymMac is a pre-defined functions that performs the asymmetric MAC operations on a vector or slice of max k size created by the pre-defined (r)slice functions. VESYLA creates M parallel threads (using statement 12), each performing max k MACs. Similarly adderTree is also a pre- ^ defined function that corresponds to the 4 input adderTree Figure 4. Partially Parallel 125 Taps Asymmetric FIR Filter mode of LATAAs. Statement 15 implements the shift line. C. VESYLA Configware Generation VESYLA generates configware corresponding to specific E. Flexibility offered by VESYLA values of generics like N, M, r and c. This configware Primary target of any HLS tool is to meet the performance is at register transfer level of abstraction and has absolute by optimizing area/power constraints. VESYLA helps this timing and synchronization details that VESYLA synthesizes. by a controlled mechanism for HLS, where user can choose Suppose we want to compute convolution sum of 126 Taps any resource in DRRA Fabric. User can easily change the asymmetric FIR FILTER using only 5 computation threads allocation parameters in DRRA Fabric using pragmas (po- (partially parallel implementation) in DRRA fabric. This can sitional generics). In the example earlier, co-efficients and be achieved by using M=5 N=126 as shown in Figure 4. samples are sharing the same REFIs, but this can be easily Additionally assume r and c to be zero, which implies that changed by making the following small alteration as shown columns 0 - 4 are being used. Each REFI can store 25 samples in Figure 5. Although this change looks very simple but it except the last one which contains 26 samples. VESYLA will has a significant effect at low level, as in each SIMD thread program the micro-coded state machines in row 0 columns the bit stream pattern for MAUSEEQAARS are changed and (0-4) for generating these REFI instructions (Read, Write using interconnect mechanism between storage/functional unit is port A/B in streaming Modes), LATAA instructions (Asym. affected as well. Manually altering these bit streams at lower Mac mode, Adder Tree) accordingly. VESYLA will similarly level is a very difficult task (as shown in “Experimental deal with allocation of REFI locations for co-efficients ensur- Results”) and VESYLA unburdens the user from doing these ing that they are aligned with their corresponding samples. tedious jobs by using this simple mechanism. Instructions in these MAUSEEQAARs are issued sequentially and are synchronized by VESYLA using the delays provided E D by AGU and in each MAUSEEQAAR there are multiple FSMs being created, which includes streaming data to/from E D REFIS, consuming data for computational purpose in LATAAs including the Adder Trees. Figure 5. Simple Mechanism to Select Resources in DRRA D. VESYLA Optional Pragmas Inferences There are some optional parameters which programmer can One can switch to two different architectures, without mak- omit and VESYLA will still generate the correct functionality ing any significant changes in the code and with full control by inferring these parameters. It can be seen from the FIR filter over the degree of parallelism. This can be achieved by making codes that there is no information about the ports of REFI. few changes in VESYLA code for generating a different VESYLA chooses by itself and assign the unused ports of architectural solution for the same algorithm. Consider the that particular REFI. Similarly the address ranges mentioned code presented in Figure 6 which is generating symmetric Fir in the code are logical addresses which are resolved to physical Filter instead of asymmetric FIR Filter. Few changes are made addresses by VESYLA. If a user wants to use some specific in previous VESYLA code, as in symmetric FIR Filter samples physical addresses, then it should inform VESYLA by using are summed together and then multiplied by the coefficients. some additional syntax in pragmas. Similarly distribution of samples and coefficients over REFI
  • 5. are different as compared to the code for asymmetric FIR V. E XPERIMENTAL R ESULTS Filter. Samples for symmetric FIRs are divided into two halves Some of the above mentioned algorithms were implemented and are distributed over the same resource while co-efficients in VESYLA and were simulated using the generated config- are using a different REFIs. A 64 taps symmetric Fir Filter ware. Figure 8 shows the partial configware generated for using M = 1 N=64 will generate the configuration shown a single MAUSEEQAAR (FIR FILTER) with their instruc- in Figure 6. tion sets and the corresponding FSMs being generated. This MAUSEEQAAR represents an instance from a single thread E D of multiple SIMD threads and Figure 9 elaborates only one of ^ the FSM (instr. num. 5) executed by using VESYLA. FSMs W E E D related to Read/Write operations are similar to each other. It W D should also be noted that the manually mapped design would E D W D E W produce the same output as VESYLA. D ^ D D h^Y Z D D E / E / Z/ D s D / / / Z W / / W Z/ / Z Z/ W ^ ^ D / / ^ / / / D / D y ^ d D K E / / K / Z Z/ W / / ^ / / / D / t ^ ^ K :  / / Figure 6. Symmetric FIR filters in VESYLA / d / d K D K K K D / t Z/ W VESYLA can also exploit TLP, where multiple execution ^ / / K K / t ^ units are performing computations on data sets with different / D K :  set of instructions just like in MIMD. Consider the code pre- d sented in Figure 7 which will generate two threads executing independent of each other computing different instructions. Figure 8. Configware For N-Taps FIR Filter Here binding/allocation pragams are pointing to the exact row/column in DRRA fabric. We can use the same code to VESYLA programs each MAUSEEQAAR involved by ex- generate multiple SIMD threads with in each MIMD thread ecuting and synchronizing these FSMs properly. Imagine how by replacing the 0’s 1’s in pragmas with some generic complex the job would be in case of manually generating values; which will sweep the architectural space accordingly. this configware for each architectural choice with multiple There can be other variants, where multiple MIMD threads SIMD/MIMD threads. Chances of making mistakes and time can communicate with each other and VESYLA can handle for debugging the code goes up as complexity of the algorithm the complexity. increases. D/D d E E Z Z E E E ^ E E / ^ / ^ y Z ^ z z E / D y E D / z Figure 9. Single Read FSM Figure 7. VESYLA MIMD Code Threads Due to lack of space we cannot show the complete con-