53

Multi-Application Multi-Step Mapping Method
for Many-Core Network-on-Chips
Bo Yang∗ , Liang Guang∗‡ , Thomas Canhao Xu∗‡ , Alexander Wei Yin∗‡ , Tero S¨ ntti∗† , Juha Plosila∗†
a
∗ Department of Information Technology, University of Turku, Finland
† Academy of Finland, Research Council for Natural Sciences and Engineering
‡ Turku Center for Computer Science, Turku, Finland

{boyan, liagua, canxu, yinwei, teansa, juplos}@utu.fi

Abstract—Massive parallel computing performed on many- rithms in literature are analyzed and compared in [12]. Tang
core Network-on-Chips (NoCs) is the future of the computing. et al. proposed a two-step genetic algorithm and the related
One feasible approach to implement parallel computing is to software for mapping concurrent applications on a fixed NoC
deploy multiple applications on the NoC simultaneously. In this
paper, we propose a multi-application mapping method starting architecture [10]. Murali et al. presented a methodology to
with the application mapping which finds a region on the NoC map multiple use-cases onto the NoC architecture, satisfying
for each application and then task mapping which maps all the constraints of each use-case [13]. In these works multiple
tasks of the application into each region. In the application applications reuse the same platforms in different time slots.
mapping step, several strategies based on the maximal empty The main drawback of these systems is the timing overhead
rectangle (MER) technique are introduced for finding an optimal
region for each application. In the task mapping step, a tree- incurred by reconfiguring the NoC and loading new appli-
model based algorithm is used with the purpose of reducing the cations. Also, since various communication constraints and
communication latency and energy consumption. The experiment traffic characteristics of applications have to be satisfied using
results show that the proposed method can achieve considerable limited processing elements (PEs), the system design is more
reduction of network latency and energy consumption (up to complicated and the optimized mapping for each application
18%) for a given set of applications.
may not be achieved [13].
While the traditional approaches to maximize the serial
I. I NTRODUCTION
performance of processors by maximizing the clock speed
Over the last 40 years, we have witnessed a series of remark- and increasing instruction-level parallelism (ILP) are proved
able developments in computer industry. One of them is the to reach their limits [7] [8], the many-core NoC architectures
increasing processing capability of the system. The increase is provide more feasibility to deliver higher performance through
not only achieved by the performance improvements between parallel computing. The massive parallel computing performed
the generations of uniprocessors, but also comes from the on many-core NoCs is the future of computing [1]. With
advent of multi-core or many-core architectures where tens to increasing number and computational power of on-chip PEs,
hundreds of processors or cores can be integrated on a single the parallel computing on many-core NoCs can be realized no
chip. Examples of such architectures are [6] and [17]. A recent only at the instruction level, but also at the higher task and/or
study at the University of California, Berkeley [1] suggests that application levels. To realize the higher level parallelism and
it will soon be possible to integrate more than 1000 cores on make full use of the abundant resources on the NoCs, it is
a single chip since Moore’s Law is still generously delivering no longer reasonable to only focus on the implementation of
transistors at the rate of twice every couple of years. While single application with abundant PEs being available on the
the amount of on-chip cores increases, the communication many-core NoCs. Instead, the design focus should shift from
among them is critical to the system performance and energy the single-application to the multi-application scenarios. More
consumption. In the last decade, NoC has been proposed as precisely, multiple applications could be deployed on different
an alternative for the traditional bus and point-to-point adhoc regions of the NoC and executed in parallel.
connections in order to address the challenge of increasing In this paper, we propose a novel mapping method whereby
concurrent communication requirements as well as the diffi- multiple applications can be simultaneously mapped on the
culty of global synchronization [4]. many-core NoCs. The mapping method consists of application
Based on the NoC platforms, a large body of researches mapping and task mapping. The two-step mapping method
addressing the mapping problem has been undertaken in the first finds a region on the NoC for each application and then
last couple of years [9] [12] [10] [13]. In [9], Hu et al. maps all tasks of the application into the region. Several
presented a branch and bound algorithm which maps the strategies based on the MER technique are introduced for
tasks of a single application to nodes and generates a suitable finding an objective MER for each application. Following the
deadlock-free routing function such that the total commu- application mapping, a tree-model based algorithm is used to
nication energy consumption is minimized under specified map all tasks of the application into the objective MER. By
performance constraints. Several well used task mapping algo- optimizing the layout of both multiple applications and tasks

978-1-4244-8971-8/10$26.00 c 2010 IEEE

within applications, the proposed method aims at achieving Using these definitions, the problem of the multi-application
lower network latency and energy consumption for multiple mapping can be described as follows:
applications on the many-core NoCs. Given a set of TGs and a CRG, find a mapping area (MA)
on CRG for each TG which can accommodate all tasks of the
II. P ROBLEM F ORMULIZATION
TG, also find a position within the MA for each task such that
A. System Model the lowest overall network delay and communication energy
The target system is shown in Figure 1, consisting of a consumption can be achieved for the give set of TGs.
Real-time Operating System (RTOS) and a NoC platform. The C. Objective Formulization
NoC provides the computation and communication resources
to implement multiple applications. The RTOS schedules the Since the network delay is proportional to the communi-
given set of applications (e.g. A1 to A6 in Figure 1) and cation distance between the source and destination nodes on
manage the resources on the NoC. The mapper runs the pro- the NoC, one feasible way to reduce network delay is to
posed mapping algorithm to map each application on a feasible shorten the communication distance among tasks as much as
region and the loader loads all tasks on PEs according to the possible. This can be achieved in the process of finding the
mapping solution. This work deals with on-line scenarios, i.e., optimal MA for an application. We use the nodes average
the RTOS does not know in advance when each application distance (NAD) mentioned in [10] to evaluate the average
arrives and how much PEs they need. In this paper, we focus communication distance within the MA. NAD is defined as
on the mapping algorithm of the mapper. the average distance between two randomly selected nodes in
NoC architecture. For a X × Y mesh NoC, the NAD is:
X +Y 1
N AD = × 1− (1)
3 X ×Y
The Equation (1) implies that for a given application, the
average communication distance among tasks varies when
different areas are used to map the tasks of the application.
Fig. 1: System Model The more compact the area is, the smaller NAD it achieves.
B. Problem Description The energy consumption of a communication between tasks
ti and tj is determined by both the communication weight wij
In the single-application mapping scenarios, the mapping
and the distance |lij |. To reduce the communication energy
problem is how to find an appropriate position for each task
consumption, minimizing the weighted communication of the
of the application subject to particular performance or cost
application (WCA) has been proved to be efficient [18]. The
metrics. In the multi-application scenarios, the problem is
WCA is defined as the sum of products of the wij and |lij |
extended to search for the optimal positions for both the
for all communications in an application as follows:
applications and tasks of the individual application. We first
give the definitions regarding the target application and NoC W CA = wij × |lij | (2)
architecture used in this paper. ∀i,j
Definition 1: We assume that each application has already Based on these formulizations, the objectives of the pro-
been implemented as a set of tasks. The application is modeled posed method are transformed into seeking the most compact
by a task graph (TG). A TG is a directed graph TG = mapping area MA with smallest NAD and the optimized task
< T, C >, where T = {t1 , t2 , . . . , tp } represents the set mapping solution with minimized WCA.
of tasks, corresponding the set of TG vertices, and C =
{(ti , tj , wij )} denotes the set of communications between III. M ULTI -A PPLICATION M ULTI -S TEP M APPING
tasks, corresponding to the set of TG edges. The edge weight To reach the two goals mentioned in the previous section,
wij in (ti , tj , wij ) represents the total data amount, sent from we propose a two-step multi-application mapping method.
ti to tj . The number of tasks p in TG is denoted as the size The mapping consists of two sequential phases: application
of the given application. mapping (AM) and task mapping (TM). AM deals with
Definition 2: A NoC is modeled as a communication re- the mapping of multiple applications and its purpose is to
source graph (CRG). A CRG is a directed graph CRG = optimize the layout of multiple applications mapped on the
< N, L >, where N = {n1 , n2 , . . . , nq } denotes the set of NoC and find the optimal MA with the minimal NAD for each
nodes on the NoC, corresponding to the set of CRG vertices, application. TM works after AM to conduct the task mapping
and L = {(ni , nj , |lij |)} designates the set of routing path of an individual application and achieve the minimized WCA.
from node ni to node nj , corresponding to the edges of CRG.
|lij | represents the communication length from node ni to node A. Application Mapping (AM)
nj . The number of nodes q in CRG is denoted as the size of On a 2-D mesh NoC, any sub-mesh or rectangle can be
the NoC. For the sake of simplicity, in this paper, the NoC is regarded as a piece of compact area. Thus, the problem of
assumed to be a homogeneous 2-D using deterministic X-Y AM is turned into the problem of managing the rectangles
routing strategy. on the NoC. To do this, AM adopts the concept of maximal

empty rectangle (MER), which was originally used to solve smallest size, the one with minimal A(R) is selected.
the placement problem in FPGA design [2]. Best Shape Best Size (BShBS): Similar to the previous
•
1) MER Technique: A MER is a empty rectangle that is not one, among all candidate MERs with the same minimal
contained by any other empty rectangles. In our case, a MER A(R), the one with smallest size is selected.
represents a cluster of free nodes on the NoC that is used to Whenever an objective MER Rm is selected, AM will choose
map an application. Figure 2 shows an example of application a mapping area MA with minimal A(M A) in Rm to map the
mapping using the MER technique. At first, the whole surface given application. In this paper, we define the corner of the
of the NoC is represented by one MER R0 (Figure 2a). After objective MER Rm which is closest to any corner of the NoC
the mapping of application A1 , the R0 is split into R1 and R2 as the starting point to create the MA. The reasons behind this
(Figure 2b). In Figure 2c, the R1 is further fragmented into include to reduce fragmentations along the borders of the NoC
R3 and R4 after the application A2 has been mapped. The as well as to reduce the congestion in the middle area of the
MERs R2 , R3 and R4 can be used for the future application NoC by leaving free MERs there. The created area MA will
mapping. Let w(R) and h(R) be the width and height of the be returned as an input for TM phase.
MER R, the normalized aspect ratio A(R) of the MER R is To deal with the third case, the LS+C strategy is applied.
defined as: • Largest Size + Combining (LS+C): In this case, the
max{w(R), h(R)}
A(R) = (3) application has to be mapped on separate MERs. To avoid
min{w(R), h(R)}
increasing communication cost between more distant
The aspect ratio A(R) implies the shape of the MER. If it MERs with small size , LS+C chooses the free MER
equals 1, the MER is a square. Otherwise, it is a standard with largest number of PEs as the primary area and then
rectangle. combines the nearest free MERs to get adequate PEs
R3
for the application. The combined mapping area MA is
R4 returned as an input for TM phase.
A2
3) MER Merging: When the execution of an application
A1
completes, the area occupied by the application can be released
R2 and merged with neighboring free MERs to get larger MERs
(a) (b) (c) for the future mappings.
Combining these techniques and strategies together, the
Fig. 2: Application Mapping Using MER
algorithm of AM is described as Algorithm 1.
2) Objective MER Selection: For a given application with
the size p, AM tries to find an optimal or near-optimal Algorithm 1: Multi-Application Mapping
objective MER Rm to map the application. Based on the state Input : TGs: a set of applications, CRG: a 2-D mesh
of MERs on the NoC, the cases that AM possibly faces are: with size W × H
(1) the total amount of PEs in all MERs is not adequate to Output: The mapping areas for applications in A
accommodate the given application; (2) there is at least one 1 Initiate the original MERs list R0 with size W × H.
candidate MER that can accommodate the given application; 2 if the free PEs on the NoC can not accommodate the
(3) the total amount of PEs in all MERs is adequate to arriving application Ai then
accommodate the given application, but neither of them can 3 Reject the mapping request.
fit the application alone.
In the first case, the mapping request will be rejected at 4 else if More than one MER can accommodate Ai then
this time and the RTOS can try the mapping later. For the 5 Use appropriate strategy to select one objective MER
second case, we propose the following strategies for finding and create the mapping area MA.
the objective MER. 6 else
• Best Size (BS): BS chooses the candidate MER with the
7 Use the LS+C strategy to find a mapping area MA.
smallest size as the objective MER Rm . Intuitively, this 8 if application Aj is completed then
strategy tries to keep the big rectangles for the future 9 Merge the area occupied by Aj with neighboring free
application mapping. MERs;
• Best Shape (BSh): It is noteworthy in Equation (1) that, 10 Repeat 2-9 until MA for each application is found.
an area with the same width X and height Y holds
the minimal NAD among all areas with size X × Y . Figure 3 is an example of the application mapping using
Taking this into consideration, BSh strategy chooses the Algorithm 1. Four applications with size 25, 16, 16, 9 used in
candidate MER with the minimal A(R) as the objective the experiment in Section IV, denoted as FFT(25), X264(16),
MER Rm . The reason behinds BSh is that in such a MER, TPCH(16) and FFT(9) respectively, are mapped sequentially
the application is more likely to be mapped in a area close on a NoC with size 10 × 7. Figure 3a and 3b are the final
to square so that a smaller NAD can be achieved. mapping under the BS and BSh strategy respectively. The
• Best Size Best Shape (BSBSh): BSBSh is extended from main difference of these two mapping results is the transposed
BS. If there are several candidate MERs with the same locations of application X264(16) and TPCH(16). Under both

strategies, the LS+C strategy is used for the application The major responsibility of the AM algorithm is to manage
FFT(9). the MERs list. As mentioned in [2], the algorithm of managing
MERs is O n2 for n mapped applications.
C. Task Mapping (TM)
After the mapping area MA for a given application has been
obtained in the AM phase, the role of TM is to map the
tasks of the application with the purpose of minimizing the
W CA of the application. To address the task mapping, we
(a) BS Mapping (b) BSh Mapping propose a tree-model based mapping algorithm. The mapping
Fig. 3: Application Mapping Using BS and BSh algorithm consists of two parts: the abstraction of a mapping
area MA into an extended tree structure and the mapping of
an application onto the extended tree. Figure 5 is an example
of mapping the tasks of an application A2 (shown in Figure
2c) on the selected MA.
Mapping Area (MA)

(a) BSh Mapping for TPCH(16) (b) WNAD Mapping
A2
Fig. 4: Application Mapping Using WNAD

B. Weighted NAD Task Graph (TG) Step 1
Abstraction
Abstracted Tree
Step 2

In Algorithm 1, the MERs which can’t accommodate the T2 80 10
Step 4

Step 3
T4
20 30 T5
given application would not be selected as an objective MER T1 T4 T7
T2 T7 T6 T3
20 60
Rm as long as there are candidate MERs, although some of 30
T3
40
T6 20

50
Step 7 T1 T5
them hold a smaller A(R) than the selected Rm and can Step 6
Step 5

accommodate most tasks of the application. Figure 4a is an Tx Task Tx
Mapped Spare
Node Node
example of application mapping under the BSh strategy. After
Fig. 5: Tree-Model Based Task Mapping
the application FFT(25) and X264(16) have been mapped, the
1) Tree Model of MA: The abstraction of a MA into an
candidate MER R1 is selected (shown in Figure 3b), although
extended tree structure follows Algorithm 2. Simply put, the
the non-candidate MER R2 with the better shape and close
center point of the MA is chosen as the root node of the tree,
size (15) for the application TPCH(16). This is because the
which has the shortest average distance to other nodes in the
combination of several separated MERs is likely to induce
MA. The neighbors of the center point are put as the children
higher NAD and WCA than a monolithic MER. However, if
nodes of the root node. The procedure continues until all nodes
the task mapping algorithm presented in the following section
in the MA are put onto the tree (bottom right of Figure 5). The
is taken into account, it is reasonable to accept some non-
structure is called an extended tree since some children may
candidate MER as the objective MER on which most of the
have more than one parent node. This extended tree structure
tasks are able to be mapped. Since the task mapping always
places the network nodes with shorter average distance (to
chooses the task which affects the WCA most and maps it prior
other nodes) onto higher-level tree nodes. Intuitively, task in
to other tasks, the last selected tasks have limited impact on
the application with a large communication volume should be
the overall WCA even if they are mapped on separate MERs.
placed on as high level on the tree as possible, in order to
Therefore, we propose another strategy for the objective MER
minimize the total communication cost which is proportional
selection, termed as weighted NAD (WNAD). The WNAD of
to the average communication distance.
a MER is defined as follows:
Ntasks Algorithm 2: Tree Abstraction Algorithm
W N AD = × N AD (4)
Nnodes Input : mapping area MA
Output: An extended tree abstraction
where the first factor is the weighted ratio. Ntasks is the
number of tasks in the application. Nnodes is the number of 1 Select the center network node as the root node in the
nodes occupied by the tasks if the application is mapped on tree;
the MER. For a candidate MER, the weighted ratio equals to 2 Traverse the NoC from the center node, record all its
1 and the WNAD strategy is equivalent to the BSh strategy. neighbors as the child nodes;
The MER with a lower WNAD can accommodate more tasks 3 Repeat 2 for each child node until all nodes are in the
with a smaller NAD. Using the WNAD strategy, both the tree.
candidate and non-candidate MERs presented in the previous 2) TM Algorithm: The mapping of applications onto the
strategies can be evaluated together to find the objective MER. tree follows Algorithm 3. We calculate the communication
The Figure 4b is an example of using WNAD strategy to map volume (CV , Definition 3) of each task in the task graph,
the same set of applications as in Figure 3. and place the task with the largest communication volume

onto the root node in the tree. Then we calculate the weighted A cycle-accurate NoC simulator, Noxim [14], was extended
communication volumes of the remaining tasks to the ones and used to simulate the four applications’ traffics on a
already mapped onto the tree, termed as affinity to partial tree 10 × 7 NoC and produce network delay and communication
(AP T , Definition 4), and place the task with the largest AP T energy consumption under different mapping strategies. The
to the highest node available in the tree. This procedure iterates workload traces of these four applications were gathered from
until all tasks have been mapped onto the tree. Simics [11] where the NoC was configured to model a chip
Definition 3: Let ti be a task in the task graph TG, and cij multiprocessor (CMP). Each PE has a core, a private L1
be the communication volume from ti to tj , then cache and a shared L2 cache bank. Memory controllers are
connected to the top and bottom side of the chip. The static
CVti = (wij + wji ) non uniform cache architecture (NUCA) [3] is implemented
∀tj ∈T in our memory/cache architecture, in which data are mapped
CVti is the communication volume of ti . to cache banks statically.
Definition 4: Let T be the set of mapped tasks on a tree, The average communication distance (ACD), WCA, average
and ti be a task not yet mapped, then network latency (ANL), energy consumption (EC) under the
different strategies were compared. The ACD is the average
AP Tti = (wij + wji ) communication distance among all tasks when the application
∀tj ∈T is mapped on the selected objective MER. The ANL is the
average number of cycles needed for transferring one packet
AP Tti is the affinity of ti to the partial (mapped) tree.
on the NoC.
Algorithm 3: Task Mapping Algorithm
B. Results Analysis
Input : TG, Abstracted Tree of MA
Output: Task Mapping on the Tree

Average Communication Distance (ACD:hops)
4 BS+GI
BS+Tree

1 Calculate CV for all tasks, and map the task with the 3.5

3
BSh+GI
BSh+Tree
WNAD+GI
WNAD+Tree
largest CV onto the root node; 2.5

2 Calculate AP T of all non-mapped tasks, and map the 2

1.5
task with the largest AP T to the highest level tree node 1

available; 0.5

0
3 repeat 2 until all tasks have been mapped. FFT(25) X264(16) TPCH(16) FFT(9)

Fig. 6: ACD Using Different Strategies
The tree-model based mapping has low complexity and high
Weighted Communcation of Application(WCA)

100% BS+GI
efficiency. For instance, compared to the greedy incremental BS+Tree
BSh+GI
BSh+Tree
80%
(GI) algorithm presented in [12], the tree-based mapping has WNAD+GI
WNAD+Tree

an algorithm complexity of O(N ), where N is the number 60%

of tasks in TG, while the GI algorithm has an algorithm 40%

complexity of O(N 2 ). By mapping tasks starting from the root 20%

of the tree, the algorithm minimizes the W CA using the AP T 0%
FFT(25) X264(16) TPCH(16) FFT(9)
method and consequently reduces the energy consumption and
network delay. Fig. 7: WCA Using Different Strategies
Figure 6 shows the ACD for each application under different
IV. EXPERIMENT
strategies. BS+GI and BS+Tree respectively represent the
A. Experiment Setup cases where BS is applied to the application mapping and the
Full system simulations were performed to evaluate the GI and tree-model based algorithm to the task mapping, and
proposed method under different mapping strategies. Since the so forth. The variant ACDs of the X264, TPCH and FFT(9)
comparison in [12] shows that the GI algorithm achieves good under different strategies show the impact of objective MER
results compared with some other algorithms, the GI algorithm on the ACD. For each of them, the optimal ACD is achieved
was chosen as a reference to evaluate the tree-model based when they are mapped on an objective MER with minimal
algorithm used in task mapping. The tree-model based and GI aspect ratio A(R), or intuitively, a rectangle close to square.
algorithms were used together with the BS, BSh and WNAD Non-optimized mappings result highest ACD for X264 (3.14,
strategies of application mapping to compare the performance 25% higher than the optimal 2.49), TPCH (3.14, 25% higher
of these strategies. Four benchmark applications were selected, than the optimal 2.49) and FFT (9) (2.49, 23% higher than the
three of them are from the SPLASH-2 [15] and PARSEC optimal 2.03). For most applications, the WNAD can obtain
[5] suite: FFT with 25 and 9 cores (FFT(25) and FFT(9) the same or better solution than that of BS and BSh strategy.
respectively), X264 with 16 cores (X264(16)). Another TPC- The only exception is the case of TPCH where one primary
H with 16 cores (TPCH(16)) is an ad-hoc, decision support MER combing another MER is selected for the mapping under
benchmark from TPC [16]. The mapping was conducted in the WNAD strategy, instead of a monolithic MER under the
order of FFT(25), X264(16), TPCH(16) and FFT(9). BS and BSh strategy. This does prove the negative impact

of separate MERs on the ACD. Also note, the task mapping for application mapping, the WNAD is likely to obtain a better
algorithm has negligible impact on the ACD. solution than BS and BSh. For the task mapping, the proposed
The normalized WCA of each application under different tree-model based algorithm outperforms the GI algorithm on
strategies is displayed in Figure 7. The impact of the applica- achieving lower network latency and energy consumption.
tion mapping on the WCA keeps consistent with that on the WNAD+Tree strategy achieves lowest network latency and
ACD (shown in Figure 6). However, it is noteworthy that the energy consumption among all strategies.
task mapping has a great impact on the WCA. In all cases, VI. ACKNOWLEDGEMENT
the tree-model based algorithm outperforms the GI algorithm.
The authors would like to thank the Academy of Finland
For example, using the WNAD+Tree strategy, the tree-model
for the financial support for this work.
based algorithm achieves 35%, 17%, 16% and 32% lower
WCA than the GI algorithm for each application. Furthermore, R EFERENCES
WNAD+Tree contributes the lowest WCA for each application [1] Krste Asanovic, Ras Bodik, Bryan C. Catanzaro, Joseph J. Gebis, Parry
among all strategies. Husbands, Kurt Keutzer, David A. Patterson, William L. Plishker, John
100% GI Shalf, Samuel W. Williams, and Katherine A. Yelick. The landscape of
Tree−Model Based
parallel computing research: a view from berkeley. (UCB/EECS-2006-
Average Network Latency (ANL)

80%
183), December 2006.
60% [2] K. Bazargan, R. Kastner, and M. Sarrafzadeh. Fast template placement
for reconfigurable computing systems. Design Test of Computers, IEEE,
40%
17(1):68 –83, jan-mar 2000.
20% [3] Bradford M. Beckmann and David A. Wood. Managing wire delay in
large chip-multiprocessor caches. In Proceedings of the 37th annual
0%
BS BSh WNAD IEEE/ACM International Symposium on Microarchitecture, pages 319–
Fig. 8: ANL Using Different Strategies 330, December 2004.
100%
[4] L. Benini and G. De Micheli. Networks on chips: a new soc paradigm.
GI
Tree−Model Based Computer, 35(1):70–78, Jan 2002.
[5] Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The
Energy Consumption (EC)

80%

60%
parsec benchmark suite: characterization and architectural implications.
In Proceedings of the 17th international conference on Parallel archi-
40% tectures and compilation techniques, pages 72–81, October 2008.
20%
[6] M. Denneau and H. S Warren, Jr. 64-bit cyclops: Principles of operation.
IBMTech-report, 2005.
0%
BS BSh WNAD [7] P.P. Gelsinger. Microprocessors for the new millennium: Challenges,
Fig. 9: EC Using Different Strategies opportunities, and new frontiers. In Proceedings of The International
Solid State Circuits Conference (ISSCC), pages 22–25, 2001.
The normalized simulation results of the ANL and the EC [8] J. Hennessy and D. Patterson. Computer Architecture: A Quantitative
are demonstrated in Figure 8 and 9. As anticipated by the Application, 4th Edition. Morgan Kauffman, 2007.
WCA in Figure 7, the tree-model based algorithm achieves [9] Radu Marculescu Jingcao Hu. Energy- and performance-aware mapping
for regular noc architecture. IEEE Transations On Computer-Aided
lower ANL and EC than the GI algorithm. The ANL of tree- Design of Integrated Circuits and Systems, Vol.24, No.4:551–562, 2005.
model based algorithm is 12%, 15%, 13% lower than that of [10] Tang Lei and Shashi Kumar. A two-step genetic algorithm for mapping
the GI under BS, BSh and WNAD strategies respectively. The task graphs to a network on chip architecture. In DSD ’03: Proceedings
of the Euromicro Symposium on Digital Systems Design, page 180,
same achievements keeps for the EC. Furthermore, WNAD Washington, DC, USA, 2003. IEEE Computer Society.
strategy outperform the BS and BSh and achieves lowest [11] P.S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg,
ANL and EC (about 5% lower in average ). For this set of J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full
system simulation platform. Computer, 35(2):50–58, February 2002.
applications, the difference between BS and BSh is negligible [12] C.A.M. Marcon, E.I. Moreno, N.L.V. Calazans, and F.G. Moraes.
with respect to the ANL and EC. The lowest ANL and EC Evaluation of algorithms for low energy mapping onto nocs. In Proc.
are achieved by WNAD+Tree which is 18% lower compared IEEE International Symposium on Circuits and Systems ISCAS 2007,
pages 389–392, 2007.
with the worst case under BSh+GI strategy. [13] Srinivasan Murali, Martijn Coenen, Andrei Radulescu, Kees Goossens,
and Giovanni De Micheli. Mapping and configuration methods for multi-
V. C ONCLUSION use-case networks on chips. In ASP-DAC ’06: Proceedings of the 2006
An innovative method for multiple applications mapping on Asia and South Pacific Design Automation Conference, pages 146–151,
Piscataway, NJ, USA, 2006. IEEE Press.
the future many-core NoC is proposed. The two-step mapping [14] University of Catania. Noxim. http://www.noxim.org/.
method first finds a region on the NoC for a given application [15] Jaswinder Pal Singh, Anoop Gupta, Moriyoshi Ohara, Evan Torrie, and
and then maps all tasks of the application into the region. Steven Cameron Woo. The splash-2 programs: Characterization and
methodological considerations. Computer Architecture, International
Several strategies based on the MER technique, e.g. BS, Symposium on, 0:24, 1995.
BSh and WNAD are introduced for the application mapping. [16] TPC. Tpc-h. http://www.tpc.org/tpch/.
By using these strategies, the algorithm can efficiently find [17] S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz,
D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts,
the optimal objective MER to map the target application. Y. Hoskote, N. Borkar, and S. Borkar. An 80-tile sub-100-w teraflops
Following the application mapping, a tree-model based algo- processor in 65-nm cmos. Solid-State Circuits, IEEE Journal of,
rithm is proposed for the task mapping and compared against 43(1):29–41, 2008.
[18] Bo Yang, Thomas Canhao Xu, Tero Santti, and Juha Plosila. Tree-model
an existing GI algorithm. The experiment shows that in a based mapping for energy-efficient and low-latency network-on-chip. In
common case, the MER with minimal aspect ratio is ideal for Design and Diagnostics of Electronic Circuits and Systems (DDECS),
mapping a given application. Among the proposed strategies pages 189 –192, 14-16 2010.

53

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie 53

Ähnlich wie 53 (20)

Mehr von srimoorthi

Mehr von srimoorthi (20)

53