Design and Performance Analysis of 8 x 8 Network on Chip Router
Public Seminar_Final 18112014
1. Hossam El-Sayed Abdel-Fadeel
M.Sc. Student, ECE department, E-JUST,
Research Assistance, NTI
email: hossam.fadeel@ejust.edu.eg
hossam.fadeel@nti.sci.eg
Supervised by:
Prof. M. Ragab, Assoc. Prof. Maha El-Sabarouty,
Assoc. Prof. V. Goulart, and Assist. Prof. Mohammed Sharaf
December 2,
2013
1
2. • MOTIVATION
• RELATED WORK
• BASE ROUTER ARCHITECTURE
• FLEXIBLE ROUTER ARCHITECTURE
• EVALUATION AND EXPERIMENTS
• CONCLUSION
December 2,
2013
2
Outline
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
3. • Process technology scales
Transistor densities increases.
Many Processing Elements in a single chip.
BUT, also global wiring delays increases. (wire speed not scaling)
Performance of Digital Systems increases in terms of computation.
• Design concept
Many Processing Elements (PEs) need to be interconnected.
Need a structured and scalable on-chip communication architecture.
Computation-centric design.
Communication-centric design.
December 2,
2013
3
Why Network on Chip?
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
4. • NoC = Routers + Links.
– Network topology (how the nodes are connected
to each other)
– Routing algorithm (how packets move: source
destination)
– Flow control (controls the transmission of packets
between routers)
– Router architecture (Buffers, Arbiters, Crossbars,
..etc.)
• Buffer Requirements in a Router
– Stores arriving Packets or flits.
December 2,
2013
4
What is Network on Chip (NoC) ?
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
5. Buffers in NoC Routers
• Why buffering ?
Wait for routing decisions.
Contention for the same output channel.
Congested downstream router.
• Large buffers improve Throughput and Latency.
• BUT, in cost of
– Area: High hardware resource overhead
– Power: Large energy consumers about 64% of the total router leakage
power .
• Need efficient ways to use buffer resources
– Through Perfect management of available buffers.
• Several architectures and implementations were proposed .
December 2,
2013
5
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
6. Related Work
• Central Buffer Sharing Method
– All ports share a central buffer
– Improves the performance but at
the cost of
• Area overheads
• Complexity of control
• Distributed Shared Buffer
– Shows improvement in the
throughput but in cost of
• Power and
• Area overhead.
December 2,
2013
6
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
7. • Improve the performance of the overall network.
– Modifying the Router Architecture
• Using the same amount of available buffers in more efficient way.
– If there is a contention at any input port, the Flexible
Router will try to allocate any suitable free buffer in other
input ports in the router.
– No need to increase the size of buffers or to use extra
virtual channels (VCs)
December 2,
2013
7
Flexible Router Approach (1/3)
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
8. December 2,
2013
8
Flexible Router Approach (2/3)
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
Base Router Congestion Problem
Busy
Busy
E
W
S
N
Packets requesting busy
buffer will be blocked
9. December 2,
2013
9
Flexible Router Approach (3/3)
Instead of waiting
busy buffer to be free
look for another one.
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
Increases packets
moving through router
Features of Flexible Router
Busy
The design of Flexible Router similar to the
base router except the added functionality
and modules to the input ports.
BusyBusy
Efficient buffer
utilization
Enhance Packets
throughput
Low hardware resource
overhead
E
W
S
N
10. December 2,
2013
10
Base Router Architecture
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
11. December 2,
2013
11
Input Port Module
RC
RoutingComputation
FIFO buffer
ReqUpStr ReqInt(3:0)
FIFO
Controller
GntUpStr GntInt(3:0)
ReqInCnt
GntInCnt
EmptyFull
PacketIn PacketInCnt IntPacket
ReadEn
WriteEn
ReadAddr
WriteIAddr
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
12. December 2,
2013
12
Output Port Module
Mux
Round Robin
Arbiter
gnt[1:0]
ReqInt (3:0)
GntInt (3:0) ReqDnStr
GntDnStr
fullDnStr
PacketIn 0
PacketOut
PacketIn 1
PacketIn 2
PacketIn 3
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
13. December 2,
2013
13
Basic operation of Base Router
Receiving flowchart of
Down Stream
Sending flowchart
of Up Stream
OutputPort
UpStreamRouter(US)
Full_US
Request_US
Grant_US
PacketIn_US
InputPort
DownStreamRouter
(DS)
InputPort
UpStreamRouter(US)
Full_DS
Request_DS
Grant_DS
PacketOut_DS
OutputPort
DownStreamRouter
(DS)
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
14. December 2,
2013
14
Flexible Router Architecture
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
15. December 2,
2013
15
Input Port Module
RC
RoutingComputation
ReqUpStr ReqInt(3:0)
GntUpStr
GntInt(3:0)
ReqInCnt
GntInCnt
Empty
Full
PacketIn IntPacket
ReadEn
WriteEn
ReadAddr
WriteIAddr
FIFO buffer
FIFO
Flexibility Controller
Req_FFCE_FIFO_W,N,S
Gnt_FFCE_FIFO_W,N,S
MUX
EastPacket
Packets From
Other Ports
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
16. December 2,
2013
16
Basic operation of Flexible
Router
The FFC requests other FIFOs in
a sequential order.
pseudo code for East FFC :
if (FIFO West is not full)
{Send Request and wait Grant ;}
else if (FIFO North is not full)
{Send Request and wait Grant ;}
else if (FIFO South is not full)
{Send Request and wait Grant ;}
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
17. • By applying the turn model on the Flexible router working under XY
routing we can avoid deadlock.
• Under XY routing, possible packet directions that each buffer can store
in the Flexible router are as follows:
– North buffer:
• Can contain packets directed to Local or South.
– South buffer:
• Can contain packets directed to Local or North.
– East buffer:
• Can contain packets directed to Local, North, South, or West.
– West buffer:
• Can contain packets directed to Local, North, South, or East.
December 2,
2013
17
Possible Packet Directions
Packets directed to the local port they reach their destination and are
absorbed directly with the local port.
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
18. • NoC parameters used in this work :
– A 64-bit 5-input-buffer router
18
System Architecture
Selected Logic Why used
Arbitration Round Robin Fairness
Switching Store-And-Forward For simplicity and prove of concept.
Routing Algorithm XY- DOR Routing Minimize area and control overhead.
Deadlock free routing
Topology Mesh Most common for 2D chips
Packet/Flit size 64 bits Can be vary from 32 to 264 bits
Buffer Size 2,4,8 Packets Small to see the utilization
Traffic Patterns Uniform Random, Hot-Spot and
Nearest-Neighbor
For performance evaluation
December 2,
2013
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
19. 19
XY - Dimension-Ordered Routing
(XY - DOR)
S
D
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
20. • Latency is the time elapsed since a particular packet enters
the network until its last packet reaches its destination.
• Throughput is the rate at which packets are delivered by the
network for a particular traffic pattern. .
• There are many factors that a affecting these parameters
– Topology: determines the connecting form of the system and the size,
or the number of nodes.
– Injection rate: the rate at which packets are injected into the
simulator, tell the simulator how many packets to inject per simulation
cycle per nodes on an average.
– Flow control: It refers to the number of virtual channels per physical
channel and the depth of each virtual channel; the unit is flit.
December 2,
2013
20
Performance Parameters
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
21. • A cycle-accurate NoC simulation system in Verilog
HDL is developed to evaluate the performance of
Flexible Router.
• Synthesis Environment:
– XILINX ISE 14.1 Target platform – XILINX Virtex-5 xc5vfx70t-
1ff1136 FPGA.
– Cadence SoC Encounter ® Digital Implementation System,
with 180nm technology. (Encounter RTL Compiler®)
December 2,
2013
21
Evaluation Approach
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
22. December 2,
2013
22
Simulation Platform (1/3)
PE Information Where Function
Send Time Sender Log Cycle counter of each sent packet
Receive Time Receiver Log Cycle counter of each received packet
PE Module Sender ID Sender and Receiver Log The PE Module ID of the Sender
PE Module Receiver ID Receiver Log The PE Module ID of the Receiver
Packet ID Sender and Receiver Log The ID of the transmitted Packet
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
Packet Injector Flow Chart
23. 23
Simulation Platform (2/3)
Verilog RTL
Model
Verilog
Testbench
Simulation
Compiler
Simulation
Results
Log FilesWaveform Matlab
Matlab calculates the following:
Average Latency for all the packets in
the simulation system.
Average Throughput for all the packets
in the simulation system.
December 2,
2013
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
Simulation
Graphs
Modelsim
or ISim
24. • Most performance analysis used synthetic traffic patterns
with different characteristics.
• Simulation done under 3 different traffic patterns:
– Uniform (UNI): all the traffic is equally distributed between all nodes.
This is the most commonly used traffic pattern for network
evaluation because it is straightforward to implement, it makes no
assumptions about the application.
– Nearest-Neighbor (NN): any node sends only to its neighbor nodes.
– Hotspot (HS): 90% of the traffic is directed to the hotspot node at (2,
2) and the rest of the traffic is equally distributed between all other
nodes.
December 2,
2013
24
Simulation Platform (3/3)
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
26. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
12
12.5
13
13.5
14
14.5
15
Buffer Size = 2
Packet Injection Rate (Packets/Cycle/PE)
AverageLatency(Cycles)
Base Router
Flexible Router
December 2,
2013
26
Nearest Neighbor Traffic
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
12
12.5
13
13.5
14
14.5
15
15.5
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
Base Router
Flexible Router
0 0.05 0.1 0.15 0.2
12
12.5
13
13.5
14
14.5
15
15.5
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
Buffer Size = 8
Flexible Router
Base Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
0
0.05
0.1
0.15
0.2
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2
0
0.05
0.1
0.15
0.2
Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
0
0.05
0.1
0.15
0.2
Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
The traffic characteristics of Nearest Neighbor has that each
injector only injects packets to its neighbors so the utilization
of buffer makes the throughput to perform as a linear
function that all injection served by the routers and no
congestion happens to affect the throughput.
27. December 2,
2013
27
Hot Spot Traffic
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.005 0.01 0.015 0.02 0.025 0.03
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
20
40
60
80
100
120
140
Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
BR-BUF-2
FR-BUF-2
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
0
50
100
150
200
250
300
350
400
450
500
Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
BR-BUF-8
FR-BUF-8
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016
20
40
60
80
100
120
140
160
180
200
220
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
BR-BUF-4
FR-BUF-4
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
The slight improvement in HS, except of increasing saturation
point, is because the HS packets are injected faster than
they can be collected, furthermore HS packets acquire all
network buffer spaces.
It could be one of our future work on modifying the
architecture of FR to be suitable for such kind of this type of
traffic.
28. • Using Xilinx ISE® Synthesis Tool (XST) targeting Virtex-5 FPGA, xc5vfx70t-
1ff1136.
– The area and maximum frequency results of both Flexible and Base Routers
– The increase in area is accepted due to the added logics for flexibility and FPGA
resources.
December 2,
2013
28
Synthesis Results (1/2)
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
FPGA resources
Number of resources used
Base Flexible
BUF2 BUF4 BUF8 BUF2 BUF4 BUF8
LUTs 657 776 836 1078 1111 1112
FFs 425 430 440 473 474 493
AREA RESULTS OF XILINX FPGA
FPGA resources Base Flexible
BUF2 BUF4 BUF8 BUF2 BUF4 BUF8
Max Frequency (MHz) 164 150 150 141 139 141
FREQUENCY RESULTS OF XILINX FPGA
Max Clock Frequency decreased in Flexible
router due to the Flexibility units but the
Performance of Flexible Router in terms of
throughput and Latency overcome this impact.
29. December 2,
2013
29
Synthesis Results (1/2)
Configuration
Area in Cells Power in µW
Cell area Leakage Switching
Base 557963.68 1.015 51421.04
Flexible 661936.7 1.15 56372.29
Overhead 18 % 13 % 9.6 %
• Using Cadence Encounter RTL Compiler tool and 180nm
standard cell library.
– The power dissipation and area overhead are obtained for each case
at a typical operating conditions for 180nm technology.
• 25o C, 1.8 Volts, Typical Transistor Model.
– Both dynamic and leakage power estimates were extracted from the
synthesized router implementation, assuming a 50% uniform switching
activity on all router input ports.
AREA AND POWER RESULTS FOR 180NM TECHNOLOGY
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
30. • Experiment results show that Flexible Router
• Increase in the throughput
• Reduce the latency
• @low injection rates both Base and Flexible routers have nearly the same
performance.
• @ high injection rates Flexible has better performance, hence the propriety of
flexibility used.
• Flexible router has saturation point higher than that of the Base router.
• For UNI traffic there is 15% allows higher injection rate, in addition to
improvement in the performance at higher rates.
• For HS and NN it is a small improvement (increasing saturation point), specially
for HS.
– For HS, regards to that HS packets injected faster than they can be collected,
furthermore HS packets acquire all network buffer spaces.
– As the traffic characteristics of NN where each injector only injects packets to its
neighbors so the utilization of buffer makes the throughput to perform as a linear
function that all injection served by the routers and no congestion happens to
affect the throughput.
December 2,
2013
30
Analysis
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
31. • Decrease the communication overhead due to FFC
• Support hot Spot traffics by modifying the FFC.
• Implement and evaluate the Flexible Router for:
Virtual Channels.
Other switching techniques like Virtual Cut-Through and Wormhole.
• Explore Flexible Router to support 3-D Network on Chip.
• More real-world example implementations
• The support for dynamically reconfigurable system
December 2,
2013
31
Future Work
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
32. [1] Hossam El-Sayed, Mohammed Ragab, Mohammed S. Sayed, and Victor
Goulart, “ Hardware Implementation and Evaluation of the Flexible Router
Architecture for NoCs,” 20th IEEE-ICECS International Conference on
Electronics, Circuits, and Systems, UAE, Dec. 2013. (Accepted As Lecture).
[2] Hossam El-Sayed, Ahmed Shalaby, Mostafa Said, Mohammed S. Sayed,
Mohammed Ragab and Victor Goulart, Performance Evaluation of Flexible
Router Architecture for NoCs,” 24th International Conference on Field
Programmable Logic and Applications, Munich, Germany; September 2 - 4,
2014. (Submitted).
December 2,
2013
32
Published Papers
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
33. December 2,
2013
33
Acknowledgment
Motivation
Related Work
Base Router Architecture
Flexible Router Architecture
Evaluation and Experiments
Conclusion
Maher Abdelrasoul
Ahmed Shalaby
Mostafa Said