SlideShare a Scribd company logo
1 of 35
Hossam El-Sayed Abdel-Fadeel
M.Sc. Student, ECE department, E-JUST,
Research Assistance, NTI
email: hossam.fadeel@ejust.edu.eg
hossam.fadeel@nti.sci.eg
Supervised by:
Prof. M. Ragab, Assoc. Prof. Maha El-Sabarouty,
Assoc. Prof. V. Goulart, and Assist. Prof. Mohammed Sharaf
December 2,
2013
1
• MOTIVATION
• RELATED WORK
• BASE ROUTER ARCHITECTURE
• FLEXIBLE ROUTER ARCHITECTURE
• EVALUATION AND EXPERIMENTS
• CONCLUSION
December 2,
2013
2
Outline
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• Process technology scales
 Transistor densities increases.
 Many Processing Elements in a single chip.
 BUT, also global wiring delays increases. (wire speed not scaling)
 Performance of Digital Systems increases in terms of computation.
• Design concept
 Many Processing Elements (PEs) need to be interconnected.
 Need a structured and scalable on-chip communication architecture.
 Computation-centric design.
 Communication-centric design.
December 2,
2013
3
Why Network on Chip?
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• NoC = Routers + Links.
– Network topology (how the nodes are connected
to each other)
– Routing algorithm (how packets move: source 
destination)
– Flow control (controls the transmission of packets
between routers)
– Router architecture (Buffers, Arbiters, Crossbars,
..etc.)
• Buffer Requirements in a Router
– Stores arriving Packets or flits.
December 2,
2013
4
What is Network on Chip (NoC) ?
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
Buffers in NoC Routers
• Why buffering ?
 Wait for routing decisions.
 Contention for the same output channel.
 Congested downstream router.
• Large buffers  improve Throughput and Latency.
• BUT, in cost of
– Area: High hardware resource overhead
– Power: Large energy consumers about 64% of the total router leakage
power .
• Need efficient ways to use buffer resources
– Through Perfect management of available buffers.
• Several architectures and implementations were proposed .
December 2,
2013
5
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
Related Work
• Central Buffer Sharing Method
– All ports share a central buffer
– Improves the performance but at
the cost of
• Area overheads
• Complexity of control
• Distributed Shared Buffer
– Shows improvement in the
throughput but in cost of
• Power and
• Area overhead.
December 2,
2013
6
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• Improve the performance of the overall network.
– Modifying the Router Architecture
• Using the same amount of available buffers in more efficient way.
– If there is a contention at any input port, the Flexible
Router will try to allocate any suitable free buffer in other
input ports in the router.
– No need to increase the size of buffers or to use extra
virtual channels (VCs)
December 2,
2013
7
Flexible Router Approach (1/3)
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
8
Flexible Router Approach (2/3)
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
Base Router Congestion Problem
Busy
Busy
E
W
S
N
Packets requesting busy
buffer will be blocked
December 2,
2013
9
Flexible Router Approach (3/3)
Instead of waiting
busy buffer to be free
look for another one.
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
Increases packets
moving through router
Features of Flexible Router
Busy
The design of Flexible Router similar to the
base router except the added functionality
and modules to the input ports.
BusyBusy
Efficient buffer
utilization
Enhance Packets
throughput
Low hardware resource
overhead
E
W
S
N
December 2,
2013
10
Base Router Architecture
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
11
Input Port Module
RC
RoutingComputation
FIFO buffer
ReqUpStr ReqInt(3:0)
FIFO
Controller
GntUpStr GntInt(3:0)
ReqInCnt
GntInCnt
EmptyFull
PacketIn PacketInCnt IntPacket
ReadEn
WriteEn
ReadAddr
WriteIAddr
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
12
Output Port Module
Mux
Round Robin
Arbiter
gnt[1:0]
ReqInt (3:0)
GntInt (3:0) ReqDnStr
GntDnStr
fullDnStr
PacketIn 0
PacketOut
PacketIn 1
PacketIn 2
PacketIn 3
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
13
Basic operation of Base Router
Receiving flowchart of
Down Stream
Sending flowchart
of Up Stream
OutputPort
UpStreamRouter(US)
Full_US
Request_US
Grant_US
PacketIn_US
InputPort
DownStreamRouter
(DS)
InputPort
UpStreamRouter(US)
Full_DS
Request_DS
Grant_DS
PacketOut_DS
OutputPort
DownStreamRouter
(DS)
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
14
Flexible Router Architecture
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
15
Input Port Module
RC
RoutingComputation
ReqUpStr ReqInt(3:0)
GntUpStr
GntInt(3:0)
ReqInCnt
GntInCnt
Empty
Full
PacketIn IntPacket
ReadEn
WriteEn
ReadAddr
WriteIAddr
FIFO buffer
FIFO
Flexibility Controller
Req_FFCE_FIFO_W,N,S
Gnt_FFCE_FIFO_W,N,S
MUX
EastPacket
Packets From
Other Ports
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
16
Basic operation of Flexible
Router
The FFC requests other FIFOs in
a sequential order.
pseudo code for East FFC :
if (FIFO West is not full)
{Send Request and wait Grant ;}
else if (FIFO North is not full)
{Send Request and wait Grant ;}
else if (FIFO South is not full)
{Send Request and wait Grant ;}
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• By applying the turn model on the Flexible router working under XY
routing we can avoid deadlock.
• Under XY routing, possible packet directions that each buffer can store
in the Flexible router are as follows:
– North buffer:
• Can contain packets directed to Local or South.
– South buffer:
• Can contain packets directed to Local or North.
– East buffer:
• Can contain packets directed to Local, North, South, or West.
– West buffer:
• Can contain packets directed to Local, North, South, or East.
December 2,
2013
17
Possible Packet Directions
Packets directed to the local port they reach their destination and are
absorbed directly with the local port.
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• NoC parameters used in this work :
– A 64-bit 5-input-buffer router
18
System Architecture
Selected Logic Why used
Arbitration Round Robin Fairness
Switching Store-And-Forward For simplicity and prove of concept.
Routing Algorithm XY- DOR Routing Minimize area and control overhead.
Deadlock free routing
Topology Mesh Most common for 2D chips
Packet/Flit size 64 bits Can be vary from 32 to 264 bits
Buffer Size 2,4,8 Packets Small to see the utilization
Traffic Patterns Uniform Random, Hot-Spot and
Nearest-Neighbor
For performance evaluation
December 2,
2013
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
19
XY - Dimension-Ordered Routing
(XY - DOR)
S
D
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• Latency is the time elapsed since a particular packet enters
the network until its last packet reaches its destination.
• Throughput is the rate at which packets are delivered by the
network for a particular traffic pattern. .
• There are many factors that a affecting these parameters
– Topology: determines the connecting form of the system and the size,
or the number of nodes.
– Injection rate: the rate at which packets are injected into the
simulator, tell the simulator how many packets to inject per simulation
cycle per nodes on an average.
– Flow control: It refers to the number of virtual channels per physical
channel and the depth of each virtual channel; the unit is flit.
December 2,
2013
20
Performance Parameters
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• A cycle-accurate NoC simulation system in Verilog
HDL is developed to evaluate the performance of
Flexible Router.
• Synthesis Environment:
– XILINX ISE 14.1 Target platform – XILINX Virtex-5 xc5vfx70t-
1ff1136 FPGA.
– Cadence SoC Encounter ® Digital Implementation System,
with 180nm technology. (Encounter RTL Compiler®)
December 2,
2013
21
Evaluation Approach
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
22
Simulation Platform (1/3)
PE Information Where Function
Send Time Sender Log Cycle counter of each sent packet
Receive Time Receiver Log Cycle counter of each received packet
PE Module Sender ID Sender and Receiver Log The PE Module ID of the Sender
PE Module Receiver ID Receiver Log The PE Module ID of the Receiver
Packet ID Sender and Receiver Log The ID of the transmitted Packet
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
Packet Injector Flow Chart
23
Simulation Platform (2/3)
Verilog RTL
Model
Verilog
Testbench
Simulation
Compiler
Simulation
Results
Log FilesWaveform Matlab
Matlab calculates the following:
 Average Latency for all the packets in
the simulation system.
 Average Throughput for all the packets
in the simulation system.
December 2,
2013
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
Simulation
Graphs
Modelsim
or ISim
• Most performance analysis used synthetic traffic patterns
with different characteristics.
• Simulation done under 3 different traffic patterns:
– Uniform (UNI): all the traffic is equally distributed between all nodes.
This is the most commonly used traffic pattern for network
evaluation because it is straightforward to implement, it makes no
assumptions about the application.
– Nearest-Neighbor (NN): any node sends only to its neighbor nodes.
– Hotspot (HS): 90% of the traffic is directed to the hotspot node at (2,
2) and the rest of the traffic is equally distributed between all other
nodes.
December 2,
2013
24
Simulation Platform (3/3)
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
Uniform Random Traffic
December 2,
2013
25
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
20
40
60
80
100
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
Base Router
Flexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
0
0.05
0.1
0.15
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
0
50
100
150
Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles) Base Router
Flexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
0
0.05
0.1
0.15
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Buffer Size = 8
Base Router
Flexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12
0
0.02
0.04
0.06
0.08
0.1
0.12
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Buffer Size = 2
Base Router
Flexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12
25
30
35
40
45
50
55
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
Buffer Size = 2
Base Router
Flexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
12
12.5
13
13.5
14
14.5
15
Buffer Size = 2
Packet Injection Rate (Packets/Cycle/PE)
AverageLatency(Cycles)
Base Router
Flexible Router
December 2,
2013
26
Nearest Neighbor Traffic
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
12
12.5
13
13.5
14
14.5
15
15.5
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
Base Router
Flexible Router
0 0.05 0.1 0.15 0.2
12
12.5
13
13.5
14
14.5
15
15.5
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
Buffer Size = 8
Flexible Router
Base Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
0
0.05
0.1
0.15
0.2
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2
0
0.05
0.1
0.15
0.2
Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
0
0.05
0.1
0.15
0.2
Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
The traffic characteristics of Nearest Neighbor has that each
injector only injects packets to its neighbors so the utilization
of buffer makes the throughput to perform as a linear
function that all injection served by the routers and no
congestion happens to affect the throughput.
December 2,
2013
27
Hot Spot Traffic
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.005 0.01 0.015 0.02 0.025 0.03
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
20
40
60
80
100
120
140
Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
BR-BUF-2
FR-BUF-2
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
0
50
100
150
200
250
300
350
400
450
500
Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
BR-BUF-8
FR-BUF-8
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016
20
40
60
80
100
120
140
160
180
200
220
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
AverageLatency(Cycles)
BR-BUF-4
FR-BUF-4
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Throughput(Packets/Cycle/PE)
Base Router
Flexible Router
The slight improvement in HS, except of increasing saturation
point, is because the HS packets are injected faster than
they can be collected, furthermore HS packets acquire all
network buffer spaces.
It could be one of our future work on modifying the
architecture of FR to be suitable for such kind of this type of
traffic.
• Using Xilinx ISE® Synthesis Tool (XST) targeting Virtex-5 FPGA, xc5vfx70t-
1ff1136.
– The area and maximum frequency results of both Flexible and Base Routers
– The increase in area is accepted due to the added logics for flexibility and FPGA
resources.
December 2,
2013
28
Synthesis Results (1/2)
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
FPGA resources
Number of resources used
Base Flexible
BUF2 BUF4 BUF8 BUF2 BUF4 BUF8
LUTs 657 776 836 1078 1111 1112
FFs 425 430 440 473 474 493
AREA RESULTS OF XILINX FPGA
FPGA resources Base Flexible
BUF2 BUF4 BUF8 BUF2 BUF4 BUF8
Max Frequency (MHz) 164 150 150 141 139 141
FREQUENCY RESULTS OF XILINX FPGA
Max Clock Frequency decreased in Flexible
router due to the Flexibility units but the
Performance of Flexible Router in terms of
throughput and Latency overcome this impact.
December 2,
2013
29
Synthesis Results (1/2)
Configuration
Area in Cells Power in µW
Cell area Leakage Switching
Base 557963.68 1.015 51421.04
Flexible 661936.7 1.15 56372.29
Overhead 18 % 13 % 9.6 %
• Using Cadence Encounter RTL Compiler tool and 180nm
standard cell library.
– The power dissipation and area overhead are obtained for each case
at a typical operating conditions for 180nm technology.
• 25o C, 1.8 Volts, Typical Transistor Model.
– Both dynamic and leakage power estimates were extracted from the
synthesized router implementation, assuming a 50% uniform switching
activity on all router input ports.
AREA AND POWER RESULTS FOR 180NM TECHNOLOGY
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• Experiment results show that Flexible Router
• Increase in the throughput
• Reduce the latency
• @low injection rates both Base and Flexible routers have nearly the same
performance.
• @ high injection rates Flexible has better performance, hence the propriety of
flexibility used.
• Flexible router has saturation point higher than that of the Base router.
• For UNI traffic there is 15% allows higher injection rate, in addition to
improvement in the performance at higher rates.
• For HS and NN it is a small improvement (increasing saturation point), specially
for HS.
– For HS, regards to that HS packets injected faster than they can be collected,
furthermore HS packets acquire all network buffer spaces.
– As the traffic characteristics of NN where each injector only injects packets to its
neighbors so the utilization of buffer makes the throughput to perform as a linear
function that all injection served by the routers and no congestion happens to
affect the throughput.
December 2,
2013
30
Analysis
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
• Decrease the communication overhead due to FFC
• Support hot Spot traffics by modifying the FFC.
• Implement and evaluate the Flexible Router for:
 Virtual Channels.
 Other switching techniques like Virtual Cut-Through and Wormhole.
• Explore Flexible Router to support 3-D Network on Chip.
• More real-world example implementations
• The support for dynamically reconfigurable system
December 2,
2013
31
Future Work
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
[1] Hossam El-Sayed, Mohammed Ragab, Mohammed S. Sayed, and Victor
Goulart, “ Hardware Implementation and Evaluation of the Flexible Router
Architecture for NoCs,” 20th IEEE-ICECS International Conference on
Electronics, Circuits, and Systems, UAE, Dec. 2013. (Accepted As Lecture).
[2] Hossam El-Sayed, Ahmed Shalaby, Mostafa Said, Mohammed S. Sayed,
Mohammed Ragab and Victor Goulart, Performance Evaluation of Flexible
Router Architecture for NoCs,” 24th International Conference on Field
Programmable Logic and Applications, Munich, Germany; September 2 - 4,
2014. (Submitted).
December 2,
2013
32
Published Papers
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
December 2,
2013
33
Acknowledgment
 Motivation
 Related Work
 Base Router Architecture
 Flexible Router Architecture
 Evaluation and Experiments
 Conclusion
Maher Abdelrasoul
Ahmed Shalaby
Mostafa Said
December 2,
2013
34
35
December 2,
2013

More Related Content

What's hot

Digital System Design Basics
Digital System Design BasicsDigital System Design Basics
Digital System Design Basicsanishgoel
 
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...Analog Devices, Inc.
 
An Overview on Programmable System on Chip: PSoC-5
An Overview on Programmable System on Chip: PSoC-5An Overview on Programmable System on Chip: PSoC-5
An Overview on Programmable System on Chip: PSoC-5Premier Farnell
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)Shivam Gupta
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Deepak Kumar
 
Assic 28th Lecture
Assic 28th LectureAssic 28th Lecture
Assic 28th Lecturebabak danyal
 
Soc - Intro, Design Aspects, HLS, TLM
Soc - Intro, Design Aspects, HLS, TLMSoc - Intro, Design Aspects, HLS, TLM
Soc - Intro, Design Aspects, HLS, TLMSubhash Iyer
 
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyDesign of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyTELKOMNIKA JOURNAL
 
Implementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGAImplementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGADeepak Kumar
 
SOC Interconnects: AMBA & CoreConnect
SOC Interconnects: AMBA  & CoreConnectSOC Interconnects: AMBA  & CoreConnect
SOC Interconnects: AMBA & CoreConnectA B Shinde
 
Nios2 and ip core
Nios2 and ip coreNios2 and ip core
Nios2 and ip coreanishgoel
 
Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and designSatya Harish
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOCA B Shinde
 
IoT Programming on the Raspberry Pi
IoT Programming on the Raspberry PiIoT Programming on the Raspberry Pi
IoT Programming on the Raspberry PiDamien Magoni
 
Introduction to Digital Signal processors
Introduction to Digital Signal processorsIntroduction to Digital Signal processors
Introduction to Digital Signal processorsPeriyanayagiS
 

What's hot (20)

Digital System Design Basics
Digital System Design BasicsDigital System Design Basics
Digital System Design Basics
 
Hard ip based SoC design
Hard ip based SoC designHard ip based SoC design
Hard ip based SoC design
 
DSP by FPGA
DSP by FPGADSP by FPGA
DSP by FPGA
 
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
 
An Overview on Programmable System on Chip: PSoC-5
An Overview on Programmable System on Chip: PSoC-5An Overview on Programmable System on Chip: PSoC-5
An Overview on Programmable System on Chip: PSoC-5
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
 
Assic 28th Lecture
Assic 28th LectureAssic 28th Lecture
Assic 28th Lecture
 
Soc - Intro, Design Aspects, HLS, TLM
Soc - Intro, Design Aspects, HLS, TLMSoc - Intro, Design Aspects, HLS, TLM
Soc - Intro, Design Aspects, HLS, TLM
 
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyDesign of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
 
Implementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGAImplementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGA
 
SOC Interconnects: AMBA & CoreConnect
SOC Interconnects: AMBA  & CoreConnectSOC Interconnects: AMBA  & CoreConnect
SOC Interconnects: AMBA & CoreConnect
 
Nios2 and ip core
Nios2 and ip coreNios2 and ip core
Nios2 and ip core
 
Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and design
 
EDA
EDAEDA
EDA
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOC
 
IoT Programming on the Raspberry Pi
IoT Programming on the Raspberry PiIoT Programming on the Raspberry Pi
IoT Programming on the Raspberry Pi
 
Introduction to Digital Signal processors
Introduction to Digital Signal processorsIntroduction to Digital Signal processors
Introduction to Digital Signal processors
 
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
 

Viewers also liked

multi standard multi-band receivers for wireless applications
multi standard  multi-band receivers for wireless applicationsmulti standard  multi-band receivers for wireless applications
multi standard multi-band receivers for wireless applicationsHossam Hassan
 
On Being A Successful Graduate Student In The Sciences
On Being A Successful Graduate Student In The SciencesOn Being A Successful Graduate Student In The Sciences
On Being A Successful Graduate Student In The SciencesHossam Hassan
 
Dot matrix display design using fpga
Dot matrix display design using fpgaDot matrix display design using fpga
Dot matrix display design using fpgaHossam Hassan
 
Calculator design with lcd using fpga
Calculator design with lcd using fpgaCalculator design with lcd using fpga
Calculator design with lcd using fpgaHossam Hassan
 
Introduction to digital signal processing 2
Introduction to digital signal processing 2Introduction to digital signal processing 2
Introduction to digital signal processing 2Hossam Hassan
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentationHossam Hassan
 
An Ultra-Low Power Asynchronous-Logic
An Ultra-Low Power Asynchronous-LogicAn Ultra-Low Power Asynchronous-Logic
An Ultra-Low Power Asynchronous-LogicHossam Hassan
 
Hardware Implementation Of QPSK Modulator for Satellite Communications
Hardware Implementation Of QPSK Modulator for Satellite CommunicationsHardware Implementation Of QPSK Modulator for Satellite Communications
Hardware Implementation Of QPSK Modulator for Satellite Communicationspradeepps88
 
Embedded c c++ programming fundamentals master
Embedded c c++ programming fundamentals masterEmbedded c c++ programming fundamentals master
Embedded c c++ programming fundamentals masterHossam Hassan
 
Search algorithms master
Search algorithms masterSearch algorithms master
Search algorithms masterHossam Hassan
 
Hardware interfacing basics using AVR
Hardware interfacing basics using AVRHardware interfacing basics using AVR
Hardware interfacing basics using AVRMohamed Abdallah
 

Viewers also liked (12)

multi standard multi-band receivers for wireless applications
multi standard  multi-band receivers for wireless applicationsmulti standard  multi-band receivers for wireless applications
multi standard multi-band receivers for wireless applications
 
On Being A Successful Graduate Student In The Sciences
On Being A Successful Graduate Student In The SciencesOn Being A Successful Graduate Student In The Sciences
On Being A Successful Graduate Student In The Sciences
 
Dot matrix display design using fpga
Dot matrix display design using fpgaDot matrix display design using fpga
Dot matrix display design using fpga
 
Calculator design with lcd using fpga
Calculator design with lcd using fpgaCalculator design with lcd using fpga
Calculator design with lcd using fpga
 
Introduction to digital signal processing 2
Introduction to digital signal processing 2Introduction to digital signal processing 2
Introduction to digital signal processing 2
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentation
 
An Ultra-Low Power Asynchronous-Logic
An Ultra-Low Power Asynchronous-LogicAn Ultra-Low Power Asynchronous-Logic
An Ultra-Low Power Asynchronous-Logic
 
Hardware Implementation Of QPSK Modulator for Satellite Communications
Hardware Implementation Of QPSK Modulator for Satellite CommunicationsHardware Implementation Of QPSK Modulator for Satellite Communications
Hardware Implementation Of QPSK Modulator for Satellite Communications
 
Embedded C - Lecture 1
Embedded C - Lecture 1Embedded C - Lecture 1
Embedded C - Lecture 1
 
Embedded c c++ programming fundamentals master
Embedded c c++ programming fundamentals masterEmbedded c c++ programming fundamentals master
Embedded c c++ programming fundamentals master
 
Search algorithms master
Search algorithms masterSearch algorithms master
Search algorithms master
 
Hardware interfacing basics using AVR
Hardware interfacing basics using AVRHardware interfacing basics using AVR
Hardware interfacing basics using AVR
 

Similar to Public Seminar_Final 18112014

Understanding Network Routing Problem and Study of Routing Algorithms and Heu...
Understanding Network Routing Problem and Study of Routing Algorithms and Heu...Understanding Network Routing Problem and Study of Routing Algorithms and Heu...
Understanding Network Routing Problem and Study of Routing Algorithms and Heu...IRJET Journal
 
PERFORMANCE STUDIES ON THE VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
PERFORMANCE STUDIES ON THE  VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKSPERFORMANCE STUDIES ON THE  VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
PERFORMANCE STUDIES ON THE VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKSJYoTHiSH o.s
 
Experimental Analysis Of On Demand Routing Protocol
Experimental Analysis Of On Demand Routing ProtocolExperimental Analysis Of On Demand Routing Protocol
Experimental Analysis Of On Demand Routing Protocolsmita gupta
 
Simulation Based EIGRP with two Autonomous systems Performance Analysis
Simulation Based EIGRP with two Autonomous systems Performance Analysis Simulation Based EIGRP with two Autonomous systems Performance Analysis
Simulation Based EIGRP with two Autonomous systems Performance Analysis Nzava Luwawa
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal
 
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSFPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSIAEME Publication
 
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSFPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSIAEME Publication
 
A NOVEL ROBUST ROUTER ARCHITECTURE
A NOVEL ROBUST ROUTER ARCHITECTURE A NOVEL ROBUST ROUTER ARCHITECTURE
A NOVEL ROBUST ROUTER ARCHITECTURE IJERA Editor
 
The core skills of 4g wireless industrial router with ethernet
The core skills of 4g wireless industrial router with ethernetThe core skills of 4g wireless industrial router with ethernet
The core skills of 4g wireless industrial router with ethernetE-Lins Technology Co. Ltd.
 
Crowd management system
Crowd management systemCrowd management system
Crowd management systemMumbaikar Le
 
Fpga based design and implementation of noc torus
Fpga based design and implementation of noc torusFpga based design and implementation of noc torus
Fpga based design and implementation of noc toruseSAT Publishing House
 
Implementation of switching controller for the internet router
Implementation of switching controller for the internet routerImplementation of switching controller for the internet router
Implementation of switching controller for the internet routerIAEME Publication
 
Design and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterDesign and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterIRJET Journal
 

Similar to Public Seminar_Final 18112014 (20)

Ijecet 06 08_004
Ijecet 06 08_004Ijecet 06 08_004
Ijecet 06 08_004
 
Решения WANDL и NorthStar для операторов
Решения WANDL и NorthStar для операторовРешения WANDL и NorthStar для операторов
Решения WANDL и NorthStar для операторов
 
Understanding Network Routing Problem and Study of Routing Algorithms and Heu...
Understanding Network Routing Problem and Study of Routing Algorithms and Heu...Understanding Network Routing Problem and Study of Routing Algorithms and Heu...
Understanding Network Routing Problem and Study of Routing Algorithms and Heu...
 
Chapter07
Chapter07Chapter07
Chapter07
 
PERFORMANCE STUDIES ON THE VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
PERFORMANCE STUDIES ON THE  VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKSPERFORMANCE STUDIES ON THE  VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
PERFORMANCE STUDIES ON THE VARIOUS ROUTING PROTOCOLS IN AD-HOC NETWORKS
 
Ppt seminar noc
Ppt seminar nocPpt seminar noc
Ppt seminar noc
 
Experimental Analysis Of On Demand Routing Protocol
Experimental Analysis Of On Demand Routing ProtocolExperimental Analysis Of On Demand Routing Protocol
Experimental Analysis Of On Demand Routing Protocol
 
Simulation Based EIGRP with two Autonomous systems Performance Analysis
Simulation Based EIGRP with two Autonomous systems Performance Analysis Simulation Based EIGRP with two Autonomous systems Performance Analysis
Simulation Based EIGRP with two Autonomous systems Performance Analysis
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSFPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
 
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSFPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
 
A NOVEL ROBUST ROUTER ARCHITECTURE
A NOVEL ROBUST ROUTER ARCHITECTURE A NOVEL ROBUST ROUTER ARCHITECTURE
A NOVEL ROBUST ROUTER ARCHITECTURE
 
The core skills of 4g wireless industrial router with ethernet
The core skills of 4g wireless industrial router with ethernetThe core skills of 4g wireless industrial router with ethernet
The core skills of 4g wireless industrial router with ethernet
 
Crowd management system
Crowd management systemCrowd management system
Crowd management system
 
Routing simulator
Routing simulatorRouting simulator
Routing simulator
 
5 ijcse-01219
5 ijcse-012195 ijcse-01219
5 ijcse-01219
 
Fpga based design and implementation of noc torus
Fpga based design and implementation of noc torusFpga based design and implementation of noc torus
Fpga based design and implementation of noc torus
 
A018120105
A018120105A018120105
A018120105
 
Implementation of switching controller for the internet router
Implementation of switching controller for the internet routerImplementation of switching controller for the internet router
Implementation of switching controller for the internet router
 
Design and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterDesign and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip Router
 

Public Seminar_Final 18112014

  • 1. Hossam El-Sayed Abdel-Fadeel M.Sc. Student, ECE department, E-JUST, Research Assistance, NTI email: hossam.fadeel@ejust.edu.eg hossam.fadeel@nti.sci.eg Supervised by: Prof. M. Ragab, Assoc. Prof. Maha El-Sabarouty, Assoc. Prof. V. Goulart, and Assist. Prof. Mohammed Sharaf December 2, 2013 1
  • 2. • MOTIVATION • RELATED WORK • BASE ROUTER ARCHITECTURE • FLEXIBLE ROUTER ARCHITECTURE • EVALUATION AND EXPERIMENTS • CONCLUSION December 2, 2013 2 Outline  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 3. • Process technology scales  Transistor densities increases.  Many Processing Elements in a single chip.  BUT, also global wiring delays increases. (wire speed not scaling)  Performance of Digital Systems increases in terms of computation. • Design concept  Many Processing Elements (PEs) need to be interconnected.  Need a structured and scalable on-chip communication architecture.  Computation-centric design.  Communication-centric design. December 2, 2013 3 Why Network on Chip?  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 4. • NoC = Routers + Links. – Network topology (how the nodes are connected to each other) – Routing algorithm (how packets move: source  destination) – Flow control (controls the transmission of packets between routers) – Router architecture (Buffers, Arbiters, Crossbars, ..etc.) • Buffer Requirements in a Router – Stores arriving Packets or flits. December 2, 2013 4 What is Network on Chip (NoC) ?  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 5. Buffers in NoC Routers • Why buffering ?  Wait for routing decisions.  Contention for the same output channel.  Congested downstream router. • Large buffers  improve Throughput and Latency. • BUT, in cost of – Area: High hardware resource overhead – Power: Large energy consumers about 64% of the total router leakage power . • Need efficient ways to use buffer resources – Through Perfect management of available buffers. • Several architectures and implementations were proposed . December 2, 2013 5  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 6. Related Work • Central Buffer Sharing Method – All ports share a central buffer – Improves the performance but at the cost of • Area overheads • Complexity of control • Distributed Shared Buffer – Shows improvement in the throughput but in cost of • Power and • Area overhead. December 2, 2013 6  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 7. • Improve the performance of the overall network. – Modifying the Router Architecture • Using the same amount of available buffers in more efficient way. – If there is a contention at any input port, the Flexible Router will try to allocate any suitable free buffer in other input ports in the router. – No need to increase the size of buffers or to use extra virtual channels (VCs) December 2, 2013 7 Flexible Router Approach (1/3)  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 8. December 2, 2013 8 Flexible Router Approach (2/3)  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion Base Router Congestion Problem Busy Busy E W S N Packets requesting busy buffer will be blocked
  • 9. December 2, 2013 9 Flexible Router Approach (3/3) Instead of waiting busy buffer to be free look for another one.  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion Increases packets moving through router Features of Flexible Router Busy The design of Flexible Router similar to the base router except the added functionality and modules to the input ports. BusyBusy Efficient buffer utilization Enhance Packets throughput Low hardware resource overhead E W S N
  • 10. December 2, 2013 10 Base Router Architecture  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 11. December 2, 2013 11 Input Port Module RC RoutingComputation FIFO buffer ReqUpStr ReqInt(3:0) FIFO Controller GntUpStr GntInt(3:0) ReqInCnt GntInCnt EmptyFull PacketIn PacketInCnt IntPacket ReadEn WriteEn ReadAddr WriteIAddr  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 12. December 2, 2013 12 Output Port Module Mux Round Robin Arbiter gnt[1:0] ReqInt (3:0) GntInt (3:0) ReqDnStr GntDnStr fullDnStr PacketIn 0 PacketOut PacketIn 1 PacketIn 2 PacketIn 3  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 13. December 2, 2013 13 Basic operation of Base Router Receiving flowchart of Down Stream Sending flowchart of Up Stream OutputPort UpStreamRouter(US) Full_US Request_US Grant_US PacketIn_US InputPort DownStreamRouter (DS) InputPort UpStreamRouter(US) Full_DS Request_DS Grant_DS PacketOut_DS OutputPort DownStreamRouter (DS)  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 14. December 2, 2013 14 Flexible Router Architecture  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 15. December 2, 2013 15 Input Port Module RC RoutingComputation ReqUpStr ReqInt(3:0) GntUpStr GntInt(3:0) ReqInCnt GntInCnt Empty Full PacketIn IntPacket ReadEn WriteEn ReadAddr WriteIAddr FIFO buffer FIFO Flexibility Controller Req_FFCE_FIFO_W,N,S Gnt_FFCE_FIFO_W,N,S MUX EastPacket Packets From Other Ports  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 16. December 2, 2013 16 Basic operation of Flexible Router The FFC requests other FIFOs in a sequential order. pseudo code for East FFC : if (FIFO West is not full) {Send Request and wait Grant ;} else if (FIFO North is not full) {Send Request and wait Grant ;} else if (FIFO South is not full) {Send Request and wait Grant ;}  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 17. • By applying the turn model on the Flexible router working under XY routing we can avoid deadlock. • Under XY routing, possible packet directions that each buffer can store in the Flexible router are as follows: – North buffer: • Can contain packets directed to Local or South. – South buffer: • Can contain packets directed to Local or North. – East buffer: • Can contain packets directed to Local, North, South, or West. – West buffer: • Can contain packets directed to Local, North, South, or East. December 2, 2013 17 Possible Packet Directions Packets directed to the local port they reach their destination and are absorbed directly with the local port.  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 18. • NoC parameters used in this work : – A 64-bit 5-input-buffer router 18 System Architecture Selected Logic Why used Arbitration Round Robin Fairness Switching Store-And-Forward For simplicity and prove of concept. Routing Algorithm XY- DOR Routing Minimize area and control overhead. Deadlock free routing Topology Mesh Most common for 2D chips Packet/Flit size 64 bits Can be vary from 32 to 264 bits Buffer Size 2,4,8 Packets Small to see the utilization Traffic Patterns Uniform Random, Hot-Spot and Nearest-Neighbor For performance evaluation December 2, 2013  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 19. 19 XY - Dimension-Ordered Routing (XY - DOR) S D  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 20. • Latency is the time elapsed since a particular packet enters the network until its last packet reaches its destination. • Throughput is the rate at which packets are delivered by the network for a particular traffic pattern. . • There are many factors that a affecting these parameters – Topology: determines the connecting form of the system and the size, or the number of nodes. – Injection rate: the rate at which packets are injected into the simulator, tell the simulator how many packets to inject per simulation cycle per nodes on an average. – Flow control: It refers to the number of virtual channels per physical channel and the depth of each virtual channel; the unit is flit. December 2, 2013 20 Performance Parameters  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 21. • A cycle-accurate NoC simulation system in Verilog HDL is developed to evaluate the performance of Flexible Router. • Synthesis Environment: – XILINX ISE 14.1 Target platform – XILINX Virtex-5 xc5vfx70t- 1ff1136 FPGA. – Cadence SoC Encounter ® Digital Implementation System, with 180nm technology. (Encounter RTL Compiler®) December 2, 2013 21 Evaluation Approach  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 22. December 2, 2013 22 Simulation Platform (1/3) PE Information Where Function Send Time Sender Log Cycle counter of each sent packet Receive Time Receiver Log Cycle counter of each received packet PE Module Sender ID Sender and Receiver Log The PE Module ID of the Sender PE Module Receiver ID Receiver Log The PE Module ID of the Receiver Packet ID Sender and Receiver Log The ID of the transmitted Packet  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion Packet Injector Flow Chart
  • 23. 23 Simulation Platform (2/3) Verilog RTL Model Verilog Testbench Simulation Compiler Simulation Results Log FilesWaveform Matlab Matlab calculates the following:  Average Latency for all the packets in the simulation system.  Average Throughput for all the packets in the simulation system. December 2, 2013  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion Simulation Graphs Modelsim or ISim
  • 24. • Most performance analysis used synthetic traffic patterns with different characteristics. • Simulation done under 3 different traffic patterns: – Uniform (UNI): all the traffic is equally distributed between all nodes. This is the most commonly used traffic pattern for network evaluation because it is straightforward to implement, it makes no assumptions about the application. – Nearest-Neighbor (NN): any node sends only to its neighbor nodes. – Hotspot (HS): 90% of the traffic is directed to the hotspot node at (2, 2) and the rest of the traffic is equally distributed between all other nodes. December 2, 2013 24 Simulation Platform (3/3)  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 25. Uniform Random Traffic December 2, 2013 25  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 20 40 60 80 100 Buffer Size = 4 Packets Injection Rate (Packet/Cycle/PE) AverageLatency(Cycles) Base Router Flexible Router 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 0.05 0.1 0.15 Buffer Size = 4 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Base Router Flexible Router 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 50 100 150 Buffer Size = 8 Packets Injection Rate (Packet/Cycle/PE) AverageLatency(Cycles) Base Router Flexible Router 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 0.05 0.1 0.15 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Buffer Size = 8 Base Router Flexible Router 0 0.02 0.04 0.06 0.08 0.1 0.12 0 0.02 0.04 0.06 0.08 0.1 0.12 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Buffer Size = 2 Base Router Flexible Router 0 0.02 0.04 0.06 0.08 0.1 0.12 25 30 35 40 45 50 55 Packets Injection Rate (Packet/Cycle/PE) AverageLatency(Cycles) Buffer Size = 2 Base Router Flexible Router
  • 26. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 12 12.5 13 13.5 14 14.5 15 Buffer Size = 2 Packet Injection Rate (Packets/Cycle/PE) AverageLatency(Cycles) Base Router Flexible Router December 2, 2013 26 Nearest Neighbor Traffic  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 12 12.5 13 13.5 14 14.5 15 15.5 Buffer Size = 4 Packets Injection Rate (Packet/Cycle/PE) AverageLatency(Cycles) Base Router Flexible Router 0 0.05 0.1 0.15 0.2 12 12.5 13 13.5 14 14.5 15 15.5 Packets Injection Rate (Packet/Cycle/PE) AverageLatency(Cycles) Buffer Size = 8 Flexible Router Base Router 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 0.05 0.1 0.15 0.2 Buffer Size = 4 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Base Router Flexible Router 0 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2 0 0.05 0.1 0.15 0.2 Buffer Size = 8 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Base Router Flexible Router 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 0.05 0.1 0.15 0.2 Buffer Size = 2 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Base Router Flexible Router The traffic characteristics of Nearest Neighbor has that each injector only injects packets to its neighbors so the utilization of buffer makes the throughput to perform as a linear function that all injection served by the routers and no congestion happens to affect the throughput.
  • 27. December 2, 2013 27 Hot Spot Traffic 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 Buffer Size = 8 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Base Router Flexible Router 0 0.005 0.01 0.015 0.02 0.025 0.03 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 Buffer Size = 2 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Base Router Flexible Router 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 20 40 60 80 100 120 140 Buffer Size = 2 Packets Injection Rate (Packet/Cycle/PE) AverageLatency(Cycles) BR-BUF-2 FR-BUF-2 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0 50 100 150 200 250 300 350 400 450 500 Buffer Size = 8 Packets Injection Rate (Packet/Cycle/PE) AverageLatency(Cycles) BR-BUF-8 FR-BUF-8  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 20 40 60 80 100 120 140 160 180 200 220 Buffer Size = 4 Packets Injection Rate (Packet/Cycle/PE) AverageLatency(Cycles) BR-BUF-4 FR-BUF-4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 Buffer Size = 4 Packets Injection Rate (Packet/Cycle/PE) Throughput(Packets/Cycle/PE) Base Router Flexible Router The slight improvement in HS, except of increasing saturation point, is because the HS packets are injected faster than they can be collected, furthermore HS packets acquire all network buffer spaces. It could be one of our future work on modifying the architecture of FR to be suitable for such kind of this type of traffic.
  • 28. • Using Xilinx ISE® Synthesis Tool (XST) targeting Virtex-5 FPGA, xc5vfx70t- 1ff1136. – The area and maximum frequency results of both Flexible and Base Routers – The increase in area is accepted due to the added logics for flexibility and FPGA resources. December 2, 2013 28 Synthesis Results (1/2)  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion FPGA resources Number of resources used Base Flexible BUF2 BUF4 BUF8 BUF2 BUF4 BUF8 LUTs 657 776 836 1078 1111 1112 FFs 425 430 440 473 474 493 AREA RESULTS OF XILINX FPGA FPGA resources Base Flexible BUF2 BUF4 BUF8 BUF2 BUF4 BUF8 Max Frequency (MHz) 164 150 150 141 139 141 FREQUENCY RESULTS OF XILINX FPGA Max Clock Frequency decreased in Flexible router due to the Flexibility units but the Performance of Flexible Router in terms of throughput and Latency overcome this impact.
  • 29. December 2, 2013 29 Synthesis Results (1/2) Configuration Area in Cells Power in µW Cell area Leakage Switching Base 557963.68 1.015 51421.04 Flexible 661936.7 1.15 56372.29 Overhead 18 % 13 % 9.6 % • Using Cadence Encounter RTL Compiler tool and 180nm standard cell library. – The power dissipation and area overhead are obtained for each case at a typical operating conditions for 180nm technology. • 25o C, 1.8 Volts, Typical Transistor Model. – Both dynamic and leakage power estimates were extracted from the synthesized router implementation, assuming a 50% uniform switching activity on all router input ports. AREA AND POWER RESULTS FOR 180NM TECHNOLOGY  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 30. • Experiment results show that Flexible Router • Increase in the throughput • Reduce the latency • @low injection rates both Base and Flexible routers have nearly the same performance. • @ high injection rates Flexible has better performance, hence the propriety of flexibility used. • Flexible router has saturation point higher than that of the Base router. • For UNI traffic there is 15% allows higher injection rate, in addition to improvement in the performance at higher rates. • For HS and NN it is a small improvement (increasing saturation point), specially for HS. – For HS, regards to that HS packets injected faster than they can be collected, furthermore HS packets acquire all network buffer spaces. – As the traffic characteristics of NN where each injector only injects packets to its neighbors so the utilization of buffer makes the throughput to perform as a linear function that all injection served by the routers and no congestion happens to affect the throughput. December 2, 2013 30 Analysis  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 31. • Decrease the communication overhead due to FFC • Support hot Spot traffics by modifying the FFC. • Implement and evaluate the Flexible Router for:  Virtual Channels.  Other switching techniques like Virtual Cut-Through and Wormhole. • Explore Flexible Router to support 3-D Network on Chip. • More real-world example implementations • The support for dynamically reconfigurable system December 2, 2013 31 Future Work  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 32. [1] Hossam El-Sayed, Mohammed Ragab, Mohammed S. Sayed, and Victor Goulart, “ Hardware Implementation and Evaluation of the Flexible Router Architecture for NoCs,” 20th IEEE-ICECS International Conference on Electronics, Circuits, and Systems, UAE, Dec. 2013. (Accepted As Lecture). [2] Hossam El-Sayed, Ahmed Shalaby, Mostafa Said, Mohammed S. Sayed, Mohammed Ragab and Victor Goulart, Performance Evaluation of Flexible Router Architecture for NoCs,” 24th International Conference on Field Programmable Logic and Applications, Munich, Germany; September 2 - 4, 2014. (Submitted). December 2, 2013 32 Published Papers  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion
  • 33. December 2, 2013 33 Acknowledgment  Motivation  Related Work  Base Router Architecture  Flexible Router Architecture  Evaluation and Experiments  Conclusion Maher Abdelrasoul Ahmed Shalaby Mostafa Said