3. TThhee GGeenneerraall DDeessiiggnn FFllooww
Design flow is a set of procedures that allows
designers to progress from a specification to the
final implementation in an error-free manner
The general design flow is shown below:
3
Heart of front-end
Heart of back-end
4. RRTTLL SSyynntthheessiiss FFllooww
RTL synthesis flow chart:
4
Convert RTL
description to generic
gates and registers
and then optimize the
logic to improve
speed and area
This block is often used in
ASIC (cell-based design)
but not FPGA
Insert or modify
logic and registers
to aid in
manufacturing test
Static timing
analysis checks the
temporal
requirement of the
design and Power
analysis estimates
the power
consumption of the
circuit
5. PPhhyyssiiccaall SSyynntthheessiiss FFllooww
Divided into placement and routing stages:
5
Stages Sub-divisions
Placement (logic cells are placed
in at fixed positions to minimise
total area and wire length)
Partitioning: Partition the circuit into parts
Floorplanning: Determines location of each module in a rectangular chip area
Placement: Finds the best position of each module
Routing (complete the
connections of signal nets
among the cell modules placed
by placement)
Global: Decompose large routing problems into small manageable sub-problems
Detailed: Carries out actual connections of signal nets among modules
6. TTiimmiinngg--DDrriivveenn PPllaacceemmeenntt
In PAR, timing information can only be obtained
after layout has been completed
In order to satisfy post-layout timing requirement,
we have to go back to logic synthesis and start
again
Timing-driven placement solves this problem by
incorporating timing analysis into the placement
stage
The critical path can then be placed into the
layout with priority
Some terms used:
6
Term Definition
Arrival time Time elapsed for a signal to arrive at a certain point
Required time Latest time at which a signal can arrive without making the clock
cycle longer than desired
Slack time Difference between required time and arrival time
8. TTiimmiinngg--DDrriivveenn PPllaacceemmeenntt
Features of the slack time include:
The path with the smallest slacks are called the
critical path
Negative slack means the associated path is the
critical path
As a results, the slack time is used to analyze the
critical path of the design
Another use of slack time is in timing-dependent
algorithm one of which is known as the zero slack
algorithm
8
9. ZZeerroo--ssllaacckk aallggoorriitthhmm
The assumptions made include:
Signal-arriving time at each primary input and the
time a signal is required at the primary outputs are
known
Algorithm:
Begin
repeat:
compute all slacks;
find the minimum positive slack;
find a path with all slacks equal to the
minimum slack;
distribute the slacks along the path segment;
until (there exist no positive slack);
End
The purpose of the algorithm is to is to compute
and distribute the slack time evenly in the
interconnect along with the path from the primary
9
10. ZZeerroo--ssllaacckk aallggoorriitthhmm
An example of the zero-slack algorithm:
10
14/-/-
0 0
16/-/-
0
0
2 2 0
0
4
0 0
0
0
0
0
1
5
z’
v
w
u
x
y’
14/11/3
16/10/6
0
0
2 2 0
0
4
0 0
0
0
0
0
1
5
z’
v
w
u
x
y’
14/14/0
16/16/0
0
0
2 2 1
1
4
2 2
0
2
0
0
1
5
z’
v
w
u
x
y’
14/14/0
16/10/6
0
0
2 2 1
1
4
0 0
0
0
0
0
1
5
z’
v
w
u
x
y’
8/5/3
15/9/6
12/9/3
5/5/0
10/10/0
15/9/6
1
1
12/2/10
11. ZZeerroo--ssllaacckk aallggoorriitthhmm
Final placement of results from the slack-algorithm:
v
f1
f2
u z’
x y’
w
The nets with a higher slack time may use longer
wires and the nets with lower slack time may use
shorter wires 11
12. Design Environment aanndd CCoonnssttrraaiinnttss
Both design environment and constraints are
required for a design to be synthesized
They must be provided along with the RTL codes and
technology library to the synthesis tool
12
Component description
Environment Provides process parameters and I/O port attributes
Constraints Provide clock related constraints, I/O delays and timing exceptions
13. LLooggiicc SSyynntthheessiiss
Architecture of logic synthesizers:
13
Checks the syntax of the
source code and creates
internal components to be
used in the next phase
Connects all internal components,
unrolls loops, expands generate-Manages
design loops, initializations etc
hierarchy, extract FSM,
explore resource sharing
This is the heart of the synthesizer
It creates a new gate network which
computes the functions specified by a
set of Boolean functions, one per
primary output
14. LLooggiicc SSyynntthheessiiss
The general operations involved in technology-independent
synthesis are grouped into:
Restructuring operations: includes operations that
modify the structure of the Boolean network by
introducing new nodes and eliminating others
Node minimization: includes operations that simplify
the logic equations associated with nodes
14
16. RReessttrruuccttuurriinngg OOppeerraattiioonnss
Extraction: this is related to decomposition but
operates on a number of given functions
With extraction, the given functions are expressed
in terms of newly created intermediate functions
and variables
For example
f = xyz+uw
g = xyz+uv
xyz can be extracted and denoted by a new function
h
therefore:
h = xyz
f = h+uw
g = h+uv
16
17. RReessttrruuccttuurriinngg OOppeerraattiioonnss
Factorization: takes three steps
Generate all potential common factors
Choses which factors to substitute into the network
Reconstruct the network by adding the new factors
For example
f = wyz+xyz+uv can be factored into:
f = yz(w+x)+uv
17
18. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
The motivation behind multilevel logic synthesis is
that:
Most often there are too many functions that are too
expensive in terms of area and propagation time to
implement in two-level logic
For example, consider the two logic functions:
f(w,x,y,z) = wx+xy+xz
g(w,x,y,t) = wx’+x’y+x’t
The above functions contain 12 literals and need a
total of 8 gates to implement
18
19. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
Hardware requirement is reduced when using three-level
logic structure by factoring both functions
and extracting the common factor
f(w,x,y,z) = wx+xy+xz = x(w+y)+xz = x.k+xz
g(w,x,y,t) = wx’+x’y+x’t = x’(w+y)+x’t = x’.k+x’t
k(w,y) =
w+y
The result has 10 literals and need only 7 gates
19
20. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
The general operations involved in multilevel
synthesis are:
Minimizing two-level logic function
Finding common sub-expressions
Substituting one expression into another
Factoring single functions
One fundamental approach used in multilevel logic
synthesis is the kernel approach
20
21. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
The kernel approach (terminologies):
The divisors of f are defined as the set:
D(f) = {g| f/g != R}
Primary divisors of f are defined as the set:
P(f) = {f/c| c is a cube} eg if f = wxy+wxzt, then
f/w = xy+xzt is a primary divisor
Every divisor of f is contained in the primary
divisor
An expression is cube-free if no cube divides the
expression evenly example xy+z is cube free but xy+xz
is not
21
22. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
The kernels of f are cube-free primary divisors of
f:
K(f) = {k/k e P(f), k is cube-free}
For given f = wxy+wxzt, then f/w = xy+xzt is a
primary divisor but not cube-free since x is a
factor: f/w = x(y+zt)
A cube c used to obtain the kernel k = f/c is
called a cokernel of k
Example f/wx = y+zt is a kernel and wx is the
cokernel
22
23. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
Kernel and cokernel:
Consider the function f(w,x,y,z) = xz+yz+wxy
There are 7 literal and we find the cokernels and
kernels as follows:
f/w = xy
f/x = z+wy
f/y = z+wx
f/z = x+y
w is not a cokernel because xy is not cube free
The cokernel set is: {x,y,z}
The kernel set is: {z+wy, z+wx, x+y}
23
24. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
Consider: f(t,u,v,w,x,y,z) =
twy+txy+uwy+uxy+vwy+vxy+z
There are 19 literals, use the table below to find
the cokernel:
twy txy uwy uxy vwy vxy
twy *
txy ty *
uwy wy y *
uxy y xy uy *
vwy wy y wy y *
vxy y xy y wy vy *
ty uy vy wy xy
ty *
uy y *
vy y y *
wy y y y *
xy y y y y *
Combining the two cokernel set, we get:
{ty, uy, vy, wy, xy, y} 24
25. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
The kernels corresponding to the cokernal is as
follows:
f/ty = w+x = K1
f/uy = w+x = K1
f/vy = w+x = K1
f/wy = t+u+v = K2
f/xy = t+u+v = K2
f/y = tw+tx+uw+ux+vw+vx
= t(w+x)+u(w+x)+v(w+x) = (w+x)(t+u+v) = K3 =
K1K2
The function can be reduce to:
f = K3y+z = (w+x)(t+u+v)y+z
25
27. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
A multi-output example:
Consider following functions:
F1(t,u,v,w) = tv+tw+uv+uw
F2(v,w,x,y) = vxy’+wxy’
F3(u,v,w,x,y,z) = uv+uw+z’
There are 19 literals, the cokernel set is as
follows:
Cf1 = {t,u,v,w}
Cf2 = {xy’}
Cf3 = {u}
The kernel set is as follows
Kf1 = {v+w,t+u}
Kf2 = {v+w}
Kf3 = {v+w}
27
28. MMuullttiilleevveell LLooggiicc SSyynntthheessiiss
From the kernel list, the common divisor is v+w
Therefore the functions can be modified to:
f1(t,u,v,w) = g(t+u)
f2(v,w,x,y) = gxy’
f3(u,v,w,x,y,z) = gu+z’
Where g = v+w
The logic circuit is given below
28
t
f1
f2
f3
u
z’
v
w
u
x
y’
g
29. Technology DDeeppeennddeenntt SSyynntthheessiiss
Involves finding minimum cost covering of boolean
network by choosing from the collection of
primitive logic elements in the target library
Optimization is done for both area and delay
A simple approach for LUT-based FPGA architecture
is introduced
29
30. Technology DDeeppeennddeenntt SSyynntthheessiiss
For LUT-based FPGA architecture, we assume that
each LUT has at most 4 inputs
The following steps are then followed:
The network is decomposed into nodes with at most 4
inputs
Reduce the number of nodes by combining some of them
30
31. LLaanngguuaaggee SSttrruuccttuurree SSyynntthheessiiss
Synthesis tools perform the following tasks:
Detect and eliminate redundant logic
Detect combinational feedback loops
Detect unused states
Detect and collapse equivalent states
Make state assignments
31
32. o Synthesis off AAssssiiggnnmmeenntt SSttaatteemmeennttss
Assignment statements including continuous and
procedural assignments are the most straight
forward language structures in Verilog
Continuous assignment is basically an expression
comprising of operands and operators
Almost all operators in Verilog HDL are
synthesizable
The exceptions include: case equality, arithmetic
shift, exponent and modulus operators
32
33. o Synthesis off AAssssiiggnnmmeenntt SSttaatteemmeennttss
Synthesizable and non-synthesizable entities
33
Synthesizable Non-synthesizable
Module instances, primitive gate, tasks Timing constraints
Procedural assignments: always, if-else,
Initial statements
case, casex, casez
Procedural blocks: begin-end, named
blocks, disable statements
Loop statements: for, while and forever
34. o Synthesis off SSeelleeccttiioonn SSttaatteemmeennttss
Selection statements and their synthesis results:
Depending on whether a design is combinational or
sequential logic, only a part of the selection
statement will be needed
For combinational logic an incomplete selection
statements infers a latch
For sequential logic, there is no need to specify a
complete selection statement
34
Selection statement Synthesis results
If-else 2-to-1 multiplexer
Nested if-else Cascaded combination of multiplexers
Case Multiplexer
Incomplete if-else and case statements Latch
35. o Synthesis off SSeelleeccttiioonn SSttaatteemmeennttss
This shows that a latch is inferred by the
synthesis tool due to the lack of the else part in
a combinational logic using and if-else statement
Code and RTL schematic
35
36. o Synthesis off SSeelleeccttiioonn SSttaatteemmeennttss
A complete if-else statement infers a multiplexer
Code and RTL schematic:
36
37. o Synthesis off SSeelleeccttiioonn SSttaatteemmeennttss
An incomplete case statement (without default)
infers a latch
Code and RTL schematic:
37
38. o Synthesis off SSeelleeccttiioonn SSttaatteemmeennttss
To prevent a latch from being inferred in a case
statements, the default must be included
Code and RTL schematic:
38
39. DDeellaayy VVaalluueess
Synthesis tools ignore delay values
This is because the ultimate delays of the network
will be determined by the actual delays of the
gates used to implement the gate-level netlist
Delays are only used during simulations
39
40. DDeellaayy VVaalluueess
Ignored delay values – non synthesizable
The example is a module of a four-phase clock
generator
Code and testbench:
40
42. DDeellaayy VVaalluueess
A four phase clock generator – synthesizable
version
Code and testbench
42
43. DDeellaayy VVaalluueess
A four phase clock generator – synthesizable
version
RTL schematic and waveform
43
44. o Synthesis off NNeeggaattiivvee aanndd PPoossiittiivvee
SSiiggnnaallss
Positive and negative-edge clock signals are used
to perform operations in sequence
Most synthesis tools support the mix use of two or
more different edge triggered signals but cannot
accept the mix use of edge-triggered and level-sensitive
signal in the same always block
44
45. o Synthesis off NNeeggaattiivvee aanndd PPoossiittiivvee
SSiiggnnaallss
This code shows the mix use of edge-triggered and
level-sensitive signals which is not accepted by
synthesis tools
Try to synthesize this code and the error below
will be generated:
45
46. o Synthesis off NNeeggaattiivvee aanndd PPoossiittiivvee
SSiiggnnaallss
An example of the mixed use of posedge and negedge
signals
Code and RTL schematic:
46
47. SSyynntthheessiiss ooff LLoooopp SSttaatteemmeennttss
Loop statements contain for, while, repeat and
forever
For, while and forever are synthesizable except
that while and forever must contain timing control
@(posedge) or @(negedge)
Repeat is generally not synthsizable
To synthesize a for loop:
The elaborator unrolls the for loop
The synthesizer proceeds with analysis/translation
and logic optimization
47
48. SSyynntthheessiiss ooff LLoooopp SSttaatteemmeennttss
An example illustrating the loop statement
This example adds two n-bit operands and produces
an (n+1)-bit sum
Code:
48
This is what happens at
the elaboration phase
Each statement corresponds to a
full adder. The four full adders are
then cascaded together as a 4-bit
Ripple-carry adder