Digital Design With Systemc (with notes)

Specification Languages:
Part 2
Marc Engels
e-mail: marc.engels@flandersmake.be
Welcome to the second part of the course on specification languages.

2
Specification Languages
➢ Part 1: Specification Models
➢ Part 2: Model based system design
 Show how the models of part 1 can be used for
architectural design
Provide hands-on experience with SystemC v2.3.2 (released
in October 2017).
Introduce OO techniques for design of hardware systems
➢ Part 3: Project
The course on specification languages consists of 3 parts:
 First, an extensive overview was given of various specification models,
ranging from dataflow to finite state machines.
 In this second part, I will focus on the use of a subset of these models for
the architectural design of digital embedded systems. The main goal of
this part of the course is to learn how the specification models of part 1
can be used for the architectural design of embedded systems. For this
purpose, we will rely on SystemC version 2.3.2, which was standardized
by the IEEE in January 2012 (IEEE 1666-2011 language reference
manual) and for which the simulation library was released in April 2014.
SystemC is a class library on top of C++. As such, all object oriented
(OO) constructs of C++ can be used in the design of an architecture.
These OO techniques can bring the same benefits with respect to re-use
to architectural design as that they have brought to software design.
 Finally, you will apply the acquired skills in a small, but realistic, project.

3
Course Material for part 2
➢ Prerequisite:
 part 1 of specification languages
 C++ (good tutorial at www.cplusplus.com)
 Coding and debugging programs
 RTL description of synchronous digital circuits
➢ Material for part 2:
 Slides with notes.
 IEEE Standard SystemC Language Reference Manual, IEEE
Std 1666-2011.
As prerequisites for this course, I expect the following:
 Quite obvious you should have a good understanding of the first part of this course, and
particularly the presented models.
 Next, as SystemC is based on C++, also a decent knowledge of this programming
language is required. Basic OO concepts like classes, inheritance and templates should
be familiar to you. If not, review the C++ tutorial at www.cplusplus.com.
 In general, a structured methodology for developing and debugging programs is
essential for executing the exercises and the project. Familiarity with Integrated Design
Environments (IDE) like Eclipse is a benefit.
 When writing SystemC code, you should be able to describe the hardware that will be
generated from this code. Therefore a basic knowledge of register transfer level (RTL)
description of synchronous digital circuits is necessary. An RTL description of a circuit
consists of registers (e.g. D flip-flops) and combinatorial logic. The registers synchronize
the operation of the circuit to the clock signal while the combinatorial logic describes the
calculations performed by the circuit. RTL descriptions are used in hardware description
languages like Verilog or VHDL.
For part 2, the following material is available:
(1) The slides with notes can be found on the icorsi (icorsi.ch).
(2) The SystemC language reference manual, which can be downloaded from the IEEE
standards website (http://standards.ieee.org/getieee/1666/download/1666-2011.pdf)

Model Based System
Design
Class 1: constructing a
functional model
Marc Engels
e-mail:
marc.engels@flandersmake.be
In this first class we will focus on the functional modeling of a digital
embedded system. A functional model will describe the functionality of the
embedded system, independent of the platform or architecture on which this
functionality is executed. Therefore it is sometimes called a platform
independent model (PIM). In this class we will focus on the data flow
modeling paradigm for describing the functional model.
At the end of the class, you will be able to program a functional model of a
digital embedded system in SystemC.

5
Functional modeling in
SystemC
➢ Introduction to design of digital embedded systems
➢ SystemC introduction
➢ SystemC functional model syntax
➢ Exercise 1: building a functional model in SystemC
This class covers 4 topics:
(1) A general introduction to the design of digital embedded systems
(2) The role of SystemC in the design of digital embedded systems
(3) The syntax of the SystemC language for functional modeling (with the
dataflow paradigm)
(4) And finally an exercise to build a functional model in SystemC
Lets start with the general introduction.

6
Consumer devices become
increasingly more intelligent
Consumer as well as professional equipment is becoming increasingly
smarter. A few examples:
 Your car is being converted into a multimedia theater. The value of
the electronics in a car has increased consistently, resulting in
almost 100 electronic units in a luxury model. Recently a lot of new
safety functions (ABS, ESP, parking sensors, anti-collision systems,
etc.) have been introduced.
 It is hard to find a mobile phone with which you can only make a call.
Taking pictures, playing music, surfing the web, reading e-mail, etc.
are also features of a state-of-the-art mobile phone. Most phones
even have GPS functionality and run office software.
 Gaming becomes more interactive (e.g. Nintendo Wii, Microsoft
Kinect) and mobile.
 Photography has dramatically changed over the last decade: it has
become fully digital. Digital cameras are currently extended with
features like wireless connections, automatic picture enhancements
(e.g. red eye correction), etc.
 The era of service robots is coming. Robots to vacuum clean the
house, mown the lawn in the garden, etc. are already on the market.

7
… as well as professional
equipment
The evolution towards smart products is not limited to consumer devices.
We observe, for instance, the same trends in production machines.
 Harvesters have a growing number of functions for quality control,
obstacle detection and precision farming. To realize these smart
functions, the electronic control units become increasingly more
complex. Especially the software content is growing very fast
(20% average growth per year). The long term vision for combine
harvesters is to evolve towards full autonomous machines, that
can work without any operator on board and just receive a
command of the job to be done. Many more smart functionalities
will be needed to reach this goal.
 In compressors functions are introduced to optimize the energy
consumption based on the instantaneous demand of air.
 Weaving looms can adapt their speed to the quality (strength) of
the textile fibers.
 Professional washing machines automatically detect the load,
hardness of the water, etc. and adapt their washing program.

8
Characteristics of embedded
systems
➢ Optimize for power, cost, and size
➢ Robust design
➢ Provide the ability for evolution and mass customization
➢ Minimize time to market
➢ Some functionality might be safety-critical
➢ Interfacing with the real world, leading to real time constraints
To realize this smart functionalities, electronic systems and software have to
be embedded in consumer and professional devices. Such embedded
systems are minimizing power, cost and size, and hence work on a minimal
platform. For instance, 8-bit and 16-bit processors are still extensively used
in embedded devices. They must be robust. For instance, a mobile phone
must survive rude treatment. A car has an operation life of 7000 hours and
some machines are expected to work up to 100000 hours. Over their
lifetime, products are increasingly expected to evolve. Also more variants are
designed from the same platform. A typical example is the customization of
the mobile phones. And the product needs to be on the market before the
Christmas shopping. In many cases the system has even safety-critical
functionality, think about automatic braking system (ABS) or emergency
buttons, which require a guarantee on the reliability of the system. For the
development of such safety-critical functions, specific standards have to be
followed. The main distinctive characteristic of an embedded system,
however is that it has to interact with the real world, necessitating real-time
behavior.

9
Sensors Actuators
Real world process
Processing
Embedded systems combine
various types of real-time behavior
ADC DAC
event
signal signal
action
user
Signal
conditioning
Actuator
Powering
A system is said to be real-time if the correctness of an operation depends not only upon its
logical correctness, but also upon the time in which it is performed. In a hard real-time
system, the completion of an operation after its deadline is considered useless - ultimately,
this may lead to a critical failure of the complete system. A soft real-time system on the other
hand will tolerate such lateness, and may respond with decreased service quality (e.g. bank
terminal).
Depending on the inputs, two types of hard real-time constraints are distinguished in
embedded systems:
 Signal processing systems process inputs that arrive at regular intervals and the
system must be ready after a fixed time to process the next input. Signal
processing systems typically interact with their environment through sensors
(observe the environment) and actuators (control/influence the environment).
Sensors are components that translate non-electrical quantities (e.g. temperature,
pressure, ...) into electrical quantities (voltage, current). Since most observable
quantities are analog signals, sensors usually produce analog electrical signals. In
most cases signal conditioning is required to compensate the non-idealities in the
sensors and to prepare the sensor signals for the actual signal processing.
Because the signal processing is done digitally, an Analog to Digital Converter
(ADC) puts the sensor signal in the right format. Actuators perform the reverse
operation of sensors: they translate electrical quantities into non-electrical
quantities. Also actuators need analog signals and therefore a Digital to Analog
Converter (DAC) is needed. Because actuators need to influence the physical
environment they often require high power, hence power electronics circuits are
introduced to condition the control signal.
 When the input is an event and the system has to react within a certain time, this
is called a reactive system. Examples of reactive parts of an embedded system are
the interaction with the user or responses to external alarms.
As shown on the picture, embedded systems often combine various types of real-time
behavior.

10
Digital embedded systems
combine hard- and software
User
interface
NVM
ROM
mPorDSPcore
RAM
Conf. Logic
Memories
Peripheral
Mo-
dem
buffers
Video/
Graphics
processor Protocol
Speech
Processing
Analysis of
channel
+ analog, sensors and actuators
An embedded system can be separated into a digital part and an analog
part. The analog part contains for instance signal conditioning, ADCs and
DACs. In high-frequency applications, like radios or radars, it will be a large
part of the embedded system. Also sensors and actuators are part of the
embedded system. Traditionally these were discrete external components,
but recently they are increasingly integrated, when power permits, in a
package and even on chips.
The digital part is where the actual “intelligence” is. A growing part of the
functionality of embedded systems is implemented in software called
“embedded software”. This offers the advantage of increased flexibility
(functionality can be changed after production). As a consequence, the
digital part of an embedded system consists of 3 components:
Programmable processor cores. They can be general-purpose
micro-processors or more specialized digital signal processors
(DSPs).
Volatile and non-volatile memories.
Configurable (though parameters) dedicated logic.
The digital part can be implemented as a PCB with discrete components, a
multi-chip package, an FPGA or a fully integrated chip. In the latter case this
is often referred to as a System-on-Chip (or SOC).
In these classes we will mainly focus on the design of the configurable logic
(on FPGA or chip), although SystemC is also extensively used for the
modeling of SOCs.

11
Design flow for digital embedded
systems
System
Functionality
Functional
Requirements
Performance
Requirements
Architecture
Template
Architectural
Requirements
Mapping
Dedicated
Architecture
C-code
Non-functional
Requirements
For the design of a digital embedded system, we use a design flow that
consist of the following elements:
•During the functional design of the system, the designer determines what
the system has to do, based on the performance requirements (e.g. bit error
rates in communication systems) and functional requirements (e.g. specified
protocols). He also determines all algorithms. The system functionality is
expressed in a platform independent way.
•A reusable architecture template, or platform, consisting of processors,
memories, and dedicated logic, is defined or selected. The architecture
template should guarantee architectural requirements (e.g. interface
formats) and non-functional requirements (e.g. power or cost).
•Each function in the functionality is mapped on an element in the
architecture template.
•For the dedicated logic a circuit corresponding to the required functionality
is created, resulting in a dedicated architecture. Finally, by means of RTL-
synthesis the designer generates a gate level netlist. By the place and route
step this netlist is next transformed into a physical layout for this dedicated
architecture, which can be manufactured by a foundry. Alternatively, the
design is mapped to a configuration file for a programmable platform (e.g.
field programmable gate array or FPGA).
•For the functions mapped on processors, C-code is generated and

Page ‹#›
compiled.
The Y-model is represented as a top-down approach, but in a realistic design flow,
multiple iterations are performed before reaching the final embedded system.

12
Function to architecture
conversion follows three axes
Computations
operations
Data
variables, arrays
floating point
memories
fixed point
operators
Communication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture
In this course we concentrate on the architectural design of dedicated logic,
where the algorithms are mapped into an optimal architecture. The algorithm
will typically be specified into a functional model, e.g. data flow and
asynchronous state machines. The architecture needs a timed model, e.g.
register transfer level (RTL). To obtain the RTL description, a refinement
needs to be done for the computations, communications, and data. The
order of these refinements is not fixed. However, it is good practice to take
the most important design decisions first. Remark that for parts of the
system that are implemented on software, the complete refinement does not
need to be performed. However, a processor and a memory structure has to
be selected. For this purpose, certain refinement, like fixed point, can be
useful.

13
SystemC
We now take a closer look at the role of SystemC in the design of digital
embedded systems.

14
SystemC bridges gap between
function and architecture
MATLAB
C/C++
VHDL
Verilog
SystemC
System
Functionality
Dedicated
Architecture
Traditionally, a system functionality is expressed in MATLAB
(SIMULINK/STATEFLOW) or a standard computer language (C/C++). To
express the RTL description of the system, VHDL or Verilog is used. As a
consequence the transformation from functionality into architecture does not
only involve a change in semantics but also in syntax. Moreover, because of
the different languages, this transformation cannot be done incrementally.
SystemC resolves this issue, by offering a language that can express both
functionality and architecture.

15
What is SystemC?
➢ A modeling framework in C++ for the refinement of system from a functional
description into an architecture
➢ Contributions:
 hardware modeling with C++: OCAPI (IMEC) and SCENIC (Synopsys/UC
Irvine)
 fixed-point data types: Frontier Design
 hardware-software co-design: CoWare (IMEC/CoWare)
➢ Language first standardized in December 2005 as IEEE 1666, revised in 2011 as
IEEE 1666-2011
➢ Extensions of SystemC:
 Verification library.
 Transaction level modeling library ( integrated in IEEE 1666-2011).
 Analog and mixed-signal modeling.
➢ More info: www.accellera.org
SystemC is a C++ library that allows to refine a system from a functional description
into an architecture.
Three contributions were essential into the creation of SystemC:
 The modeling of RTL hardware with C++ was demonstrated in the OCAPI
framework of IMEC, as well as the SCENIC project of UC Irvine in
cooperation with Synopsys.
 Frontier Design (an IMEC spin-off) contributed to the fixed-point data
types.
 CoWare (another IMEC spin-off) introduced concepts of hardware-software
co-design.
The SystemC language was first standardized in December 2005 by the IEEE. A
revision (IEEE 1666-2011) was made in 2011.
More recently a number of extensions of the SystemC language were proposed:
 Verification library adds random generator and transaction recording.
 Transaction level modeling, a high-level approach to modeling digital
systems where details of communication among modules are separated
from the details of the implementation of functional units or of the
communication architecture. This extension is included in the revised IEEE
standard.
 Analog and mixed-signal library extends SystemC with the following
modeling paradigms: timed data flow, linear signal flow modeling, and
electrical linear network modeling.

Page ‹#›
All information about SystemC can be downloaded from the www.accellera.org website.

16
Which tools are available for
SystemC?
➢ Open source simulation library available
➢ Open source translators from Verilog or VHDL to SystemC
➢ Commercial synthesis tools:
 Cadence (Stratus HLS).
 Mentor(Catapult C).
 NEC(CyberWorkBench).
 SystemCrafter (SC).
 Xilinx (Vivado Design Suite).
With respect to tool support, the Accellera System Initiative
(www.accellera.org) makes an open-source simulation library available.
Various academic institutes also offer translators from Verilog or VHDL to
SystemC. For synthesis however, we have to rely on commercial tools.

17
SystemC language
architecture
C++ language
Core Language
Modules
Ports
Exports
Processes
Interfaces
Channels
Events
Event-driven simulation kernel
Data-types
4-valued logic type
4-valued logic vectors
Bit-vectors
Finite-Precision integers
Limited-Precision integers
Fixed-Point types
Pre-defined Channels
Signal, Clock, fifo,
Mutex, Semaphore.
Libraries for Specific Models of Computation and/or methodologies, e.g. TLM
interfaces, bus models, SystemC verification library
Utilities
Report Handling,
Tracing
User Application
The classes of the SystemC library fall into four categories: the core
language, the SystemC data types, the predefined channels, and the
utilities. The core language and the data types may be used independently
of one another.
At the core of SystemC is a simulation engine containing a process
scheduler. Processes are executed in response to the notification of events.
Events are notified at specific points in simulated time. In the case of time-
ordered events, the scheduler is deterministic. In the case of events
occurring at the same point in simulation time, the scheduler is non-
deterministic. The scheduler is non-preemptive, which means that once an
execution of a process is started, it cannot be halted but executes till the end
of the process.

18
SystemC core language
sc_module
sc_port
sc_prim_channel
sc_process
sc_interface
sc_event
sc_export
The SystemC core language contains a number of primitives to define
parallelism. A system is split in a number of modules (sc_module). A module
communicates with the external world through ports (sc_port). Two ports are
connected through a channel. SystemC predefines some primitive channels
(sc_prim_channel), but more complex channels can be user defined. A
channels connects to a port via an export (sc_export).
A hierarchical module consists of a structure of other modules. A non-
hierarchical module contains one or more processes (sc_process). A
process is executed in case that an events (sc_event) happens. A process
interacts with a channel through an interface (sc_interface), which is a
collection of functions that are supported by sc_port.

19
SystemC
SystemC contains all necessary constructs to model the functionality of a
system. We will focus on activity-oriented models, although SystemC can
also express other modeling paradigms. Let’s review these constructs.

20
process process
FIFO
Kahn Process Networks in
SystemC
➢ (Modules to structure design)
➢ Functional processes
➢ First-In-First-Out queues
➢ Simulation engine
SystemC has support to model Kahn process networks, with the limitation of
bounded queues. A Kahn process network is a directed network of
processes that are interconnected by first-in-first-out (FIFO) queues of
infinite size. Each time that a process is executed, tokens are consumed
from the input queues and new ones are produced in the output queues. If a
token is not present on an input queue, the consumption of the token will
block. Kahn process networks exhibits deterministic behavior that does not
depend on computation or communication delays. In SystemC the
constructs are available to define the processes and the queues. These
constructs interact with a simulation engine, which schedules the execution
of the processes. The simulation engine stops when there is no longer
activity in the network.

21
Modules are used for structural
partitioning the functionality
➢ Each module has its own class, derived from the sc_module
class.
➢ Every constructor of a module class shall have exactly one
parameter of class sc_module_name.
 It is good practice to make this name for an instance of the
module the same as the C++ variable name through which
the module is referenced.
➢ A module can be hierarchical or contains processes. In the latter case,
the SC_HAS_PROCESS(“class name”) macro is used to indicate
that the module contains processes.
Modules are used to partition the functionality in the design. However, you
should not use too many modules, as this complicates the design, but also
not too few. In general, functionality that is implemented in a different
architectural style (e.g. software or dedicated hardware) or on a different
location should be in different modules.
Every module is derived from the base class sc_module and should have a
name, which is used for debugging purposes.
The macro SC_HAS_PROCESS(“class name”) indicates that the module in
not hierarchical and contains processes.

22
Example of a functional model of
an adder
SC_MODULE(adder) {
//define ports
//define processes, internal data, etc.
SC_CTOR(adder) {
// body of constructor;
// process declaration, sensitivities, etc.
};
};
Class adder : public sc_module {
public:
// define ports
//define processes, , internal data, etc.
SC_HAS_PROCESS(adder);
adder(sc_module_name name):
sc_module(name) {
};
};
Explicit: With MACROs:
The slide shows an explicit definition of a modules, consisting of the class
definition, the SC_HAS_PROCESS macro and the constructor.
To compact the definition, two more macros are provides:
 SC_MODULE(“class name”) is equivalent to the first two lines of the
explicit definition
 SC_CTOR(“class name”) equals the SC_HAS_PROCESS macro
and the first lines of the constructor. It can be used when if only a
name is passed to the constructor. If you also want to pass
parameters, an explicit declaration is needed.

23
Ports are used to communicate
with a FIFO channel
➢ General port definition: sc_port<interface>
➢ Predefined ports are: sc_fifo_in<T> and sc_fifo_out<T>.
 sc_fifo_in<T> is derived from sc_port<sc_fifo_in_if<T>,0> with interface
functions read(), nb_read(), and num_available().
 sc_fifo_out<T> is derived from sc_port<sc_fifo_out_if<T>,0> with interface
functions write(), nb_write(), and num_free().
➢ blocking read and write interface functions (automatic synchronization with
implicit wait() operations)
int a = f1.read(); // read a token
f1.write(a); // write a token
➢ Inspecting queues
int a = f1.num_available(); // number of tokens in a queue
int a = f1.num_free(); // number of free places in a queue
In SystemC the sc_port object is used to communicate with a channel. Ports
provide the means by which a module can be coded such that it is
independent of the context in which it is instantiated. A port forwards
interface method calls to the channel to which the port is bound.
For functional modeling, processes communicate through fifo ports. Two port
types for sc_fifo<T> channel, where T is the basic type of the elements in
the fifo channel, are supported:
 Input: sc_fifo_in<T> which is basically equivalent to
sc_port<sc_fifo_in_if<T>,0>, where the first parameter is the input
interface of a FIFO and the second parameter specifies that multiple
channels can be connected to a FIFO. However the practical use of
these multiple bindings is not clear. Therefore it could be useful to
define its own fifo port with a restriction of a single binding.
 Output: sc_fifo_out<T> which is equivalent to
sc_port<sc_fifo_out_if<T>,0>. Also here, the use of multiple bindings
is not recommended.
Several functions are associated to the sc_fifo class:
 read() gets a token from the queue. It blocks when no tokens are
available.
 write() puts a token on a queue. It blocks when there are no free
spaces in the queue
There are also inspecting functions available to look at the number of tokens
or free spaces.

24
an adder (continued)
SC_MODULE(adder) {
sc_fifo_in<int> a,b;
sc_fifo_out<int> c;
//define processes, internal data, etc.
SC_CTOR(adder) {
};
};
When we add the definition of the ports to the constructor of the adder we
obtain the code on the slide.

25
SC_THREAD processes are used
to model functional processes
➢ SC_THREAD processes run forever once started.
➢ SC_THREAD processes can be suspended by means of the
wait(event) function. In functional modeling the wait
statements are hidden in the read() and write() functions to the
queues.
➢ Multiple processes per module are possible
➢ Processes can also be dynamically created.
The actual computation in the application is performed in the processes. As
a consequence, they also define the parallelism in the application.
SystemC supports three types of processes. For functional modeling we use
the SC_THREAD process. An SC_THREAD process runs forever when
started. It can be suspended by a wait(event) function. Often the wait(event)
function is implicitly present in the communication functions.
Processes are executed on events. These events can be statically or
dynamically defined. Static sensitivity is set by means of the variable
sensitive of sc_module. Dynamic sensitivity to a certain event is set by wait
(event) for an SC_THREAD process.
A module can have multiple processes.
Processes might be dynamically created during simulation. However, no
synthesis support exists for dynamic processes. Therefore, we do not use
them in this course.

26
SC_MODULE(adder) {
sc_fifo_in<int> a,b;
sc_fifo_out<int> c;
void compute() {
while(true) {
int valuea = a.read();
int valueb = b.read();
c.write(valuea+valueb);
}
}
SC_CTOR(adder) {
SC_THREAD(compute);
}
};
Adding the definition of an SC_THREAD process to the adder results in the
code on the slide. This adder waits for data on both its input queues
sequentially and next produces a token on its output queue.

27
Define the main program
➢ The systemc library must be included in the main program:
 #include <systemc.h>
➢ In sc_main() the following actions are taken:
 Instantiate channels with:
• sc_fifo<T> (”name”, length); // default length 16
• e.g. sc_fifo<int> f1(”f1”,2);
 Instantiate the modules.
 Bind ports of modules to channels:
• Positional
• named.
 Call sc_start() to start simulation and run until end of any
activity.
The global structure of the system is defined in the main function. Because
main() is already used by the SystemC library, the main function for the
user application is sc_main().
In sc_main(), the following actions are taken:
1. Instantiation of the channels. The basic channels that we use in
functional modeling is sc_fifo. A FIFO queue is defined by means
of the template class sc_fifo<T>. T can take on any basic data
type, e.g. int, float, etc. The sc_fifo class declares a finite length
buffer of tokens. The default length is 16 elements. The queue
also has a name for debugging and statistics retrieval purposes.
The constructor for the queue is sc_fifo<T> f1 (“name f1”, length);
A sc_fifo can only be written from one process.
2. Instantiation of the modules. A module can be instantiated multiple
times.
3. Binding the ports of the modules to the channels. This can be
done in two ways: positional or named. Named binding is
preferred because it is less prone to errors than positional port
binding.
4. Start the simulation.

28
int sc_main(int argc , char *argv[]) {
sc_fifo<int> fifo_a, fifo_b, fifo_c; //channel instantiation
… // instantiate signal generation and evaluation module
adder my_adder(“my_adder”); // module instantiation
my_adder.a(fifo_a); // binding of port to channel
my_adder.b(fifo_b);
my_adder.c(fifo_c);
… // other modules and test bench, which drive fifo_a and fifo_b.
sc_start(); // start simulation
};
Elaborationphase
The sc_main() function for the adder is shown on the slide.
Remark that the arguments of sc_main() are identical to these of main().
To connect the ports to the channels, named bindings are used.

29
SC_MODULE(superfunc) {
// IO ports
sc_fifo_in<float> in;
sc_fifo_out<float> out;
//internal queues
sc_fifo<float> d;
// internal modules
function func1;
function *func2;
// Module constructor
SC_CTOR (superfunc):
func1(“func1”) {
func1.in(in);
func1.out(d);
func2 = new function (“func2”);
func2->in(d);
func2->out(out);
}
}
Modules can also be used to
create hierarchy
func1
superfunc
d
func2
sc_module(function)
In a functional model hierarchy will be used to make the design more
readable. The hierarchy is fully transparent: it basically acts as a container
for the basic modules, but does not add any functionality or synchronization.
The definition of a hierarchical module consists of the definition of the ports
and internal queues. Next the internal modules are defined. Care must be
taken that the module objects will still exist after execution of the constructor.
Two alternatives exist to guarantee this: either construct them when calling
the constructor, or create them with a new function.
The constructor creates the two modules and binds the ports to the
channels.

30
Simulation engine
➢ In an un-timed model, the simulator only advances in delta-
cycles:
 If it is started to run for a finite amount of time, it will never
stop.
 We therefore run it until no events are present: sc_start();
➢ Ways of stopping the simulator:
 Terminate a process (return from SC_THREAD): the simulator
will stop due to the lack of events.
 Call sc_stop() when a termination condition is fulfilled.
In a functional model no notion of time is present. Every action processes
infinitely fast. As a consequence, the simulation kernel only advances in
delta cycles of infinite small time units. If we would start the simulation kernel
with a finite amount of time to run, it would never reach that time and hence
run forever. Therefore we run the simulation kernel until no events are
present any more. This is achieved with the sc_start() command.
With this approach, there are two ways of stopping the simulation:
1. We can exit a SC_THREAD. By doing so, no events will be
produced anymore and the simulation will finally stop because of the
lack of events.
2. We can check for a termination condition and explicitly call sc_stop().
This approach was used in the exercise of class 1. When the whole
image is processed and written to file, the simulation is explicitly
stopped. In general this is also the safest and most elegant way of
controlling the simulation.

31
SystemC
Finally, let’s exercise what we have learned so far.

32
Goal of this exercise
➢ use a simplifiedJPEG block diagram to practice functional
modeling
➢ develop a functional process that fits into a system
➢ simulate a functional model
➢ observe the overall behavior of a system
The goal of this exercise is to practice functional modeling. We will use a
simplified JPEG block diagram for this purpose. A process will be defined
and integrated in a JPEG functional model. Next the functional model will be
simulated and the overall behavior of the system will be observed.

33
What is JPEG?
➢ “JPEG” stands for
“Joint Photographic Experts Group”
➢ “JPEG” is a standard for color image compression
➢ “JPEG” is widely used (e.g. on the WWW)
➢ More information?
 http://www.jpeg.org/
JPEG stands for “Joint Photographic Experts Group” and is a compression
standard for color images. It is widely used. More information can be found
on www.jpeg.org

34
(Partial) JPEG: a simple block
diagram
DCT
Quantize
(+table)
ZIGZAG
SCAN
RUN-LENGTH
ENCODER
IDCT
Normalize
(+table)
ZIGZAG
SCAN
RUN-LENGTH
DECODER
Original
Image
Reconstructed
Image
JPEG-ENCODER
JPEG-DECODER
R2B
B2R
Parameters: width, height, #bits
Parameters: width, height, #bits
A simplified block diagram of a JPEG encoder and decoder is shown on the
slide.
First and original image is inputted and split in 8x8 blocks (R2B). Together
with the pixel data, also width, height and number of bits per pixel are
extracted from the image.
Next, on each 8x8 block, a discrete cosine transform (DCT) is performed,
resulting in 8x8 DCT coefficients. These DCT coefficients are quantized and
reorganized in the zigzag scan module. The resulting coefficient stream is
run-length encoded. This last block is different from the JPEG standard
where an Huffman encoder is used.
In the decoder the reverse operations are performed in the reverse order.

35
2D Discrete Cosine Transform
➢ Non-optimized equation
➢ DCT can be separated in consecutive 1-D operations
➢ There are many optimized DCT-algorithms available
           
 


7
0
7
0 16
12
cos.
16
12
cos,
4
1
,
i j
vjui
jifvCuCvuF

           
 


7
0
7
0 16
12
cos.
16
12
cos,
4
1
,
u v
vjui
vuFvCuCjif

01
0
2
1
)(







l
l
lCwhere
The discrete cosine transform (DCT) is performed on a 8x8 pixel block and
returns an 8x8 block of DCT coefficients. Each DCT coefficient indicates the
amplitude of a horizontal and vertical frequency component. The inverse
discrete cosine transform (IDCT) returns pixel values from DCT coefficients.
The formal definition of the DCT and IDCT are shown on the slide. In stead
of this straight forward 2D operation the calculation can be split in
consecutive 1D operations, which is more efficient. There is also a large set
of optimized DCT-algorithms that exploit the regular structure of the cosine
values.

36
Quantization
➢ Each DCT coefficient is divided by the coefficient amplitude
that is just detectable by the human eye (table)
➢ The result is rounded to an integer
➢ This reduces the number of bits needed to represent the DCT
coefficient
➢ The quantization is the place where information of the image
might be lost, resulting in lossy compression.
Next the DCT coefficients are quantized. To this end each DCT coefficient is
divided by the corresponding value in the quantization table.
The result is rounded to the nearest integer, reducing the number of bits
needed to represent the DCT coefficient.
In the quantization step image information might be lost, resulting in lossy
compression.

37
Quantization Table
9910310011298959272
10112012110387786449
921131048164553524
771031096856372218
6280875129221714
5669574024161314
5560582619141212
6151402416101116


























N
An example of a typical quantization table is shown on the slide. It can be
remarked that the quantization values grow for higher horizontal or vertical
frequencies.
JPEG contains a number of predefined quantization tables. If a custom
quantization table is used, it must be sent to the decoder.

38
The coefficients are zigzag
scanned
0 1 5 6 14 15 27 28
2 4 7 13 16 26 29 42
3 8 12 17 25 30 41 43
9 11 18 24 31 40 44 53
10 19 23 32 39 45 52 54
20 22 33 38 46 51 55 60
21 34 37 47 50 56 59 61
35 36 48 49 57 58 62 63
The resulting quantized DCT coefficients are next zigzag scanned. This is
done in such an order that statistically long sequences of zero coefficients
can be expected.

39
(Simplified) Run-length coding
➢ Send the DC value “as is”
➢ Represent the high frequency data with (zero run-length,
amplitude) combinations.
➢ End the stream with EOB (= 63).
➢ Example:
 in: 79, 0, -2, -1, 3, -1, 0, 0, -1, 0, 0, 0, …
 out: 79, 1,-2, 0,-1, 0, 3, 0,-1,2,-1, 63
Next we use a non-JPEG run-length coder for our exercise. This coding
works as follows:
 The DC value is sent “as is”
 The high frequency data is split in sections consisting of a number of
zero’s followed by a non-zero coefficient. Each segment is
represented by a couple consisting of the number of subsequent
zero’s and the value of the non-zero coefficient.
 When all remaining coefficients for a block are zero, an end of block
(EOB=63) value is sent.

40
How to start?
➢ Download exercise files form http://www.icorsi.ch/
➢ Follow installation instructions of exercises.
➢ you will find:
 In /exercises/exercise1/: main.cpp to start from
 In/exercises/modules/: library with JPEG encoder modules
{r2b,dct,quantize,zz_enc,rl_enc}.{h,cpp}, JPEG decoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules {src,snk,test}.{h,cpp}
 In /exercises/images/: test images
 In /exercises/add2systemc additional functions (df_fork, fifo_stat)
➢ Things to be done:
 make rl_dec.h and rl_dec.cpp
 complete the main.cpp with the modules.
 Compile and execute the application.
 Inspect the number of reads and writes in the fifos
 Visualize resulting image
 Test if you can launch the application in the debugger.
 Optional: make a hierarchy for the encoder and decoder.
You will find all files for starting in the exercise1 directory.
Perform the actions as indicated on the slide.
To obtain information about the number of writes and reads in the fifo’s, use
the type fifo_stat<T> i.s.o. sc_fifo<T>.
To prevent multiple bindings of a fifo_port, the classes my_fifo_in<T> and
my_fifo_out<T> are used in the exercises.

41
Using SystemC on
Linux/Cygwin
➢ Use g++ (I used version 4.5.3).
➢ Make a workspace in Eclipse:
 Add your source files to the project.
 Add libmodules.a
 Add libadd2systemc.a (for next exercises).
 Add libsystemc.a
 Put the right include paths and linker paths
➢ Build your application from within Eclipse.
➢ Execute your application from within Eclipse.
 Exercise1.exe –i ../images/mountain.pgm –o result.pgm
We will make the exercises in a Linux environment, using g++ and Eclipse.
Eclipse is an integrated development and debugging environment. In the
exercise directory there is a step-by-step guide of how to get started with the
exercises in Eclipse.
The recent sources of the exercises and libraries can be found at
http://www.icorsi.ch/
Libraries have to be compiled before starting the exercise session.

Model Based System
Design
Class 2: Fixed-point
refinement
Marc Engels
In this second class we will focus on the refinement of the data types of the
functional model. More in particular we will explain the definition of fixed-
point word lengths for the variables in the functional model. This action is
relevant both for mapping on embedded processors with limited data sizes,
e.g. 16-bit processors, or for mapping on a dedicated architecture.
A the end of the class, you will be able to perform fixed point refinement on a
functional model of an embedded system in SystemC.

43
Fixed point refinement
➢ Fixed word length optimization
 Overflow and quantization
 MSB determination
 LSB determination
➢ Fixed word length support in SystemC
➢ Exercise 2: fixed point refinement of IDCT
This lecture on fixed point refinement consists of three parts:
• In the first part we introduce the quantization and overflow effects of
fixed point representations. We also present some methods to
determine the most and least significant bits (MSB and LSB).
• Next, we introduce the fixed point support in SystemC. This consists of
an extensive set of fixed point types. In addition, SystemC also
supports 4-valued logic to define bus structures.
• Finally, we introduce the exercise on fixed point refinement.

44
Fixed point refinement is one of the
steps in architectural design
Computations
operations
Data
variables, arrays
floating point
memories
fixed point
operators
Communication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture
Let’s concentrate on the architectural design step that translates an
algorithm into an optimal architecture. The algorithm will typically be
specified into a functional model, like data flow. The architecture needs a
timed model, e.g. register transfer level (RTL). Initially the algorithm will be
modeled in floating point. Cost-effective implementation requires, however, a
refinement into fixed point types.

45
*
3 bytes (mantissa)
+ 1 byte (exponent)
Fixed-point
•minimum area
•low power
•high speed
8
*6
14
Finite word lengths are a must
for DSP applications
Floating-point
•powerful
•expensive (storage & ops)
Most signal processing algorithms are specified in floating point precision.
This is a very powerful signal representation with high accuracy, but is also
expensive in storage and operation cost. For instance, a typical
representation of a floating point number is a mantissa of 24 bits and an
exponent of 8 bits. As a consequence, a floating point multiplication is
equivalent to a 24-bit multiplication and a 8-bit addition.
However, many applications, like cable modems and wireless
communication devices, require low cost and low power for a high
processing speed. As a consequence, the DSP algorithms will be performed
in fixed-point arithmetic. With an 8-bit fixed point notation, for instance, the
cost will drop dramatically as the hardware cost for a multiplication is a
quadratic function of its input width.
This requires the designer to translate floating point types into fixed point
types, using a refinement strategy.

46
2
3
2 2 2 2 2i.2
2 1 0 -1 -2 -3
WL
IWL
MSB LSB
How to model a fixed-point
signal?
➢Total number of bits WL
➢Integer bits IWL
➢Value representation
•2’s complement (i=-1)
•unsigned (i=1)
WL-IWL
A fixed point type can be defined by three parameters:
• The total number of bits WL.
• The position of the decimal point, indicated by the number of integer
bits IWL.
• The way in which the value is represented. In the case of a signed
number, 2’s complement notation is the most common because it
allows easy arithmetic. However, alternatives like sign-magnitude and
1’s complement are also feasible.

47
How do we quantize?
truncate
(floor)
fxp
flp
round
fxp
flp
magnitude
truncate
fxp
flp
ceil
fxp
flp
If the result of a calculation has more precision than available in the fixed
point format, the value has to be quantized. Several ways of quantization
exist:
• Truncate or floor is the cheapest approach because it is standard
available in hardware. However, it generally gives the worst
performance of the quantization techniques.
• Magnitude truncate realizes a floor function for positive values and a
ceil function for negative values. The technique is natural for sign
magnitude representations. The advantage is a symmetrical behavior
around the zero value.
• Applying the ceil function to the complete range is an alternative which
is seldom used.
• Rounding is the technique with the best performance for most cases.
However, it also is the most expensive one. In hardware this requires
the addition of 0,5 the least significant bit followed by a truncation
operation.

48
What happens on an overflow?
wrap-around saturation
flp flp
fxp fxp
max. value
When the result of an operation is larger than the maximum value that can
be represented by the fixed point format (overflow), we have two
possibilities:
• Wrap-around: the overflow bits are neglected. For unsigned values, this
is equivalent to a modulo operation (see figure on slide). For 2’s
complement numbers, a one bit overflow results in the maximum
negative number. This is the standard behavior in a hardware
implementation.
• Saturation: when an overflow occurs, the signal is set to the maximum
value that can be represented. Additional hardware is necessary to
realize this behavior.
Remark that a similar situation can occur for the minimum value of a signal.
For instance, if the subtraction of two unsigned signals results in a negative
value and must be represented in an unsigned format. For such underflow,
similar remedies are possible.

49
Saturation Hardware
MAX_VAL
MIN_VAL
comp
comp
mux
mux
VALUE RESULT
When we opt for a saturation strategy, the following hardware is needed. The
result of the operation must be compared to the maximum positive and
negative numbers. This can be done with an explicit comparator or with the
overflow flags from the adders. If overflow or underflow is reached, the result
of the operation is replaced by the maximum or minimum value respectively.
Remark that the hardware complexity of a comparator or multiplexer is
comparable to a adder. As a consequence, saturation hardware can require
a significant amount of area.

50
Floating-point
algorithmADC
8 7
*
*
+
?
?
?
?
??
During design we must specify
fixed-point formats for signals
z-1
DAC
Going back to the need for fixed point representations, the designer is faced
with the following problem. He obtains a floating point algorithm and needs
to translate the floating point types into fixed point types, using a refinement
strategy. For each floating point number, a fixed point characteristic
(including total and integer word lengths, overflow and rounding behavior)
must be chosen. In most situations the input and output formats are defined
by the system context (e.g. analog-to-digital converter). Remark that
determining these ADC and DAC precisions is an important task in the
overall system design.

51
Fixed-point refinement is a
complex optimization problem
➢Minimize overall cost:
minimal word lengths
truncate and wrap-around
➢MSB determination:
goal: avoid unwanted overflows
method: find min, max signal values
result: MSB position, value
representation, overflow
➢LSB determination:
goal: keep required precision
method: evaluate difference
between flp a fxp behavior
result: LSB position, quantization
safe range
quantization
This fixed-point refinement is a complex optimization problem where the
search space grows exponentially with the number of signals. The goal of
the optimization is to minimize the overall implementation cost and power
consumption. At the same time the performance degradation (e.g.
implementation loss for telecom systems) must be small. Remark that it is
essential to define a performance degradation bound (e.g. implementation
loss for communication systems, visual performance measure for multimedia
systems) before starting the fixed point refinement.
The optimization problem can be separated in two parts:
1. Determination of the most significant bit (MSB). First, the minimum and
maximum signal value must be determined. From this the MSB
position, value representation and overflow behavior is selected such
that overflows are avoided as much as possible.
2. Determination of the least significant bit (LSB). By evaluating the
difference in performance between the fixed and floating point behavior
of the algorithm, the LSB position and quantization method are
determined for each signal. The goal is to stay within the performance
degradation bound.
In the next slides we will take a closer look at methods for MSB and LSB
determination.

52
MSB determination can be
based on range calculations
* +
d
m
x
y
➢Put range (min, max) on inputs
➢Propagate range over the operators
➢This gives a save (pessimistic) estimate
range
info
[0,255]
12
range
calc.[0,255]
[0,3060] [0,3315]
z-1
MSB determination can be done by means of range propagation. This
analytical method works as follows:
1. On each input signal, the range, i.e. the minimum and maximum
values that occur in a signal, are specified.
2. Next, the signal flow graph of the algorithm is traversed and for each
operator, the range of its output is calculated based on its input ranges.
Because the method exactly calculates the exact minimum and maximum
signal values, it results in a safe, but sometimes pessimistic, estimation of
MSB position.

53
Range propagation is a simple
calculation
Operator minc maxc
c=a+b mina+minb maxa+maxb
c=a-b mina-maxb maxa-minb
c=a*b MIN(mina*minb,
mina*maxb,
maxa*minb,
maxa*maxb)
MAX(mina*minb,
mina*maxb,
maxa*minb,
maxa*maxb)
Range propagation on the operators is a simple operation. The table on the
slides shows the rules for add, subtract and multiply operations.

54
Range calculations can get
unstable with feedback
*
+
a
X(n) Y(n)
z-1
F(n)
sample n
maxF
minF
value
When applied to feedback signals, range propagation can become unstable
and cause continuous growth of the minimum and maximum values. An
example of such a situation is shown on the slide. In such a situation, a
statistical inspection of the real signals will be needed to determine a realistic
MSB position.
Remark that the propagation mechanism also causes that all signals within
this feedback loop or depending on the output of the feedback loop will
struggle from this range explosion. Once saturation logic is introduced at one
place in the loop this problem will be solved.

55
* +
d
m
x
12 y
stimuli
?min, max
q1
Collecting signal statistics from
simulations is an alternative
➢Perform simulation with realistic stimuli.
➢Collect minimum and maximum value on each signal during the
simulation
➢This gives an optimistic, stimuli dependent estimate
z-1
As an alternative to the analytical range propagation method, we can collect
the signal statistics during simulations. Because the obtained range
information will be stimuli-dependent, this will give an optimistic estimation of
the minimum and maximum values. As a consequence, to maximize the
confidence in the obtained results, the stimuli set should be large and
provide a complete coverage of the algorithm code.

56
signal statistic range propagation
name min max MSB1 min max MSB2
signal1 -1.5 1.6 2 -1.9 1.9 2
signal2 -1.3 1.4 2 -2.1 2.1 3
signal3 -1.2 1.2 2 -22.0 22.0 6
signal4 -1.2 1.2 2 -∞ ∞ ∞
Combine both methods for
accurate MSB determination
➢If MSB1 == MSB2: wrap-around(MSB1)
➢If MSB1 < MSB2: wrap-around(MSB2)
➢If MSB1 << MSB2: saturation (MSB1)
➢MSB2 is ∞ saturation (MSB1)
As can be expected, combining both methods gives the best results. Each
signal in the system will then be in one of the following situations:
• Both methods result in the same MSB position. Quite logically, the
signal can safely be specified with the resulting MSB position and wrap-
around overflow behavior.
• When the analytical MSB position is larger than the statistical MSB
position, we can make a trade-off between the analytical MSB with
wrap-around and the statistical method with saturation. In most case
the wrap-around functionality will be the most economical. Only when
the statistical MSB position is much smaller, saturation logic can be
beneficial.
• In the case of a range growth because of feedback, the analytical MSB
position cannot be calculates (going to infinity). In this case, the
statistical MSB position is chosen together with a saturation behavior.
After introducing the saturation on one signal in the feedback loop, we
need to re-simulate to get useful results for the rest of the algorithm.
An example of each of these situations is shown on the slide.

57
Q +
B bits
input output outputinput
noise
Quantization effects can be
modeled as additive noise
➢Noise is approximated by a statistical model with the following
assumptions:
the noise is uncorrelated to the input.
the noise is white.
the probability distribution is uniform.
When we look at the LSB side, the question arises what the effect is of
quantization. Many authors approximate the quantization effect as an
additional noise source. They assume that:
• The noise sequence is a sample of a stationary random process (i.e.
whose statistical parameters do not change over time).
• The noise sequence is uncorrelated with the input sequence.
• The random variables of the noise process are uncorrelated, i.e. the
error is a white-noise process.
• The probability distribution of the error process is uniform over the
range of the quantization error.

58
Each quantization effect has
mean and variance
➢ Rounding with step D:
➢ Truncation with step D:
➢ Magnitude truncation with step D:
12
and0
2
2 D
 nnm 
12
and
2
2
2 D

D
 nnm 
3
and0
2
2 D
 nnm 
The noise process can then be modeled by means of its mean and variance.
The expressions for mean and variance for the three most popular
quantization methods are shown on the slide. D is the quantization step.
Rounding and magnitude truncation result in a 0 mean, but rounding has the
lowest variance. Truncation and rounding have the same variance, but
rounding has the lowest mean. As can be expected, rounding introduces the
least quantization noise.

59
This results in an equivalent
linear network
Q1 +
* +
d
m
x
12 y
z-1
Q2 * +
d
m
x
12 y
z-1
e1(t)
+
e2(t)
))1()()(12())1()(12()( 121  tetetetxtxty
Replacing the quantization by an additional noise source results in a linear
model of the quantized algorithm. This can then be analytically analyzed by
means of well-developed linear signal processing theory. For many
quantization effects, this linear model is a good approximation. It has, for
instance been used to determine the effects of quantizing the signals in FIR
filters.
As an exercise, calculate the resulting signal to noise ratio in the case that:
• x(t) ranges between 0 and 255 with a uniform distribution.
• both quatization steps are rounding the values to the nearest integer.

60
… but quantization is a non-
linear operation
*
+
-0.96
X(n) Y(n)
z-1
Q
X(0) = 14, x(n) = 0 for n > 0
round to nearest integer
B bits
...
...
with rounding:
without rounding:
However, not all applications are linear. Quantization in non-linear systems
can lead to non-intuitive behavior. In infinite impulse response (IIR) filters, for
instance, quantization can generate limit cycles. For a stable floating-point
IIR filter implementation, the output will decay asymptotically to zero when
the input becomes zero. For the same system, implemented with finite
precision, the output may continue to oscillate indefinitely with a periodic
pattern while the input remains equal to zero. This effect is often referred to
as zero-input limit cycle behavior. An example of such behavior is shown on
the slide.

61
LSB determination is based on
simulations
All fixed-point
simulate
output
ok
yes
no
* +
stimuli
12
x
ym
Q
* +
12
x
ym com
pare
Q
z-1
z-1
Non-linear quantization effects are difficult to analyze analytically. Therefore,
mostly simulation based methods are used. To this end the output of a
reference simulation is compared to a simulation with the quantized signals.
Again sufficient large stimuli sets, which have a complete code coverage,
must be used.

62
Signal to quantization noise
ratio (SQNR)










 22
22
10log10
ee
ss
x
m
m
SQNR


Q
-
e
me,e
ms,s
xQ
To get a better insight in the optimization trade-off, the difference between
the floating-point and fixed-point values (e) and the resulting signal to
quantization noise (SQNR) is a useful guidance.
The SQNR for all signals is calculated as follows:
• During signal assignments the statistics (mean, standard deviation) for
the error signal as well as for the output signal are collected.
• At the end of the calculate the signal to quantization noise ratio SQNR
is calculated for each signal.

63
LSB selection optimizes cost and
performance
quantization
set
SQNR
pi
SQNR
accu
SQNR
pix
SQNR
coeffs
SQNR
block
SQNR
temp block
SQNR
blocki cost SNR PSNR
0 208 253 Inf 184 Inf 225 Inf 787968 27,64 31,49
1 45,5 59,76 Inf 174 Inf Inf Inf 759296 27,48 31,33
2 45,5 59,76 25,15 174 Inf Inf Inf 759296 22,66 26,51
3 45,5 59,76 38,77 174 Inf Inf Inf 759296 26,91 30,75
4 45,5 59,76 47,3 30,88 Inf Inf Inf 230912 27,35 31,19
5 45,5 59,8 47,3 30,88 29,38 Inf Inf 230912 27,34 31,19
6 45,5 61,4 47,3 30,88 29,38 -1,93 Inf 41472 20,47 24,32
7 45,5 59,8 47,3 30,88 29,38 Inf Inf 72192 27,34 31,19
8 45,5 60,23 47,3 30,88 29,38 16,73 Inf 56832 26,96 30,8
9 45,5 59,88 47,3 30,88 29,38 31,86 Inf 67072 27,31 31,16
The optimal LSB is determined by running the simulation multiple times with
various quantization sets. For each quantization set, the SQNR per signal,
the overall SNR and PSNR, and the cost is calculated. The goal is to find the
cheapest solution that realizes the specified performance. This procedure
can be automated by means of an optimization routine.
When changing the quantization for one signal at the time, the statistics give
an impression of the sensitivity of the cost and the performance to the
quantization of a signal. As a rule of thumb, the SQNR of a signal should be
higher than the overall SNR.
Remark that the SQNR and SNR statistics are dependent on the input. As a
consequence, the optimization should be performed on a representative set
of inputs.

64
In the next part we discuss the fixed point support in SystemC

65
SystemC introduces a number
of specific data types
Type Description
sc_logic 4 value {0,1,X,Z} single bit
sc_int 1 to 64 bit signed integer
sc_uint 1 to 64 bit unsigned integer
sc_bigint Arbitrary size signed integer
sc_biguint Arbitrary size unsigned integer
sc_bv Arbitrary sized 2 value vector
sc_lv Arbitrary sized 4 value vector
sc_fixed Signed fixed point
sc_ufixed Unsigned fixed point
sc_fix Untemplated signed fixed point
sc_ufix Untemplated unsigned fixed point
SystemC introduces a number of specific data types, which correspond to
data types that are frequently used in Hardware Description Languages
(HDLs). These types include sc_logic to make 4 valued representation that
can be high (1), low (0), undefined (X) or in a high-impedance (Z) state.
Integers can be of arbitrary length with sc_int, sc_uint, sc_bigint and
sc_biguint. SystemC also supports logic vectors with 2 or 4 valued logic with
sc_bv and sc_lv. sc_fixed and sc_ufixed define fixed point numbers where
the characteristics of the number are defined by a template. sc_fix and
sc_ufix use a run-time argument to define the fixed point characteristics.
This is interesting to try out different quantization settings without
recompilation. However, these types can not be used in synthesis, while the
others can.

66
SystemC templated fixed-point
types
➢ Two fixed point templates
 sc_fixed <wl, iwl, q_mode, o_mode, n_bits> x; // signed
 sc_ufixed <wl, iwl, q_mode, o_mode, n_bits> y; // unsigned
➢ Parameters:
 wl = number of bits
 Iwl = number of integer bits
 q_mode = quantization method (SC_RND / SC_TRN / SC_TRN_ZERO
/ ...)
 o_mode = overflow method (SC_SAT / SC_WRAP / … )
 n_bits = number of saturated bits in case of wrapping (default 0)
➢ If quantization and overflow not specified the defaults (SC_TRN and
SC_WRAP) are used
Two data types provide full flexibility in representing fixed point numbers with
static parameters: sc_fixed (signed, 2’s complement numbers) and sc_ufixed
(unsigned numbers). The constructor of these fixed-point types carry the
information of the word lengths and quantization and overflow behavior:
• wl is the total number of bits
• iwl represents the number of integer bits, i.e. left from the binary point.
• q_mode specifies the quantization method to be rounding (SC_RND),
flooring (SC_TRN), or magnitude truncate (SC_TRN_ZERO). In
addition, some very particular, rarely used quantization modes are
specified.
• o_mode selects the overflow mode to be saturation (SC_SAT),
saturation to zero (SC_SAT_ZERO), symmetrical saturation
(SC_SAT_SYM), wrap-around (SC_WRAP), or sign-magnitude
wrapping (SC_WRAP_SM).
• n_bits specifies the number of saturated bits in case of wrapping. This
allows to generate some special wrapping methods that keep the sign
of the signal. Default nb is set to 0.

67
Fixed point lengths
sc_fixed <5, 7> v;
X X X 0 0 [ -64 , 60 ]X X
sc_fixed <5, 3> v;
X X X [ -4 , 3.75 ]X X
sc_fixed <5, -2> v;
X X X X X [ -0.125 , 0.109375 ]S S
Two of the arguments specified to the fixed point data type were word length
(wl) and integer word length (iwl). Word length must be greater than 0.
Integer word length can be positive or negative, and larger than the word
length.
For instance if the word length is specified as 5 bits but the integer word
length is 7 then two zeroes will be added to the end of the object.
If the integer word length is a negative value then sign bits after the binary
point will be extended. For instance if wl = 5 and iwl = -2 then two sign bits
will be added to the object. The sign bits are simply the most significant bit of
the 5 bit number. By extending the sign bits, the value of the number is
maintained.

68
Quantization methods
sc_ufixed <5, 3, SC_RND> v;
v = 3.1875
0 1 1 0 1
3.1875
011.0011
3.25
0 1 1 0 0 3.0
sc_ufixed <5, 3, SC_TRN> v;
v = 3.1875
[ 0 , 7.75 ] precision = 0.25
quantization
error
0.0625
0.18753.1875
011.0011
This slide shows an example that illustrates the difference between rounding
and flooring functionality. As can be seen, rounding always results in smaller
quantization errors than flooring.

69
Overflow handling
sc_fixed <5, 5, SC_RND,SC_SAT> v;
v = 18 ;
0 1 1 1 118 15
1 0 0 1 018 -14
sc_fixed <5, 5, SC_RND,SC_WRAP> v;
v = 18;
[ -16 , 15 ]
The slide shows an example with different overflow handling methods:
saturation and wrap-around for a two’s complement number. As can be seen
largely different outputs are generated for this different overflow methods.

70
Fixed-point simulation
➢operations in floating-point
➢quantization and overflow handling during assignment
sc_fixed <4,3> a;
sc_fixed <4,1> b;
sc_fixed <4,2> c;
a = 1.6;
b = 0.9;
c = a * b;
1.6 1.5
0.9 0.875
1.3125 1.25
Q
Q
Q*
0.5
0.125
0.25
lsb precision
a
b
c
When working with fixed-point arithmetic, it is vital to have an efficient
representation of values and simulation of operations. For this purpose, all
operations are performed with floating point arithmetic. Only on assignment,
the quantization is performed. In case an intermediate result needs to be
quantized, an explicit assignment operation has to be used.
In the example above the multiplication a*b is a floating-point operation
having as input two fixed point values. During the assignment to c the
floating point result is automatically casted to the specified fixed point type of
variable c.

71
SystemC fixed point types with
non-static arguments
➢ Fixed point parameter values
 sc_fxtype_params my_type(wl,iwl,q_mode,o_mode,n_bits);
 x = my_type.wl();
 my_type.iwl()=x-2;
➢ Two non-static fixed point types
 sc_fix x(my_type); // signed
 sc_ufix y(my_type); // unsigned
➢ For arrays, these types are used with a context
 sc_fxtype_context my_context(sc_fxtype_params);
 sc_fix z[64];
➢ Remark: for fixed point simulations, include in every file
 #define SC_INCLUDE_FX
 #include <systemc.h>
SystemC also allow to define fixed point types with non-static arguments:
sc_fix (signed, 2’s complement numbers) and sc_ufix (unsigned numbers).
Type sc_fxtype_params is used to configure the parameters of types sc_fix,
and sc_ufix. To set the parameters for these types declare an object of type
sc_fxtype_params, initialize the parameter values as desired, and pass the
sc_fxtype_params object as an argument to the sc_fix or sc_ufix
declarations.
The sc_fxtype_params object has the same arguments passed to an object
of type sc_fixed. These include:
• wl - word length
• iwl - integer word length
• q_mode - quantization mode
• o_mode - overflow mode
• n_bits - saturated bits
Any combination of arguments are allowed, but the order cannot be
changed. A variable of type sc_fxtype_params can be initialized by another
variable of type sc_fxtype_params. One variable of type sc_fxtype_params
can also be assigned to another.
Individual argument values can be read and written using methods with the
same name as the arguments shown above.

72
We now turn to the exercise, where we will perform fixed point refinement of
the IDCT operator in the JPEG decoder.

73
Goal of this exercise
➢ Perform fixed point refinement for all the internal variables of
the IDCT in the JPEG example
➢ determine the MSB to avoid internal overflows without overflow
logic.
➢ determine the LSB to have no more that 0,5dB degradation on
the PSNR of the resulting image
The goal of this exercise is to get familiar with fixed point refinement, by
practicing it on the IDCT block of the JPEG decoder. To this end, we will
determine the LSB and MSB value for every variable in the IDCT function.
By observing the overall behavior it will be possible to optimize the LSB and
MSB values. The MSB should be determined in such a way that overflow is
avoided without introduction of overflow logic. To determine the LSB the
impact on the image quality (e.g. peak signal to noise ratio PSNR) should be
kept below 0,5dB. The PSNR is defined as the ratio between the maximum
power of a signal and the power of the corrupting noise. In our case the
noise is the mean squared error (MSE) between the original and the
decompressed image. The maximum power of the signal is MAX2, where
MAX is the maximum grey value of a pixel.

74
How to start?
➢ You find:
In .../exercises/exercise2/ : the functional model with a fixed point IDCT
implementation; types-file datatypes_original.txt
In/exercises/modules/: library of JPEG-encoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and testbench modules
{src,snk,test}.{h,cpp}
Special fixed point support functions of directory
…/exercises/add2systemc/ are used
In /exercises/images/: test images
➢ Things to do:
inspect the code to understand the behavior
Make the application
change datatypes.txt file
syntax: exercise2 -i <inputfile> -o <outputfile> -t <typefile>

7
5
Model Based System Design
Class 3: Communication
Refinement
Marc Engels
In this third class we will focus on the refinement of the communication
between the modules of the functional model. More in particular we will
explain how the FIFO communication channels can be replaced by protocols
on simple wires.

7
6
76
Communication refinement
➢ Communication refinement
➢ Communication refinement in SystemC
➢ Exercise 3: communication refinement for
the JPEG decoder
This lecture on communication refinement consists of three parts:
• In the first part we introduce the concept of refining the inter process
FIFO communication into real protocols.
• Next, we review the support in SystemC for communication refinement.
• Finally we introduce the exercise to practice what we have learned.

7
7
77
Communication refinement is one
of the steps in architectural design
Computations
operations
Data
variables, arrays
floating point
memories
fixed point
operators
Communication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture
In the architectural design process that translates an algorithm into an
optimal architecture, communication refinement is an important step. The
algorithm will typically be specified into a functional model, like data flow. In
this data flow model, the communication between processes is performed
via point-to-point queues. The architecture needs a model with explicit
protocols. In addition, signals could be multiplexed on a bus to reduce the
wiring overhead.

7
8
78
Functional models use FIFO
communication
➢ Queues guarantee consistent data passing
➢ Implementation could become expensive for large sizes
➢ communication must be optimized
Process1 Process2
(infinite) storage
A FIFO is a very robust structure because it guarantees correct processing
of the data independently from the processing times of the functions and
communication times. However, queues require a large amount of storage
and also some addressing hardware. A typical implementation, for instance,
would be a memory array with modulo addressing and a read and write
pointer. Because of this large implementation cost, the communication must
be optimized.

7
9
79
wire
Process1 Process2
Many communications can be
reduced to a single register
➢ Output of functions is registered
➢ No extra implementation cost
➢ No storage for data
➢ Consistency of communication needs to be guaranteed
Ideally, from an implementation point of view, a FIFO communication could
be reduced to a simple wire when the output signal is registered. This
requires no storage and no implementation cost for the addressing or
protocol. However, consistency of the communication must be guaranteed:
Process 2 should not use the data before it is generated and Process 1
should not produce new data before the previous has been read by Process
2.

8
0
80
w=4
Example of correct wired
communication
wire
Process 1 Process 2
w=0
w<4
filt1
filt2
filt3
filt4
write()
w++
read()
op1
op2 op3
op4
To analyze the behavior of a wired connection, we represent the two
processes with a Synchronous Finite State Machine (FSM). In such a
Synchronous FSM the transitions take place on a clock edge. In our analysis
we assume that both processes are running on the same clock. Process 1
will perform a filtering operation in 4 cycles and will also write the data in the
register in the 4th cycle. Process 2 will initially wait for 4 cycles. Next cycle, it
will read the data and perform a first operation, followed by three more
cycles of operation. This sequence will be repeated continuously.

8
1
81
1 w=1
2 w=2
3 w=3
4 w=4
5 read() op1
6 op2
7 op3
8 op4
9 read() op1
10 op2
Communication is perfectly
aligned
1 filt1
2 filt2
3 filt3
4 filt4 write()
5 filt1
6 filt2
7 filt3
8 filt4 write()
9 filt1
10 filt2
… …
We have to guarantee the condition that every write()
comes before a read()
ClockCycle
If we look at a timing diagram, we see that the timing is guaranteed. Every
read() happens after a write() of the signal. Also no data is lost.

8
2
82
Small changes to design can
result in errors
➢ Increase (decrease) the number of operations in process 1 (2):
the same data will be consumed twice.
➢ Decrease (increase) the number of operations in process 1 (2):
data will be lost
➢ If the number of initial wait operations in process 2 is too low,
we will use undefined data
➢ If the number of initial wait operations in process 2 is too high,
we will loose the first data elements)
However, small changes to the finite state machines of one of the two
processes can result in errors:
• If we increase the number of operations in process 1, process 2 will
consume too early and hence twice the same data is used.
• If we decrease the number of operations in process 2, the same
happens.
• If we decrease the number of operations in process 1, process 2 will be
relatively too slow and some data will be overwritten before it has been
used.
• Increasing the number of operations in process 2 will have the same
effect.
• Also remark that the number of initial wait operations in process 2
should not be too low or too high.

8
3
83
Example of wrong wired
communication
wirefilt1
filt2
filt3
filt4
write()
Process 1 Process 2
read()
op1
op2
In the slide an example is shown where process 2 has only two states. As a
consequence it can be expected that the data produced by process 1 is
used multiple times. Because no initial wait operations are present in
process 2, we also expect that undefined data will be used.

8
4
84
1 read() op1
2 op2
3 read() op1
4 op2
5 read() op1
6 op2
7 read() op1
8 op2
9 read() op1
10 op2
The example results in
undesired behavior
1 filt1
2 filt2
3 filt3
4 filt4 write()
5 filt1
6 filt2
7 filt3
8 filt4 write()
9 filt1
10 filt2
ClockCycles
… …
?
Adapt cycle budget or introduce handshake protocol
The expected behavior is confirmed on the time diagram. As can been seen
on the diagram, the first two data elements for process 2 will be undefined.
Next, the read() operation of process 2 will use twice the same data
produced from process 1.
To guarantee correct behavior, two approaches exist:
• Adapt the cycle budget of process 2, for instance by introducing two
dummy cycles. However, this breaks the general approach of making
modules independent from the environment in which they operate.
• Introduce a handshake protocol that automatically synchronizes on the
data transfers. This is the most robust and reliable approach. On the
other hand, handshake protocols introduce some overhead and should
be performed on larger units.

8
5
85
Simple handshake protocol is
more robust
➢ The flag “a” (ask) indicates that the receiver is ready to read
data in the next cycle.
➢ The flag “r” (ready) indicates that data has been written
➢ Save communication requires at least two cycles.
Many different handshake protocols are feasible. Let’s illustrate the concept
with a very simple one with two handshake lines. The handshake line “a”
(ask) is generated by the receiver and indicates that the receiver is ready to
read in the next cycle. The handshake line “r” (ready) is generated by the
transmitter and indicates that he has written data in the cycle when the flag
is raised. At least two cycles are needed for a reliable communication of a
value. Remark that this protocol is only suited for synchronous designs
where both processes are executed on the same clock.

8
6
86
!r
r a
Simple handshake protocol is
more robust
Process 2
filt1
r=0
filt2 filt3
if (a==1){
filt4
write()
r=1}
Process 1
!a
a
if (r==1) {
read()
op1
a=0}
op2
a=1
r
a=1
r=0
The finite state machines enhanced with the protocol operations (in red) is
shown in this picture. When “a” is set, process 2 waits for the “r” flag to be
raised. Next it reads the data, lowers “a”. performs its operations, and sets
“a” again for a next sequence. Process 1 performs its operations and next
waits for flag “a” before it writes its data and raised flag “r”. The basic
assumption of this protocol is that when data is written it is read in the next
cycle.

8
7
87
1 a=1
2 a=1
3 a=1
4 a=1
5 a=0 read() op1
6 a=1 op2
7 a=1
8 a=1
9 a=0 read() op1
10 a=1 op2
… and effectively synchronizes
the communication
1 r=0 filt1
2 r=0 filt2
3 r=0 filt3
4 r=1 filt4 write()
5 r=0 filt1
6 r=0 filt2
7 r=0 filt3
8 r=1 filt4 write()
9 r=0 filt1
10 r=0 filt2
ClockCycles
… …
Looking at the time diagram shows that the operation of the two processes
are automatically synchronized by this protocol.

8
8
88
r a
… also when receiver is slower
than transmitter
Process 1 Process 2
filt1
r=0
If(a==1){
filt2
write()
r=1} !a
!r If (r==1){
read()
op1
a=0 }
op2
r
op3
a=1
a=1
r=0
a
When we add a state in process 2 and reduce the number of states in
process 1 to two, we make the receiving process slower than the
transmitting one.

8
9
89
1 a=1
2 a=1
3 a=0 read() op1
4 a=0 op2
5 a=1 op3
6 a=1
7 a=0 read() op1
8 a=0 op2
9 a=1 op3
10 a=1
… but introduces then one
extra wait cycle at receiver
1 r=0 filt1
2 r=1 filt2 write()
3 r=0 filt1
4 r=0
5 r=0
6 r=1 filt2 write()
7 r=0 filt1
8 r=0
9 r=0
10 r=0 filt2 write()
Cycles
… …
The extra wait cycle can be avoided by already putting a=1 during op2
Also now, the protocol synchronizes the two processes automatically.
However, after “op3” in process2, an extra clock cycle is introduced
automatically. This is caused by the fact that process 1 has to observe that
“a” is raised before it can write the data and raise “r”. The extra cycle can be
avoided by raising’ ”a” already during “op2”.

9
0
90
Most general protocol: 4-phase
handshake protocol
Ack
Ack
Ack
Req
Req
Req
Req
Ack
Req
Ack
Req
Req
Ack
Execute
Ack
Data
Ack
Req=1
Get Data
Req=0
Ack=0
Put Data
Ack=1
Ack=0
The simple handshake protocol of previous slides is just one of the many
possibilities. The most general protocol is the 4-phase handshake protocol
that can synchronize two systems, independent of a clock signal. The 4
phase handshake protocol consists of 4 phases:
1. Initially, both request (Req) and acknowledgement (Ack) signals
are low.
2. Next, the Req signal is raised and the operation is executed.
3. After the execution of the operation, the Ack signal is raised. Here
starts the third phase.
4. When the Ack signal is detected, the Req signal is turned off. This
phase continues until the low Req signal is detected and the Ack
signal is turned off.
The picture on the slide shows the asynchronous FSM for the four-phase
handshake protocol. In an asynchronous FSM the transitions are not
clocked and happen as soon as the guard statement is valid.

9
1
91
Multiple variations on these
handshake protocols exist
➢ In stead of signal levels, the protocol can be based on signal
transitions.
➢ The protocol can be simplified if both systems run on the same
clock.
➢ Protocols can be simplified if one knows that the receiver or
the transmitter is fastest.
➢ Synchronization can be performed on the basis of a block:
 Set-up communication for first element of a block
 Next, communicate every cycle
➢ Some protocols are based on typical FIFO signals: full and
empty.
Besides the 4-phase handshake protocol, many other protocols exist.
For example a protocol can be constructed that is based on signal
transitions rather than signal levels.
Handshake protocol can also be simplified when both systems run on the
same clock or for the cases that the receiver or transmitter is known to be
the fastest.
Also, the efficiency of the communication can be improved by block based
handshake protocols. In such a protocol, the communication is set-up for the
first element of the block. Next, a data element is communicated every cycle.
There also exists a set of protocols based on typical FIFO signals.

9
2
92
In some cases buffered
communication is required
process2process1
Q1
Queue size can be determined by monitoring the maximum
number of elements in a queue during simulation.
1 write(Q1) 1
2 write(Q1) 2
3 write(Q2) 3
4 4 read(Q2)
5 5 read(Q1)
6 6 read(Q1)
Q2
The replacement of the FIFO by protocols is only possible if no intermediate
storage is needed. This is not always the case. For example, the system on
the slides needs at least a storage for two data elements on queue 1. In
most cases, the number of required data storages can be derived from the
maximum number of elements in a queue during functional simulations.
Also remark that changing the order in which data is produced in process 1
or consumed in process 2 will change the storage requirements.
Another option is to integrate the required storage in one of the two
processes and match the production and consumption sequences.

9
3
93
r a
Queues must be introduced
explicitly in hardware
FIFO process
size N
fsm
Wired
handshake
protocol
Process1 Process2
r a
If intermediate storage is needed, a FIFO must be explicitly introduced in
hardware. A FIFO will be a module with storage, a finite state machines and
communication protocols for the producing and consuming processes.
The FIFO structure can be defined once and next reused in many designs.

9
4
94
Process1 Process2
Several communications can
also be multiplexed on a bus
Process3 Process4
Process1
Process3
Process2
Process4
bus
arbiter
r a
a r
r a
a r
Bus and arbiter classes
can be reused!
Up till now, we have considered point-to-point communications. Each
channel in the functional model is then mapped to a physical channel in the
hardware.
However, when this communication structure becomes complicated it might
be advantageous to multiplex multiple communications on a bus structure.
Communication with off-chip devices might also take advantage of a bus
structure because of the limited amount of available pins.
The bus can be modeled as a set of multiplexers. To decide when a module
is allowed to communicate on this bus, an arbiter is needed. The arbiter
works with handshake protocols with the processes. If we reuse our simple
protocol, the arbiter would react on the ask signals from the receiving
processes and reserve and transfer this ask signal to the sending process
when the bus is free for data transfer.
The bus and arbiter are modules that can be designed ones and reused in
multiple designs.

9
5
95
results in behavioral model
➢ Model that defines the relative ordering of input and outputs
➢ A clock signal is used for ordering
➢ Pins are accurate to the final implementation
➢ Internal resources are not mapped on clock cycles
(scheduling) and functional units (resource binding)
After communication refinement of a functional model, we obtain a
behavioral model. A behavioral model defines the functionality and also the
relative ordering of inputs and outputs. To perform this ordering, a clock
signal is used. Also, the pins of a module are identical to the final
implementation. On the other hand, the internal operations are functionally
modeled. They are not mapped on clock cycles and no functional units are
allocated.
Increasingly synthesis tools are moving up from the register transfer level
(RTL) synthesis toward behavioral synthesis. In the latter the synthesis tool
autonomously decided on the number and types of functional units and
schedules the operations on these functional units.

9
6
96
the JPEG decoder
We now take a look at the support for communication refinement in SystemC

9
7
97
In SystemC behavioral models
use (clocked) threads
➢ Modeled with thread processes SC_THREAD or with clocked
thread processes SC_CTHREAD
➢ Every module has a clock input:
 sc_in_clk clk;
➢ The SC_THREAD process is made static sensitive to a clock edge
 Sensitive << clk.pos();
➢ To separate clock cycles wait() statements are used.
➢ A synchronous or asynchronous reset signal can be specified:
 reset_signal_is(reset, true);
 async_reset_signal_is(reset, true);
➢ Simulation must be run for a finite time (or will not stop!) or halted
explicitly.
Representing behavioral models in SystemC is straight forward. The
processes are represented with (clocked) thread processes (SC_CTHREAD
or SC_THREAD). To order the inputs and outputs, every module has a clock
input. In the case of a SC_THREAD process, it must be made static
sensitive to this clock.
To separate clock cycles, wait() statements will be used in the SC_THREAD
or SC_CTHREAD process.
It is possible to assign a synchronous reset signal to the thread processes.
In the case that the reset signal is active at a clock event, the current
process will be stopped, and called again from the start of the function. Also
an asynchronous reset is supported.
Remark that because of the introduction of the clock we cannot run until the
end of activity (this would never stop). Therefore we must run the simulation
for a finite time or halt it explicitly.

9
8
98
Behavioral models communi-
cate via standard signals
➢ All input and outputs are standard signals
➢ Define signals with:
 sc_signal<T> a;
➢ Predefined ports for sc_signal<T> channels:
 sc_in<T> with interface function read() or assignment operator.
 sc_out<T> with interface function write() or assignment operator.
 sc_inout<T> that combines both interface functions.
Standard signals are used to communicate between behavioral processes. A
signal can only be written from one process.
For the sc_signal<T> channel, three ports are predefined:
 sc_in <T> is essentially equivalent to sc_port<sc_signal_in_if <T> >
 sc_inout <T> is essentially equivalent to sc_port<sc_signal_inout_if
<T> >
 sc_out <T> is identical to sc_inout<T>
The write() operation on a signal overwrites the present value. The read()
operation reads the current value. Also the assignment operators are
available for signals. These three ports must be bounded to exactly one
signal.

9
9
99
Clocks in SystemC
➢ Create clock
 sc_clock clock1 ( “clock_label”, period, time_unit, duty_ratio, offset, first_value );
 sc_clock clock2 ( “clock_label”, period, time_unit, duty_ratio);
 sc_clock clock3 ( “clock_label”, period, time_unit);
➢ Clock Binding
• f1.clk( clock1 );
➢ Clocks are typically defined in sc_main();
➢ Example
2 12 22 32 42
sc_clock clock1 ("clock1", 20, SC_NS, 0.5, 2, true);
Finally we need also a clock in a behavioral model. SystemC offers special
clock functions, where you can choose the period, duty ratio, initial offset
and first value. An example is shown on the slide.

1
0
0
100
Example: summing 3 values on
an input
SC_MODULE(sum3) {
sc_in_clk CLOCK;
sc_in<bool> RESET;
sc_in<unsigned> A;
sc_out<unsigned> D;
void compute();
SC_CTOR(sum3) {
SC_CTHREAD(compute, CLOCK.pos());
reset_signal_is(RESET,true);
};
};
void sum3::compute() {
unsigned tmp;
// reset section
while (TRUE) { // main loop
tmp = A.read();
wait(); // end first I/O cycle
tmp += A.read();
wait(); // end second I/O cycle
tmp += A.read();
D.write(tmp);
wait(); // end third I/O cycle
}
}
On the slide an example is shown where three values are read in
sequentially and summed. The resulting sum is put on the output.
The example is modeled with a clocked thread. It could also be implemented
with a thread process.

1
0
1
101
Gradual Communication
refinement (1/2)
Process1 Process2
queue
Process1 Process2C1 C2
r a
Behavioral_process1 Behavioral_process2
clock
Converters
Q1 Q2
To replace the queues it is advocated to follow a gradual approach. First,
converters (between sc_fifo and protocol) are introduced between the
processes.

1
0
2
102
Gradual Communication
refinement (2/2)
Process1 Behavioral
Process2
C1
r a
Behavioral_process1
clock
Q1
Behavioral
Process2r a
clock
Behavioral
Process1
Next the protocol can be integrated in each process separately.
At each moment the correct operation of the system can be validated
through simulations.

1
0
3
103
Converter SystemC code
template <class T> SC_MODULE(FF2P) {
sc_fifo_in<T> input;
sc_out<T> output;
sc_in<bool> ask;
sc_out<bool> ready;
sc_in_clk clk;
SC_CTOR(FF2P) {
SC_THREAD(process);
sensitive << clk.pos();
}
void process() {
T value;
enum ctrl_state {READINPUT, WRITEOUTPUT};
ctrl_state state;
// reset cycle
ready.write(false); state = READINPUT; wait();
while(true) {
if (state == READINPUT) {
ready.write(false); value = input.read();
state = WRITEOUTPUT;
} else {
if (ask.read() == true) {
output.write(value); ready.write(true);
state = READINPUT;
} else {
ready.write(false); state = WRITEOUTPUT;
};
};
wait();
}
return;
}
};
template <class T> SC_MODULE(P2FF) {
sc_fifo_out<T> output;
sc_in<T> input;
sc_in<bool> ready;
sc_out<bool> ask;
sc_in_clk clk;
SC_CTOR(P2FF) {
SC_THREAD(process)
sensitive << clk.pos();
}
void process() {
T value;
enum ctrl_state {READINPUT, WRITEOUTPUT};
ctrl_state state;
// reset cycle
ask.write(true); state = READINPUT; wait();
while(true) {
if (state == READINPUT) {
if (ready.read() == true) {
value = input.read(); ask.write(false);
output.write(value); state = WRITEOUTPUT;
} else {
ask.write(true); state = READINPUT;
};
} else {
ask.write(true); state = READINPUT;
};
wait();
}
return;
}
};
On the slide we show an example for the converters that translate between a
sc_fifo and the simple synchronization protocol and vice versa.

1
0
4
104
the JPEG decoder
The exercise is intended to get you familiar with communication refinement.
We turn again to the simplified JPEG decoder.

1
0
5
105
Exercise 3: communication
refinement for the JPEG encoder
➢ Goal: Replace the FIFO between the run-length encoder and decoder by
a handshake protocol
➢ You will find:
 In /exercises/exercise3/ : solution of exercise2
 In/exercises/modules/: JPEG-encoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules
{src,snk,test}.{h,cpp}
 In /exercises/images/: test images
 In /exercises/add2systemc: FIFO to protocol conversion functions in
add2systemc: {FF2P, P2FF}.h
➢ Things to be done:
 Introduce a handshake protocol between rl_enc and rl_dec.
 introduce refined versions of rl_dec in jpeg_dec.h and main.cpp.
 simulate and verify correct operation.
The goal of this exercise is to replace the FIFO channel between the run-
length encoder and decoder by a handshake protocol. To this end we will
add converters between the two blocks to obtain a behavioral model. Next
integrate the protocol functionality in the run-length decoder process,
integrate the resulting behavioral model in the application, simulate the
system, and verify correct operation.

In this 4th class we focus on the refinement of the computations, resulting in
RTL description of the circuit. This model should be synthesizable with an
RTL synthesis tool.
Model Based System
Design
Class 4: computation
refinement
Marc Engels
e-mail:
marc.engels@flandersmake.be

The class consists of three parts:
First, we describe the conceptual steps to transform from a behavioral
into an RTL description of the circuit.
Next we introduce the constructs that are available in SystemC to
support this RTL modeling.
Finally we exercise the new knowledge on the JPEG decoder.
107
Computation refinement in
SystemC
➢ Computation refinement
➢ Computation refinement in SystemC
➢ Exercise 4: computation refinement of a JPEG decoder

Next to fixed point and communication refinement, computation refinement is
an important step in architectural design (from functional model towards RTL
model). Remark that the order in which these three steps are performed is
not defined. Refinements along these three axes can even be intermixed.
There also exist interdependences between these operations. For instance if
two operations share a common operator they will use the same word size.
108
RTL refinement is the 3rd step in
architectural design
Computations
operations
Data
variables, arrays
floating point
memories
fixed point
operators
Communication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
System
Architecture

109
beh4RTL4beh2RTL2
beh3RTL3func1
For synthesis all blocks needs
to be transformed to RTL
➢ Transformation is a gradual refinement process
 switch a behavioral block with a RTL block
 verify by system simulation
SYSTEM
S1
S2
S3
TESTBENCH
At the start of the computation refinement the embedded system is modeled
with behavioral blocks, where both the data types and communications are
refined. The test bench is not evolved and is still the original functional
model.
The RTL modeling can be introduced gradually by replacing individual
behavioral blocks with RTL descriptions. The correctness of the system can
be verified during this process by simulating the combination of functional,
behavioral, and RTL models.

Behavioral models are represented as threads which wait on clock edges to
synchronize their inputs and outputs (IO).
As a consequence, they can be represented by a clocked finite state
machine (FSM). In the slide a Moore-type state machine, whose outputs are
only determined by the state, is used.
110
Behavioral model can be
represented by an FSM
Process_behavioral{// SC_CTHREAD
ask.write(TRUE);
while (ready.read() == FALSE) {wait();}
wait();
while(TRUE) {
ask.write(FALSE);
x = input.read();
wait();
d = x * b1;
y = d * b2;
output.write(y);
ask.write(TRUE);
wait();
}
}
=
!ready
ready !ready
ready
ask=1
ask=0
x=input
ask=1
d = x * b1
y = d * b2
output = y

111
Behavioral to RTL: scheduling of
operations in FSM
!ready
ready !ready
ready
ready
!ready
ready
!ready
ask=1
ask=0
x=input
ask=1
d = x * b1
y = d * b2
output = y
!ready!ready
ask=1
ask=0
x=input
d=x*b1
ask=1
y = d * b2
output = y
The transformation from behavioral to RTL can conceptually be represented
by the scheduling of operations on this FSM. In this scheduling activity
additional states can be introduced.
Remark also that the scheduling of the operations can have major impact on
the inter-process communication:
• Additional states can introduce errors in synchronized communication.
• Protocol based communication is more robust but the settings of the
protocol signals might have to be adapted
Separation of operator scheduling and communication refinement is a desire
in many design flows but is rarely achieved completely.

112
Rescheduled FSM is
represented in RTL code
=
ready
!ready
ready
!ready!ready
ask=1
ask=0
x=input
d=x*b1
ask=1
y = d * b2
output = y
Process_RTL{// SC_CTHREAD
ask.write(TRUE);
wait();
while(TRUE) {
ask.write(FALSE);
x = input.read();
d = x * b1;
wait();
ask.write(TRUE);
y = d * b2;
output.write(y);
wait();
}
}
The resulting FSM can be transformed back in code. The resulting RTL
model can be represented either with a SC_METHOD or a SC_CTHREAD.
Both can be synthesized into gate level circuits. For simplicity, we will use
SC_CTHREADS.

Digital Design With Systemc (with notes)

Digital Design With Systemc (with notes)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Digital Design With Systemc (with notes)

Ähnlich wie Digital Design With Systemc (with notes) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Digital Design With Systemc (with notes)