SlideShare ist ein Scribd-Unternehmen logo
1 von 136
Downloaden Sie, um offline zu lesen
Page 1
Specification Languages:
Part 2
Marc Engels
e-mail: marc.engels@flandersmake.be
Welcome to the second part of the course on specification languages.
Page 2
2
Specification Languages
➢ Part 1: Specification Models
➢ Part 2: Model based system design
 Show how the models of part 1 can be used for
architectural design
Provide hands-on experience with SystemC v2.3.2 (released
in October 2017).
Introduce OO techniques for design of hardware systems
➢ Part 3: Project
The course on specification languages consists of 3 parts:
 First, an extensive overview was given of various specification models,
ranging from dataflow to finite state machines.
 In this second part, I will focus on the use of a subset of these models for
the architectural design of digital embedded systems. The main goal of
this part of the course is to learn how the specification models of part 1
can be used for the architectural design of embedded systems. For this
purpose, we will rely on SystemC version 2.3.2, which was standardized
by the IEEE in January 2012 (IEEE 1666-2011 language reference
manual) and for which the simulation library was released in April 2014.
SystemC is a class library on top of C++. As such, all object oriented
(OO) constructs of C++ can be used in the design of an architecture.
These OO techniques can bring the same benefits with respect to re-use
to architectural design as that they have brought to software design.
 Finally, you will apply the acquired skills in a small, but realistic, project.
Page 3
3
Course Material for part 2
➢ Prerequisite:
 part 1 of specification languages
 C++ (good tutorial at www.cplusplus.com)
 Coding and debugging programs
 RTL description of synchronous digital circuits
➢ Material for part 2:
 Slides with notes.
 IEEE Standard SystemC Language Reference Manual, IEEE
Std 1666-2011.
As prerequisites for this course, I expect the following:
 Quite obvious you should have a good understanding of the first part of this course, and
particularly the presented models.
 Next, as SystemC is based on C++, also a decent knowledge of this programming
language is required. Basic OO concepts like classes, inheritance and templates should
be familiar to you. If not, review the C++ tutorial at www.cplusplus.com.
 In general, a structured methodology for developing and debugging programs is
essential for executing the exercises and the project. Familiarity with Integrated Design
Environments (IDE) like Eclipse is a benefit.
 When writing SystemC code, you should be able to describe the hardware that will be
generated from this code. Therefore a basic knowledge of register transfer level (RTL)
description of synchronous digital circuits is necessary. An RTL description of a circuit
consists of registers (e.g. D flip-flops) and combinatorial logic. The registers synchronize
the operation of the circuit to the clock signal while the combinatorial logic describes the
calculations performed by the circuit. RTL descriptions are used in hardware description
languages like Verilog or VHDL.
For part 2, the following material is available:
(1) The slides with notes can be found on the icorsi (icorsi.ch).
(2) The SystemC language reference manual, which can be downloaded from the IEEE
standards website (http://standards.ieee.org/getieee/1666/download/1666-2011.pdf)
Page 4
Model Based System
Design
Class 1: constructing a
functional model
Marc Engels
e-mail:
marc.engels@flandersmake.be
In this first class we will focus on the functional modeling of a digital
embedded system. A functional model will describe the functionality of the
embedded system, independent of the platform or architecture on which this
functionality is executed. Therefore it is sometimes called a platform
independent model (PIM). In this class we will focus on the data flow
modeling paradigm for describing the functional model.
At the end of the class, you will be able to program a functional model of a
digital embedded system in SystemC.
Page 5
5
Functional modeling in
SystemC
➢ Introduction to design of digital embedded systems
➢ SystemC introduction
➢ SystemC functional model syntax
➢ Exercise 1: building a functional model in SystemC
This class covers 4 topics:
(1) A general introduction to the design of digital embedded systems
(2) The role of SystemC in the design of digital embedded systems
(3) The syntax of the SystemC language for functional modeling (with the
dataflow paradigm)
(4) And finally an exercise to build a functional model in SystemC
Lets start with the general introduction.
Page 6
6
Consumer devices become
increasingly more intelligent
Consumer as well as professional equipment is becoming increasingly
smarter. A few examples:
 Your car is being converted into a multimedia theater. The value of
the electronics in a car has increased consistently, resulting in
almost 100 electronic units in a luxury model. Recently a lot of new
safety functions (ABS, ESP, parking sensors, anti-collision systems,
etc.) have been introduced.
 It is hard to find a mobile phone with which you can only make a call.
Taking pictures, playing music, surfing the web, reading e-mail, etc.
are also features of a state-of-the-art mobile phone. Most phones
even have GPS functionality and run office software.
 Gaming becomes more interactive (e.g. Nintendo Wii, Microsoft
Kinect) and mobile.
 Photography has dramatically changed over the last decade: it has
become fully digital. Digital cameras are currently extended with
features like wireless connections, automatic picture enhancements
(e.g. red eye correction), etc.
 The era of service robots is coming. Robots to vacuum clean the
house, mown the lawn in the garden, etc. are already on the market.
Page 7
7
… as well as professional
equipment
The evolution towards smart products is not limited to consumer devices.
We observe, for instance, the same trends in production machines.
 Harvesters have a growing number of functions for quality control,
obstacle detection and precision farming. To realize these smart
functions, the electronic control units become increasingly more
complex. Especially the software content is growing very fast
(20% average growth per year). The long term vision for combine
harvesters is to evolve towards full autonomous machines, that
can work without any operator on board and just receive a
command of the job to be done. Many more smart functionalities
will be needed to reach this goal.
 In compressors functions are introduced to optimize the energy
consumption based on the instantaneous demand of air.
 Weaving looms can adapt their speed to the quality (strength) of
the textile fibers.
 Professional washing machines automatically detect the load,
hardness of the water, etc. and adapt their washing program.
Page 8
8
Characteristics of embedded
systems
➢ Optimize for power, cost, and size
➢ Robust design
➢ Provide the ability for evolution and mass customization
➢ Minimize time to market
➢ Some functionality might be safety-critical
➢ Interfacing with the real world, leading to real time constraints
To realize this smart functionalities, electronic systems and software have to
be embedded in consumer and professional devices. Such embedded
systems are minimizing power, cost and size, and hence work on a minimal
platform. For instance, 8-bit and 16-bit processors are still extensively used
in embedded devices. They must be robust. For instance, a mobile phone
must survive rude treatment. A car has an operation life of 7000 hours and
some machines are expected to work up to 100000 hours. Over their
lifetime, products are increasingly expected to evolve. Also more variants are
designed from the same platform. A typical example is the customization of
the mobile phones. And the product needs to be on the market before the
Christmas shopping. In many cases the system has even safety-critical
functionality, think about automatic braking system (ABS) or emergency
buttons, which require a guarantee on the reliability of the system. For the
development of such safety-critical functions, specific standards have to be
followed. The main distinctive characteristic of an embedded system,
however is that it has to interact with the real world, necessitating real-time
behavior.
Page 9
9
Sensors Actuators
Real world process
Processing
Embedded systems combine
various types of real-time behavior
ADC DAC
event
signal signal
action
user
Signal
conditioning
Actuator
Powering
A system is said to be real-time if the correctness of an operation depends not only upon its
logical correctness, but also upon the time in which it is performed. In a hard real-time
system, the completion of an operation after its deadline is considered useless - ultimately,
this may lead to a critical failure of the complete system. A soft real-time system on the other
hand will tolerate such lateness, and may respond with decreased service quality (e.g. bank
terminal).
Depending on the inputs, two types of hard real-time constraints are distinguished in
embedded systems:
 Signal processing systems process inputs that arrive at regular intervals and the
system must be ready after a fixed time to process the next input. Signal
processing systems typically interact with their environment through sensors
(observe the environment) and actuators (control/influence the environment).
Sensors are components that translate non-electrical quantities (e.g. temperature,
pressure, ...) into electrical quantities (voltage, current). Since most observable
quantities are analog signals, sensors usually produce analog electrical signals. In
most cases signal conditioning is required to compensate the non-idealities in the
sensors and to prepare the sensor signals for the actual signal processing.
Because the signal processing is done digitally, an Analog to Digital Converter
(ADC) puts the sensor signal in the right format. Actuators perform the reverse
operation of sensors: they translate electrical quantities into non-electrical
quantities. Also actuators need analog signals and therefore a Digital to Analog
Converter (DAC) is needed. Because actuators need to influence the physical
environment they often require high power, hence power electronics circuits are
introduced to condition the control signal.
 When the input is an event and the system has to react within a certain time, this
is called a reactive system. Examples of reactive parts of an embedded system are
the interaction with the user or responses to external alarms.
As shown on the picture, embedded systems often combine various types of real-time
behavior.
Page 10
10
Digital embedded systems
combine hard- and software
User
interface
NVM
ROM
mPorDSPcore
RAM
Conf. Logic
Memories
Peripheral
Mo-
dem
buffers
Video/
Graphics
processor Protocol
Speech
Processing
Analysis of
channel
+ analog, sensors and actuators
An embedded system can be separated into a digital part and an analog
part. The analog part contains for instance signal conditioning, ADCs and
DACs. In high-frequency applications, like radios or radars, it will be a large
part of the embedded system. Also sensors and actuators are part of the
embedded system. Traditionally these were discrete external components,
but recently they are increasingly integrated, when power permits, in a
package and even on chips.
The digital part is where the actual “intelligence” is. A growing part of the
functionality of embedded systems is implemented in software called
“embedded software”. This offers the advantage of increased flexibility
(functionality can be changed after production). As a consequence, the
digital part of an embedded system consists of 3 components:
Programmable processor cores. They can be general-purpose
micro-processors or more specialized digital signal processors
(DSPs).
Volatile and non-volatile memories.
Configurable (though parameters) dedicated logic.
The digital part can be implemented as a PCB with discrete components, a
multi-chip package, an FPGA or a fully integrated chip. In the latter case this
is often referred to as a System-on-Chip (or SOC).
In these classes we will mainly focus on the design of the configurable logic
(on FPGA or chip), although SystemC is also extensively used for the
modeling of SOCs.
Page 11
11
Design flow for digital embedded
systems
System
Functionality
Functional
Requirements
Performance
Requirements
Architecture
Template
Architectural
Requirements
Mapping
Dedicated
Architecture
C-code
Non-functional
Requirements
For the design of a digital embedded system, we use a design flow that
consist of the following elements:
•During the functional design of the system, the designer determines what
the system has to do, based on the performance requirements (e.g. bit error
rates in communication systems) and functional requirements (e.g. specified
protocols). He also determines all algorithms. The system functionality is
expressed in a platform independent way.
•A reusable architecture template, or platform, consisting of processors,
memories, and dedicated logic, is defined or selected. The architecture
template should guarantee architectural requirements (e.g. interface
formats) and non-functional requirements (e.g. power or cost).
•Each function in the functionality is mapped on an element in the
architecture template.
•For the dedicated logic a circuit corresponding to the required functionality
is created, resulting in a dedicated architecture. Finally, by means of RTL-
synthesis the designer generates a gate level netlist. By the place and route
step this netlist is next transformed into a physical layout for this dedicated
architecture, which can be manufactured by a foundry. Alternatively, the
design is mapped to a configuration file for a programmable platform (e.g.
field programmable gate array or FPGA).
•For the functions mapped on processors, C-code is generated and
Page ‹#›
compiled.
The Y-model is represented as a top-down approach, but in a realistic design flow,
multiple iterations are performed before reaching the final embedded system.
Page 12
12
Function to architecture
conversion follows three axes
Computations
operations
Data
variables, arrays
floating point
memories
fixed point
operators
Communication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture
In this course we concentrate on the architectural design of dedicated logic,
where the algorithms are mapped into an optimal architecture. The algorithm
will typically be specified into a functional model, e.g. data flow and
asynchronous state machines. The architecture needs a timed model, e.g.
register transfer level (RTL). To obtain the RTL description, a refinement
needs to be done for the computations, communications, and data. The
order of these refinements is not fixed. However, it is good practice to take
the most important design decisions first. Remark that for parts of the
system that are implemented on software, the complete refinement does not
need to be performed. However, a processor and a memory structure has to
be selected. For this purpose, certain refinement, like fixed point, can be
useful.
Page 13
13
Functional modeling in
SystemC
➢ Introduction to design of digital embedded systems
➢ SystemC introduction
➢ SystemC functional model syntax
➢ Exercise 1: building a functional model in SystemC
We now take a closer look at the role of SystemC in the design of digital
embedded systems.
Page 14
14
SystemC bridges gap between
function and architecture
MATLAB
C/C++
VHDL
Verilog
SystemC
System
Functionality
Dedicated
Architecture
Traditionally, a system functionality is expressed in MATLAB
(SIMULINK/STATEFLOW) or a standard computer language (C/C++). To
express the RTL description of the system, VHDL or Verilog is used. As a
consequence the transformation from functionality into architecture does not
only involve a change in semantics but also in syntax. Moreover, because of
the different languages, this transformation cannot be done incrementally.
SystemC resolves this issue, by offering a language that can express both
functionality and architecture.
Page 15
15
What is SystemC?
➢ A modeling framework in C++ for the refinement of system from a functional
description into an architecture
➢ Contributions:
 hardware modeling with C++: OCAPI (IMEC) and SCENIC (Synopsys/UC
Irvine)
 fixed-point data types: Frontier Design
 hardware-software co-design: CoWare (IMEC/CoWare)
➢ Language first standardized in December 2005 as IEEE 1666, revised in 2011 as
IEEE 1666-2011
➢ Extensions of SystemC:
 Verification library.
 Transaction level modeling library ( integrated in IEEE 1666-2011).
 Analog and mixed-signal modeling.
➢ More info: www.accellera.org
SystemC is a C++ library that allows to refine a system from a functional description
into an architecture.
Three contributions were essential into the creation of SystemC:
 The modeling of RTL hardware with C++ was demonstrated in the OCAPI
framework of IMEC, as well as the SCENIC project of UC Irvine in
cooperation with Synopsys.
 Frontier Design (an IMEC spin-off) contributed to the fixed-point data
types.
 CoWare (another IMEC spin-off) introduced concepts of hardware-software
co-design.
The SystemC language was first standardized in December 2005 by the IEEE. A
revision (IEEE 1666-2011) was made in 2011.
More recently a number of extensions of the SystemC language were proposed:
 Verification library adds random generator and transaction recording.
 Transaction level modeling, a high-level approach to modeling digital
systems where details of communication among modules are separated
from the details of the implementation of functional units or of the
communication architecture. This extension is included in the revised IEEE
standard.
 Analog and mixed-signal library extends SystemC with the following
modeling paradigms: timed data flow, linear signal flow modeling, and
electrical linear network modeling.
Page ‹#›
All information about SystemC can be downloaded from the www.accellera.org website.
Page 16
16
Which tools are available for
SystemC?
➢ Open source simulation library available
➢ Open source translators from Verilog or VHDL to SystemC
➢ Commercial synthesis tools:
 Cadence (Stratus HLS).
 Mentor(Catapult C).
 NEC(CyberWorkBench).
 SystemCrafter (SC).
 Xilinx (Vivado Design Suite).
With respect to tool support, the Accellera System Initiative
(www.accellera.org) makes an open-source simulation library available.
Various academic institutes also offer translators from Verilog or VHDL to
SystemC. For synthesis however, we have to rely on commercial tools.
Page 17
17
SystemC language
architecture
C++ language
Core Language
Modules
Ports
Exports
Processes
Interfaces
Channels
Events
Event-driven simulation kernel
Data-types
4-valued logic type
4-valued logic vectors
Bit-vectors
Finite-Precision integers
Limited-Precision integers
Fixed-Point types
Pre-defined Channels
Signal, Clock, fifo,
Mutex, Semaphore.
Libraries for Specific Models of Computation and/or methodologies, e.g. TLM
interfaces, bus models, SystemC verification library
Utilities
Report Handling,
Tracing
User Application
The classes of the SystemC library fall into four categories: the core
language, the SystemC data types, the predefined channels, and the
utilities. The core language and the data types may be used independently
of one another.
At the core of SystemC is a simulation engine containing a process
scheduler. Processes are executed in response to the notification of events.
Events are notified at specific points in simulated time. In the case of time-
ordered events, the scheduler is deterministic. In the case of events
occurring at the same point in simulation time, the scheduler is non-
deterministic. The scheduler is non-preemptive, which means that once an
execution of a process is started, it cannot be halted but executes till the end
of the process.
Page 18
18
SystemC core language
sc_module
sc_port
sc_prim_channel
sc_process
sc_interface
sc_event
sc_export
The SystemC core language contains a number of primitives to define
parallelism. A system is split in a number of modules (sc_module). A module
communicates with the external world through ports (sc_port). Two ports are
connected through a channel. SystemC predefines some primitive channels
(sc_prim_channel), but more complex channels can be user defined. A
channels connects to a port via an export (sc_export).
A hierarchical module consists of a structure of other modules. A non-
hierarchical module contains one or more processes (sc_process). A
process is executed in case that an events (sc_event) happens. A process
interacts with a channel through an interface (sc_interface), which is a
collection of functions that are supported by sc_port.
Page 19
19
Functional modeling in
SystemC
➢ Introduction to design of digital embedded systems
➢ SystemC introduction
➢ SystemC functional model syntax
➢ Exercise 1: building a functional model in SystemC
SystemC contains all necessary constructs to model the functionality of a
system. We will focus on activity-oriented models, although SystemC can
also express other modeling paradigms. Let’s review these constructs.
Page 20
20
process process
FIFO
Kahn Process Networks in
SystemC
➢ (Modules to structure design)
➢ Functional processes
➢ First-In-First-Out queues
➢ Simulation engine
SystemC has support to model Kahn process networks, with the limitation of
bounded queues. A Kahn process network is a directed network of
processes that are interconnected by first-in-first-out (FIFO) queues of
infinite size. Each time that a process is executed, tokens are consumed
from the input queues and new ones are produced in the output queues. If a
token is not present on an input queue, the consumption of the token will
block. Kahn process networks exhibits deterministic behavior that does not
depend on computation or communication delays. In SystemC the
constructs are available to define the processes and the queues. These
constructs interact with a simulation engine, which schedules the execution
of the processes. The simulation engine stops when there is no longer
activity in the network.
Page 21
21
Modules are used for structural
partitioning the functionality
➢ Each module has its own class, derived from the sc_module
class.
➢ Every constructor of a module class shall have exactly one
parameter of class sc_module_name.
 It is good practice to make this name for an instance of the
module the same as the C++ variable name through which
the module is referenced.
➢ A module can be hierarchical or contains processes. In the latter case,
the SC_HAS_PROCESS(“class name”) macro is used to indicate
that the module contains processes.
Modules are used to partition the functionality in the design. However, you
should not use too many modules, as this complicates the design, but also
not too few. In general, functionality that is implemented in a different
architectural style (e.g. software or dedicated hardware) or on a different
location should be in different modules.
Every module is derived from the base class sc_module and should have a
name, which is used for debugging purposes.
The macro SC_HAS_PROCESS(“class name”) indicates that the module in
not hierarchical and contains processes.
Page 22
22
Example of a functional model of
an adder
SC_MODULE(adder) {
//define ports
//define processes, internal data, etc.
SC_CTOR(adder) {
// body of constructor;
// process declaration, sensitivities, etc.
};
};
Class adder : public sc_module {
public:
// define ports
//define processes, , internal data, etc.
SC_HAS_PROCESS(adder);
adder(sc_module_name name):
sc_module(name) {
// body of constructor;
// process declaration, sensitivities, etc.
};
};
Explicit: With MACROs:
The slide shows an explicit definition of a modules, consisting of the class
definition, the SC_HAS_PROCESS macro and the constructor.
To compact the definition, two more macros are provides:
 SC_MODULE(“class name”) is equivalent to the first two lines of the
explicit definition
 SC_CTOR(“class name”) equals the SC_HAS_PROCESS macro
and the first lines of the constructor. It can be used when if only a
name is passed to the constructor. If you also want to pass
parameters, an explicit declaration is needed.
Page 23
23
Ports are used to communicate
with a FIFO channel
➢ General port definition: sc_port<interface>
➢ Predefined ports are: sc_fifo_in<T> and sc_fifo_out<T>.
 sc_fifo_in<T> is derived from sc_port<sc_fifo_in_if<T>,0> with interface
functions read(), nb_read(), and num_available().
 sc_fifo_out<T> is derived from sc_port<sc_fifo_out_if<T>,0> with interface
functions write(), nb_write(), and num_free().
➢ blocking read and write interface functions (automatic synchronization with
implicit wait() operations)
int a = f1.read(); // read a token
f1.write(a); // write a token
➢ Inspecting queues
int a = f1.num_available(); // number of tokens in a queue
int a = f1.num_free(); // number of free places in a queue
In SystemC the sc_port object is used to communicate with a channel. Ports
provide the means by which a module can be coded such that it is
independent of the context in which it is instantiated. A port forwards
interface method calls to the channel to which the port is bound.
For functional modeling, processes communicate through fifo ports. Two port
types for sc_fifo<T> channel, where T is the basic type of the elements in
the fifo channel, are supported:
 Input: sc_fifo_in<T> which is basically equivalent to
sc_port<sc_fifo_in_if<T>,0>, where the first parameter is the input
interface of a FIFO and the second parameter specifies that multiple
channels can be connected to a FIFO. However the practical use of
these multiple bindings is not clear. Therefore it could be useful to
define its own fifo port with a restriction of a single binding.
 Output: sc_fifo_out<T> which is equivalent to
sc_port<sc_fifo_out_if<T>,0>. Also here, the use of multiple bindings
is not recommended.
Several functions are associated to the sc_fifo class:
 read() gets a token from the queue. It blocks when no tokens are
available.
 write() puts a token on a queue. It blocks when there are no free
spaces in the queue
There are also inspecting functions available to look at the number of tokens
or free spaces.
Page 24
24
Example of a functional model of
an adder (continued)
SC_MODULE(adder) {
sc_fifo_in<int> a,b;
sc_fifo_out<int> c;
//define processes, internal data, etc.
SC_CTOR(adder) {
// body of constructor;
// process declaration, sensitivities, etc.
};
};
When we add the definition of the ports to the constructor of the adder we
obtain the code on the slide.
Page 25
25
SC_THREAD processes are used
to model functional processes
➢ SC_THREAD processes run forever once started.
➢ SC_THREAD processes can be suspended by means of the
wait(event) function. In functional modeling the wait
statements are hidden in the read() and write() functions to the
queues.
➢ Multiple processes per module are possible
➢ Processes can also be dynamically created.
The actual computation in the application is performed in the processes. As
a consequence, they also define the parallelism in the application.
SystemC supports three types of processes. For functional modeling we use
the SC_THREAD process. An SC_THREAD process runs forever when
started. It can be suspended by a wait(event) function. Often the wait(event)
function is implicitly present in the communication functions.
Processes are executed on events. These events can be statically or
dynamically defined. Static sensitivity is set by means of the variable
sensitive of sc_module. Dynamic sensitivity to a certain event is set by wait
(event) for an SC_THREAD process.
A module can have multiple processes.
Processes might be dynamically created during simulation. However, no
synthesis support exists for dynamic processes. Therefore, we do not use
them in this course.
Page 26
26
Example of a functional model of
an adder (continued)
SC_MODULE(adder) {
sc_fifo_in<int> a,b;
sc_fifo_out<int> c;
void compute() {
while(true) {
int valuea = a.read();
int valueb = b.read();
c.write(valuea+valueb);
}
}
SC_CTOR(adder) {
SC_THREAD(compute);
}
};
Adding the definition of an SC_THREAD process to the adder results in the
code on the slide. This adder waits for data on both its input queues
sequentially and next produces a token on its output queue.
Page 27
27
Define the main program
➢ The systemc library must be included in the main program:
 #include <systemc.h>
➢ In sc_main() the following actions are taken:
 Instantiate channels with:
• sc_fifo<T> (”name”, length); // default length 16
• e.g. sc_fifo<int> f1(”f1”,2);
 Instantiate the modules.
 Bind ports of modules to channels:
• Positional
• named.
 Call sc_start() to start simulation and run until end of any
activity.
The global structure of the system is defined in the main function. Because
main() is already used by the SystemC library, the main function for the
user application is sc_main().
In sc_main(), the following actions are taken:
1. Instantiation of the channels. The basic channels that we use in
functional modeling is sc_fifo. A FIFO queue is defined by means
of the template class sc_fifo<T>. T can take on any basic data
type, e.g. int, float, etc. The sc_fifo class declares a finite length
buffer of tokens. The default length is 16 elements. The queue
also has a name for debugging and statistics retrieval purposes.
The constructor for the queue is sc_fifo<T> f1 (“name f1”, length);
A sc_fifo can only be written from one process.
2. Instantiation of the modules. A module can be instantiated multiple
times.
3. Binding the ports of the modules to the channels. This can be
done in two ways: positional or named. Named binding is
preferred because it is less prone to errors than positional port
binding.
4. Start the simulation.
Page 28
28
Example of a functional model of
an adder (continued)
int sc_main(int argc , char *argv[]) {
sc_fifo<int> fifo_a, fifo_b, fifo_c; //channel instantiation
… // instantiate signal generation and evaluation module
adder my_adder(“my_adder”); // module instantiation
my_adder.a(fifo_a); // binding of port to channel
my_adder.b(fifo_b);
my_adder.c(fifo_c);
… // other modules and test bench, which drive fifo_a and fifo_b.
sc_start(); // start simulation
};
Elaborationphase
The sc_main() function for the adder is shown on the slide.
Remark that the arguments of sc_main() are identical to these of main().
To connect the ports to the channels, named bindings are used.
Page 29
29
SC_MODULE(superfunc) {
// IO ports
sc_fifo_in<float> in;
sc_fifo_out<float> out;
//internal queues
sc_fifo<float> d;
// internal modules
function func1;
function *func2;
// Module constructor
SC_CTOR (superfunc):
func1(“func1”) {
func1.in(in);
func1.out(d);
func2 = new function (“func2”);
func2->in(d);
func2->out(out);
}
}
Modules can also be used to
create hierarchy
func1
superfunc
d
func2
sc_module(function)
In a functional model hierarchy will be used to make the design more
readable. The hierarchy is fully transparent: it basically acts as a container
for the basic modules, but does not add any functionality or synchronization.
The definition of a hierarchical module consists of the definition of the ports
and internal queues. Next the internal modules are defined. Care must be
taken that the module objects will still exist after execution of the constructor.
Two alternatives exist to guarantee this: either construct them when calling
the constructor, or create them with a new function.
The constructor creates the two modules and binds the ports to the
channels.
Page 30
30
Simulation engine
➢ In an un-timed model, the simulator only advances in delta-
cycles:
 If it is started to run for a finite amount of time, it will never
stop.
 We therefore run it until no events are present: sc_start();
➢ Ways of stopping the simulator:
 Terminate a process (return from SC_THREAD): the simulator
will stop due to the lack of events.
 Call sc_stop() when a termination condition is fulfilled.
In a functional model no notion of time is present. Every action processes
infinitely fast. As a consequence, the simulation kernel only advances in
delta cycles of infinite small time units. If we would start the simulation kernel
with a finite amount of time to run, it would never reach that time and hence
run forever. Therefore we run the simulation kernel until no events are
present any more. This is achieved with the sc_start() command.
With this approach, there are two ways of stopping the simulation:
1. We can exit a SC_THREAD. By doing so, no events will be
produced anymore and the simulation will finally stop because of the
lack of events.
2. We can check for a termination condition and explicitly call sc_stop().
This approach was used in the exercise of class 1. When the whole
image is processed and written to file, the simulation is explicitly
stopped. In general this is also the safest and most elegant way of
controlling the simulation.
Page 31
31
Functional modeling in
SystemC
➢ Introduction to design of digital embedded systems
➢ SystemC introduction
➢ SystemC functional model syntax
➢ Exercise 1: building a functional model in SystemC
Finally, let’s exercise what we have learned so far.
Page 32
32
Goal of this exercise
➢ use a simplifiedJPEG block diagram to practice functional
modeling
➢ develop a functional process that fits into a system
➢ simulate a functional model
➢ observe the overall behavior of a system
The goal of this exercise is to practice functional modeling. We will use a
simplified JPEG block diagram for this purpose. A process will be defined
and integrated in a JPEG functional model. Next the functional model will be
simulated and the overall behavior of the system will be observed.
Page 33
33
What is JPEG?
➢ “JPEG” stands for
“Joint Photographic Experts Group”
➢ “JPEG” is a standard for color image compression
➢ “JPEG” is widely used (e.g. on the WWW)
➢ More information?
 http://www.jpeg.org/
JPEG stands for “Joint Photographic Experts Group” and is a compression
standard for color images. It is widely used. More information can be found
on www.jpeg.org
Page 34
34
(Partial) JPEG: a simple block
diagram
DCT
Quantize
(+table)
ZIGZAG
SCAN
RUN-LENGTH
ENCODER
IDCT
Normalize
(+table)
ZIGZAG
SCAN
RUN-LENGTH
DECODER
Original
Image
Reconstructed
Image
JPEG-ENCODER
JPEG-DECODER
R2B
B2R
Parameters: width, height, #bits
Parameters: width, height, #bits
A simplified block diagram of a JPEG encoder and decoder is shown on the
slide.
First and original image is inputted and split in 8x8 blocks (R2B). Together
with the pixel data, also width, height and number of bits per pixel are
extracted from the image.
Next, on each 8x8 block, a discrete cosine transform (DCT) is performed,
resulting in 8x8 DCT coefficients. These DCT coefficients are quantized and
reorganized in the zigzag scan module. The resulting coefficient stream is
run-length encoded. This last block is different from the JPEG standard
where an Huffman encoder is used.
In the decoder the reverse operations are performed in the reverse order.
Page 35
35
2D Discrete Cosine Transform
➢ Non-optimized equation
➢ DCT can be separated in consecutive 1-D operations
➢ There are many optimized DCT-algorithms available
           
 


7
0
7
0 16
12
cos.
16
12
cos,
4
1
,
i j
vjui
jifvCuCvuF

           
 


7
0
7
0 16
12
cos.
16
12
cos,
4
1
,
u v
vjui
vuFvCuCjif

01
0
2
1
)(







l
l
lCwhere
The discrete cosine transform (DCT) is performed on a 8x8 pixel block and
returns an 8x8 block of DCT coefficients. Each DCT coefficient indicates the
amplitude of a horizontal and vertical frequency component. The inverse
discrete cosine transform (IDCT) returns pixel values from DCT coefficients.
The formal definition of the DCT and IDCT are shown on the slide. In stead
of this straight forward 2D operation the calculation can be split in
consecutive 1D operations, which is more efficient. There is also a large set
of optimized DCT-algorithms that exploit the regular structure of the cosine
values.
Page 36
36
Quantization
➢ Each DCT coefficient is divided by the coefficient amplitude
that is just detectable by the human eye (table)
➢ The result is rounded to an integer
➢ This reduces the number of bits needed to represent the DCT
coefficient
➢ The quantization is the place where information of the image
might be lost, resulting in lossy compression.
Next the DCT coefficients are quantized. To this end each DCT coefficient is
divided by the corresponding value in the quantization table.
The result is rounded to the nearest integer, reducing the number of bits
needed to represent the DCT coefficient.
In the quantization step image information might be lost, resulting in lossy
compression.
Page 37
37
Quantization Table
9910310011298959272
10112012110387786449
921131048164553524
771031096856372218
6280875129221714
5669574024161314
5560582619141212
6151402416101116


























N
An example of a typical quantization table is shown on the slide. It can be
remarked that the quantization values grow for higher horizontal or vertical
frequencies.
JPEG contains a number of predefined quantization tables. If a custom
quantization table is used, it must be sent to the decoder.
Page 38
38
The coefficients are zigzag
scanned
0 1 5 6 14 15 27 28
2 4 7 13 16 26 29 42
3 8 12 17 25 30 41 43
9 11 18 24 31 40 44 53
10 19 23 32 39 45 52 54
20 22 33 38 46 51 55 60
21 34 37 47 50 56 59 61
35 36 48 49 57 58 62 63
The resulting quantized DCT coefficients are next zigzag scanned. This is
done in such an order that statistically long sequences of zero coefficients
can be expected.
Page 39
39
(Simplified) Run-length coding
➢ Send the DC value “as is”
➢ Represent the high frequency data with (zero run-length,
amplitude) combinations.
➢ End the stream with EOB (= 63).
➢ Example:
 in: 79, 0, -2, -1, 3, -1, 0, 0, -1, 0, 0, 0, …
 out: 79, 1,-2, 0,-1, 0, 3, 0,-1,2,-1, 63
Next we use a non-JPEG run-length coder for our exercise. This coding
works as follows:
 The DC value is sent “as is”
 The high frequency data is split in sections consisting of a number of
zero’s followed by a non-zero coefficient. Each segment is
represented by a couple consisting of the number of subsequent
zero’s and the value of the non-zero coefficient.
 When all remaining coefficients for a block are zero, an end of block
(EOB=63) value is sent.
Page 40
40
How to start?
➢ Download exercise files form http://www.icorsi.ch/
➢ Follow installation instructions of exercises.
➢ you will find:
 In /exercises/exercise1/: main.cpp to start from
 In/exercises/modules/: library with JPEG encoder modules
{r2b,dct,quantize,zz_enc,rl_enc}.{h,cpp}, JPEG decoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules {src,snk,test}.{h,cpp}
 In /exercises/images/: test images
 In /exercises/add2systemc additional functions (df_fork, fifo_stat)
➢ Things to be done:
 make rl_dec.h and rl_dec.cpp
 complete the main.cpp with the modules.
 Compile and execute the application.
 Inspect the number of reads and writes in the fifos
 Visualize resulting image
 Test if you can launch the application in the debugger.
 Optional: make a hierarchy for the encoder and decoder.
You will find all files for starting in the exercise1 directory.
Perform the actions as indicated on the slide.
To obtain information about the number of writes and reads in the fifo’s, use
the type fifo_stat<T> i.s.o. sc_fifo<T>.
To prevent multiple bindings of a fifo_port, the classes my_fifo_in<T> and
my_fifo_out<T> are used in the exercises.
Page 41
41
Using SystemC on
Linux/Cygwin
➢ Use g++ (I used version 4.5.3).
➢ Make a workspace in Eclipse:
 Add your source files to the project.
 Add libmodules.a
 Add libadd2systemc.a (for next exercises).
 Add libsystemc.a
 Put the right include paths and linker paths
➢ Build your application from within Eclipse.
➢ Execute your application from within Eclipse.
 Exercise1.exe –i ../images/mountain.pgm –o result.pgm
We will make the exercises in a Linux environment, using g++ and Eclipse.
Eclipse is an integrated development and debugging environment. In the
exercise directory there is a step-by-step guide of how to get started with the
exercises in Eclipse.
The recent sources of the exercises and libraries can be found at
http://www.icorsi.ch/
Libraries have to be compiled before starting the exercise session.
Page 42
Model Based System
Design
Class 2: Fixed-point
refinement
Marc Engels
e-mail: marc.engels@flandersmake.be
In this second class we will focus on the refinement of the data types of the
functional model. More in particular we will explain the definition of fixed-
point word lengths for the variables in the functional model. This action is
relevant both for mapping on embedded processors with limited data sizes,
e.g. 16-bit processors, or for mapping on a dedicated architecture.
A the end of the class, you will be able to perform fixed point refinement on a
functional model of an embedded system in SystemC.
Page 43
43
Fixed point refinement
➢ Fixed word length optimization
 Overflow and quantization
 MSB determination
 LSB determination
➢ Fixed word length support in SystemC
➢ Exercise 2: fixed point refinement of IDCT
This lecture on fixed point refinement consists of three parts:
• In the first part we introduce the quantization and overflow effects of
fixed point representations. We also present some methods to
determine the most and least significant bits (MSB and LSB).
• Next, we introduce the fixed point support in SystemC. This consists of
an extensive set of fixed point types. In addition, SystemC also
supports 4-valued logic to define bus structures.
• Finally, we introduce the exercise on fixed point refinement.
Page 44
44
Fixed point refinement is one of the
steps in architectural design
Computations
operations
Data
variables, arrays
floating point
memories
fixed point
operators
Communication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture
Let’s concentrate on the architectural design step that translates an
algorithm into an optimal architecture. The algorithm will typically be
specified into a functional model, like data flow. The architecture needs a
timed model, e.g. register transfer level (RTL). Initially the algorithm will be
modeled in floating point. Cost-effective implementation requires, however, a
refinement into fixed point types.
Page 45
45
*
3 bytes (mantissa)
+ 1 byte (exponent)
Fixed-point
•minimum area
•low power
•high speed
8
*6
14
Finite word lengths are a must
for DSP applications
Floating-point
•powerful
•expensive (storage & ops)
Most signal processing algorithms are specified in floating point precision.
This is a very powerful signal representation with high accuracy, but is also
expensive in storage and operation cost. For instance, a typical
representation of a floating point number is a mantissa of 24 bits and an
exponent of 8 bits. As a consequence, a floating point multiplication is
equivalent to a 24-bit multiplication and a 8-bit addition.
However, many applications, like cable modems and wireless
communication devices, require low cost and low power for a high
processing speed. As a consequence, the DSP algorithms will be performed
in fixed-point arithmetic. With an 8-bit fixed point notation, for instance, the
cost will drop dramatically as the hardware cost for a multiplication is a
quadratic function of its input width.
This requires the designer to translate floating point types into fixed point
types, using a refinement strategy.
Page 46
46
2
3
2 2 2 2 2i.2
2 1 0 -1 -2 -3
WL
IWL
MSB LSB
How to model a fixed-point
signal?
➢Total number of bits WL
➢Integer bits IWL
➢Value representation
•2’s complement (i=-1)
•unsigned (i=1)
WL-IWL
A fixed point type can be defined by three parameters:
• The total number of bits WL.
• The position of the decimal point, indicated by the number of integer
bits IWL.
• The way in which the value is represented. In the case of a signed
number, 2’s complement notation is the most common because it
allows easy arithmetic. However, alternatives like sign-magnitude and
1’s complement are also feasible.
Page 47
47
How do we quantize?
truncate
(floor)
fxp
flp
round
fxp
flp
magnitude
truncate
fxp
flp
ceil
fxp
flp
If the result of a calculation has more precision than available in the fixed
point format, the value has to be quantized. Several ways of quantization
exist:
• Truncate or floor is the cheapest approach because it is standard
available in hardware. However, it generally gives the worst
performance of the quantization techniques.
• Magnitude truncate realizes a floor function for positive values and a
ceil function for negative values. The technique is natural for sign
magnitude representations. The advantage is a symmetrical behavior
around the zero value.
• Applying the ceil function to the complete range is an alternative which
is seldom used.
• Rounding is the technique with the best performance for most cases.
However, it also is the most expensive one. In hardware this requires
the addition of 0,5 the least significant bit followed by a truncation
operation.
Page 48
48
What happens on an overflow?
wrap-around saturation
flp flp
fxp fxp
max. value
When the result of an operation is larger than the maximum value that can
be represented by the fixed point format (overflow), we have two
possibilities:
• Wrap-around: the overflow bits are neglected. For unsigned values, this
is equivalent to a modulo operation (see figure on slide). For 2’s
complement numbers, a one bit overflow results in the maximum
negative number. This is the standard behavior in a hardware
implementation.
• Saturation: when an overflow occurs, the signal is set to the maximum
value that can be represented. Additional hardware is necessary to
realize this behavior.
Remark that a similar situation can occur for the minimum value of a signal.
For instance, if the subtraction of two unsigned signals results in a negative
value and must be represented in an unsigned format. For such underflow,
similar remedies are possible.
Page 49
49
Saturation Hardware
MAX_VAL
MIN_VAL
comp
comp
mux
mux
VALUE RESULT
When we opt for a saturation strategy, the following hardware is needed. The
result of the operation must be compared to the maximum positive and
negative numbers. This can be done with an explicit comparator or with the
overflow flags from the adders. If overflow or underflow is reached, the result
of the operation is replaced by the maximum or minimum value respectively.
Remark that the hardware complexity of a comparator or multiplexer is
comparable to a adder. As a consequence, saturation hardware can require
a significant amount of area.
Page 50
50
Floating-point
algorithmADC
8 7
*
*
+
?
?
?
?
??
During design we must specify
fixed-point formats for signals
z-1
DAC
Going back to the need for fixed point representations, the designer is faced
with the following problem. He obtains a floating point algorithm and needs
to translate the floating point types into fixed point types, using a refinement
strategy. For each floating point number, a fixed point characteristic
(including total and integer word lengths, overflow and rounding behavior)
must be chosen. In most situations the input and output formats are defined
by the system context (e.g. analog-to-digital converter). Remark that
determining these ADC and DAC precisions is an important task in the
overall system design.
Page 51
51
Fixed-point refinement is a
complex optimization problem
➢Minimize overall cost:
minimal word lengths
truncate and wrap-around
➢MSB determination:
goal: avoid unwanted overflows
method: find min, max signal values
result: MSB position, value
representation, overflow
➢LSB determination:
goal: keep required precision
method: evaluate difference
between flp a fxp behavior
result: LSB position, quantization
safe range
quantization
This fixed-point refinement is a complex optimization problem where the
search space grows exponentially with the number of signals. The goal of
the optimization is to minimize the overall implementation cost and power
consumption. At the same time the performance degradation (e.g.
implementation loss for telecom systems) must be small. Remark that it is
essential to define a performance degradation bound (e.g. implementation
loss for communication systems, visual performance measure for multimedia
systems) before starting the fixed point refinement.
The optimization problem can be separated in two parts:
1. Determination of the most significant bit (MSB). First, the minimum and
maximum signal value must be determined. From this the MSB
position, value representation and overflow behavior is selected such
that overflows are avoided as much as possible.
2. Determination of the least significant bit (LSB). By evaluating the
difference in performance between the fixed and floating point behavior
of the algorithm, the LSB position and quantization method are
determined for each signal. The goal is to stay within the performance
degradation bound.
In the next slides we will take a closer look at methods for MSB and LSB
determination.
Page 52
52
MSB determination can be
based on range calculations
* +
d
m
x
y
➢Put range (min, max) on inputs
➢Propagate range over the operators
➢This gives a save (pessimistic) estimate
range
info
[0,255]
12
range
calc.[0,255]
[0,3060] [0,3315]
z-1
MSB determination can be done by means of range propagation. This
analytical method works as follows:
1. On each input signal, the range, i.e. the minimum and maximum
values that occur in a signal, are specified.
2. Next, the signal flow graph of the algorithm is traversed and for each
operator, the range of its output is calculated based on its input ranges.
Because the method exactly calculates the exact minimum and maximum
signal values, it results in a safe, but sometimes pessimistic, estimation of
MSB position.
Page 53
53
Range propagation is a simple
calculation
Operator minc maxc
c=a+b mina+minb maxa+maxb
c=a-b mina-maxb maxa-minb
c=a*b MIN(mina*minb,
mina*maxb,
maxa*minb,
maxa*maxb)
MAX(mina*minb,
mina*maxb,
maxa*minb,
maxa*maxb)
Range propagation on the operators is a simple operation. The table on the
slides shows the rules for add, subtract and multiply operations.
Page 54
54
Range calculations can get
unstable with feedback
*
+
a
X(n) Y(n)
z-1
F(n)
sample n
maxF
minF
value
When applied to feedback signals, range propagation can become unstable
and cause continuous growth of the minimum and maximum values. An
example of such a situation is shown on the slide. In such a situation, a
statistical inspection of the real signals will be needed to determine a realistic
MSB position.
Remark that the propagation mechanism also causes that all signals within
this feedback loop or depending on the output of the feedback loop will
struggle from this range explosion. Once saturation logic is introduced at one
place in the loop this problem will be solved.
Page 55
55
* +
d
m
x
12 y
stimuli
?min, max
q1
Collecting signal statistics from
simulations is an alternative
➢Perform simulation with realistic stimuli.
➢Collect minimum and maximum value on each signal during the
simulation
➢This gives an optimistic, stimuli dependent estimate
z-1
As an alternative to the analytical range propagation method, we can collect
the signal statistics during simulations. Because the obtained range
information will be stimuli-dependent, this will give an optimistic estimation of
the minimum and maximum values. As a consequence, to maximize the
confidence in the obtained results, the stimuli set should be large and
provide a complete coverage of the algorithm code.
Page 56
56
signal statistic range propagation
name min max MSB1 min max MSB2
signal1 -1.5 1.6 2 -1.9 1.9 2
signal2 -1.3 1.4 2 -2.1 2.1 3
signal3 -1.2 1.2 2 -22.0 22.0 6
signal4 -1.2 1.2 2 -∞ ∞ ∞
Combine both methods for
accurate MSB determination
➢If MSB1 == MSB2: wrap-around(MSB1)
➢If MSB1 < MSB2: wrap-around(MSB2)
➢If MSB1 << MSB2: saturation (MSB1)
➢MSB2 is ∞ saturation (MSB1)
As can be expected, combining both methods gives the best results. Each
signal in the system will then be in one of the following situations:
• Both methods result in the same MSB position. Quite logically, the
signal can safely be specified with the resulting MSB position and wrap-
around overflow behavior.
• When the analytical MSB position is larger than the statistical MSB
position, we can make a trade-off between the analytical MSB with
wrap-around and the statistical method with saturation. In most case
the wrap-around functionality will be the most economical. Only when
the statistical MSB position is much smaller, saturation logic can be
beneficial.
• In the case of a range growth because of feedback, the analytical MSB
position cannot be calculates (going to infinity). In this case, the
statistical MSB position is chosen together with a saturation behavior.
After introducing the saturation on one signal in the feedback loop, we
need to re-simulate to get useful results for the rest of the algorithm.
An example of each of these situations is shown on the slide.
Page 57
57
Q +
B bits
input output outputinput
noise
Quantization effects can be
modeled as additive noise
➢Noise is approximated by a statistical model with the following
assumptions:
the noise is uncorrelated to the input.
the noise is white.
the probability distribution is uniform.
When we look at the LSB side, the question arises what the effect is of
quantization. Many authors approximate the quantization effect as an
additional noise source. They assume that:
• The noise sequence is a sample of a stationary random process (i.e.
whose statistical parameters do not change over time).
• The noise sequence is uncorrelated with the input sequence.
• The random variables of the noise process are uncorrelated, i.e. the
error is a white-noise process.
• The probability distribution of the error process is uniform over the
range of the quantization error.
Page 58
58
Each quantization effect has
mean and variance
➢ Rounding with step D:
➢ Truncation with step D:
➢ Magnitude truncation with step D:
12
and0
2
2 D
 nnm 
12
and
2
2
2 D

D
 nnm 
3
and0
2
2 D
 nnm 
The noise process can then be modeled by means of its mean and variance.
The expressions for mean and variance for the three most popular
quantization methods are shown on the slide. D is the quantization step.
Rounding and magnitude truncation result in a 0 mean, but rounding has the
lowest variance. Truncation and rounding have the same variance, but
rounding has the lowest mean. As can be expected, rounding introduces the
least quantization noise.
Page 59
59
This results in an equivalent
linear network
Q1 +
* +
d
m
x
12 y
z-1
Q2 * +
d
m
x
12 y
z-1
e1(t)
+
e2(t)
))1()()(12())1()(12()( 121  tetetetxtxty
Replacing the quantization by an additional noise source results in a linear
model of the quantized algorithm. This can then be analytically analyzed by
means of well-developed linear signal processing theory. For many
quantization effects, this linear model is a good approximation. It has, for
instance been used to determine the effects of quantizing the signals in FIR
filters.
As an exercise, calculate the resulting signal to noise ratio in the case that:
• x(t) ranges between 0 and 255 with a uniform distribution.
• both quatization steps are rounding the values to the nearest integer.
Page 60
60
… but quantization is a non-
linear operation
*
+
-0.96
X(n) Y(n)
z-1
Q
X(0) = 14, x(n) = 0 for n > 0
round to nearest integer
B bits
...
...
with rounding:
without rounding:
However, not all applications are linear. Quantization in non-linear systems
can lead to non-intuitive behavior. In infinite impulse response (IIR) filters, for
instance, quantization can generate limit cycles. For a stable floating-point
IIR filter implementation, the output will decay asymptotically to zero when
the input becomes zero. For the same system, implemented with finite
precision, the output may continue to oscillate indefinitely with a periodic
pattern while the input remains equal to zero. This effect is often referred to
as zero-input limit cycle behavior. An example of such behavior is shown on
the slide.
Page 61
61
LSB determination is based on
simulations
All fixed-point
simulate
output
ok
yes
no
* +
stimuli
12
x
ym
Q
* +
12
x
ym com
pare
Q
z-1
z-1
Non-linear quantization effects are difficult to analyze analytically. Therefore,
mostly simulation based methods are used. To this end the output of a
reference simulation is compared to a simulation with the quantized signals.
Again sufficient large stimuli sets, which have a complete code coverage,
must be used.
Page 62
62
Signal to quantization noise
ratio (SQNR)










 22
22
10log10
ee
ss
x
m
m
SQNR


Q
-
e
me,e
ms,s
xQ
To get a better insight in the optimization trade-off, the difference between
the floating-point and fixed-point values (e) and the resulting signal to
quantization noise (SQNR) is a useful guidance.
The SQNR for all signals is calculated as follows:
• During signal assignments the statistics (mean, standard deviation) for
the error signal as well as for the output signal are collected.
• At the end of the calculate the signal to quantization noise ratio SQNR
is calculated for each signal.
Page 63
63
LSB selection optimizes cost and
performance
quantization
set
SQNR
pi
SQNR
accu
SQNR
pix
SQNR
coeffs
SQNR
block
SQNR
temp block
SQNR
blocki cost SNR PSNR
0 208 253 Inf 184 Inf 225 Inf 787968 27,64 31,49
1 45,5 59,76 Inf 174 Inf Inf Inf 759296 27,48 31,33
2 45,5 59,76 25,15 174 Inf Inf Inf 759296 22,66 26,51
3 45,5 59,76 38,77 174 Inf Inf Inf 759296 26,91 30,75
4 45,5 59,76 47,3 30,88 Inf Inf Inf 230912 27,35 31,19
5 45,5 59,8 47,3 30,88 29,38 Inf Inf 230912 27,34 31,19
6 45,5 61,4 47,3 30,88 29,38 -1,93 Inf 41472 20,47 24,32
7 45,5 59,8 47,3 30,88 29,38 Inf Inf 72192 27,34 31,19
8 45,5 60,23 47,3 30,88 29,38 16,73 Inf 56832 26,96 30,8
9 45,5 59,88 47,3 30,88 29,38 31,86 Inf 67072 27,31 31,16
The optimal LSB is determined by running the simulation multiple times with
various quantization sets. For each quantization set, the SQNR per signal,
the overall SNR and PSNR, and the cost is calculated. The goal is to find the
cheapest solution that realizes the specified performance. This procedure
can be automated by means of an optimization routine.
When changing the quantization for one signal at the time, the statistics give
an impression of the sensitivity of the cost and the performance to the
quantization of a signal. As a rule of thumb, the SQNR of a signal should be
higher than the overall SNR.
Remark that the SQNR and SNR statistics are dependent on the input. As a
consequence, the optimization should be performed on a representative set
of inputs.
Page 64
64
Fixed point refinement
➢ Fixed word length optimization
 Overflow and quantization
 MSB determination
 LSB determination
➢ Fixed word length support in SystemC
➢ Exercise 2: fixed point refinement of IDCT
In the next part we discuss the fixed point support in SystemC
Page 65
65
SystemC introduces a number
of specific data types
Type Description
sc_logic 4 value {0,1,X,Z} single bit
sc_int 1 to 64 bit signed integer
sc_uint 1 to 64 bit unsigned integer
sc_bigint Arbitrary size signed integer
sc_biguint Arbitrary size unsigned integer
sc_bv Arbitrary sized 2 value vector
sc_lv Arbitrary sized 4 value vector
sc_fixed Signed fixed point
sc_ufixed Unsigned fixed point
sc_fix Untemplated signed fixed point
sc_ufix Untemplated unsigned fixed point
SystemC introduces a number of specific data types, which correspond to
data types that are frequently used in Hardware Description Languages
(HDLs). These types include sc_logic to make 4 valued representation that
can be high (1), low (0), undefined (X) or in a high-impedance (Z) state.
Integers can be of arbitrary length with sc_int, sc_uint, sc_bigint and
sc_biguint. SystemC also supports logic vectors with 2 or 4 valued logic with
sc_bv and sc_lv. sc_fixed and sc_ufixed define fixed point numbers where
the characteristics of the number are defined by a template. sc_fix and
sc_ufix use a run-time argument to define the fixed point characteristics.
This is interesting to try out different quantization settings without
recompilation. However, these types can not be used in synthesis, while the
others can.
Page 66
66
SystemC templated fixed-point
types
➢ Two fixed point templates
 sc_fixed <wl, iwl, q_mode, o_mode, n_bits> x; // signed
 sc_ufixed <wl, iwl, q_mode, o_mode, n_bits> y; // unsigned
➢ Parameters:
 wl = number of bits
 Iwl = number of integer bits
 q_mode = quantization method (SC_RND / SC_TRN / SC_TRN_ZERO
/ ...)
 o_mode = overflow method (SC_SAT / SC_WRAP / … )
 n_bits = number of saturated bits in case of wrapping (default 0)
➢ If quantization and overflow not specified the defaults (SC_TRN and
SC_WRAP) are used
Two data types provide full flexibility in representing fixed point numbers with
static parameters: sc_fixed (signed, 2’s complement numbers) and sc_ufixed
(unsigned numbers). The constructor of these fixed-point types carry the
information of the word lengths and quantization and overflow behavior:
• wl is the total number of bits
• iwl represents the number of integer bits, i.e. left from the binary point.
• q_mode specifies the quantization method to be rounding (SC_RND),
flooring (SC_TRN), or magnitude truncate (SC_TRN_ZERO). In
addition, some very particular, rarely used quantization modes are
specified.
• o_mode selects the overflow mode to be saturation (SC_SAT),
saturation to zero (SC_SAT_ZERO), symmetrical saturation
(SC_SAT_SYM), wrap-around (SC_WRAP), or sign-magnitude
wrapping (SC_WRAP_SM).
• n_bits specifies the number of saturated bits in case of wrapping. This
allows to generate some special wrapping methods that keep the sign
of the signal. Default nb is set to 0.
Page 67
67
Fixed point lengths
sc_fixed <5, 7> v;
X X X 0 0 [ -64 , 60 ]X X
sc_fixed <5, 3> v;
X X X [ -4 , 3.75 ]X X
sc_fixed <5, -2> v;
X X X X X [ -0.125 , 0.109375 ]S S
Two of the arguments specified to the fixed point data type were word length
(wl) and integer word length (iwl). Word length must be greater than 0.
Integer word length can be positive or negative, and larger than the word
length.
For instance if the word length is specified as 5 bits but the integer word
length is 7 then two zeroes will be added to the end of the object.
If the integer word length is a negative value then sign bits after the binary
point will be extended. For instance if wl = 5 and iwl = -2 then two sign bits
will be added to the object. The sign bits are simply the most significant bit of
the 5 bit number. By extending the sign bits, the value of the number is
maintained.
Page 68
68
Quantization methods
sc_ufixed <5, 3, SC_RND> v;
v = 3.1875
0 1 1 0 1
3.1875
011.0011
3.25
0 1 1 0 0 3.0
sc_ufixed <5, 3, SC_TRN> v;
v = 3.1875
[ 0 , 7.75 ] precision = 0.25
quantization
error
0.0625
0.18753.1875
011.0011
This slide shows an example that illustrates the difference between rounding
and flooring functionality. As can be seen, rounding always results in smaller
quantization errors than flooring.
Page 69
69
Overflow handling
sc_fixed <5, 5, SC_RND,SC_SAT> v;
v = 18 ;
0 1 1 1 118 15
1 0 0 1 018 -14
sc_fixed <5, 5, SC_RND,SC_WRAP> v;
v = 18;
[ -16 , 15 ]
The slide shows an example with different overflow handling methods:
saturation and wrap-around for a two’s complement number. As can be seen
largely different outputs are generated for this different overflow methods.
Page 70
70
Fixed-point simulation
➢operations in floating-point
➢quantization and overflow handling during assignment
sc_fixed <4,3> a;
sc_fixed <4,1> b;
sc_fixed <4,2> c;
a = 1.6;
b = 0.9;
c = a * b;
1.6 1.5
0.9 0.875
1.3125 1.25
Q
Q
Q*
0.5
0.125
0.25
lsb precision
a
b
c
When working with fixed-point arithmetic, it is vital to have an efficient
representation of values and simulation of operations. For this purpose, all
operations are performed with floating point arithmetic. Only on assignment,
the quantization is performed. In case an intermediate result needs to be
quantized, an explicit assignment operation has to be used.
In the example above the multiplication a*b is a floating-point operation
having as input two fixed point values. During the assignment to c the
floating point result is automatically casted to the specified fixed point type of
variable c.
Page 71
71
SystemC fixed point types with
non-static arguments
➢ Fixed point parameter values
 sc_fxtype_params my_type(wl,iwl,q_mode,o_mode,n_bits);
 x = my_type.wl();
 my_type.iwl()=x-2;
➢ Two non-static fixed point types
 sc_fix x(my_type); // signed
 sc_ufix y(my_type); // unsigned
➢ For arrays, these types are used with a context
 sc_fxtype_context my_context(sc_fxtype_params);
 sc_fix z[64];
➢ Remark: for fixed point simulations, include in every file
 #define SC_INCLUDE_FX
 #include <systemc.h>
SystemC also allow to define fixed point types with non-static arguments:
sc_fix (signed, 2’s complement numbers) and sc_ufix (unsigned numbers).
Type sc_fxtype_params is used to configure the parameters of types sc_fix,
and sc_ufix. To set the parameters for these types declare an object of type
sc_fxtype_params, initialize the parameter values as desired, and pass the
sc_fxtype_params object as an argument to the sc_fix or sc_ufix
declarations.
The sc_fxtype_params object has the same arguments passed to an object
of type sc_fixed. These include:
• wl - word length
• iwl - integer word length
• q_mode - quantization mode
• o_mode - overflow mode
• n_bits - saturated bits
Any combination of arguments are allowed, but the order cannot be
changed. A variable of type sc_fxtype_params can be initialized by another
variable of type sc_fxtype_params. One variable of type sc_fxtype_params
can also be assigned to another.
Individual argument values can be read and written using methods with the
same name as the arguments shown above.
Page 72
72
Fixed point refinement
➢ Fixed word length optimization
 Overflow and quantization
 MSB determination
 LSB determination
➢ Fixed word length support in SystemC
➢ Exercise 2: fixed point refinement of IDCT
We now turn to the exercise, where we will perform fixed point refinement of
the IDCT operator in the JPEG decoder.
Page 73
73
Goal of this exercise
➢ Perform fixed point refinement for all the internal variables of
the IDCT in the JPEG example
➢ determine the MSB to avoid internal overflows without overflow
logic.
➢ determine the LSB to have no more that 0,5dB degradation on
the PSNR of the resulting image
The goal of this exercise is to get familiar with fixed point refinement, by
practicing it on the IDCT block of the JPEG decoder. To this end, we will
determine the LSB and MSB value for every variable in the IDCT function.
By observing the overall behavior it will be possible to optimize the LSB and
MSB values. The MSB should be determined in such a way that overflow is
avoided without introduction of overflow logic. To determine the LSB the
impact on the image quality (e.g. peak signal to noise ratio PSNR) should be
kept below 0,5dB. The PSNR is defined as the ratio between the maximum
power of a signal and the power of the corrupting noise. In our case the
noise is the mean squared error (MSE) between the original and the
decompressed image. The maximum power of the signal is MAX2, where
MAX is the maximum grey value of a pixel.
Page 74
74
How to start?
➢ You find:
In .../exercises/exercise2/ : the functional model with a fixed point IDCT
implementation; types-file datatypes_original.txt
In/exercises/modules/: library of JPEG-encoder modules
{r2b,dct,quantize,zz_enc,rl_enc}.{h,cpp}, JPEG decoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and testbench modules
{src,snk,test}.{h,cpp}
Special fixed point support functions of directory
…/exercises/add2systemc/ are used
In /exercises/images/: test images
➢ Things to do:
inspect the code to understand the behavior
Make the application
change datatypes.txt file
syntax: exercise2 -i <inputfile> -o <outputfile> -t <typefile>
Page 75
7
5
Model Based System Design
Class 3: Communication
Refinement
Marc Engels
e-mail: marc.engels@flandersmake.be
In this third class we will focus on the refinement of the communication
between the modules of the functional model. More in particular we will
explain how the FIFO communication channels can be replaced by protocols
on simple wires.
Page 76
7
6
76
Communication refinement
➢ Communication refinement
➢ Communication refinement in SystemC
➢ Exercise 3: communication refinement for
the JPEG decoder
This lecture on communication refinement consists of three parts:
• In the first part we introduce the concept of refining the inter process
FIFO communication into real protocols.
• Next, we review the support in SystemC for communication refinement.
• Finally we introduce the exercise to practice what we have learned.
Page 77
7
7
77
Communication refinement is one
of the steps in architectural design
Computations
operations
Data
variables, arrays
floating point
memories
fixed point
operators
Communication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture
In the architectural design process that translates an algorithm into an
optimal architecture, communication refinement is an important step. The
algorithm will typically be specified into a functional model, like data flow. In
this data flow model, the communication between processes is performed
via point-to-point queues. The architecture needs a model with explicit
protocols. In addition, signals could be multiplexed on a bus to reduce the
wiring overhead.
Page 78
7
8
78
Functional models use FIFO
communication
➢ Queues guarantee consistent data passing
➢ Implementation could become expensive for large sizes
➢ communication must be optimized
Process1 Process2
(infinite) storage
A FIFO is a very robust structure because it guarantees correct processing
of the data independently from the processing times of the functions and
communication times. However, queues require a large amount of storage
and also some addressing hardware. A typical implementation, for instance,
would be a memory array with modulo addressing and a read and write
pointer. Because of this large implementation cost, the communication must
be optimized.
Page 79
7
9
79
wire
Process1 Process2
Many communications can be
reduced to a single register
➢ Output of functions is registered
➢ No extra implementation cost
➢ No storage for data
➢ Consistency of communication needs to be guaranteed
Ideally, from an implementation point of view, a FIFO communication could
be reduced to a simple wire when the output signal is registered. This
requires no storage and no implementation cost for the addressing or
protocol. However, consistency of the communication must be guaranteed:
Process 2 should not use the data before it is generated and Process 1
should not produce new data before the previous has been read by Process
2.
Page 80
8
0
80
w=4
Example of correct wired
communication
wire
Process 1 Process 2
w=0
w<4
filt1
filt2
filt3
filt4
write()
w++
read()
op1
op2 op3
op4
To analyze the behavior of a wired connection, we represent the two
processes with a Synchronous Finite State Machine (FSM). In such a
Synchronous FSM the transitions take place on a clock edge. In our analysis
we assume that both processes are running on the same clock. Process 1
will perform a filtering operation in 4 cycles and will also write the data in the
register in the 4th cycle. Process 2 will initially wait for 4 cycles. Next cycle, it
will read the data and perform a first operation, followed by three more
cycles of operation. This sequence will be repeated continuously.
Page 81
8
1
81
1 w=1
2 w=2
3 w=3
4 w=4
5 read() op1
6 op2
7 op3
8 op4
9 read() op1
10 op2
Communication is perfectly
aligned
1 filt1
2 filt2
3 filt3
4 filt4 write()
5 filt1
6 filt2
7 filt3
8 filt4 write()
9 filt1
10 filt2
… …
We have to guarantee the condition that every write()
comes before a read()
ClockCycle
If we look at a timing diagram, we see that the timing is guaranteed. Every
read() happens after a write() of the signal. Also no data is lost.
Page 82
8
2
82
Small changes to design can
result in errors
➢ Increase (decrease) the number of operations in process 1 (2):
the same data will be consumed twice.
➢ Decrease (increase) the number of operations in process 1 (2):
data will be lost
➢ If the number of initial wait operations in process 2 is too low,
we will use undefined data
➢ If the number of initial wait operations in process 2 is too high,
we will loose the first data elements)
However, small changes to the finite state machines of one of the two
processes can result in errors:
• If we increase the number of operations in process 1, process 2 will
consume too early and hence twice the same data is used.
• If we decrease the number of operations in process 2, the same
happens.
• If we decrease the number of operations in process 1, process 2 will be
relatively too slow and some data will be overwritten before it has been
used.
• Increasing the number of operations in process 2 will have the same
effect.
• Also remark that the number of initial wait operations in process 2
should not be too low or too high.
Page 83
8
3
83
Example of wrong wired
communication
wirefilt1
filt2
filt3
filt4
write()
Process 1 Process 2
read()
op1
op2
In the slide an example is shown where process 2 has only two states. As a
consequence it can be expected that the data produced by process 1 is
used multiple times. Because no initial wait operations are present in
process 2, we also expect that undefined data will be used.
Page 84
8
4
84
1 read() op1
2 op2
3 read() op1
4 op2
5 read() op1
6 op2
7 read() op1
8 op2
9 read() op1
10 op2
The example results in
undesired behavior
1 filt1
2 filt2
3 filt3
4 filt4 write()
5 filt1
6 filt2
7 filt3
8 filt4 write()
9 filt1
10 filt2
ClockCycles
… …
?
Adapt cycle budget or introduce handshake protocol
The expected behavior is confirmed on the time diagram. As can been seen
on the diagram, the first two data elements for process 2 will be undefined.
Next, the read() operation of process 2 will use twice the same data
produced from process 1.
To guarantee correct behavior, two approaches exist:
• Adapt the cycle budget of process 2, for instance by introducing two
dummy cycles. However, this breaks the general approach of making
modules independent from the environment in which they operate.
• Introduce a handshake protocol that automatically synchronizes on the
data transfers. This is the most robust and reliable approach. On the
other hand, handshake protocols introduce some overhead and should
be performed on larger units.
Page 85
8
5
85
Simple handshake protocol is
more robust
➢ The flag “a” (ask) indicates that the receiver is ready to read
data in the next cycle.
➢ The flag “r” (ready) indicates that data has been written
➢ Save communication requires at least two cycles.
Many different handshake protocols are feasible. Let’s illustrate the concept
with a very simple one with two handshake lines. The handshake line “a”
(ask) is generated by the receiver and indicates that the receiver is ready to
read in the next cycle. The handshake line “r” (ready) is generated by the
transmitter and indicates that he has written data in the cycle when the flag
is raised. At least two cycles are needed for a reliable communication of a
value. Remark that this protocol is only suited for synchronous designs
where both processes are executed on the same clock.
Page 86
8
6
86
!r
r a
Simple handshake protocol is
more robust
Process 2
filt1
r=0
filt2 filt3
if (a==1){
filt4
write()
r=1}
Process 1
!a
a
if (r==1) {
read()
op1
a=0}
op2
a=1
r
a=1
r=0
The finite state machines enhanced with the protocol operations (in red) is
shown in this picture. When “a” is set, process 2 waits for the “r” flag to be
raised. Next it reads the data, lowers “a”. performs its operations, and sets
“a” again for a next sequence. Process 1 performs its operations and next
waits for flag “a” before it writes its data and raised flag “r”. The basic
assumption of this protocol is that when data is written it is read in the next
cycle.
Page 87
8
7
87
1 a=1
2 a=1
3 a=1
4 a=1
5 a=0 read() op1
6 a=1 op2
7 a=1
8 a=1
9 a=0 read() op1
10 a=1 op2
… and effectively synchronizes
the communication
1 r=0 filt1
2 r=0 filt2
3 r=0 filt3
4 r=1 filt4 write()
5 r=0 filt1
6 r=0 filt2
7 r=0 filt3
8 r=1 filt4 write()
9 r=0 filt1
10 r=0 filt2
ClockCycles
… …
Looking at the time diagram shows that the operation of the two processes
are automatically synchronized by this protocol.
Page 88
8
8
88
r a
… also when receiver is slower
than transmitter
Process 1 Process 2
filt1
r=0
If(a==1){
filt2
write()
r=1} !a
!r If (r==1){
read()
op1
a=0 }
op2
r
op3
a=1
a=1
r=0
a
When we add a state in process 2 and reduce the number of states in
process 1 to two, we make the receiving process slower than the
transmitting one.
Page 89
8
9
89
1 a=1
2 a=1
3 a=0 read() op1
4 a=0 op2
5 a=1 op3
6 a=1
7 a=0 read() op1
8 a=0 op2
9 a=1 op3
10 a=1
… but introduces then one
extra wait cycle at receiver
1 r=0 filt1
2 r=1 filt2 write()
3 r=0 filt1
4 r=0
5 r=0
6 r=1 filt2 write()
7 r=0 filt1
8 r=0
9 r=0
10 r=0 filt2 write()
Cycles
… …
The extra wait cycle can be avoided by already putting a=1 during op2
Also now, the protocol synchronizes the two processes automatically.
However, after “op3” in process2, an extra clock cycle is introduced
automatically. This is caused by the fact that process 1 has to observe that
“a” is raised before it can write the data and raise “r”. The extra cycle can be
avoided by raising’ ”a” already during “op2”.
Page 90
9
0
90
Most general protocol: 4-phase
handshake protocol
Ack
Ack
Ack
Req
Req
Req
Req
Ack
Req
Ack
Req
Req
Ack
Execute
Ack
Data
Ack
Req=1
Get Data
Req=0
Ack=0
Put Data
Ack=1
Ack=0
The simple handshake protocol of previous slides is just one of the many
possibilities. The most general protocol is the 4-phase handshake protocol
that can synchronize two systems, independent of a clock signal. The 4
phase handshake protocol consists of 4 phases:
1. Initially, both request (Req) and acknowledgement (Ack) signals
are low.
2. Next, the Req signal is raised and the operation is executed.
3. After the execution of the operation, the Ack signal is raised. Here
starts the third phase.
4. When the Ack signal is detected, the Req signal is turned off. This
phase continues until the low Req signal is detected and the Ack
signal is turned off.
The picture on the slide shows the asynchronous FSM for the four-phase
handshake protocol. In an asynchronous FSM the transitions are not
clocked and happen as soon as the guard statement is valid.
Page 91
9
1
91
Multiple variations on these
handshake protocols exist
➢ In stead of signal levels, the protocol can be based on signal
transitions.
➢ The protocol can be simplified if both systems run on the same
clock.
➢ Protocols can be simplified if one knows that the receiver or
the transmitter is fastest.
➢ Synchronization can be performed on the basis of a block:
 Set-up communication for first element of a block
 Next, communicate every cycle
➢ Some protocols are based on typical FIFO signals: full and
empty.
Besides the 4-phase handshake protocol, many other protocols exist.
For example a protocol can be constructed that is based on signal
transitions rather than signal levels.
Handshake protocol can also be simplified when both systems run on the
same clock or for the cases that the receiver or transmitter is known to be
the fastest.
Also, the efficiency of the communication can be improved by block based
handshake protocols. In such a protocol, the communication is set-up for the
first element of the block. Next, a data element is communicated every cycle.
There also exists a set of protocols based on typical FIFO signals.
Page 92
9
2
92
In some cases buffered
communication is required
process2process1
Q1
Queue size can be determined by monitoring the maximum
number of elements in a queue during simulation.
1 write(Q1) 1
2 write(Q1) 2
3 write(Q2) 3
4 4 read(Q2)
5 5 read(Q1)
6 6 read(Q1)
Q2
The replacement of the FIFO by protocols is only possible if no intermediate
storage is needed. This is not always the case. For example, the system on
the slides needs at least a storage for two data elements on queue 1. In
most cases, the number of required data storages can be derived from the
maximum number of elements in a queue during functional simulations.
Also remark that changing the order in which data is produced in process 1
or consumed in process 2 will change the storage requirements.
Another option is to integrate the required storage in one of the two
processes and match the production and consumption sequences.
Page 93
9
3
93
r a
Queues must be introduced
explicitly in hardware
FIFO process
size N
fsm
Wired
handshake
protocol
Process1 Process2
r a
If intermediate storage is needed, a FIFO must be explicitly introduced in
hardware. A FIFO will be a module with storage, a finite state machines and
communication protocols for the producing and consuming processes.
The FIFO structure can be defined once and next reused in many designs.
Page 94
9
4
94
Process1 Process2
Several communications can
also be multiplexed on a bus
Process3 Process4
Process1
Process3
Process2
Process4
bus
arbiter
r a
a r
r a
a r
Bus and arbiter classes
can be reused!
Up till now, we have considered point-to-point communications. Each
channel in the functional model is then mapped to a physical channel in the
hardware.
However, when this communication structure becomes complicated it might
be advantageous to multiplex multiple communications on a bus structure.
Communication with off-chip devices might also take advantage of a bus
structure because of the limited amount of available pins.
The bus can be modeled as a set of multiplexers. To decide when a module
is allowed to communicate on this bus, an arbiter is needed. The arbiter
works with handshake protocols with the processes. If we reuse our simple
protocol, the arbiter would react on the ask signals from the receiving
processes and reserve and transfer this ask signal to the sending process
when the bus is free for data transfer.
The bus and arbiter are modules that can be designed ones and reused in
multiple designs.
Page 95
9
5
95
Communication refinement
results in behavioral model
➢ Model that defines the relative ordering of input and outputs
➢ A clock signal is used for ordering
➢ Pins are accurate to the final implementation
➢ Internal resources are not mapped on clock cycles
(scheduling) and functional units (resource binding)
After communication refinement of a functional model, we obtain a
behavioral model. A behavioral model defines the functionality and also the
relative ordering of inputs and outputs. To perform this ordering, a clock
signal is used. Also, the pins of a module are identical to the final
implementation. On the other hand, the internal operations are functionally
modeled. They are not mapped on clock cycles and no functional units are
allocated.
Increasingly synthesis tools are moving up from the register transfer level
(RTL) synthesis toward behavioral synthesis. In the latter the synthesis tool
autonomously decided on the number and types of functional units and
schedules the operations on these functional units.
Page 96
9
6
96
Communication refinement
➢ Communication refinement
➢ Communication refinement in SystemC
➢ Exercise 3: communication refinement for
the JPEG decoder
We now take a look at the support for communication refinement in SystemC
Page 97
9
7
97
In SystemC behavioral models
use (clocked) threads
➢ Modeled with thread processes SC_THREAD or with clocked
thread processes SC_CTHREAD
➢ Every module has a clock input:
 sc_in_clk clk;
➢ The SC_THREAD process is made static sensitive to a clock edge
 Sensitive << clk.pos();
➢ To separate clock cycles wait() statements are used.
➢ A synchronous or asynchronous reset signal can be specified:
 reset_signal_is(reset, true);
 async_reset_signal_is(reset, true);
➢ Simulation must be run for a finite time (or will not stop!) or halted
explicitly.
Representing behavioral models in SystemC is straight forward. The
processes are represented with (clocked) thread processes (SC_CTHREAD
or SC_THREAD). To order the inputs and outputs, every module has a clock
input. In the case of a SC_THREAD process, it must be made static
sensitive to this clock.
To separate clock cycles, wait() statements will be used in the SC_THREAD
or SC_CTHREAD process.
It is possible to assign a synchronous reset signal to the thread processes.
In the case that the reset signal is active at a clock event, the current
process will be stopped, and called again from the start of the function. Also
an asynchronous reset is supported.
Remark that because of the introduction of the clock we cannot run until the
end of activity (this would never stop). Therefore we must run the simulation
for a finite time or halt it explicitly.
Page 98
9
8
98
Behavioral models communi-
cate via standard signals
➢ All input and outputs are standard signals
➢ Define signals with:
 sc_signal<T> a;
➢ Predefined ports for sc_signal<T> channels:
 sc_in<T> with interface function read() or assignment operator.
 sc_out<T> with interface function write() or assignment operator.
 sc_inout<T> that combines both interface functions.
Standard signals are used to communicate between behavioral processes. A
signal can only be written from one process.
For the sc_signal<T> channel, three ports are predefined:
 sc_in <T> is essentially equivalent to sc_port<sc_signal_in_if <T> >
 sc_inout <T> is essentially equivalent to sc_port<sc_signal_inout_if
<T> >
 sc_out <T> is identical to sc_inout<T>
The write() operation on a signal overwrites the present value. The read()
operation reads the current value. Also the assignment operators are
available for signals. These three ports must be bounded to exactly one
signal.
Page 99
9
9
99
Clocks in SystemC
➢ Create clock
 sc_clock clock1 ( “clock_label”, period, time_unit, duty_ratio, offset, first_value );
 sc_clock clock2 ( “clock_label”, period, time_unit, duty_ratio);
 sc_clock clock3 ( “clock_label”, period, time_unit);
➢ Clock Binding
• f1.clk( clock1 );
➢ Clocks are typically defined in sc_main();
➢ Example
2 12 22 32 42
sc_clock clock1 ("clock1", 20, SC_NS, 0.5, 2, true);
Finally we need also a clock in a behavioral model. SystemC offers special
clock functions, where you can choose the period, duty ratio, initial offset
and first value. An example is shown on the slide.
Page 100
1
0
0
100
Example: summing 3 values on
an input
SC_MODULE(sum3) {
sc_in_clk CLOCK;
sc_in<bool> RESET;
sc_in<unsigned> A;
sc_out<unsigned> D;
void compute();
SC_CTOR(sum3) {
SC_CTHREAD(compute, CLOCK.pos());
reset_signal_is(RESET,true);
};
};
void sum3::compute() {
unsigned tmp;
// reset section
while (TRUE) { // main loop
tmp = A.read();
wait(); // end first I/O cycle
tmp += A.read();
wait(); // end second I/O cycle
tmp += A.read();
D.write(tmp);
wait(); // end third I/O cycle
}
}
On the slide an example is shown where three values are read in
sequentially and summed. The resulting sum is put on the output.
The example is modeled with a clocked thread. It could also be implemented
with a thread process.
Page 101
1
0
1
101
Gradual Communication
refinement (1/2)
Process1 Process2
queue
Process1 Process2C1 C2
r a
Behavioral_process1 Behavioral_process2
clock
Converters
Q1 Q2
To replace the queues it is advocated to follow a gradual approach. First,
converters (between sc_fifo and protocol) are introduced between the
processes.
Page 102
1
0
2
102
Gradual Communication
refinement (2/2)
Process1 Behavioral
Process2
C1
r a
Behavioral_process1
clock
Q1
Behavioral
Process2r a
clock
Behavioral
Process1
Next the protocol can be integrated in each process separately.
At each moment the correct operation of the system can be validated
through simulations.
Page 103
1
0
3
103
Converter SystemC code
template <class T> SC_MODULE(FF2P) {
sc_fifo_in<T> input;
sc_out<T> output;
sc_in<bool> ask;
sc_out<bool> ready;
sc_in_clk clk;
SC_CTOR(FF2P) {
SC_THREAD(process);
sensitive << clk.pos();
}
void process() {
T value;
enum ctrl_state {READINPUT, WRITEOUTPUT};
ctrl_state state;
// reset cycle
ready.write(false); state = READINPUT; wait();
while(true) {
if (state == READINPUT) {
ready.write(false); value = input.read();
state = WRITEOUTPUT;
} else {
if (ask.read() == true) {
output.write(value); ready.write(true);
state = READINPUT;
} else {
ready.write(false); state = WRITEOUTPUT;
};
};
wait();
}
return;
}
};
template <class T> SC_MODULE(P2FF) {
sc_fifo_out<T> output;
sc_in<T> input;
sc_in<bool> ready;
sc_out<bool> ask;
sc_in_clk clk;
SC_CTOR(P2FF) {
SC_THREAD(process)
sensitive << clk.pos();
}
void process() {
T value;
enum ctrl_state {READINPUT, WRITEOUTPUT};
ctrl_state state;
// reset cycle
ask.write(true); state = READINPUT; wait();
while(true) {
if (state == READINPUT) {
if (ready.read() == true) {
value = input.read(); ask.write(false);
output.write(value); state = WRITEOUTPUT;
} else {
ask.write(true); state = READINPUT;
};
} else {
ask.write(true); state = READINPUT;
};
wait();
}
return;
}
};
On the slide we show an example for the converters that translate between a
sc_fifo and the simple synchronization protocol and vice versa.
Page 104
1
0
4
104
Communication refinement
➢ Communication refinement
➢ Communication refinement in SystemC
➢ Exercise 3: communication refinement for
the JPEG decoder
The exercise is intended to get you familiar with communication refinement.
We turn again to the simplified JPEG decoder.
Page 105
1
0
5
105
Exercise 3: communication
refinement for the JPEG encoder
➢ Goal: Replace the FIFO between the run-length encoder and decoder by
a handshake protocol
➢ You will find:
 In /exercises/exercise3/ : solution of exercise2
 In/exercises/modules/: JPEG-encoder modules
{r2b,dct,quantize,zz_enc,rl_enc}.{h,cpp}, JPEG decoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules
{src,snk,test}.{h,cpp}
 In /exercises/images/: test images
 In /exercises/add2systemc: FIFO to protocol conversion functions in
add2systemc: {FF2P, P2FF}.h
➢ Things to be done:
 Introduce a handshake protocol between rl_enc and rl_dec.
 introduce refined versions of rl_dec in jpeg_dec.h and main.cpp.
 simulate and verify correct operation.
The goal of this exercise is to replace the FIFO channel between the run-
length encoder and decoder by a handshake protocol. To this end we will
add converters between the two blocks to obtain a behavioral model. Next
integrate the protocol functionality in the run-length decoder process,
integrate the resulting behavioral model in the application, simulate the
system, and verify correct operation.
Page 106
In this 4th class we focus on the refinement of the computations, resulting in
RTL description of the circuit. This model should be synthesizable with an
RTL synthesis tool.
Model Based System
Design
Class 4: computation
refinement
Marc Engels
e-mail:
marc.engels@flandersmake.be
Page 107
The class consists of three parts:
First, we describe the conceptual steps to transform from a behavioral
into an RTL description of the circuit.
Next we introduce the constructs that are available in SystemC to
support this RTL modeling.
Finally we exercise the new knowledge on the JPEG decoder.
107
Computation refinement in
SystemC
➢ Computation refinement
➢ Computation refinement in SystemC
➢ Exercise 4: computation refinement of a JPEG decoder
Page 108
Next to fixed point and communication refinement, computation refinement is
an important step in architectural design (from functional model towards RTL
model). Remark that the order in which these three steps are performed is
not defined. Refinements along these three axes can even be intermixed.
There also exist interdependences between these operations. For instance if
two operations share a common operator they will use the same word size.
108
RTL refinement is the 3rd step in
architectural design
Computations
operations
Data
variables, arrays
floating point
memories
fixed point
operators
Communication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
System
Architecture
Page 109
109
beh4RTL4beh2RTL2
beh3RTL3func1
For synthesis all blocks needs
to be transformed to RTL
➢ Transformation is a gradual refinement process
 switch a behavioral block with a RTL block
 verify by system simulation
SYSTEM
S1
S2
S3
TESTBENCH
At the start of the computation refinement the embedded system is modeled
with behavioral blocks, where both the data types and communications are
refined. The test bench is not evolved and is still the original functional
model.
The RTL modeling can be introduced gradually by replacing individual
behavioral blocks with RTL descriptions. The correctness of the system can
be verified during this process by simulating the combination of functional,
behavioral, and RTL models.
Page 110
Behavioral models are represented as threads which wait on clock edges to
synchronize their inputs and outputs (IO).
As a consequence, they can be represented by a clocked finite state
machine (FSM). In the slide a Moore-type state machine, whose outputs are
only determined by the state, is used.
110
Behavioral model can be
represented by an FSM
Process_behavioral{// SC_CTHREAD
ask.write(TRUE);
while (ready.read() == FALSE) {wait();}
wait();
while(TRUE) {
ask.write(FALSE);
x = input.read();
wait();
d = x * b1;
y = d * b2;
output.write(y);
ask.write(TRUE);
while (ready.read() == FALSE) {wait();}
wait();
}
}
=
!ready
ready !ready
ready
ask=1
ask=0
x=input
ask=1
d = x * b1
y = d * b2
output = y
Page 111
111
Behavioral to RTL: scheduling of
operations in FSM
!ready
ready !ready
ready
ready
!ready
ready
!ready
ask=1
ask=0
x=input
ask=1
d = x * b1
y = d * b2
output = y
!ready!ready
ask=1
ask=0
x=input
d=x*b1
ask=1
y = d * b2
output = y
The transformation from behavioral to RTL can conceptually be represented
by the scheduling of operations on this FSM. In this scheduling activity
additional states can be introduced.
Remark also that the scheduling of the operations can have major impact on
the inter-process communication:
• Additional states can introduce errors in synchronized communication.
• Protocol based communication is more robust but the settings of the
protocol signals might have to be adapted
Separation of operator scheduling and communication refinement is a desire
in many design flows but is rarely achieved completely.
Page 112
112
Rescheduled FSM is
represented in RTL code
=
ready
!ready
ready
!ready!ready
ask=1
ask=0
x=input
d=x*b1
ask=1
y = d * b2
output = y
Process_RTL{// SC_CTHREAD
ask.write(TRUE);
while (ready.read() == FALSE) {wait();}
wait();
while(TRUE) {
ask.write(FALSE);
x = input.read();
d = x * b1;
wait();
ask.write(TRUE);
y = d * b2;
output.write(y);
while (ready.read() == FALSE) {wait();}
wait();
}
}
The resulting FSM can be transformed back in code. The resulting RTL
model can be represented either with a SC_METHOD or a SC_CTHREAD.
Both can be synthesized into gate level circuits. For simplicity, we will use
SC_CTHREADS.
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)
Digital Design With Systemc (with notes)

Weitere ähnliche Inhalte

Was ist angesagt?

Intellectual property in vlsi
Intellectual property in vlsiIntellectual property in vlsi
Intellectual property in vlsiSaransh Choudhary
 
Vlsi physical design-notes
Vlsi physical design-notesVlsi physical design-notes
Vlsi physical design-notesDr.YNM
 
"Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr...
"Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr..."Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr...
"Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr...Edge AI and Vision Alliance
 
Basic synthesis flow and commands in digital VLSI
Basic synthesis flow and commands in digital VLSIBasic synthesis flow and commands in digital VLSI
Basic synthesis flow and commands in digital VLSISurya Raj
 
System verilog assertions (sva) ( pdf drive )
System verilog assertions (sva) ( pdf drive )System verilog assertions (sva) ( pdf drive )
System verilog assertions (sva) ( pdf drive )sivasubramanian manickam
 
I2C Subsystem In Linux-2.6.24
I2C Subsystem In Linux-2.6.24I2C Subsystem In Linux-2.6.24
I2C Subsystem In Linux-2.6.24Varun Mahajan
 
Secret of Intel Management Engine by Igor Skochinsky
Secret of Intel Management Engine  by Igor SkochinskySecret of Intel Management Engine  by Igor Skochinsky
Secret of Intel Management Engine by Igor SkochinskyCODE BLUE
 
Logic synthesis,flootplan&placement
Logic synthesis,flootplan&placementLogic synthesis,flootplan&placement
Logic synthesis,flootplan&placementshaik sharief
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Jian-Hong Pan
 
Physical design
Physical design Physical design
Physical design Mantra VLSI
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-ExpressDVClub
 
Complete ASIC design flow - VLSI UNIVERSE
Complete ASIC design flow - VLSI UNIVERSEComplete ASIC design flow - VLSI UNIVERSE
Complete ASIC design flow - VLSI UNIVERSEVLSIUNIVERSE
 

Was ist angesagt? (20)

ASIC DESIGN FLOW
ASIC DESIGN FLOWASIC DESIGN FLOW
ASIC DESIGN FLOW
 
Intellectual property in vlsi
Intellectual property in vlsiIntellectual property in vlsi
Intellectual property in vlsi
 
Actel fpga
Actel fpgaActel fpga
Actel fpga
 
sc vector
sc vectorsc vector
sc vector
 
Vlsi physical design-notes
Vlsi physical design-notesVlsi physical design-notes
Vlsi physical design-notes
 
ASIC Design.pdf
ASIC Design.pdfASIC Design.pdf
ASIC Design.pdf
 
"Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr...
"Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr..."Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr...
"Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr...
 
Basic synthesis flow and commands in digital VLSI
Basic synthesis flow and commands in digital VLSIBasic synthesis flow and commands in digital VLSI
Basic synthesis flow and commands in digital VLSI
 
SOC design
SOC design SOC design
SOC design
 
System verilog assertions (sva) ( pdf drive )
System verilog assertions (sva) ( pdf drive )System verilog assertions (sva) ( pdf drive )
System verilog assertions (sva) ( pdf drive )
 
Inputs of physical design
Inputs of physical designInputs of physical design
Inputs of physical design
 
I2C Subsystem In Linux-2.6.24
I2C Subsystem In Linux-2.6.24I2C Subsystem In Linux-2.6.24
I2C Subsystem In Linux-2.6.24
 
Secret of Intel Management Engine by Igor Skochinsky
Secret of Intel Management Engine  by Igor SkochinskySecret of Intel Management Engine  by Igor Skochinsky
Secret of Intel Management Engine by Igor Skochinsky
 
Logic synthesis,flootplan&placement
Logic synthesis,flootplan&placementLogic synthesis,flootplan&placement
Logic synthesis,flootplan&placement
 
Verification Challenges and Methodologies
Verification Challenges and MethodologiesVerification Challenges and Methodologies
Verification Challenges and Methodologies
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
Physical design
Physical design Physical design
Physical design
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
 
Powerplanning
PowerplanningPowerplanning
Powerplanning
 
Complete ASIC design flow - VLSI UNIVERSE
Complete ASIC design flow - VLSI UNIVERSEComplete ASIC design flow - VLSI UNIVERSE
Complete ASIC design flow - VLSI UNIVERSE
 

Ähnlich wie Digital Design With Systemc (with notes)

Embedded system
Embedded systemEmbedded system
Embedded system12lakshmi
 
btech embedded systems ppt ES UNIT-1.pptx
btech embedded systems ppt ES UNIT-1.pptxbtech embedded systems ppt ES UNIT-1.pptx
btech embedded systems ppt ES UNIT-1.pptxSattiBabu16
 
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdfenriquealbabaena6868
 
It 443 lecture 1
It 443 lecture 1It 443 lecture 1
It 443 lecture 1elisha25
 
Embedded systems- nanocdac
Embedded systems- nanocdacEmbedded systems- nanocdac
Embedded systems- nanocdacnanocdac
 
EMBEDDED SYSTEMS INTRODUCTION.pptx
EMBEDDED SYSTEMS INTRODUCTION.pptxEMBEDDED SYSTEMS INTRODUCTION.pptx
EMBEDDED SYSTEMS INTRODUCTION.pptxMohammedtajuddinTaju
 
Embedded system by abhishek mahajan
Embedded system by abhishek mahajanEmbedded system by abhishek mahajan
Embedded system by abhishek mahajanabhimaha09
 
Embedded system by abhishek mahajan
Embedded system by abhishek mahajanEmbedded system by abhishek mahajan
Embedded system by abhishek mahajanAbhishek Mahajan
 
Embedded system by abhishek mahajan
Embedded system by abhishek mahajanEmbedded system by abhishek mahajan
Embedded system by abhishek mahajanAbhishek Mahajan
 
Microprocessors and microcontrollers
Microprocessors and microcontrollersMicroprocessors and microcontrollers
Microprocessors and microcontrollersAditya Porwal
 
project report 8051,eembedded system,pcb designing,electronic voting machine
project report 8051,eembedded system,pcb designing,electronic voting machineproject report 8051,eembedded system,pcb designing,electronic voting machine
project report 8051,eembedded system,pcb designing,electronic voting machineAyush Khurana
 
electronic voting machine document
electronic voting machine documentelectronic voting machine document
electronic voting machine documentmani akuthota
 
List and describe various features of electronic systems.List and .pdf
List and describe various features of electronic systems.List and .pdfList and describe various features of electronic systems.List and .pdf
List and describe various features of electronic systems.List and .pdfinfo824691
 
embeddedsystemspresentation-140524063909-phpapp01.pdf
embeddedsystemspresentation-140524063909-phpapp01.pdfembeddedsystemspresentation-140524063909-phpapp01.pdf
embeddedsystemspresentation-140524063909-phpapp01.pdfAshwin180668
 

Ähnlich wie Digital Design With Systemc (with notes) (20)

Embedded system
Embedded systemEmbedded system
Embedded system
 
Embeddedsystems 091130091010-phpapp02
Embeddedsystems 091130091010-phpapp02Embeddedsystems 091130091010-phpapp02
Embeddedsystems 091130091010-phpapp02
 
btech embedded systems ppt ES UNIT-1.pptx
btech embedded systems ppt ES UNIT-1.pptxbtech embedded systems ppt ES UNIT-1.pptx
btech embedded systems ppt ES UNIT-1.pptx
 
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
 
It 443 lecture 1
It 443 lecture 1It 443 lecture 1
It 443 lecture 1
 
Embedded systems
Embedded systemsEmbedded systems
Embedded systems
 
Embedded systems- nanocdac
Embedded systems- nanocdacEmbedded systems- nanocdac
Embedded systems- nanocdac
 
EMBEDDED SYSTEMS INTRODUCTION.pptx
EMBEDDED SYSTEMS INTRODUCTION.pptxEMBEDDED SYSTEMS INTRODUCTION.pptx
EMBEDDED SYSTEMS INTRODUCTION.pptx
 
Embedded system by abhishek mahajan
Embedded system by abhishek mahajanEmbedded system by abhishek mahajan
Embedded system by abhishek mahajan
 
Embedded system by abhishek mahajan
Embedded system by abhishek mahajanEmbedded system by abhishek mahajan
Embedded system by abhishek mahajan
 
Embedded system by abhishek mahajan
Embedded system by abhishek mahajanEmbedded system by abhishek mahajan
Embedded system by abhishek mahajan
 
Microprocessors and microcontrollers
Microprocessors and microcontrollersMicroprocessors and microcontrollers
Microprocessors and microcontrollers
 
Embedded systems
Embedded systemsEmbedded systems
Embedded systems
 
project report 8051,eembedded system,pcb designing,electronic voting machine
project report 8051,eembedded system,pcb designing,electronic voting machineproject report 8051,eembedded system,pcb designing,electronic voting machine
project report 8051,eembedded system,pcb designing,electronic voting machine
 
embedded systems
embedded systemsembedded systems
embedded systems
 
electronic voting machine document
electronic voting machine documentelectronic voting machine document
electronic voting machine document
 
ERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdfERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdf
 
List and describe various features of electronic systems.List and .pdf
List and describe various features of electronic systems.List and .pdfList and describe various features of electronic systems.List and .pdf
List and describe various features of electronic systems.List and .pdf
 
Embedded System Presentation
Embedded System PresentationEmbedded System Presentation
Embedded System Presentation
 
embeddedsystemspresentation-140524063909-phpapp01.pdf
embeddedsystemspresentation-140524063909-phpapp01.pdfembeddedsystemspresentation-140524063909-phpapp01.pdf
embeddedsystemspresentation-140524063909-phpapp01.pdf
 

Kürzlich hochgeladen

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Kürzlich hochgeladen (20)

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

Digital Design With Systemc (with notes)

  • 1. Page 1 Specification Languages: Part 2 Marc Engels e-mail: marc.engels@flandersmake.be Welcome to the second part of the course on specification languages.
  • 2. Page 2 2 Specification Languages ➢ Part 1: Specification Models ➢ Part 2: Model based system design  Show how the models of part 1 can be used for architectural design Provide hands-on experience with SystemC v2.3.2 (released in October 2017). Introduce OO techniques for design of hardware systems ➢ Part 3: Project The course on specification languages consists of 3 parts:  First, an extensive overview was given of various specification models, ranging from dataflow to finite state machines.  In this second part, I will focus on the use of a subset of these models for the architectural design of digital embedded systems. The main goal of this part of the course is to learn how the specification models of part 1 can be used for the architectural design of embedded systems. For this purpose, we will rely on SystemC version 2.3.2, which was standardized by the IEEE in January 2012 (IEEE 1666-2011 language reference manual) and for which the simulation library was released in April 2014. SystemC is a class library on top of C++. As such, all object oriented (OO) constructs of C++ can be used in the design of an architecture. These OO techniques can bring the same benefits with respect to re-use to architectural design as that they have brought to software design.  Finally, you will apply the acquired skills in a small, but realistic, project.
  • 3. Page 3 3 Course Material for part 2 ➢ Prerequisite:  part 1 of specification languages  C++ (good tutorial at www.cplusplus.com)  Coding and debugging programs  RTL description of synchronous digital circuits ➢ Material for part 2:  Slides with notes.  IEEE Standard SystemC Language Reference Manual, IEEE Std 1666-2011. As prerequisites for this course, I expect the following:  Quite obvious you should have a good understanding of the first part of this course, and particularly the presented models.  Next, as SystemC is based on C++, also a decent knowledge of this programming language is required. Basic OO concepts like classes, inheritance and templates should be familiar to you. If not, review the C++ tutorial at www.cplusplus.com.  In general, a structured methodology for developing and debugging programs is essential for executing the exercises and the project. Familiarity with Integrated Design Environments (IDE) like Eclipse is a benefit.  When writing SystemC code, you should be able to describe the hardware that will be generated from this code. Therefore a basic knowledge of register transfer level (RTL) description of synchronous digital circuits is necessary. An RTL description of a circuit consists of registers (e.g. D flip-flops) and combinatorial logic. The registers synchronize the operation of the circuit to the clock signal while the combinatorial logic describes the calculations performed by the circuit. RTL descriptions are used in hardware description languages like Verilog or VHDL. For part 2, the following material is available: (1) The slides with notes can be found on the icorsi (icorsi.ch). (2) The SystemC language reference manual, which can be downloaded from the IEEE standards website (http://standards.ieee.org/getieee/1666/download/1666-2011.pdf)
  • 4. Page 4 Model Based System Design Class 1: constructing a functional model Marc Engels e-mail: marc.engels@flandersmake.be In this first class we will focus on the functional modeling of a digital embedded system. A functional model will describe the functionality of the embedded system, independent of the platform or architecture on which this functionality is executed. Therefore it is sometimes called a platform independent model (PIM). In this class we will focus on the data flow modeling paradigm for describing the functional model. At the end of the class, you will be able to program a functional model of a digital embedded system in SystemC.
  • 5. Page 5 5 Functional modeling in SystemC ➢ Introduction to design of digital embedded systems ➢ SystemC introduction ➢ SystemC functional model syntax ➢ Exercise 1: building a functional model in SystemC This class covers 4 topics: (1) A general introduction to the design of digital embedded systems (2) The role of SystemC in the design of digital embedded systems (3) The syntax of the SystemC language for functional modeling (with the dataflow paradigm) (4) And finally an exercise to build a functional model in SystemC Lets start with the general introduction.
  • 6. Page 6 6 Consumer devices become increasingly more intelligent Consumer as well as professional equipment is becoming increasingly smarter. A few examples:  Your car is being converted into a multimedia theater. The value of the electronics in a car has increased consistently, resulting in almost 100 electronic units in a luxury model. Recently a lot of new safety functions (ABS, ESP, parking sensors, anti-collision systems, etc.) have been introduced.  It is hard to find a mobile phone with which you can only make a call. Taking pictures, playing music, surfing the web, reading e-mail, etc. are also features of a state-of-the-art mobile phone. Most phones even have GPS functionality and run office software.  Gaming becomes more interactive (e.g. Nintendo Wii, Microsoft Kinect) and mobile.  Photography has dramatically changed over the last decade: it has become fully digital. Digital cameras are currently extended with features like wireless connections, automatic picture enhancements (e.g. red eye correction), etc.  The era of service robots is coming. Robots to vacuum clean the house, mown the lawn in the garden, etc. are already on the market.
  • 7. Page 7 7 … as well as professional equipment The evolution towards smart products is not limited to consumer devices. We observe, for instance, the same trends in production machines.  Harvesters have a growing number of functions for quality control, obstacle detection and precision farming. To realize these smart functions, the electronic control units become increasingly more complex. Especially the software content is growing very fast (20% average growth per year). The long term vision for combine harvesters is to evolve towards full autonomous machines, that can work without any operator on board and just receive a command of the job to be done. Many more smart functionalities will be needed to reach this goal.  In compressors functions are introduced to optimize the energy consumption based on the instantaneous demand of air.  Weaving looms can adapt their speed to the quality (strength) of the textile fibers.  Professional washing machines automatically detect the load, hardness of the water, etc. and adapt their washing program.
  • 8. Page 8 8 Characteristics of embedded systems ➢ Optimize for power, cost, and size ➢ Robust design ➢ Provide the ability for evolution and mass customization ➢ Minimize time to market ➢ Some functionality might be safety-critical ➢ Interfacing with the real world, leading to real time constraints To realize this smart functionalities, electronic systems and software have to be embedded in consumer and professional devices. Such embedded systems are minimizing power, cost and size, and hence work on a minimal platform. For instance, 8-bit and 16-bit processors are still extensively used in embedded devices. They must be robust. For instance, a mobile phone must survive rude treatment. A car has an operation life of 7000 hours and some machines are expected to work up to 100000 hours. Over their lifetime, products are increasingly expected to evolve. Also more variants are designed from the same platform. A typical example is the customization of the mobile phones. And the product needs to be on the market before the Christmas shopping. In many cases the system has even safety-critical functionality, think about automatic braking system (ABS) or emergency buttons, which require a guarantee on the reliability of the system. For the development of such safety-critical functions, specific standards have to be followed. The main distinctive characteristic of an embedded system, however is that it has to interact with the real world, necessitating real-time behavior.
  • 9. Page 9 9 Sensors Actuators Real world process Processing Embedded systems combine various types of real-time behavior ADC DAC event signal signal action user Signal conditioning Actuator Powering A system is said to be real-time if the correctness of an operation depends not only upon its logical correctness, but also upon the time in which it is performed. In a hard real-time system, the completion of an operation after its deadline is considered useless - ultimately, this may lead to a critical failure of the complete system. A soft real-time system on the other hand will tolerate such lateness, and may respond with decreased service quality (e.g. bank terminal). Depending on the inputs, two types of hard real-time constraints are distinguished in embedded systems:  Signal processing systems process inputs that arrive at regular intervals and the system must be ready after a fixed time to process the next input. Signal processing systems typically interact with their environment through sensors (observe the environment) and actuators (control/influence the environment). Sensors are components that translate non-electrical quantities (e.g. temperature, pressure, ...) into electrical quantities (voltage, current). Since most observable quantities are analog signals, sensors usually produce analog electrical signals. In most cases signal conditioning is required to compensate the non-idealities in the sensors and to prepare the sensor signals for the actual signal processing. Because the signal processing is done digitally, an Analog to Digital Converter (ADC) puts the sensor signal in the right format. Actuators perform the reverse operation of sensors: they translate electrical quantities into non-electrical quantities. Also actuators need analog signals and therefore a Digital to Analog Converter (DAC) is needed. Because actuators need to influence the physical environment they often require high power, hence power electronics circuits are introduced to condition the control signal.  When the input is an event and the system has to react within a certain time, this is called a reactive system. Examples of reactive parts of an embedded system are the interaction with the user or responses to external alarms. As shown on the picture, embedded systems often combine various types of real-time behavior.
  • 10. Page 10 10 Digital embedded systems combine hard- and software User interface NVM ROM mPorDSPcore RAM Conf. Logic Memories Peripheral Mo- dem buffers Video/ Graphics processor Protocol Speech Processing Analysis of channel + analog, sensors and actuators An embedded system can be separated into a digital part and an analog part. The analog part contains for instance signal conditioning, ADCs and DACs. In high-frequency applications, like radios or radars, it will be a large part of the embedded system. Also sensors and actuators are part of the embedded system. Traditionally these were discrete external components, but recently they are increasingly integrated, when power permits, in a package and even on chips. The digital part is where the actual “intelligence” is. A growing part of the functionality of embedded systems is implemented in software called “embedded software”. This offers the advantage of increased flexibility (functionality can be changed after production). As a consequence, the digital part of an embedded system consists of 3 components: Programmable processor cores. They can be general-purpose micro-processors or more specialized digital signal processors (DSPs). Volatile and non-volatile memories. Configurable (though parameters) dedicated logic. The digital part can be implemented as a PCB with discrete components, a multi-chip package, an FPGA or a fully integrated chip. In the latter case this is often referred to as a System-on-Chip (or SOC). In these classes we will mainly focus on the design of the configurable logic (on FPGA or chip), although SystemC is also extensively used for the modeling of SOCs.
  • 11. Page 11 11 Design flow for digital embedded systems System Functionality Functional Requirements Performance Requirements Architecture Template Architectural Requirements Mapping Dedicated Architecture C-code Non-functional Requirements For the design of a digital embedded system, we use a design flow that consist of the following elements: •During the functional design of the system, the designer determines what the system has to do, based on the performance requirements (e.g. bit error rates in communication systems) and functional requirements (e.g. specified protocols). He also determines all algorithms. The system functionality is expressed in a platform independent way. •A reusable architecture template, or platform, consisting of processors, memories, and dedicated logic, is defined or selected. The architecture template should guarantee architectural requirements (e.g. interface formats) and non-functional requirements (e.g. power or cost). •Each function in the functionality is mapped on an element in the architecture template. •For the dedicated logic a circuit corresponding to the required functionality is created, resulting in a dedicated architecture. Finally, by means of RTL- synthesis the designer generates a gate level netlist. By the place and route step this netlist is next transformed into a physical layout for this dedicated architecture, which can be manufactured by a foundry. Alternatively, the design is mapped to a configuration file for a programmable platform (e.g. field programmable gate array or FPGA). •For the functions mapped on processors, C-code is generated and
  • 12. Page ‹#› compiled. The Y-model is represented as a top-down approach, but in a realistic design flow, multiple iterations are performed before reaching the final embedded system.
  • 13. Page 12 12 Function to architecture conversion follows three axes Computations operations Data variables, arrays floating point memories fixed point operators Communication point-to-point queues busses detailed protocol resource allocation scheduling memory allocation address generation word sizing bus allocation introduce arbiters include protocols System Functionality Dedicated Architecture In this course we concentrate on the architectural design of dedicated logic, where the algorithms are mapped into an optimal architecture. The algorithm will typically be specified into a functional model, e.g. data flow and asynchronous state machines. The architecture needs a timed model, e.g. register transfer level (RTL). To obtain the RTL description, a refinement needs to be done for the computations, communications, and data. The order of these refinements is not fixed. However, it is good practice to take the most important design decisions first. Remark that for parts of the system that are implemented on software, the complete refinement does not need to be performed. However, a processor and a memory structure has to be selected. For this purpose, certain refinement, like fixed point, can be useful.
  • 14. Page 13 13 Functional modeling in SystemC ➢ Introduction to design of digital embedded systems ➢ SystemC introduction ➢ SystemC functional model syntax ➢ Exercise 1: building a functional model in SystemC We now take a closer look at the role of SystemC in the design of digital embedded systems.
  • 15. Page 14 14 SystemC bridges gap between function and architecture MATLAB C/C++ VHDL Verilog SystemC System Functionality Dedicated Architecture Traditionally, a system functionality is expressed in MATLAB (SIMULINK/STATEFLOW) or a standard computer language (C/C++). To express the RTL description of the system, VHDL or Verilog is used. As a consequence the transformation from functionality into architecture does not only involve a change in semantics but also in syntax. Moreover, because of the different languages, this transformation cannot be done incrementally. SystemC resolves this issue, by offering a language that can express both functionality and architecture.
  • 16. Page 15 15 What is SystemC? ➢ A modeling framework in C++ for the refinement of system from a functional description into an architecture ➢ Contributions:  hardware modeling with C++: OCAPI (IMEC) and SCENIC (Synopsys/UC Irvine)  fixed-point data types: Frontier Design  hardware-software co-design: CoWare (IMEC/CoWare) ➢ Language first standardized in December 2005 as IEEE 1666, revised in 2011 as IEEE 1666-2011 ➢ Extensions of SystemC:  Verification library.  Transaction level modeling library ( integrated in IEEE 1666-2011).  Analog and mixed-signal modeling. ➢ More info: www.accellera.org SystemC is a C++ library that allows to refine a system from a functional description into an architecture. Three contributions were essential into the creation of SystemC:  The modeling of RTL hardware with C++ was demonstrated in the OCAPI framework of IMEC, as well as the SCENIC project of UC Irvine in cooperation with Synopsys.  Frontier Design (an IMEC spin-off) contributed to the fixed-point data types.  CoWare (another IMEC spin-off) introduced concepts of hardware-software co-design. The SystemC language was first standardized in December 2005 by the IEEE. A revision (IEEE 1666-2011) was made in 2011. More recently a number of extensions of the SystemC language were proposed:  Verification library adds random generator and transaction recording.  Transaction level modeling, a high-level approach to modeling digital systems where details of communication among modules are separated from the details of the implementation of functional units or of the communication architecture. This extension is included in the revised IEEE standard.  Analog and mixed-signal library extends SystemC with the following modeling paradigms: timed data flow, linear signal flow modeling, and electrical linear network modeling.
  • 17. Page ‹#› All information about SystemC can be downloaded from the www.accellera.org website.
  • 18. Page 16 16 Which tools are available for SystemC? ➢ Open source simulation library available ➢ Open source translators from Verilog or VHDL to SystemC ➢ Commercial synthesis tools:  Cadence (Stratus HLS).  Mentor(Catapult C).  NEC(CyberWorkBench).  SystemCrafter (SC).  Xilinx (Vivado Design Suite). With respect to tool support, the Accellera System Initiative (www.accellera.org) makes an open-source simulation library available. Various academic institutes also offer translators from Verilog or VHDL to SystemC. For synthesis however, we have to rely on commercial tools.
  • 19. Page 17 17 SystemC language architecture C++ language Core Language Modules Ports Exports Processes Interfaces Channels Events Event-driven simulation kernel Data-types 4-valued logic type 4-valued logic vectors Bit-vectors Finite-Precision integers Limited-Precision integers Fixed-Point types Pre-defined Channels Signal, Clock, fifo, Mutex, Semaphore. Libraries for Specific Models of Computation and/or methodologies, e.g. TLM interfaces, bus models, SystemC verification library Utilities Report Handling, Tracing User Application The classes of the SystemC library fall into four categories: the core language, the SystemC data types, the predefined channels, and the utilities. The core language and the data types may be used independently of one another. At the core of SystemC is a simulation engine containing a process scheduler. Processes are executed in response to the notification of events. Events are notified at specific points in simulated time. In the case of time- ordered events, the scheduler is deterministic. In the case of events occurring at the same point in simulation time, the scheduler is non- deterministic. The scheduler is non-preemptive, which means that once an execution of a process is started, it cannot be halted but executes till the end of the process.
  • 20. Page 18 18 SystemC core language sc_module sc_port sc_prim_channel sc_process sc_interface sc_event sc_export The SystemC core language contains a number of primitives to define parallelism. A system is split in a number of modules (sc_module). A module communicates with the external world through ports (sc_port). Two ports are connected through a channel. SystemC predefines some primitive channels (sc_prim_channel), but more complex channels can be user defined. A channels connects to a port via an export (sc_export). A hierarchical module consists of a structure of other modules. A non- hierarchical module contains one or more processes (sc_process). A process is executed in case that an events (sc_event) happens. A process interacts with a channel through an interface (sc_interface), which is a collection of functions that are supported by sc_port.
  • 21. Page 19 19 Functional modeling in SystemC ➢ Introduction to design of digital embedded systems ➢ SystemC introduction ➢ SystemC functional model syntax ➢ Exercise 1: building a functional model in SystemC SystemC contains all necessary constructs to model the functionality of a system. We will focus on activity-oriented models, although SystemC can also express other modeling paradigms. Let’s review these constructs.
  • 22. Page 20 20 process process FIFO Kahn Process Networks in SystemC ➢ (Modules to structure design) ➢ Functional processes ➢ First-In-First-Out queues ➢ Simulation engine SystemC has support to model Kahn process networks, with the limitation of bounded queues. A Kahn process network is a directed network of processes that are interconnected by first-in-first-out (FIFO) queues of infinite size. Each time that a process is executed, tokens are consumed from the input queues and new ones are produced in the output queues. If a token is not present on an input queue, the consumption of the token will block. Kahn process networks exhibits deterministic behavior that does not depend on computation or communication delays. In SystemC the constructs are available to define the processes and the queues. These constructs interact with a simulation engine, which schedules the execution of the processes. The simulation engine stops when there is no longer activity in the network.
  • 23. Page 21 21 Modules are used for structural partitioning the functionality ➢ Each module has its own class, derived from the sc_module class. ➢ Every constructor of a module class shall have exactly one parameter of class sc_module_name.  It is good practice to make this name for an instance of the module the same as the C++ variable name through which the module is referenced. ➢ A module can be hierarchical or contains processes. In the latter case, the SC_HAS_PROCESS(“class name”) macro is used to indicate that the module contains processes. Modules are used to partition the functionality in the design. However, you should not use too many modules, as this complicates the design, but also not too few. In general, functionality that is implemented in a different architectural style (e.g. software or dedicated hardware) or on a different location should be in different modules. Every module is derived from the base class sc_module and should have a name, which is used for debugging purposes. The macro SC_HAS_PROCESS(“class name”) indicates that the module in not hierarchical and contains processes.
  • 24. Page 22 22 Example of a functional model of an adder SC_MODULE(adder) { //define ports //define processes, internal data, etc. SC_CTOR(adder) { // body of constructor; // process declaration, sensitivities, etc. }; }; Class adder : public sc_module { public: // define ports //define processes, , internal data, etc. SC_HAS_PROCESS(adder); adder(sc_module_name name): sc_module(name) { // body of constructor; // process declaration, sensitivities, etc. }; }; Explicit: With MACROs: The slide shows an explicit definition of a modules, consisting of the class definition, the SC_HAS_PROCESS macro and the constructor. To compact the definition, two more macros are provides:  SC_MODULE(“class name”) is equivalent to the first two lines of the explicit definition  SC_CTOR(“class name”) equals the SC_HAS_PROCESS macro and the first lines of the constructor. It can be used when if only a name is passed to the constructor. If you also want to pass parameters, an explicit declaration is needed.
  • 25. Page 23 23 Ports are used to communicate with a FIFO channel ➢ General port definition: sc_port<interface> ➢ Predefined ports are: sc_fifo_in<T> and sc_fifo_out<T>.  sc_fifo_in<T> is derived from sc_port<sc_fifo_in_if<T>,0> with interface functions read(), nb_read(), and num_available().  sc_fifo_out<T> is derived from sc_port<sc_fifo_out_if<T>,0> with interface functions write(), nb_write(), and num_free(). ➢ blocking read and write interface functions (automatic synchronization with implicit wait() operations) int a = f1.read(); // read a token f1.write(a); // write a token ➢ Inspecting queues int a = f1.num_available(); // number of tokens in a queue int a = f1.num_free(); // number of free places in a queue In SystemC the sc_port object is used to communicate with a channel. Ports provide the means by which a module can be coded such that it is independent of the context in which it is instantiated. A port forwards interface method calls to the channel to which the port is bound. For functional modeling, processes communicate through fifo ports. Two port types for sc_fifo<T> channel, where T is the basic type of the elements in the fifo channel, are supported:  Input: sc_fifo_in<T> which is basically equivalent to sc_port<sc_fifo_in_if<T>,0>, where the first parameter is the input interface of a FIFO and the second parameter specifies that multiple channels can be connected to a FIFO. However the practical use of these multiple bindings is not clear. Therefore it could be useful to define its own fifo port with a restriction of a single binding.  Output: sc_fifo_out<T> which is equivalent to sc_port<sc_fifo_out_if<T>,0>. Also here, the use of multiple bindings is not recommended. Several functions are associated to the sc_fifo class:  read() gets a token from the queue. It blocks when no tokens are available.  write() puts a token on a queue. It blocks when there are no free spaces in the queue There are also inspecting functions available to look at the number of tokens or free spaces.
  • 26. Page 24 24 Example of a functional model of an adder (continued) SC_MODULE(adder) { sc_fifo_in<int> a,b; sc_fifo_out<int> c; //define processes, internal data, etc. SC_CTOR(adder) { // body of constructor; // process declaration, sensitivities, etc. }; }; When we add the definition of the ports to the constructor of the adder we obtain the code on the slide.
  • 27. Page 25 25 SC_THREAD processes are used to model functional processes ➢ SC_THREAD processes run forever once started. ➢ SC_THREAD processes can be suspended by means of the wait(event) function. In functional modeling the wait statements are hidden in the read() and write() functions to the queues. ➢ Multiple processes per module are possible ➢ Processes can also be dynamically created. The actual computation in the application is performed in the processes. As a consequence, they also define the parallelism in the application. SystemC supports three types of processes. For functional modeling we use the SC_THREAD process. An SC_THREAD process runs forever when started. It can be suspended by a wait(event) function. Often the wait(event) function is implicitly present in the communication functions. Processes are executed on events. These events can be statically or dynamically defined. Static sensitivity is set by means of the variable sensitive of sc_module. Dynamic sensitivity to a certain event is set by wait (event) for an SC_THREAD process. A module can have multiple processes. Processes might be dynamically created during simulation. However, no synthesis support exists for dynamic processes. Therefore, we do not use them in this course.
  • 28. Page 26 26 Example of a functional model of an adder (continued) SC_MODULE(adder) { sc_fifo_in<int> a,b; sc_fifo_out<int> c; void compute() { while(true) { int valuea = a.read(); int valueb = b.read(); c.write(valuea+valueb); } } SC_CTOR(adder) { SC_THREAD(compute); } }; Adding the definition of an SC_THREAD process to the adder results in the code on the slide. This adder waits for data on both its input queues sequentially and next produces a token on its output queue.
  • 29. Page 27 27 Define the main program ➢ The systemc library must be included in the main program:  #include <systemc.h> ➢ In sc_main() the following actions are taken:  Instantiate channels with: • sc_fifo<T> (”name”, length); // default length 16 • e.g. sc_fifo<int> f1(”f1”,2);  Instantiate the modules.  Bind ports of modules to channels: • Positional • named.  Call sc_start() to start simulation and run until end of any activity. The global structure of the system is defined in the main function. Because main() is already used by the SystemC library, the main function for the user application is sc_main(). In sc_main(), the following actions are taken: 1. Instantiation of the channels. The basic channels that we use in functional modeling is sc_fifo. A FIFO queue is defined by means of the template class sc_fifo<T>. T can take on any basic data type, e.g. int, float, etc. The sc_fifo class declares a finite length buffer of tokens. The default length is 16 elements. The queue also has a name for debugging and statistics retrieval purposes. The constructor for the queue is sc_fifo<T> f1 (“name f1”, length); A sc_fifo can only be written from one process. 2. Instantiation of the modules. A module can be instantiated multiple times. 3. Binding the ports of the modules to the channels. This can be done in two ways: positional or named. Named binding is preferred because it is less prone to errors than positional port binding. 4. Start the simulation.
  • 30. Page 28 28 Example of a functional model of an adder (continued) int sc_main(int argc , char *argv[]) { sc_fifo<int> fifo_a, fifo_b, fifo_c; //channel instantiation … // instantiate signal generation and evaluation module adder my_adder(“my_adder”); // module instantiation my_adder.a(fifo_a); // binding of port to channel my_adder.b(fifo_b); my_adder.c(fifo_c); … // other modules and test bench, which drive fifo_a and fifo_b. sc_start(); // start simulation }; Elaborationphase The sc_main() function for the adder is shown on the slide. Remark that the arguments of sc_main() are identical to these of main(). To connect the ports to the channels, named bindings are used.
  • 31. Page 29 29 SC_MODULE(superfunc) { // IO ports sc_fifo_in<float> in; sc_fifo_out<float> out; //internal queues sc_fifo<float> d; // internal modules function func1; function *func2; // Module constructor SC_CTOR (superfunc): func1(“func1”) { func1.in(in); func1.out(d); func2 = new function (“func2”); func2->in(d); func2->out(out); } } Modules can also be used to create hierarchy func1 superfunc d func2 sc_module(function) In a functional model hierarchy will be used to make the design more readable. The hierarchy is fully transparent: it basically acts as a container for the basic modules, but does not add any functionality or synchronization. The definition of a hierarchical module consists of the definition of the ports and internal queues. Next the internal modules are defined. Care must be taken that the module objects will still exist after execution of the constructor. Two alternatives exist to guarantee this: either construct them when calling the constructor, or create them with a new function. The constructor creates the two modules and binds the ports to the channels.
  • 32. Page 30 30 Simulation engine ➢ In an un-timed model, the simulator only advances in delta- cycles:  If it is started to run for a finite amount of time, it will never stop.  We therefore run it until no events are present: sc_start(); ➢ Ways of stopping the simulator:  Terminate a process (return from SC_THREAD): the simulator will stop due to the lack of events.  Call sc_stop() when a termination condition is fulfilled. In a functional model no notion of time is present. Every action processes infinitely fast. As a consequence, the simulation kernel only advances in delta cycles of infinite small time units. If we would start the simulation kernel with a finite amount of time to run, it would never reach that time and hence run forever. Therefore we run the simulation kernel until no events are present any more. This is achieved with the sc_start() command. With this approach, there are two ways of stopping the simulation: 1. We can exit a SC_THREAD. By doing so, no events will be produced anymore and the simulation will finally stop because of the lack of events. 2. We can check for a termination condition and explicitly call sc_stop(). This approach was used in the exercise of class 1. When the whole image is processed and written to file, the simulation is explicitly stopped. In general this is also the safest and most elegant way of controlling the simulation.
  • 33. Page 31 31 Functional modeling in SystemC ➢ Introduction to design of digital embedded systems ➢ SystemC introduction ➢ SystemC functional model syntax ➢ Exercise 1: building a functional model in SystemC Finally, let’s exercise what we have learned so far.
  • 34. Page 32 32 Goal of this exercise ➢ use a simplifiedJPEG block diagram to practice functional modeling ➢ develop a functional process that fits into a system ➢ simulate a functional model ➢ observe the overall behavior of a system The goal of this exercise is to practice functional modeling. We will use a simplified JPEG block diagram for this purpose. A process will be defined and integrated in a JPEG functional model. Next the functional model will be simulated and the overall behavior of the system will be observed.
  • 35. Page 33 33 What is JPEG? ➢ “JPEG” stands for “Joint Photographic Experts Group” ➢ “JPEG” is a standard for color image compression ➢ “JPEG” is widely used (e.g. on the WWW) ➢ More information?  http://www.jpeg.org/ JPEG stands for “Joint Photographic Experts Group” and is a compression standard for color images. It is widely used. More information can be found on www.jpeg.org
  • 36. Page 34 34 (Partial) JPEG: a simple block diagram DCT Quantize (+table) ZIGZAG SCAN RUN-LENGTH ENCODER IDCT Normalize (+table) ZIGZAG SCAN RUN-LENGTH DECODER Original Image Reconstructed Image JPEG-ENCODER JPEG-DECODER R2B B2R Parameters: width, height, #bits Parameters: width, height, #bits A simplified block diagram of a JPEG encoder and decoder is shown on the slide. First and original image is inputted and split in 8x8 blocks (R2B). Together with the pixel data, also width, height and number of bits per pixel are extracted from the image. Next, on each 8x8 block, a discrete cosine transform (DCT) is performed, resulting in 8x8 DCT coefficients. These DCT coefficients are quantized and reorganized in the zigzag scan module. The resulting coefficient stream is run-length encoded. This last block is different from the JPEG standard where an Huffman encoder is used. In the decoder the reverse operations are performed in the reverse order.
  • 37. Page 35 35 2D Discrete Cosine Transform ➢ Non-optimized equation ➢ DCT can be separated in consecutive 1-D operations ➢ There are many optimized DCT-algorithms available                 7 0 7 0 16 12 cos. 16 12 cos, 4 1 , i j vjui jifvCuCvuF                  7 0 7 0 16 12 cos. 16 12 cos, 4 1 , u v vjui vuFvCuCjif  01 0 2 1 )(        l l lCwhere The discrete cosine transform (DCT) is performed on a 8x8 pixel block and returns an 8x8 block of DCT coefficients. Each DCT coefficient indicates the amplitude of a horizontal and vertical frequency component. The inverse discrete cosine transform (IDCT) returns pixel values from DCT coefficients. The formal definition of the DCT and IDCT are shown on the slide. In stead of this straight forward 2D operation the calculation can be split in consecutive 1D operations, which is more efficient. There is also a large set of optimized DCT-algorithms that exploit the regular structure of the cosine values.
  • 38. Page 36 36 Quantization ➢ Each DCT coefficient is divided by the coefficient amplitude that is just detectable by the human eye (table) ➢ The result is rounded to an integer ➢ This reduces the number of bits needed to represent the DCT coefficient ➢ The quantization is the place where information of the image might be lost, resulting in lossy compression. Next the DCT coefficients are quantized. To this end each DCT coefficient is divided by the corresponding value in the quantization table. The result is rounded to the nearest integer, reducing the number of bits needed to represent the DCT coefficient. In the quantization step image information might be lost, resulting in lossy compression.
  • 39. Page 37 37 Quantization Table 9910310011298959272 10112012110387786449 921131048164553524 771031096856372218 6280875129221714 5669574024161314 5560582619141212 6151402416101116                           N An example of a typical quantization table is shown on the slide. It can be remarked that the quantization values grow for higher horizontal or vertical frequencies. JPEG contains a number of predefined quantization tables. If a custom quantization table is used, it must be sent to the decoder.
  • 40. Page 38 38 The coefficients are zigzag scanned 0 1 5 6 14 15 27 28 2 4 7 13 16 26 29 42 3 8 12 17 25 30 41 43 9 11 18 24 31 40 44 53 10 19 23 32 39 45 52 54 20 22 33 38 46 51 55 60 21 34 37 47 50 56 59 61 35 36 48 49 57 58 62 63 The resulting quantized DCT coefficients are next zigzag scanned. This is done in such an order that statistically long sequences of zero coefficients can be expected.
  • 41. Page 39 39 (Simplified) Run-length coding ➢ Send the DC value “as is” ➢ Represent the high frequency data with (zero run-length, amplitude) combinations. ➢ End the stream with EOB (= 63). ➢ Example:  in: 79, 0, -2, -1, 3, -1, 0, 0, -1, 0, 0, 0, …  out: 79, 1,-2, 0,-1, 0, 3, 0,-1,2,-1, 63 Next we use a non-JPEG run-length coder for our exercise. This coding works as follows:  The DC value is sent “as is”  The high frequency data is split in sections consisting of a number of zero’s followed by a non-zero coefficient. Each segment is represented by a couple consisting of the number of subsequent zero’s and the value of the non-zero coefficient.  When all remaining coefficients for a block are zero, an end of block (EOB=63) value is sent.
  • 42. Page 40 40 How to start? ➢ Download exercise files form http://www.icorsi.ch/ ➢ Follow installation instructions of exercises. ➢ you will find:  In /exercises/exercise1/: main.cpp to start from  In/exercises/modules/: library with JPEG encoder modules {r2b,dct,quantize,zz_enc,rl_enc}.{h,cpp}, JPEG decoder modules {b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules {src,snk,test}.{h,cpp}  In /exercises/images/: test images  In /exercises/add2systemc additional functions (df_fork, fifo_stat) ➢ Things to be done:  make rl_dec.h and rl_dec.cpp  complete the main.cpp with the modules.  Compile and execute the application.  Inspect the number of reads and writes in the fifos  Visualize resulting image  Test if you can launch the application in the debugger.  Optional: make a hierarchy for the encoder and decoder. You will find all files for starting in the exercise1 directory. Perform the actions as indicated on the slide. To obtain information about the number of writes and reads in the fifo’s, use the type fifo_stat<T> i.s.o. sc_fifo<T>. To prevent multiple bindings of a fifo_port, the classes my_fifo_in<T> and my_fifo_out<T> are used in the exercises.
  • 43. Page 41 41 Using SystemC on Linux/Cygwin ➢ Use g++ (I used version 4.5.3). ➢ Make a workspace in Eclipse:  Add your source files to the project.  Add libmodules.a  Add libadd2systemc.a (for next exercises).  Add libsystemc.a  Put the right include paths and linker paths ➢ Build your application from within Eclipse. ➢ Execute your application from within Eclipse.  Exercise1.exe –i ../images/mountain.pgm –o result.pgm We will make the exercises in a Linux environment, using g++ and Eclipse. Eclipse is an integrated development and debugging environment. In the exercise directory there is a step-by-step guide of how to get started with the exercises in Eclipse. The recent sources of the exercises and libraries can be found at http://www.icorsi.ch/ Libraries have to be compiled before starting the exercise session.
  • 44. Page 42 Model Based System Design Class 2: Fixed-point refinement Marc Engels e-mail: marc.engels@flandersmake.be In this second class we will focus on the refinement of the data types of the functional model. More in particular we will explain the definition of fixed- point word lengths for the variables in the functional model. This action is relevant both for mapping on embedded processors with limited data sizes, e.g. 16-bit processors, or for mapping on a dedicated architecture. A the end of the class, you will be able to perform fixed point refinement on a functional model of an embedded system in SystemC.
  • 45. Page 43 43 Fixed point refinement ➢ Fixed word length optimization  Overflow and quantization  MSB determination  LSB determination ➢ Fixed word length support in SystemC ➢ Exercise 2: fixed point refinement of IDCT This lecture on fixed point refinement consists of three parts: • In the first part we introduce the quantization and overflow effects of fixed point representations. We also present some methods to determine the most and least significant bits (MSB and LSB). • Next, we introduce the fixed point support in SystemC. This consists of an extensive set of fixed point types. In addition, SystemC also supports 4-valued logic to define bus structures. • Finally, we introduce the exercise on fixed point refinement.
  • 46. Page 44 44 Fixed point refinement is one of the steps in architectural design Computations operations Data variables, arrays floating point memories fixed point operators Communication point-to-point queues busses detailed protocol resource allocation scheduling memory allocation address generation word sizing bus allocation introduce arbiters include protocols System Functionality Dedicated Architecture Let’s concentrate on the architectural design step that translates an algorithm into an optimal architecture. The algorithm will typically be specified into a functional model, like data flow. The architecture needs a timed model, e.g. register transfer level (RTL). Initially the algorithm will be modeled in floating point. Cost-effective implementation requires, however, a refinement into fixed point types.
  • 47. Page 45 45 * 3 bytes (mantissa) + 1 byte (exponent) Fixed-point •minimum area •low power •high speed 8 *6 14 Finite word lengths are a must for DSP applications Floating-point •powerful •expensive (storage & ops) Most signal processing algorithms are specified in floating point precision. This is a very powerful signal representation with high accuracy, but is also expensive in storage and operation cost. For instance, a typical representation of a floating point number is a mantissa of 24 bits and an exponent of 8 bits. As a consequence, a floating point multiplication is equivalent to a 24-bit multiplication and a 8-bit addition. However, many applications, like cable modems and wireless communication devices, require low cost and low power for a high processing speed. As a consequence, the DSP algorithms will be performed in fixed-point arithmetic. With an 8-bit fixed point notation, for instance, the cost will drop dramatically as the hardware cost for a multiplication is a quadratic function of its input width. This requires the designer to translate floating point types into fixed point types, using a refinement strategy.
  • 48. Page 46 46 2 3 2 2 2 2 2i.2 2 1 0 -1 -2 -3 WL IWL MSB LSB How to model a fixed-point signal? ➢Total number of bits WL ➢Integer bits IWL ➢Value representation •2’s complement (i=-1) •unsigned (i=1) WL-IWL A fixed point type can be defined by three parameters: • The total number of bits WL. • The position of the decimal point, indicated by the number of integer bits IWL. • The way in which the value is represented. In the case of a signed number, 2’s complement notation is the most common because it allows easy arithmetic. However, alternatives like sign-magnitude and 1’s complement are also feasible.
  • 49. Page 47 47 How do we quantize? truncate (floor) fxp flp round fxp flp magnitude truncate fxp flp ceil fxp flp If the result of a calculation has more precision than available in the fixed point format, the value has to be quantized. Several ways of quantization exist: • Truncate or floor is the cheapest approach because it is standard available in hardware. However, it generally gives the worst performance of the quantization techniques. • Magnitude truncate realizes a floor function for positive values and a ceil function for negative values. The technique is natural for sign magnitude representations. The advantage is a symmetrical behavior around the zero value. • Applying the ceil function to the complete range is an alternative which is seldom used. • Rounding is the technique with the best performance for most cases. However, it also is the most expensive one. In hardware this requires the addition of 0,5 the least significant bit followed by a truncation operation.
  • 50. Page 48 48 What happens on an overflow? wrap-around saturation flp flp fxp fxp max. value When the result of an operation is larger than the maximum value that can be represented by the fixed point format (overflow), we have two possibilities: • Wrap-around: the overflow bits are neglected. For unsigned values, this is equivalent to a modulo operation (see figure on slide). For 2’s complement numbers, a one bit overflow results in the maximum negative number. This is the standard behavior in a hardware implementation. • Saturation: when an overflow occurs, the signal is set to the maximum value that can be represented. Additional hardware is necessary to realize this behavior. Remark that a similar situation can occur for the minimum value of a signal. For instance, if the subtraction of two unsigned signals results in a negative value and must be represented in an unsigned format. For such underflow, similar remedies are possible.
  • 51. Page 49 49 Saturation Hardware MAX_VAL MIN_VAL comp comp mux mux VALUE RESULT When we opt for a saturation strategy, the following hardware is needed. The result of the operation must be compared to the maximum positive and negative numbers. This can be done with an explicit comparator or with the overflow flags from the adders. If overflow or underflow is reached, the result of the operation is replaced by the maximum or minimum value respectively. Remark that the hardware complexity of a comparator or multiplexer is comparable to a adder. As a consequence, saturation hardware can require a significant amount of area.
  • 52. Page 50 50 Floating-point algorithmADC 8 7 * * + ? ? ? ? ?? During design we must specify fixed-point formats for signals z-1 DAC Going back to the need for fixed point representations, the designer is faced with the following problem. He obtains a floating point algorithm and needs to translate the floating point types into fixed point types, using a refinement strategy. For each floating point number, a fixed point characteristic (including total and integer word lengths, overflow and rounding behavior) must be chosen. In most situations the input and output formats are defined by the system context (e.g. analog-to-digital converter). Remark that determining these ADC and DAC precisions is an important task in the overall system design.
  • 53. Page 51 51 Fixed-point refinement is a complex optimization problem ➢Minimize overall cost: minimal word lengths truncate and wrap-around ➢MSB determination: goal: avoid unwanted overflows method: find min, max signal values result: MSB position, value representation, overflow ➢LSB determination: goal: keep required precision method: evaluate difference between flp a fxp behavior result: LSB position, quantization safe range quantization This fixed-point refinement is a complex optimization problem where the search space grows exponentially with the number of signals. The goal of the optimization is to minimize the overall implementation cost and power consumption. At the same time the performance degradation (e.g. implementation loss for telecom systems) must be small. Remark that it is essential to define a performance degradation bound (e.g. implementation loss for communication systems, visual performance measure for multimedia systems) before starting the fixed point refinement. The optimization problem can be separated in two parts: 1. Determination of the most significant bit (MSB). First, the minimum and maximum signal value must be determined. From this the MSB position, value representation and overflow behavior is selected such that overflows are avoided as much as possible. 2. Determination of the least significant bit (LSB). By evaluating the difference in performance between the fixed and floating point behavior of the algorithm, the LSB position and quantization method are determined for each signal. The goal is to stay within the performance degradation bound. In the next slides we will take a closer look at methods for MSB and LSB determination.
  • 54. Page 52 52 MSB determination can be based on range calculations * + d m x y ➢Put range (min, max) on inputs ➢Propagate range over the operators ➢This gives a save (pessimistic) estimate range info [0,255] 12 range calc.[0,255] [0,3060] [0,3315] z-1 MSB determination can be done by means of range propagation. This analytical method works as follows: 1. On each input signal, the range, i.e. the minimum and maximum values that occur in a signal, are specified. 2. Next, the signal flow graph of the algorithm is traversed and for each operator, the range of its output is calculated based on its input ranges. Because the method exactly calculates the exact minimum and maximum signal values, it results in a safe, but sometimes pessimistic, estimation of MSB position.
  • 55. Page 53 53 Range propagation is a simple calculation Operator minc maxc c=a+b mina+minb maxa+maxb c=a-b mina-maxb maxa-minb c=a*b MIN(mina*minb, mina*maxb, maxa*minb, maxa*maxb) MAX(mina*minb, mina*maxb, maxa*minb, maxa*maxb) Range propagation on the operators is a simple operation. The table on the slides shows the rules for add, subtract and multiply operations.
  • 56. Page 54 54 Range calculations can get unstable with feedback * + a X(n) Y(n) z-1 F(n) sample n maxF minF value When applied to feedback signals, range propagation can become unstable and cause continuous growth of the minimum and maximum values. An example of such a situation is shown on the slide. In such a situation, a statistical inspection of the real signals will be needed to determine a realistic MSB position. Remark that the propagation mechanism also causes that all signals within this feedback loop or depending on the output of the feedback loop will struggle from this range explosion. Once saturation logic is introduced at one place in the loop this problem will be solved.
  • 57. Page 55 55 * + d m x 12 y stimuli ?min, max q1 Collecting signal statistics from simulations is an alternative ➢Perform simulation with realistic stimuli. ➢Collect minimum and maximum value on each signal during the simulation ➢This gives an optimistic, stimuli dependent estimate z-1 As an alternative to the analytical range propagation method, we can collect the signal statistics during simulations. Because the obtained range information will be stimuli-dependent, this will give an optimistic estimation of the minimum and maximum values. As a consequence, to maximize the confidence in the obtained results, the stimuli set should be large and provide a complete coverage of the algorithm code.
  • 58. Page 56 56 signal statistic range propagation name min max MSB1 min max MSB2 signal1 -1.5 1.6 2 -1.9 1.9 2 signal2 -1.3 1.4 2 -2.1 2.1 3 signal3 -1.2 1.2 2 -22.0 22.0 6 signal4 -1.2 1.2 2 -∞ ∞ ∞ Combine both methods for accurate MSB determination ➢If MSB1 == MSB2: wrap-around(MSB1) ➢If MSB1 < MSB2: wrap-around(MSB2) ➢If MSB1 << MSB2: saturation (MSB1) ➢MSB2 is ∞ saturation (MSB1) As can be expected, combining both methods gives the best results. Each signal in the system will then be in one of the following situations: • Both methods result in the same MSB position. Quite logically, the signal can safely be specified with the resulting MSB position and wrap- around overflow behavior. • When the analytical MSB position is larger than the statistical MSB position, we can make a trade-off between the analytical MSB with wrap-around and the statistical method with saturation. In most case the wrap-around functionality will be the most economical. Only when the statistical MSB position is much smaller, saturation logic can be beneficial. • In the case of a range growth because of feedback, the analytical MSB position cannot be calculates (going to infinity). In this case, the statistical MSB position is chosen together with a saturation behavior. After introducing the saturation on one signal in the feedback loop, we need to re-simulate to get useful results for the rest of the algorithm. An example of each of these situations is shown on the slide.
  • 59. Page 57 57 Q + B bits input output outputinput noise Quantization effects can be modeled as additive noise ➢Noise is approximated by a statistical model with the following assumptions: the noise is uncorrelated to the input. the noise is white. the probability distribution is uniform. When we look at the LSB side, the question arises what the effect is of quantization. Many authors approximate the quantization effect as an additional noise source. They assume that: • The noise sequence is a sample of a stationary random process (i.e. whose statistical parameters do not change over time). • The noise sequence is uncorrelated with the input sequence. • The random variables of the noise process are uncorrelated, i.e. the error is a white-noise process. • The probability distribution of the error process is uniform over the range of the quantization error.
  • 60. Page 58 58 Each quantization effect has mean and variance ➢ Rounding with step D: ➢ Truncation with step D: ➢ Magnitude truncation with step D: 12 and0 2 2 D  nnm  12 and 2 2 2 D  D  nnm  3 and0 2 2 D  nnm  The noise process can then be modeled by means of its mean and variance. The expressions for mean and variance for the three most popular quantization methods are shown on the slide. D is the quantization step. Rounding and magnitude truncation result in a 0 mean, but rounding has the lowest variance. Truncation and rounding have the same variance, but rounding has the lowest mean. As can be expected, rounding introduces the least quantization noise.
  • 61. Page 59 59 This results in an equivalent linear network Q1 + * + d m x 12 y z-1 Q2 * + d m x 12 y z-1 e1(t) + e2(t) ))1()()(12())1()(12()( 121  tetetetxtxty Replacing the quantization by an additional noise source results in a linear model of the quantized algorithm. This can then be analytically analyzed by means of well-developed linear signal processing theory. For many quantization effects, this linear model is a good approximation. It has, for instance been used to determine the effects of quantizing the signals in FIR filters. As an exercise, calculate the resulting signal to noise ratio in the case that: • x(t) ranges between 0 and 255 with a uniform distribution. • both quatization steps are rounding the values to the nearest integer.
  • 62. Page 60 60 … but quantization is a non- linear operation * + -0.96 X(n) Y(n) z-1 Q X(0) = 14, x(n) = 0 for n > 0 round to nearest integer B bits ... ... with rounding: without rounding: However, not all applications are linear. Quantization in non-linear systems can lead to non-intuitive behavior. In infinite impulse response (IIR) filters, for instance, quantization can generate limit cycles. For a stable floating-point IIR filter implementation, the output will decay asymptotically to zero when the input becomes zero. For the same system, implemented with finite precision, the output may continue to oscillate indefinitely with a periodic pattern while the input remains equal to zero. This effect is often referred to as zero-input limit cycle behavior. An example of such behavior is shown on the slide.
  • 63. Page 61 61 LSB determination is based on simulations All fixed-point simulate output ok yes no * + stimuli 12 x ym Q * + 12 x ym com pare Q z-1 z-1 Non-linear quantization effects are difficult to analyze analytically. Therefore, mostly simulation based methods are used. To this end the output of a reference simulation is compared to a simulation with the quantized signals. Again sufficient large stimuli sets, which have a complete code coverage, must be used.
  • 64. Page 62 62 Signal to quantization noise ratio (SQNR)            22 22 10log10 ee ss x m m SQNR   Q - e me,e ms,s xQ To get a better insight in the optimization trade-off, the difference between the floating-point and fixed-point values (e) and the resulting signal to quantization noise (SQNR) is a useful guidance. The SQNR for all signals is calculated as follows: • During signal assignments the statistics (mean, standard deviation) for the error signal as well as for the output signal are collected. • At the end of the calculate the signal to quantization noise ratio SQNR is calculated for each signal.
  • 65. Page 63 63 LSB selection optimizes cost and performance quantization set SQNR pi SQNR accu SQNR pix SQNR coeffs SQNR block SQNR temp block SQNR blocki cost SNR PSNR 0 208 253 Inf 184 Inf 225 Inf 787968 27,64 31,49 1 45,5 59,76 Inf 174 Inf Inf Inf 759296 27,48 31,33 2 45,5 59,76 25,15 174 Inf Inf Inf 759296 22,66 26,51 3 45,5 59,76 38,77 174 Inf Inf Inf 759296 26,91 30,75 4 45,5 59,76 47,3 30,88 Inf Inf Inf 230912 27,35 31,19 5 45,5 59,8 47,3 30,88 29,38 Inf Inf 230912 27,34 31,19 6 45,5 61,4 47,3 30,88 29,38 -1,93 Inf 41472 20,47 24,32 7 45,5 59,8 47,3 30,88 29,38 Inf Inf 72192 27,34 31,19 8 45,5 60,23 47,3 30,88 29,38 16,73 Inf 56832 26,96 30,8 9 45,5 59,88 47,3 30,88 29,38 31,86 Inf 67072 27,31 31,16 The optimal LSB is determined by running the simulation multiple times with various quantization sets. For each quantization set, the SQNR per signal, the overall SNR and PSNR, and the cost is calculated. The goal is to find the cheapest solution that realizes the specified performance. This procedure can be automated by means of an optimization routine. When changing the quantization for one signal at the time, the statistics give an impression of the sensitivity of the cost and the performance to the quantization of a signal. As a rule of thumb, the SQNR of a signal should be higher than the overall SNR. Remark that the SQNR and SNR statistics are dependent on the input. As a consequence, the optimization should be performed on a representative set of inputs.
  • 66. Page 64 64 Fixed point refinement ➢ Fixed word length optimization  Overflow and quantization  MSB determination  LSB determination ➢ Fixed word length support in SystemC ➢ Exercise 2: fixed point refinement of IDCT In the next part we discuss the fixed point support in SystemC
  • 67. Page 65 65 SystemC introduces a number of specific data types Type Description sc_logic 4 value {0,1,X,Z} single bit sc_int 1 to 64 bit signed integer sc_uint 1 to 64 bit unsigned integer sc_bigint Arbitrary size signed integer sc_biguint Arbitrary size unsigned integer sc_bv Arbitrary sized 2 value vector sc_lv Arbitrary sized 4 value vector sc_fixed Signed fixed point sc_ufixed Unsigned fixed point sc_fix Untemplated signed fixed point sc_ufix Untemplated unsigned fixed point SystemC introduces a number of specific data types, which correspond to data types that are frequently used in Hardware Description Languages (HDLs). These types include sc_logic to make 4 valued representation that can be high (1), low (0), undefined (X) or in a high-impedance (Z) state. Integers can be of arbitrary length with sc_int, sc_uint, sc_bigint and sc_biguint. SystemC also supports logic vectors with 2 or 4 valued logic with sc_bv and sc_lv. sc_fixed and sc_ufixed define fixed point numbers where the characteristics of the number are defined by a template. sc_fix and sc_ufix use a run-time argument to define the fixed point characteristics. This is interesting to try out different quantization settings without recompilation. However, these types can not be used in synthesis, while the others can.
  • 68. Page 66 66 SystemC templated fixed-point types ➢ Two fixed point templates  sc_fixed <wl, iwl, q_mode, o_mode, n_bits> x; // signed  sc_ufixed <wl, iwl, q_mode, o_mode, n_bits> y; // unsigned ➢ Parameters:  wl = number of bits  Iwl = number of integer bits  q_mode = quantization method (SC_RND / SC_TRN / SC_TRN_ZERO / ...)  o_mode = overflow method (SC_SAT / SC_WRAP / … )  n_bits = number of saturated bits in case of wrapping (default 0) ➢ If quantization and overflow not specified the defaults (SC_TRN and SC_WRAP) are used Two data types provide full flexibility in representing fixed point numbers with static parameters: sc_fixed (signed, 2’s complement numbers) and sc_ufixed (unsigned numbers). The constructor of these fixed-point types carry the information of the word lengths and quantization and overflow behavior: • wl is the total number of bits • iwl represents the number of integer bits, i.e. left from the binary point. • q_mode specifies the quantization method to be rounding (SC_RND), flooring (SC_TRN), or magnitude truncate (SC_TRN_ZERO). In addition, some very particular, rarely used quantization modes are specified. • o_mode selects the overflow mode to be saturation (SC_SAT), saturation to zero (SC_SAT_ZERO), symmetrical saturation (SC_SAT_SYM), wrap-around (SC_WRAP), or sign-magnitude wrapping (SC_WRAP_SM). • n_bits specifies the number of saturated bits in case of wrapping. This allows to generate some special wrapping methods that keep the sign of the signal. Default nb is set to 0.
  • 69. Page 67 67 Fixed point lengths sc_fixed <5, 7> v; X X X 0 0 [ -64 , 60 ]X X sc_fixed <5, 3> v; X X X [ -4 , 3.75 ]X X sc_fixed <5, -2> v; X X X X X [ -0.125 , 0.109375 ]S S Two of the arguments specified to the fixed point data type were word length (wl) and integer word length (iwl). Word length must be greater than 0. Integer word length can be positive or negative, and larger than the word length. For instance if the word length is specified as 5 bits but the integer word length is 7 then two zeroes will be added to the end of the object. If the integer word length is a negative value then sign bits after the binary point will be extended. For instance if wl = 5 and iwl = -2 then two sign bits will be added to the object. The sign bits are simply the most significant bit of the 5 bit number. By extending the sign bits, the value of the number is maintained.
  • 70. Page 68 68 Quantization methods sc_ufixed <5, 3, SC_RND> v; v = 3.1875 0 1 1 0 1 3.1875 011.0011 3.25 0 1 1 0 0 3.0 sc_ufixed <5, 3, SC_TRN> v; v = 3.1875 [ 0 , 7.75 ] precision = 0.25 quantization error 0.0625 0.18753.1875 011.0011 This slide shows an example that illustrates the difference between rounding and flooring functionality. As can be seen, rounding always results in smaller quantization errors than flooring.
  • 71. Page 69 69 Overflow handling sc_fixed <5, 5, SC_RND,SC_SAT> v; v = 18 ; 0 1 1 1 118 15 1 0 0 1 018 -14 sc_fixed <5, 5, SC_RND,SC_WRAP> v; v = 18; [ -16 , 15 ] The slide shows an example with different overflow handling methods: saturation and wrap-around for a two’s complement number. As can be seen largely different outputs are generated for this different overflow methods.
  • 72. Page 70 70 Fixed-point simulation ➢operations in floating-point ➢quantization and overflow handling during assignment sc_fixed <4,3> a; sc_fixed <4,1> b; sc_fixed <4,2> c; a = 1.6; b = 0.9; c = a * b; 1.6 1.5 0.9 0.875 1.3125 1.25 Q Q Q* 0.5 0.125 0.25 lsb precision a b c When working with fixed-point arithmetic, it is vital to have an efficient representation of values and simulation of operations. For this purpose, all operations are performed with floating point arithmetic. Only on assignment, the quantization is performed. In case an intermediate result needs to be quantized, an explicit assignment operation has to be used. In the example above the multiplication a*b is a floating-point operation having as input two fixed point values. During the assignment to c the floating point result is automatically casted to the specified fixed point type of variable c.
  • 73. Page 71 71 SystemC fixed point types with non-static arguments ➢ Fixed point parameter values  sc_fxtype_params my_type(wl,iwl,q_mode,o_mode,n_bits);  x = my_type.wl();  my_type.iwl()=x-2; ➢ Two non-static fixed point types  sc_fix x(my_type); // signed  sc_ufix y(my_type); // unsigned ➢ For arrays, these types are used with a context  sc_fxtype_context my_context(sc_fxtype_params);  sc_fix z[64]; ➢ Remark: for fixed point simulations, include in every file  #define SC_INCLUDE_FX  #include <systemc.h> SystemC also allow to define fixed point types with non-static arguments: sc_fix (signed, 2’s complement numbers) and sc_ufix (unsigned numbers). Type sc_fxtype_params is used to configure the parameters of types sc_fix, and sc_ufix. To set the parameters for these types declare an object of type sc_fxtype_params, initialize the parameter values as desired, and pass the sc_fxtype_params object as an argument to the sc_fix or sc_ufix declarations. The sc_fxtype_params object has the same arguments passed to an object of type sc_fixed. These include: • wl - word length • iwl - integer word length • q_mode - quantization mode • o_mode - overflow mode • n_bits - saturated bits Any combination of arguments are allowed, but the order cannot be changed. A variable of type sc_fxtype_params can be initialized by another variable of type sc_fxtype_params. One variable of type sc_fxtype_params can also be assigned to another. Individual argument values can be read and written using methods with the same name as the arguments shown above.
  • 74. Page 72 72 Fixed point refinement ➢ Fixed word length optimization  Overflow and quantization  MSB determination  LSB determination ➢ Fixed word length support in SystemC ➢ Exercise 2: fixed point refinement of IDCT We now turn to the exercise, where we will perform fixed point refinement of the IDCT operator in the JPEG decoder.
  • 75. Page 73 73 Goal of this exercise ➢ Perform fixed point refinement for all the internal variables of the IDCT in the JPEG example ➢ determine the MSB to avoid internal overflows without overflow logic. ➢ determine the LSB to have no more that 0,5dB degradation on the PSNR of the resulting image The goal of this exercise is to get familiar with fixed point refinement, by practicing it on the IDCT block of the JPEG decoder. To this end, we will determine the LSB and MSB value for every variable in the IDCT function. By observing the overall behavior it will be possible to optimize the LSB and MSB values. The MSB should be determined in such a way that overflow is avoided without introduction of overflow logic. To determine the LSB the impact on the image quality (e.g. peak signal to noise ratio PSNR) should be kept below 0,5dB. The PSNR is defined as the ratio between the maximum power of a signal and the power of the corrupting noise. In our case the noise is the mean squared error (MSE) between the original and the decompressed image. The maximum power of the signal is MAX2, where MAX is the maximum grey value of a pixel.
  • 76. Page 74 74 How to start? ➢ You find: In .../exercises/exercise2/ : the functional model with a fixed point IDCT implementation; types-file datatypes_original.txt In/exercises/modules/: library of JPEG-encoder modules {r2b,dct,quantize,zz_enc,rl_enc}.{h,cpp}, JPEG decoder modules {b2r,idct,normalize,zz_dec}.{h,cpp} and testbench modules {src,snk,test}.{h,cpp} Special fixed point support functions of directory …/exercises/add2systemc/ are used In /exercises/images/: test images ➢ Things to do: inspect the code to understand the behavior Make the application change datatypes.txt file syntax: exercise2 -i <inputfile> -o <outputfile> -t <typefile>
  • 77. Page 75 7 5 Model Based System Design Class 3: Communication Refinement Marc Engels e-mail: marc.engels@flandersmake.be In this third class we will focus on the refinement of the communication between the modules of the functional model. More in particular we will explain how the FIFO communication channels can be replaced by protocols on simple wires.
  • 78. Page 76 7 6 76 Communication refinement ➢ Communication refinement ➢ Communication refinement in SystemC ➢ Exercise 3: communication refinement for the JPEG decoder This lecture on communication refinement consists of three parts: • In the first part we introduce the concept of refining the inter process FIFO communication into real protocols. • Next, we review the support in SystemC for communication refinement. • Finally we introduce the exercise to practice what we have learned.
  • 79. Page 77 7 7 77 Communication refinement is one of the steps in architectural design Computations operations Data variables, arrays floating point memories fixed point operators Communication point-to-point queues busses detailed protocol resource allocation scheduling memory allocation address generation word sizing bus allocation introduce arbiters include protocols System Functionality Dedicated Architecture In the architectural design process that translates an algorithm into an optimal architecture, communication refinement is an important step. The algorithm will typically be specified into a functional model, like data flow. In this data flow model, the communication between processes is performed via point-to-point queues. The architecture needs a model with explicit protocols. In addition, signals could be multiplexed on a bus to reduce the wiring overhead.
  • 80. Page 78 7 8 78 Functional models use FIFO communication ➢ Queues guarantee consistent data passing ➢ Implementation could become expensive for large sizes ➢ communication must be optimized Process1 Process2 (infinite) storage A FIFO is a very robust structure because it guarantees correct processing of the data independently from the processing times of the functions and communication times. However, queues require a large amount of storage and also some addressing hardware. A typical implementation, for instance, would be a memory array with modulo addressing and a read and write pointer. Because of this large implementation cost, the communication must be optimized.
  • 81. Page 79 7 9 79 wire Process1 Process2 Many communications can be reduced to a single register ➢ Output of functions is registered ➢ No extra implementation cost ➢ No storage for data ➢ Consistency of communication needs to be guaranteed Ideally, from an implementation point of view, a FIFO communication could be reduced to a simple wire when the output signal is registered. This requires no storage and no implementation cost for the addressing or protocol. However, consistency of the communication must be guaranteed: Process 2 should not use the data before it is generated and Process 1 should not produce new data before the previous has been read by Process 2.
  • 82. Page 80 8 0 80 w=4 Example of correct wired communication wire Process 1 Process 2 w=0 w<4 filt1 filt2 filt3 filt4 write() w++ read() op1 op2 op3 op4 To analyze the behavior of a wired connection, we represent the two processes with a Synchronous Finite State Machine (FSM). In such a Synchronous FSM the transitions take place on a clock edge. In our analysis we assume that both processes are running on the same clock. Process 1 will perform a filtering operation in 4 cycles and will also write the data in the register in the 4th cycle. Process 2 will initially wait for 4 cycles. Next cycle, it will read the data and perform a first operation, followed by three more cycles of operation. This sequence will be repeated continuously.
  • 83. Page 81 8 1 81 1 w=1 2 w=2 3 w=3 4 w=4 5 read() op1 6 op2 7 op3 8 op4 9 read() op1 10 op2 Communication is perfectly aligned 1 filt1 2 filt2 3 filt3 4 filt4 write() 5 filt1 6 filt2 7 filt3 8 filt4 write() 9 filt1 10 filt2 … … We have to guarantee the condition that every write() comes before a read() ClockCycle If we look at a timing diagram, we see that the timing is guaranteed. Every read() happens after a write() of the signal. Also no data is lost.
  • 84. Page 82 8 2 82 Small changes to design can result in errors ➢ Increase (decrease) the number of operations in process 1 (2): the same data will be consumed twice. ➢ Decrease (increase) the number of operations in process 1 (2): data will be lost ➢ If the number of initial wait operations in process 2 is too low, we will use undefined data ➢ If the number of initial wait operations in process 2 is too high, we will loose the first data elements) However, small changes to the finite state machines of one of the two processes can result in errors: • If we increase the number of operations in process 1, process 2 will consume too early and hence twice the same data is used. • If we decrease the number of operations in process 2, the same happens. • If we decrease the number of operations in process 1, process 2 will be relatively too slow and some data will be overwritten before it has been used. • Increasing the number of operations in process 2 will have the same effect. • Also remark that the number of initial wait operations in process 2 should not be too low or too high.
  • 85. Page 83 8 3 83 Example of wrong wired communication wirefilt1 filt2 filt3 filt4 write() Process 1 Process 2 read() op1 op2 In the slide an example is shown where process 2 has only two states. As a consequence it can be expected that the data produced by process 1 is used multiple times. Because no initial wait operations are present in process 2, we also expect that undefined data will be used.
  • 86. Page 84 8 4 84 1 read() op1 2 op2 3 read() op1 4 op2 5 read() op1 6 op2 7 read() op1 8 op2 9 read() op1 10 op2 The example results in undesired behavior 1 filt1 2 filt2 3 filt3 4 filt4 write() 5 filt1 6 filt2 7 filt3 8 filt4 write() 9 filt1 10 filt2 ClockCycles … … ? Adapt cycle budget or introduce handshake protocol The expected behavior is confirmed on the time diagram. As can been seen on the diagram, the first two data elements for process 2 will be undefined. Next, the read() operation of process 2 will use twice the same data produced from process 1. To guarantee correct behavior, two approaches exist: • Adapt the cycle budget of process 2, for instance by introducing two dummy cycles. However, this breaks the general approach of making modules independent from the environment in which they operate. • Introduce a handshake protocol that automatically synchronizes on the data transfers. This is the most robust and reliable approach. On the other hand, handshake protocols introduce some overhead and should be performed on larger units.
  • 87. Page 85 8 5 85 Simple handshake protocol is more robust ➢ The flag “a” (ask) indicates that the receiver is ready to read data in the next cycle. ➢ The flag “r” (ready) indicates that data has been written ➢ Save communication requires at least two cycles. Many different handshake protocols are feasible. Let’s illustrate the concept with a very simple one with two handshake lines. The handshake line “a” (ask) is generated by the receiver and indicates that the receiver is ready to read in the next cycle. The handshake line “r” (ready) is generated by the transmitter and indicates that he has written data in the cycle when the flag is raised. At least two cycles are needed for a reliable communication of a value. Remark that this protocol is only suited for synchronous designs where both processes are executed on the same clock.
  • 88. Page 86 8 6 86 !r r a Simple handshake protocol is more robust Process 2 filt1 r=0 filt2 filt3 if (a==1){ filt4 write() r=1} Process 1 !a a if (r==1) { read() op1 a=0} op2 a=1 r a=1 r=0 The finite state machines enhanced with the protocol operations (in red) is shown in this picture. When “a” is set, process 2 waits for the “r” flag to be raised. Next it reads the data, lowers “a”. performs its operations, and sets “a” again for a next sequence. Process 1 performs its operations and next waits for flag “a” before it writes its data and raised flag “r”. The basic assumption of this protocol is that when data is written it is read in the next cycle.
  • 89. Page 87 8 7 87 1 a=1 2 a=1 3 a=1 4 a=1 5 a=0 read() op1 6 a=1 op2 7 a=1 8 a=1 9 a=0 read() op1 10 a=1 op2 … and effectively synchronizes the communication 1 r=0 filt1 2 r=0 filt2 3 r=0 filt3 4 r=1 filt4 write() 5 r=0 filt1 6 r=0 filt2 7 r=0 filt3 8 r=1 filt4 write() 9 r=0 filt1 10 r=0 filt2 ClockCycles … … Looking at the time diagram shows that the operation of the two processes are automatically synchronized by this protocol.
  • 90. Page 88 8 8 88 r a … also when receiver is slower than transmitter Process 1 Process 2 filt1 r=0 If(a==1){ filt2 write() r=1} !a !r If (r==1){ read() op1 a=0 } op2 r op3 a=1 a=1 r=0 a When we add a state in process 2 and reduce the number of states in process 1 to two, we make the receiving process slower than the transmitting one.
  • 91. Page 89 8 9 89 1 a=1 2 a=1 3 a=0 read() op1 4 a=0 op2 5 a=1 op3 6 a=1 7 a=0 read() op1 8 a=0 op2 9 a=1 op3 10 a=1 … but introduces then one extra wait cycle at receiver 1 r=0 filt1 2 r=1 filt2 write() 3 r=0 filt1 4 r=0 5 r=0 6 r=1 filt2 write() 7 r=0 filt1 8 r=0 9 r=0 10 r=0 filt2 write() Cycles … … The extra wait cycle can be avoided by already putting a=1 during op2 Also now, the protocol synchronizes the two processes automatically. However, after “op3” in process2, an extra clock cycle is introduced automatically. This is caused by the fact that process 1 has to observe that “a” is raised before it can write the data and raise “r”. The extra cycle can be avoided by raising’ ”a” already during “op2”.
  • 92. Page 90 9 0 90 Most general protocol: 4-phase handshake protocol Ack Ack Ack Req Req Req Req Ack Req Ack Req Req Ack Execute Ack Data Ack Req=1 Get Data Req=0 Ack=0 Put Data Ack=1 Ack=0 The simple handshake protocol of previous slides is just one of the many possibilities. The most general protocol is the 4-phase handshake protocol that can synchronize two systems, independent of a clock signal. The 4 phase handshake protocol consists of 4 phases: 1. Initially, both request (Req) and acknowledgement (Ack) signals are low. 2. Next, the Req signal is raised and the operation is executed. 3. After the execution of the operation, the Ack signal is raised. Here starts the third phase. 4. When the Ack signal is detected, the Req signal is turned off. This phase continues until the low Req signal is detected and the Ack signal is turned off. The picture on the slide shows the asynchronous FSM for the four-phase handshake protocol. In an asynchronous FSM the transitions are not clocked and happen as soon as the guard statement is valid.
  • 93. Page 91 9 1 91 Multiple variations on these handshake protocols exist ➢ In stead of signal levels, the protocol can be based on signal transitions. ➢ The protocol can be simplified if both systems run on the same clock. ➢ Protocols can be simplified if one knows that the receiver or the transmitter is fastest. ➢ Synchronization can be performed on the basis of a block:  Set-up communication for first element of a block  Next, communicate every cycle ➢ Some protocols are based on typical FIFO signals: full and empty. Besides the 4-phase handshake protocol, many other protocols exist. For example a protocol can be constructed that is based on signal transitions rather than signal levels. Handshake protocol can also be simplified when both systems run on the same clock or for the cases that the receiver or transmitter is known to be the fastest. Also, the efficiency of the communication can be improved by block based handshake protocols. In such a protocol, the communication is set-up for the first element of the block. Next, a data element is communicated every cycle. There also exists a set of protocols based on typical FIFO signals.
  • 94. Page 92 9 2 92 In some cases buffered communication is required process2process1 Q1 Queue size can be determined by monitoring the maximum number of elements in a queue during simulation. 1 write(Q1) 1 2 write(Q1) 2 3 write(Q2) 3 4 4 read(Q2) 5 5 read(Q1) 6 6 read(Q1) Q2 The replacement of the FIFO by protocols is only possible if no intermediate storage is needed. This is not always the case. For example, the system on the slides needs at least a storage for two data elements on queue 1. In most cases, the number of required data storages can be derived from the maximum number of elements in a queue during functional simulations. Also remark that changing the order in which data is produced in process 1 or consumed in process 2 will change the storage requirements. Another option is to integrate the required storage in one of the two processes and match the production and consumption sequences.
  • 95. Page 93 9 3 93 r a Queues must be introduced explicitly in hardware FIFO process size N fsm Wired handshake protocol Process1 Process2 r a If intermediate storage is needed, a FIFO must be explicitly introduced in hardware. A FIFO will be a module with storage, a finite state machines and communication protocols for the producing and consuming processes. The FIFO structure can be defined once and next reused in many designs.
  • 96. Page 94 9 4 94 Process1 Process2 Several communications can also be multiplexed on a bus Process3 Process4 Process1 Process3 Process2 Process4 bus arbiter r a a r r a a r Bus and arbiter classes can be reused! Up till now, we have considered point-to-point communications. Each channel in the functional model is then mapped to a physical channel in the hardware. However, when this communication structure becomes complicated it might be advantageous to multiplex multiple communications on a bus structure. Communication with off-chip devices might also take advantage of a bus structure because of the limited amount of available pins. The bus can be modeled as a set of multiplexers. To decide when a module is allowed to communicate on this bus, an arbiter is needed. The arbiter works with handshake protocols with the processes. If we reuse our simple protocol, the arbiter would react on the ask signals from the receiving processes and reserve and transfer this ask signal to the sending process when the bus is free for data transfer. The bus and arbiter are modules that can be designed ones and reused in multiple designs.
  • 97. Page 95 9 5 95 Communication refinement results in behavioral model ➢ Model that defines the relative ordering of input and outputs ➢ A clock signal is used for ordering ➢ Pins are accurate to the final implementation ➢ Internal resources are not mapped on clock cycles (scheduling) and functional units (resource binding) After communication refinement of a functional model, we obtain a behavioral model. A behavioral model defines the functionality and also the relative ordering of inputs and outputs. To perform this ordering, a clock signal is used. Also, the pins of a module are identical to the final implementation. On the other hand, the internal operations are functionally modeled. They are not mapped on clock cycles and no functional units are allocated. Increasingly synthesis tools are moving up from the register transfer level (RTL) synthesis toward behavioral synthesis. In the latter the synthesis tool autonomously decided on the number and types of functional units and schedules the operations on these functional units.
  • 98. Page 96 9 6 96 Communication refinement ➢ Communication refinement ➢ Communication refinement in SystemC ➢ Exercise 3: communication refinement for the JPEG decoder We now take a look at the support for communication refinement in SystemC
  • 99. Page 97 9 7 97 In SystemC behavioral models use (clocked) threads ➢ Modeled with thread processes SC_THREAD or with clocked thread processes SC_CTHREAD ➢ Every module has a clock input:  sc_in_clk clk; ➢ The SC_THREAD process is made static sensitive to a clock edge  Sensitive << clk.pos(); ➢ To separate clock cycles wait() statements are used. ➢ A synchronous or asynchronous reset signal can be specified:  reset_signal_is(reset, true);  async_reset_signal_is(reset, true); ➢ Simulation must be run for a finite time (or will not stop!) or halted explicitly. Representing behavioral models in SystemC is straight forward. The processes are represented with (clocked) thread processes (SC_CTHREAD or SC_THREAD). To order the inputs and outputs, every module has a clock input. In the case of a SC_THREAD process, it must be made static sensitive to this clock. To separate clock cycles, wait() statements will be used in the SC_THREAD or SC_CTHREAD process. It is possible to assign a synchronous reset signal to the thread processes. In the case that the reset signal is active at a clock event, the current process will be stopped, and called again from the start of the function. Also an asynchronous reset is supported. Remark that because of the introduction of the clock we cannot run until the end of activity (this would never stop). Therefore we must run the simulation for a finite time or halt it explicitly.
  • 100. Page 98 9 8 98 Behavioral models communi- cate via standard signals ➢ All input and outputs are standard signals ➢ Define signals with:  sc_signal<T> a; ➢ Predefined ports for sc_signal<T> channels:  sc_in<T> with interface function read() or assignment operator.  sc_out<T> with interface function write() or assignment operator.  sc_inout<T> that combines both interface functions. Standard signals are used to communicate between behavioral processes. A signal can only be written from one process. For the sc_signal<T> channel, three ports are predefined:  sc_in <T> is essentially equivalent to sc_port<sc_signal_in_if <T> >  sc_inout <T> is essentially equivalent to sc_port<sc_signal_inout_if <T> >  sc_out <T> is identical to sc_inout<T> The write() operation on a signal overwrites the present value. The read() operation reads the current value. Also the assignment operators are available for signals. These three ports must be bounded to exactly one signal.
  • 101. Page 99 9 9 99 Clocks in SystemC ➢ Create clock  sc_clock clock1 ( “clock_label”, period, time_unit, duty_ratio, offset, first_value );  sc_clock clock2 ( “clock_label”, period, time_unit, duty_ratio);  sc_clock clock3 ( “clock_label”, period, time_unit); ➢ Clock Binding • f1.clk( clock1 ); ➢ Clocks are typically defined in sc_main(); ➢ Example 2 12 22 32 42 sc_clock clock1 ("clock1", 20, SC_NS, 0.5, 2, true); Finally we need also a clock in a behavioral model. SystemC offers special clock functions, where you can choose the period, duty ratio, initial offset and first value. An example is shown on the slide.
  • 102. Page 100 1 0 0 100 Example: summing 3 values on an input SC_MODULE(sum3) { sc_in_clk CLOCK; sc_in<bool> RESET; sc_in<unsigned> A; sc_out<unsigned> D; void compute(); SC_CTOR(sum3) { SC_CTHREAD(compute, CLOCK.pos()); reset_signal_is(RESET,true); }; }; void sum3::compute() { unsigned tmp; // reset section while (TRUE) { // main loop tmp = A.read(); wait(); // end first I/O cycle tmp += A.read(); wait(); // end second I/O cycle tmp += A.read(); D.write(tmp); wait(); // end third I/O cycle } } On the slide an example is shown where three values are read in sequentially and summed. The resulting sum is put on the output. The example is modeled with a clocked thread. It could also be implemented with a thread process.
  • 103. Page 101 1 0 1 101 Gradual Communication refinement (1/2) Process1 Process2 queue Process1 Process2C1 C2 r a Behavioral_process1 Behavioral_process2 clock Converters Q1 Q2 To replace the queues it is advocated to follow a gradual approach. First, converters (between sc_fifo and protocol) are introduced between the processes.
  • 104. Page 102 1 0 2 102 Gradual Communication refinement (2/2) Process1 Behavioral Process2 C1 r a Behavioral_process1 clock Q1 Behavioral Process2r a clock Behavioral Process1 Next the protocol can be integrated in each process separately. At each moment the correct operation of the system can be validated through simulations.
  • 105. Page 103 1 0 3 103 Converter SystemC code template <class T> SC_MODULE(FF2P) { sc_fifo_in<T> input; sc_out<T> output; sc_in<bool> ask; sc_out<bool> ready; sc_in_clk clk; SC_CTOR(FF2P) { SC_THREAD(process); sensitive << clk.pos(); } void process() { T value; enum ctrl_state {READINPUT, WRITEOUTPUT}; ctrl_state state; // reset cycle ready.write(false); state = READINPUT; wait(); while(true) { if (state == READINPUT) { ready.write(false); value = input.read(); state = WRITEOUTPUT; } else { if (ask.read() == true) { output.write(value); ready.write(true); state = READINPUT; } else { ready.write(false); state = WRITEOUTPUT; }; }; wait(); } return; } }; template <class T> SC_MODULE(P2FF) { sc_fifo_out<T> output; sc_in<T> input; sc_in<bool> ready; sc_out<bool> ask; sc_in_clk clk; SC_CTOR(P2FF) { SC_THREAD(process) sensitive << clk.pos(); } void process() { T value; enum ctrl_state {READINPUT, WRITEOUTPUT}; ctrl_state state; // reset cycle ask.write(true); state = READINPUT; wait(); while(true) { if (state == READINPUT) { if (ready.read() == true) { value = input.read(); ask.write(false); output.write(value); state = WRITEOUTPUT; } else { ask.write(true); state = READINPUT; }; } else { ask.write(true); state = READINPUT; }; wait(); } return; } }; On the slide we show an example for the converters that translate between a sc_fifo and the simple synchronization protocol and vice versa.
  • 106. Page 104 1 0 4 104 Communication refinement ➢ Communication refinement ➢ Communication refinement in SystemC ➢ Exercise 3: communication refinement for the JPEG decoder The exercise is intended to get you familiar with communication refinement. We turn again to the simplified JPEG decoder.
  • 107. Page 105 1 0 5 105 Exercise 3: communication refinement for the JPEG encoder ➢ Goal: Replace the FIFO between the run-length encoder and decoder by a handshake protocol ➢ You will find:  In /exercises/exercise3/ : solution of exercise2  In/exercises/modules/: JPEG-encoder modules {r2b,dct,quantize,zz_enc,rl_enc}.{h,cpp}, JPEG decoder modules {b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules {src,snk,test}.{h,cpp}  In /exercises/images/: test images  In /exercises/add2systemc: FIFO to protocol conversion functions in add2systemc: {FF2P, P2FF}.h ➢ Things to be done:  Introduce a handshake protocol between rl_enc and rl_dec.  introduce refined versions of rl_dec in jpeg_dec.h and main.cpp.  simulate and verify correct operation. The goal of this exercise is to replace the FIFO channel between the run- length encoder and decoder by a handshake protocol. To this end we will add converters between the two blocks to obtain a behavioral model. Next integrate the protocol functionality in the run-length decoder process, integrate the resulting behavioral model in the application, simulate the system, and verify correct operation.
  • 108. Page 106 In this 4th class we focus on the refinement of the computations, resulting in RTL description of the circuit. This model should be synthesizable with an RTL synthesis tool. Model Based System Design Class 4: computation refinement Marc Engels e-mail: marc.engels@flandersmake.be
  • 109. Page 107 The class consists of three parts: First, we describe the conceptual steps to transform from a behavioral into an RTL description of the circuit. Next we introduce the constructs that are available in SystemC to support this RTL modeling. Finally we exercise the new knowledge on the JPEG decoder. 107 Computation refinement in SystemC ➢ Computation refinement ➢ Computation refinement in SystemC ➢ Exercise 4: computation refinement of a JPEG decoder
  • 110. Page 108 Next to fixed point and communication refinement, computation refinement is an important step in architectural design (from functional model towards RTL model). Remark that the order in which these three steps are performed is not defined. Refinements along these three axes can even be intermixed. There also exist interdependences between these operations. For instance if two operations share a common operator they will use the same word size. 108 RTL refinement is the 3rd step in architectural design Computations operations Data variables, arrays floating point memories fixed point operators Communication point-to-point queues busses detailed protocol resource allocation scheduling memory allocation address generation word sizing bus allocation introduce arbiters include protocols System Functionality System Architecture
  • 111. Page 109 109 beh4RTL4beh2RTL2 beh3RTL3func1 For synthesis all blocks needs to be transformed to RTL ➢ Transformation is a gradual refinement process  switch a behavioral block with a RTL block  verify by system simulation SYSTEM S1 S2 S3 TESTBENCH At the start of the computation refinement the embedded system is modeled with behavioral blocks, where both the data types and communications are refined. The test bench is not evolved and is still the original functional model. The RTL modeling can be introduced gradually by replacing individual behavioral blocks with RTL descriptions. The correctness of the system can be verified during this process by simulating the combination of functional, behavioral, and RTL models.
  • 112. Page 110 Behavioral models are represented as threads which wait on clock edges to synchronize their inputs and outputs (IO). As a consequence, they can be represented by a clocked finite state machine (FSM). In the slide a Moore-type state machine, whose outputs are only determined by the state, is used. 110 Behavioral model can be represented by an FSM Process_behavioral{// SC_CTHREAD ask.write(TRUE); while (ready.read() == FALSE) {wait();} wait(); while(TRUE) { ask.write(FALSE); x = input.read(); wait(); d = x * b1; y = d * b2; output.write(y); ask.write(TRUE); while (ready.read() == FALSE) {wait();} wait(); } } = !ready ready !ready ready ask=1 ask=0 x=input ask=1 d = x * b1 y = d * b2 output = y
  • 113. Page 111 111 Behavioral to RTL: scheduling of operations in FSM !ready ready !ready ready ready !ready ready !ready ask=1 ask=0 x=input ask=1 d = x * b1 y = d * b2 output = y !ready!ready ask=1 ask=0 x=input d=x*b1 ask=1 y = d * b2 output = y The transformation from behavioral to RTL can conceptually be represented by the scheduling of operations on this FSM. In this scheduling activity additional states can be introduced. Remark also that the scheduling of the operations can have major impact on the inter-process communication: • Additional states can introduce errors in synchronized communication. • Protocol based communication is more robust but the settings of the protocol signals might have to be adapted Separation of operator scheduling and communication refinement is a desire in many design flows but is rarely achieved completely.
  • 114. Page 112 112 Rescheduled FSM is represented in RTL code = ready !ready ready !ready!ready ask=1 ask=0 x=input d=x*b1 ask=1 y = d * b2 output = y Process_RTL{// SC_CTHREAD ask.write(TRUE); while (ready.read() == FALSE) {wait();} wait(); while(TRUE) { ask.write(FALSE); x = input.read(); d = x * b1; wait(); ask.write(TRUE); y = d * b2; output.write(y); while (ready.read() == FALSE) {wait();} wait(); } } The resulting FSM can be transformed back in code. The resulting RTL model can be represented either with a SC_METHOD or a SC_CTHREAD. Both can be synthesized into gate level circuits. For simplicity, we will use SC_CTHREADS.