Low-Energy Hardware Platform for Evaluating Multiprocessor Embedded Systems

1262 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015
A Hardware Platform for Evaluating Low-Energy
Multiprocessor Embedded Systems
Based on COTS Devices
Mohammad Salehi and Alireza Ejlali
Abstract—Embedded systems are usually energy con-
strained. Moreover, in these systems, increased pro-
ductivity and reduced time to market are essential for
product success. To design complex embedded systems
while reducing the development time and cost, there is a
great tendency to use commercial off-the-shelf (“COTS”)
devices. At system level, dynamic voltage and frequency
scaling (DVFS) is one of the most effective techniques for
energy reduction. Nonetheless, many widely used COTS
processors either do not have DVFS or apply DVFS only
to processor cores. In this paper, an easy-to-implement
COTS-based evaluation platform for low-energy embedded
systems is presented. To achieve energy saving, DVFS
is provided for the whole microcontroller (including core,
phase-locked loop, memory, and I/O). In addition, facilities
are provided for experimenting with fault-tolerance tech-
niques. The platform is equipped with energy measurement
and debugging equipment. Physical experiments show that
applying DVFS on the whole microcontroller provides up to
47% and 12% energy saving compared with the sole use
of dynamic power management and applying DVFS only on
the core, respectively. Although the platform is designed for
ARM-based embedded systems, our approach is general
and can be applied to other types of systems.
Index Terms—Embedded systems, energy management,
hardware platform.
I. INTRODUCTION
Embedded systems are ubiquitous, and demand for these
systems is growing progressively. A wide range of em-
bedded systems are battery operated. As, for many of these
systems, there is no possibility of frequently charging or re-
placing their batteries, they are highly energy constrained [1]–
[3]. Therefore, for these systems, low energy consumption has
become one of the major design objectives. Examples include
mobile robots and handheld devices such as personal digital
assistants, cell phones, and portable medical care devices. Fur-
thermore, the complexity of embedded systems is increasing as
the number of parts and the number and types of interactions
among them are increasing [3], [4]. Therefore, embedded sys-
tem designers are always conducted at the request of designing
complex embedded systems with several design objectives.
Manuscript received October 27, 2013; revised March 23, 2014 and
June 15, 2014; accepted July 21, 2014. Date of publication August 26,
2014; date of current version January 7, 2015.
The authors are with the Department of Computer Engineering,
Sharif University of Technology, Tehran 11365-11155, Iran (e-mail:
mohammad_salehi@ce.sharif.edu; ejlali@sharif.edu).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIE.2014.2352215
In dealing with today’s highly competitive embedded sys-
tems markets and time-to-market pressure and in order to
deliver correct-the-first-time products with multiple system re-
quirements, the use of commercial off-the-shelf (COTS) de-
vices [3], [5]–[7] is very beneficial in designing embedded
systems. Some vendors offer reconfigurable hardware solu-
tions to accelerate the design process and provide a variety of
programmable logic device (PLD)-based evaluation kits (e.g.,
Xilinx [8] and many others). However, instead of focusing
on embedded systems, these platforms allow to functionally
test the SOC or ASIC devices to be produced. Embedded
systems usually consist of a microcontroller that contains a
microprocessor integrated with memory elements and periph-
erals in a single chip [4]–[7]. Reference [5] has reported a
laboratory activity on a microcontroller-based platform. Refer-
ence [25] has presented a prototyping platform for ARM-based
embedded systems. However, these platforms do not provide
facilities to experiment with energy management techniques.
Reference [23] has presented a platform for dynamic voltage
and frequency scaling (DVFS) [11] in an ARM-based proces-
sor. However, this work exploits DVFS only for the processor
(and not for the other parts, e.g., phase-locked loop (PLL),
memory, and I/O).
In this paper, to meet the design requirements of multiob-
jective embedded systems, we propose a hardware platform
for experimenting with energy management techniques (i.e.,
dynamic power management (DPM) [12] and DVFS) (see
Section III) and fault-tolerance techniques (see Section VI).
Compared with previous related works (that proposed plat-
forms for embedded systems), our platform:
1) provides DVFS capability for the microcontrollers, in-
cluding not only the processor cores but also PLL, mem-
ory, and I/O; it should be noted that many existing designs
either do not have DVFS or apply DVFS only to processor
cores [11], [13], [14], [23], whereas our study in this
paper (see Section V) shows that applying DVFS to PLL,
memory, and I/O is quite effective;
2) includes circuitry to accurately and separately measure
energy/power consumption of different parts of the mi-
crocontroller, including the processor core, PLL, mem-
ory, and I/O; this provides the ability to determine the
most energy-consuming part for a given application;
3) is general and based on an ARM-based COTS micro-
controller; hence, it can be used for a wide range of
existing microcontrollers (e.g., [13], [14], and [18]–[20])
and many other COTS devices.
0278-0046 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1263
Another advantage of the proposed platform is that it is
suitable for research into energy management techniques in
parallel processing. Since the proposed platform is general and
is capable of implementing various design techniques and since
it has the capability of parallel processing (because of the use
of two ARM7-based and one AVR-based processors that can
operate in parallel), the proposed platform can be useful for
analyzing many design techniques (e.g., [1], [2], [12], and [22]),
which exploit parallelism in energy management.
Furthermore, we made new observations in our experiments
that could provide useful information for embedded system
designers. These are the five observations.
1) The high-to-low voltage scaling delay is greater than the
low-to-high delay (about 45% for the processor core and
PLL and about 110% for memory and I/O).
2) Voltage and frequency scaling is very effective in re-
ducing power consumption not only for the processor
core but also for the other parts of the microcontroller,
including PLL, memory, and I/O.
3) Although PLL, memory, and I/O have less power con-
sumption compared with the processor core, they have
comparable energy consumption to that of the core.
4) Although PLL has a very small contribution in the total
power consumption, as it is always operational, its energy
consumption is comparable with that of the others.
5) Applying DVFS on the whole microcontroller results in a
considerable energy savings compared with the sole use
of DPM or applying DVFS only on the processor core.
The remainder of this paper is organized as follows. In
Section II, the architecture of the proposed hardware platform
is described. The proposed energy management units and tech-
niques are represented in Section III. In Section IV the power
measurement, debug, and test units are described. Experimental
results are given in Section V. In Section VI, we explain
the capability of the proposed platform in experimenting with
fault-tolerance techniques. Finally, we conclude this paper and
describe future work in Section VII.
II. HARDWARE PLATFORM DESIGN
ARM7TDMI is the most widely used COTS processor in
contemporary embedded systems because it is a low-cost, high-
performance, and versatile processor [4], [6]. Many vendors
(e.g., [9], [13], and [14]) combine the ARM7TDMI (hereafter
ARM7) processor with internal memory devices and a wide
range of peripherals on a single chip to obtain a microcontroller.
It is noteworthy that the computational power of ARM7 is
quite sufficient for the majority of embedded applications. For
example, ARM7 can easily execute all benchmarks in MiBench
benchmark suite [1], [21]. ARM7 can also execute fairly com-
plex operating systems (e.g., Real-Time Executive for Mul-
tiprocessor Systems (RTEMS) [1], [26] and Keil RTX [27]).
Nevertheless, for highly computation-intensive applications,
the performance of ARM7 might not be adequate. In this case,
it should be noted that our proposed platform is not inherently
dependent on ARM7. Indeed, any processor (e.g., i.MX27 [18]
and PXA270 [20]) that allows changing operational frequency
Fig. 1. ARM7-based microcontroller architecture [9].
and its supply voltage that can vary in an allowed range can be
similarly used in our design.
A. Architecture Overview
Our design of the ARM7-based platform is founded on a
member of AT91SAM7x series of microcontrollers [9]. The
architecture of the microcontroller series is shown in Fig. 1.
The microcontroller is composed of an ARM7 processor core,
a system controller, memory elements, and peripheral de-
vices. Most of ARM7-based microcontrollers adopt a sim-
ilar architecture, e.g., [18]–[20]. As shown in Fig. 1, the
microcontroller consists of Flash, ROM, and SRAM internal
memory devices connected via the memory controller, and a
wide range of peripherals, including universal synchronous/
asynchronous receiver–transmitter (USART), serial peripheral
interface (SPI), analog-to-digital converter (ADC), universal
serial bus (USB), Ethernet medium access control, controller
area network (CAN), two-wire interface (TWI), synchronous
serial controller (SSC), real-time timer (RTT), and pulsewidth-
modulation controller (PWMC). Most I/O lines of the peripher-
als are multiplexed with the parallel I/O (PIO) controller. Each
PIO line may be assigned to a peripheral or used as general-
purpose I/O. These features provide flexibility to designers and
assure effective use of the components.
B. Platform Architecture
The architecture and physical implementation of the
hardware platform are shown in Fig. 2(a) and (b), respectively.
The platform contains two AT91SAM7x256 microcontrollers
connected via a bus. Based on the facilities provided by
AT91SAM7x series, this bus can be easily configured as SPI,
UART, CAN, or a 16-bit parallel bus. AT91SAM7x256 contains
an ARM7TDMI processor with in-circuit emulation (ICE),
debug communication channel support, 64-KB internal SRAM,
and 256-KB internal Flash memory. Two controllable power
supplies are included in the board to provide power to the pe-
ripherals and the processor core of each of the microcontrollers.
The power supplies receive commands from the processors and
control the power applied to each part of the microcontrollers
(see Section III-B). The use of separate supply voltages not only
helps conduct experiments with various DVFS schemes (where
different supply voltages can be applied to each processor
separately) but also can be used to shut off one processor
to switch into a single-processor configuration. We have also
provided the flexibility to users in choosing arbitrary DVFS or
DPM schemes. The platform also is equipped with circuitry
to measure the current drawn by the processor cores, PLLs,

Fig. 2. Hardware platform. (a) Block diagram. (b) Implementation.
Flash memory devices, and I/O peripherals. By the use of the
measured current and the supply voltage of each part of the
microcontrollers (the voltages are set by the controllable power
supplies and reported to the measurement unit), the power
consumption of each part is obtained. In addition, the execution
time of applications running by the processors is reported to
measure the consumed energy. The measurement data are sent
to the host computer through the data logging port. Two debug-
ging ports [RS232 and Joint Test Action Group (JTAG) ports]
provide debug capabilities for each of the microcontrollers.
JTAG is also used for ICE (see Section IV-B) and fault injection
purposes (see Section VI). After designing and evaluating the
target system, the platform can be customized for a specific
application.
III. ENERGY MANAGEMENT UNITS
To manage the energy consumption, DVFS [11] and DPM
[12] have been effectively used. DVFS varies the components’
voltage and, hence, frequency based on the system workload
and other run-time factors. DPM selectively turns off the sys-
tem components when they are idle. AT91SAM7x (like many
microcontrollers such as [18]–[20]) only supports DPM (only
controls the processor and peripheral clocks) and cannot exploit
DVFS (does not provide variable supply voltage to its processor
core and peripherals). In the following sections, we first explain
how DPM can be employed (as an existing capability of most
COTS microcontrollers), and then we introduce a methodology
for adding DVFS capability to the microcontrollers that are not
DVFS enabled.
A. DPM
The AT91SAM7x optimizes power consumption by con-
trolling (enabling/disabling or scaling) the clock of pro-
cessor and peripherals. The block diagram of the power
management controller is shown in Fig. 3(b). It uses
the clock outputs [see Fig. 3(a)] to supply clocks to the pro-
cessor, USB, peripherals, and master clock, which is the clock
provided to the memory controller and all the peripherals.
Table I summarizes the power management techniques, which
can be used for different parts of the microcontroller. As shown
in Fig. 3, the master clock can be generated through scaling one
of the clocks provided by the clock generator. A low-frequency
Fig. 3. Power management unit. (a) Clock generator. (b) Power man-
agement controller [9].
TABLE I
POWER MANAGEMENT TECHNIQUES IN AT91SAM7X
clock can be provided to the whole device by selecting the
slow clock, or power consumption of the PLL can be saved by
selecting the main clock. The processor power consumption can
be reduced by switching off the processor clock when it enters
to idle mode while waiting for an interrupt. After resetting the
device or by any interrupt, the processor clock is automatically
re-enabled. To reduce the power of each peripheral, the user
can individually enable and disable the peripheral clock by
controlling the master clock on each peripheral by the use of
the peripheral clock controller.
B. DVFS
DPM usually has only two operational states for systems
components, namely active and idle. The active power con-
sumption of a clock-enabled component can be determined
by its operating frequency and supply voltage, as denoted by
PActive, as [1]
PActive = ILeakageV + Ceff V 2
f (1)
where ILeakageV is the static leakage power, and Ceff V 2
f is the
dynamic power consumption (Ceff is the effective switched ca-
pacitance). The dynamic power consumption can be efficiently
eliminated by putting the component into the idle state by dis-
abling the clock [12]. With special hardware support and under

TABLE II
POWER REQUIREMENTS IN AT91SAM7X
Fig. 4. Power supply setup. (a) Typical power supply. (b) Proposed
controllable power supply.
software control, frequency scaling for system components can
be used to exploit idle times for power saving. The active energy
consumed by executing a task with N cycles at frequency f can
be computed as PActiveN/f. As a result, although frequency
scaling reduces the dynamic power consumption linearly, it
has no effect on the static leakage power consumption. Fur-
thermore, the consumed static energy for a given computation
increases due to increasing the task execution time when reduc-
ing the clock frequency. Hence, reduced energy consumption
cannot be achieved by frequency scaling alone. Frequency
scaling can be highly effective when employed in conjunc-
tion with voltage scaling [1], [11]. Voltage scaling techniques
employ software-controlled adjustable voltage regulators to set
the supply voltage of the processor core and clock-enabled
components. Software-controlled clock generators and voltage
regulators allow the system to use DVFS. The basic idea behind
DVFS techniques is to determine the minimum frequency that
satisfies all timing constraints and then to adjust the lowest
possible voltage that allows this speed [1], [11].
According to (1) and assuming a linear relationship between
frequency and voltage [1], [11], the combined effects of voltage
and frequency scaling result in decreasing the active power
consumption proportional to V 3
and reducing the energy con-
sumption proportional to V 2
. Therefore, by scaling both the
voltage and frequency, the energy can be significantly reduced.
However, this achievement does not come for free because a
tradeoff exists between speed and energy consumption [1].
The AT91SAM7x microcontrollers have six power supply
pins and a built-in (fixed output) voltage regulator, allowing the
device to support a 3.3-V single-supply mode. Power specifica-
tions of the power supply pins are shown in Table II. Fig. 4(a)
shows the schematic of a typical single-power-supply mode
where the 3.3-V power is supplied via a dc/dc voltage converter
to VFLASH, VIO, and VIN. The input of the built-in voltage
regulator is connected to the 3.3-V voltage source (i.e., the VIN
pin), and its output (i.e., the VOUT pin) supplies 1.8-V fixed
Fig. 5. Proposed controllable power supply schematic.
voltage for the VCORE and VPLL pins. As Table II shows, the
USB transceiver, Flash memory, and I/O lines power supply can
range from 3.0 to 3.6 V, and in addition, the processor core and
PLL power supply can range from 1.65 to 1.95 V. This provides
the possibility for the device to vary the supply voltage rather
than using just a single fixed voltage.
To provide voltage scaling capability for this device, the
dc/dc converter and embedded voltage regulator in Fig. 4(a) is
replaced with a controllable power supply in Fig. 4(b) to feed
the power pins with variable voltages. As shown in Fig. 4(b),
variable supply voltage is provided for the power inputs of the
microcontroller, except the embedded voltage regulator input,
which remains unconnected, to disable the internal voltage
regulator. The schematic of the proposed controllable power
supply to provide dynamically scalable power supply is shown
in Fig. 5. In this architecture, an adjustable version of a low-
dropout linear voltage regulator (e.g., LM1117) is used. This
regulator can provide an output voltage from 1.25 to 13.8 V
with exploiting only two external resistors (i.e., Rref and Radj
in Fig. 5). This device makes a 1.25-V reference voltage Vref
between the output Vout and the adjust pin. As shown in Fig. 5,
this voltage is applied across the resistor Rref to produce a
constant current that flows through the adjustment resistor Radj
and fixes the output voltage Vout to the desired level as
Vout = VREF 1 +
Radj
Rref
+ IadjRadj. (2)
Based on (2), to set Vout to a new voltage level, we need to
change the adjustment resistor Radj. To provide the capability
of dynamically adjusting the resistor, a digital potentiometer
(e.g., AD8403) is used to provide a digitally controlled vari-
able resistor that performs the same adjustment function as
a potentiometer or variable resistor. As we aim at control-
ling the voltage of the four power pins of AT91SAM7x256
[see Fig. 4(a)], a digital potentiometer, which includes four
independent variable resistors, is used. Each resistor can
be set separately by a digital code transferred into the de-
vice. The code is loaded into the device via the standard three-
wire SPI digital interface. The data bits clocked into the device
are decoded to determine the resistor and its value.
In summary, to dynamically scale the supply voltage of a
power pin of the microcontroller at run time, a digital code
indicating the resistor and its desired value is loaded by the
microcontroller into the digital potentiometer; after changing
the adjustment resistor, the voltage regulator’s output is scaled
and set to the desired voltage value. Therefore, by the use of
the proposed architecture, at run time, the microcontroller can
dynamically set the voltage of the peripherals and the processor
core power pins. Generally, the proposed technique can be used

Fig. 6. Executing two tasks. (a) On a single-processor system. (b) On
a dual-processor system.
to provide scalable voltages for the COTS devices that their
supply voltage can vary within a range.
C. Opportunities Offered by Parallel Processing
Since the proposed platform has multiple processing units
(i.e., two ARM7-based and one AVR-based microcontrollers)
and since it has the facilities for energy/power management
(i.e., DVFS and DPM), one advantage of the platform is that
it can be used to research into the possible opportunities for
energy management that may be offered by parallel processing.
To give an insight into this issue, we provide an example to
illustrate when DVFS is used in executing parallel tasks; a
two-processor system consumes less energy as compared with
a single-processor system. Suppose that the slack time that is
available to execute two tasks T1 and T2 (with N1 and N2 CPU
cycles) is S. Fig. 6 shows how the tasks are executed on a single
processor [see Fig. 6(a)] and on two processors [see Fig. 6(b)].
In Fig. 6, N1/fmax and N2/fmax are respectively the execution
times of T1 and T2 at the maximum frequency fmax. For the
single-processor system [see Fig. 6(a)], the minimum possible
frequency that stretches the two tasks as long as possible and
gives the minimum energy consumption can be calculated as
fSP =
N1/fmax + N2/fmax
S
. (3)
Similarly, for the dual-processor system [see Fig. 6(b)], the
minimum possible frequencies to execute T1 and T2 (i.e.,
fDP,1 and fDP,2, respectively) that give the minimum energy
consumption can be calculated as
fDP,1 =
N1/fmax
S
fDP,2 =
N2/fmax
S
. (4)
By the use of (1) (that gives the active power consumption
PActive) and considering that the energy consumed by execut-
ing a task with N cycles at frequency f can be computed as
PActiveN/f, the minimum energy consumption of the single-
processor system [see Fig. 6(a)] can be written as (VSP is the
minimum voltage that allows fSP)
ESP = ILeakage
VSP
fSP
+ Ceff V 2
SP N1
+ ILeakage
VSP
fSP
+ Ceff V 2
SP N2. (5)
Fig. 7. Power measurement setup.
Fig. 8. Debug and test schematic.
Similarly, the minimum energy consumption of the dual-
processor system can be written as (VDP,1 and VDP,2 are the
minimum voltages that allow fDP,1 and fDP,2, respectively)
EDP = ILeakage
VDP,1
fDP,1
+ Ceff V 2
DP,1 N1
+ ILeakage
VDP,2
fDP,2
+ Ceff V 2
DP,2 N2. (6)
In (3) and (4), it is shown that fDP,1 < fSP and fDP,2 < fSP
(for N1 and N2 = 0). Therefore, the minimum voltages that are
used in the dual-processor system can be less than the minimum
voltage that is used in the single-processor system. Therefore,
we have VDP,1 < VSP and VDP,2 < VSP. In addition, assuming
an almost linear relationship between the voltage and fre-
quency [1], [3], [11], we can write VSP/fSP ≈ VDP,1/fDP,1 ≈
VDP,2/fDP,2. Therefore, from (5) and (6), it can be concluded
that EDP < ESP. This means that when DVFS is used in
executing parallel tasks, a dual-processor system could provide
more energy saving compared with a single-processor system.
IV. POWER MEASUREMENT, DEBUG, AND TEST UNITS
A. Power Measurement Unit
To provide power measurement equipment to the platform, a
resistor is placed between each microcontroller power pin and
the power supply line, and the voltage drop across the resistor is
measured. The measured value gives the current drawn by the
power pin. The power measurement setup is shown in Fig. 7.
As the current drawn by the power pins of the microcontroller
is less than 100 mA and this value cannot be digitized by the
ADC of microcontrollers, the voltage value is amplified using
an operational amplifier. The amplified value is digitized by a
10-bit ADC, and the data are sent to the host computer.
B. Debug Units
The AT91SAM7x microcontrollers have a number of debug
and test features, shown as a block diagram in Fig. 8. The
UART debug unit provides a two-pin (i.e., TXD and RXD)
UART interface that can be employed for various purposes, e.g.,
debug, trace the running application, and upload an application
into internal SRAM. A general JTAG/ICE (see [9]) port is
employed for commonly used operations, such as loading pro-
gram code, and for standard debugging functions, such as single
stepping through programs. IEEE 1149.1 JTAG Boundary Scan

TABLE III
POWER SUPPLY REQUIREMENTS FOR SOME
WIDELY USED MICROCONTROLLERS
Fig. 9. Experimental setup and monitoring. (a) Setup. (b) Voltage of
I/O. (c) Voltage of the processor. Coupling: ac.
allows pin-level access to the IEEE 1149.1 JTAG-compliant
devices independent of the device packaging technology and
is commonly used for test purposes. In a test environment
for multiple on-board devices, a number of JTAG-compliant
devices are connected to form a single scan chain, and test
vectors are generated, transferred, and interpreted by a tester.
V. EXPERIMENTAL RESULTS
A survey of some widely used ARM-based microcontrollers
suggests that most of them permit the power supply pins to be
fed by a wide range of voltages, as shown in Table III. This pro-
vides the opportunity of employing the proposed controllable
power supply (see Section III-B) for them to achieve energy
saving. In addition, all of the processors in Table III offer a
number of modes to manage power in the system. These modes
range widely in the level of power savings and the level of
functionality. For instance, LPC11U6x series [14] provide four
power modes, namely, Sleep, Deep-sleep, Power-down, and
Deep power-down modes, and PXA270 [20] provides Turbo
mode (i.e., low latency operation), Run mode (i.e., normal full-
function mode), Idle and Deep-idle modes (allow stopping and
resuming the CPU clock), Standby mode (all PLLs are dis-
abled), Sleep mode (only keeps I/Os powered), and Deep-sleep
(I/Os are powered down). To the best of our knowledge, most of
the current embedded processors provide power management
only through controlling the clock of the processor core and
peripherals, and only a few of them (e.g., [13] and [14]) provide
variable supply voltages. As we have discussed in Section III-B,
lowering clock frequency solely is not effective for energy
saving, and simultaneous frequency and voltage scaling are
required for this purpose.
Fig. 9(a) shows the experimental setup that includes an
oscilloscope (for displaying voltages), a JTAG device (for
programming), and two USB and RS232 connections (for data
transfer). In this platform, we have four different voltages (see
Fig. 4) for: 1) the processor; 2) PLL; 3) I/O peripheral; and
4) memory. These voltages can independently vary and can
be determined regardless of others. In the experiments, the
processor and PLL voltages could be any value from the set
{1.65, 1.7, 1.75, 1.8, 1.85, 1.9, and 1.95 V}, and I/O and
Fig. 10. High-to-low and low-to-high voltage scaling delays. (a) I/O.
(b) Processor. Coupling: ac.
memory voltages could be any value from the set {3.0, 3.1, 3.2,
3.3, 3.4, 3.5, and 3.6 V}. Like the works [1], [11], and [23], we
have a set of voltage–frequency pairs to perform DVFS. Each
voltage has a corresponding frequency level, and hence, there
are seven levels, i.e., {36, 40, 45, 51, 55, 58, and 61 MHz}.
The corresponding frequency for each voltage was empirically
determined by measuring the highest frequency at which the
processor still worked correctly and then subtracting 5% safety
margin (similar to [23]). It should be noted that this measure-
ment is carried out only once (by the board development team)
and the end users just use the provided set and they do not
need to repeat such measurements (although they can do it if
they require). Although we provided only seven different levels
of voltage, the platform can provide 256 voltage levels. As
an example to show how the four voltages can independently
vary, Fig. 9(b) shows the voltage of the processor and I/O when
switching, respectively, between 1.75, 1.8 and 1.85 V, and 3.2,
3.3 and 3.4 V.
We conducted a set of experiments to analyze the voltage
scaling delay in the proposed platform. For example, Fig. 10
shows a timing diagram of voltage scaling between two consec-
utive voltage levels, i.e., 1.75 and 1.8 V for the processor core
and 3.2 and 3.3 V for I/O. In Fig. 10, the high-to-low voltage
scaling delay is 34 and 118 μs, and the low-to-high voltage
scaling delay is 23 and 55 μs for the processor core and I/O,
respectively. In our experiments, we obtained almost the same
result for the other voltage levels. An interesting observation
from these experiments is that the high-to-low voltage scaling
delay is greater than the low-to-high voltage scaling delay (i.e.,
about 45% for the processor core and PLL and about 110%
for memory and I/O). To analyze the power consumption of
different parts of the microcontroller (including the processor
core, PLL, memory, and I/O) when working on different voltage
levels, we executed a matrix multiplication task on the Keil
RTX operating system [27]. This task multiplies two randomly
generated matrices and sends the result to the host computer via
USB. Based on the power consumption results that are shown
in Fig. 11, for all the parts, lower supply voltage leads to lower
power consumption. In addition, Fig. 11 shows that voltage
scaling is very effective in reducing the power consumption of
both the processor and the other parts of the microcontroller.
Another set of experiments has been performed on the
MiBench benchmarks [21] (as real applications) to determine
the contribution of each part of the microcontroller in the total
power consumption, execution time, and energy consumption.
The results are shown in Fig. 12. In this experiment, the 1.8-V
voltage is used for the processor and PLL, and the 3.3-V

Fig. 11. Power consumption of AT91SAM7x. (a) Processor and PLL.
(b) Memory and I/O.
Fig. 12. Contribution of parts of AT91SAM7x in: (a) power consump-
tion, (b) execution time, and (c) energy consumption.
voltage is used for memory and I/O. As PLL is always op-
erational during the application execution, it is not included
in Fig. 12(b), and when we calculate energy consumption [in
Fig. 12(c)], applications’ execution time is considered for PLL.
From Fig. 12, we make two main observations: 1) Although the
power consumption of PLL, memory, and I/O is less than that
of the processor, they have energy consumption comparable
with that of the processor; 2) although PLL has a very small
contribution in the total power, as it is always operational, its
energy consumption is comparable in most cases with that of
the others.
To evaluate the effectiveness of applying voltage scaling
on the whole microcontroller, we measured and compared the
energy consumption of the microcontroller when using three
types of energy management techniques.
1) DPM: When there is an idle time, the microcontroller
enters the low-power mode, which is provided by the
microcontroller [9], as: memory is standby (is not ac-
cessed at all), processor core is idle (its clock is switched
off), main clock = 500 Hz, and all peripheral clocks are
deactivated.
2) Core voltage and frequency scaling (CVFS): DVFS is
used only for the processor core, and DPM is used for
the other parts. In this case, the processor frequency is set
to the slowest frequency (and its corresponding voltage)
necessary to finish the application, selected from the set
of available voltage–frequency pairs.
3) Microcontroller voltage and frequency scaling (MVFS):
DVFS is used for the whole microcontroller, including
the processor core, PLL, memory, and I/O.
In this experiment, we analyzed the MiBench benchmarks,
and the results are shown in Table IV. This experiment shows
that, for the applications in this experiment, using MVFS results
in energy savings in average of about 35% and 11% (at least
31% and 10%), as compared with the sole use of DPM and to
TABLE IV
ENERGY CONSUMPTION (IN MILLIJOULES) OF DPM, CVFS, AND MVFS
TABLE V
ENERGY CONSUMPTION (IN MILLIJOULES) FOR EXECUTING THE
DUPLICATION TECHNIQUE ON A SINGLE PROCESSOR
OR ON TWO PROCESSORS
the use of the processor CVFS, respectively. In this experiment,
we did not consider any fixed voltage–frequency pair for any
benchmark. Rather, we executed each benchmark by all seven
voltage–frequency pairs, and the average results are reported in
Table IV.
We conducted another set of experiments to show how the
proposed platform can be used for parallelism in energy man-
agement. As an example, consider the duplication technique
[2] where each task is executed twice to detect possible errors.
These two executions of each task can be performed on a single
processor in series [see Fig. 6(a)] or on two processors in paral-
lel [see Fig. 6(b)]. As Table V shows, for this example, parallel
processing on two processors can provide in average of 25%
(up to 29%) energy saving, as compared with implementing
the technique on a single processor (the reason is discussed
in Section III-C). To implement this application, we used RTX
operating system [27] (other embedded operating systems (e.g.,
RTEMS [26]) could be also used with the platform). Then, we
developed the source code of the application, where we used
MailBox [26] feature of RTX for message passing and syn-
chronization. MailBox can use commonly used communication
protocols (e.g., SPI, UART, and CAN) that are supported by the
platform (see Section II). For this experiment, we used UART
for MailBox. Finally, we used Keil [26] (the compiler for RTX)
to compile the source code and to load the object files into the
platform through JTAG.
VI. EXTENSIONS AND FUTURE WORK
The proposed platform can provide an experimental setup
for different research projects. For example, the platform can
be used to experiment with fault-tolerance techniques as a
direction for future work by the use of these facilities.
1) The two microcontrollers are connected such that they
can interrupt, restart, and turn on/off each other.
2) Each of the microcontrollers can access the internal parts
of the other via JTAG. This is helpful to implement fault
detection mechanisms that require comparing parts of a
processor with their correspondents in the other one.

3) There are interconnections to transfer data, internal states,
and checkpoints between the microcontrollers.
4) A third smaller microcontroller is placed in the platform
that can be used (for example, as a voter [15]) to imple-
ment fault-tolerance techniques.
These facilities provide the possibility of implementing fault-
tolerance techniques such as standby sparing [1], duplication
[2], and “2 out of 2” hardware redundancy with a voter [15].
The platform can be also used to implement software fault-
tolerance techniques such as result checking [2] and N-version
programming [10]. Such software mechanisms usually require
communication and synchronization between the processors
[28], which is supported by the proposed platform.
The debugging features (see Section IV-B) can be also used
for fault injection purposes [16]. For example, JTAG lets us
change processor registers, flags, and data memory at run time
arbitrarily. This can be used for injecting soft errors that are
caused by transient faults (e.g., single event upset [17]) and
cause one or more memory bits change [1], [2], [17].
Another possible extension is to adopt a motherboard–
daughterboard architecture for the design of the board to be
used for other microcontrollers but with a slight change.
VII. CONCLUSION
This paper has presented a hardware platform that consists
of two ARM-based microcontrollers, each fed separately by
variable voltages. This platform is very suitable for evaluating
embedded systems with low energy consumption and fault-
tolerance requirements. In this platform, we provide DVFS ca-
pability for the whole microcontroller (including the processor
core, PLL, memory, and I/O). Physical experiments show that
applying DVFS on the whole microcontroller is considerably
more efficient in reducing power/energy consumption com-
pared with applying DVFS only on the processor core or using
power-down policies that are currently used by most embedded
processors. In addition, the platform is equipped with accu-
rate energy/power measurement units, debugging ports, and
facilities for evaluating fault-tolerance techniques. Although
the platform is designed for ARM-based microcontrollers, it is
general, and other COTS devices and embedded processors can
be similarly used in the design of the platform.
REFERENCES
[1] A. Ejlali, B. M. Al-Hashimi, and P. Eles, “Low-energy standby-sparing
for hard real-time systems,” IEEE Trans. Comput.-Aided Design Integr.
Circuits Syst., vol. 31, no. 3, pp. 329–342, Mar. 2012.
[2] S. Aminzadeh and A. Ejlali, “A comparative study of system-level energy
management methods for fault-tolerant hard real-time systems,” IEEE
Trans. Comput., vol. 60, no. 9, pp. 1288–1299, Sep. 2011.
[3] A. Malinowski and H. Yu, “Comparison of embedded system design for
industrial applications,” IEEE Trans. Ind. Informat., vol. 7, no. 2, pp. 244–
254, May 2011.
[4] J. Henkel and S. Parameswaran, Designing Embedded Processors: A Low
Power Perspective. Berlin, Germany: Springer-Verlag, 2007.
[5] P. Marti, M. Velasco, J. M. Fuertes, A. Camacho, and G. Buttazzo,
“Design of an embedded control system laboratory experiment,” IEEE
Trans. Ind. Electron., vol. 57, no. 10, pp. 3297–3307, Oct. 2010.
[6] T. Yang, G. Zhang, and X. Hu, “System design of current transformer
accuracy tester based on ARM,” in Proc. 8th IEEE Conf. Ind. Electron.
Appl., Jun. 19–21, 2013, pp. 634–639.
[7] H. Guzman-Miranda, L. Sterpone, M. Violante, M. A. Aguirre, and
M. Gutierrez-Rizo, “Coping with the obsolescence of safety- or mission-
critical embedded systems using FPGAs,” IEEE Trans. Ind. Electron.,
vol. 58, no. 3, pp. 814–821, Mar. 2011.
[8] Virtex-6 FPGA ML605 Evaluation Kit, Xilinx, San Jose, CA, USA, 2012.
[9] ARM-based Flash MCU SAM7x Series, Atmel Corp., San Jose, CA, USA,
Feb. 11, 2014.
[10] R.-T. Wang, “A dependent model for fault tolerant software systems dur-
ing debugging,” IEEE Trans. Rel., vol. 61, no. 2, pp. 504–515, Jun. 2012.
[11] J. Pouwelse, K. Langendoen, and H. Sips, “Dynamic voltage scaling on a
low-power microprocessor,” in Proc. 7th ACM Int. Conf. MobiCom Netw.,
2001, pp. 251–259.
[12] Y. S. Hwang and K. S. Chung, “Dynamic power management technique
for multicore based embedded mobile devices,” IEEE Trans. Ind. Infor-
mat., vol. 9, no. 3, pp. 1601–1612, Aug. 2013.
[13] STM32L15x: Ultra-Low-Power 32-Bit MCU ARM-Based Cortex-M3,
STMicroelectronics, Geneva, Switzerland, Nov. 2013.
[14] LPC11U6x 32-Bit ARM Cortex-M0 + Microcontroller, NXP Semiconduc-
tors, Eindhoven, The Netherlands, Mar. 2014.
[15] M. Idirin, X. Aizpurua, A. Villaro, J. Legarda, and J. Melendez, “Imple-
mentation details and safety analysis of a microcontroller-based SIL-4
software voter,” IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 822–829,
Mar. 2011.
[16] M. Portela-Garcia, C. Lopez-Ongil, M. Garcia-Valderas, and L. Entrena,
“Fault injection in modern microprocessors using on-chip debugging in-
frastructures,” IEEE Trans. Dependable Secure Comput., vol. 8, no. 2,
pp. 308–314, Mar./Apr. 2011.
[17] M. Grosso, H. Guzman-Miranda, and M. A. Aguirre, “Exploiting fault
model correlations to accelerate SEU sensitivity assessment,” IEEE Trans.
Ind. Informat., vol. 9, no. 1, pp. 142–148, Feb. 2013.
[18] i.MX27 and i.MX27L Multimedia Applications Processor, Freescale
Semiconductor Inc., Austin, TX, USA, 2011.
[19] High-Performance, Low-Power System-on-Chip with SDRAM and Digital
Audio, Cirrus Logic, Inc., Austin, TX, USA, 2011.
[20] Marvell PXA270 Processor: Electrical, Mechanical, Thermal Specifica-
tion, Marvell, Santa Clara, CA, USA, 2009.
[21] M. R. Guthaus et al., “Mibench: A free, commercially representative
embedded benchmark suite,” in Proc. IEEE Int. Workshop Workload
Characterization, Dec. 2001, pp. 3–14.
[22] J. Castrillon, R. Leupers, and G. Ascheid, “MAPS: Mapping concurrent
dataflow applications to heterogeneous MPSoCs,” IEEE Trans. Ind. Infor-
mat., vol. 9, no. 1, pp. 527–545, Feb. 2013.
[23] T. Phatrapornnant and M. J. Pont, “Reducing jitter in embedded systems
employing a time-triggered software architecture and dynamic voltage
scaling,” IEEE Trans. Comput., vol. 55, no. 2, pp. 113–124, Feb. 2006.
[24] H. Guo, K.-S. Low, and H.-A. Nguyen, “Optimizing the localization of a
wireless sensor network in real time based on a low-cost microcontroller,”
IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 741–749, Mar. 2011.
[25] R. Wang and S. Yang, “The design of a rapid prototype platform for ARM
based embedded system,” IEEE Trans. Consum. Electron, vol. 50, no. 2,
pp. 746–751, May 2004.
[26] RTEMS Operating System, 2010. [Online]. Available: http://www.
rtems.com
[27] RTX Real-Time Operating System, 2013. [Online]. Available: http://
www.keil.com
[28] Y. Jiang et al., “Bayesian-network-based reliability analysis of PLC
systems,” IEEE Trans. Ind. Electron., vol. 60, no. 11, pp. 5325–5336,
Nov. 2013.
Mohammad Salehi received the M.S. degree in computer engineering
from Sharif University of Technology, Tehran, Iran, in 2010, where he is
currently working toward the Ph.D. degree in computer engineering.
His current research interests include embedded systems, low-power
design, and the tradeoff between fault tolerance and energy efficiency
in real-time systems.
Alireza Ejlali received the Ph.D. degree in computer engineering from
Sharif University of Technology, Tehran, Iran, in 2006.
He is currently an Associate Professor of computer engineering with
Sharif University of Technology, where he is also the Director of the
Computer Architecture Group and the Embedded Systems Research
Laboratory, Department of Computer Engineering. His current research
interests include low-power design, real-time embedded systems, and
fault-tolerant embedded systems.

Low-Energy Hardware Platform for Evaluating Multiprocessor Embedded Systems

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Ähnlich wie Low-Energy Hardware Platform for Evaluating Multiprocessor Embedded Systems

Ähnlich wie Low-Energy Hardware Platform for Evaluating Multiprocessor Embedded Systems (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Low-Energy Hardware Platform for Evaluating Multiprocessor Embedded Systems