SoC System Manager white paper delivered for the IP-SOC Conference in Grenoble, France (November 2010).
One of the key challenges associated with designing SoC system management schemes stems from the growing number of programmable devices on-chip. Programmable devices exponentially increase the number of combination's of software operations that drive hardware state changes in real time. This in turn complicates system level testing in order to achieve reasonable test coverage. Optimizing the SoC design for a single operating system provides little relief, because the diversity of applications running on the SoC continues to multiply the testing complexities at the system level.
This paper will discuss design considerations and compare and contrast three system management architectures. The first is an ad hoc system management, which is comprised of combination's of hardware and software elements that serve a dual purpose, one being normal operation, and one for system management. The second is including system management as part of the on-chip interconnects implementation. The third architecture introduces a control plane approach for system management which complements the data centric global interconnect.
1. Optimizing System Management in the Platform SoC Era
Howard Pakosh, ChipStart and Phil Casini, Advance Tech Marketing
November 2010
Introduction
Consumer focused SoCs have evolved system management, which is comprised
into platform architectures that are now of combinations of hardware and
being driven by requirements from software elements that serve a dual
operating systems such as Android, purpose, one being normal operation,
iPhone. Linux, and Windows and the and one for system management. The
thousands of applications they support. second is including system management
Overtime more of the system is moving as part of the on-chip interconnects
into silicon . As a result, system implementation. The third architecture
management functions have moved into introduces a control plane approach for
the SoC. Traditional feature based system management which complements
regression testing at the silicon level the data centric global interconnect.
must now be increasingly complimented
with complex system level testing in Finally the paper will discuss the
order to maintain a high level of system growing importance of integrated
coverage across SoC road maps. subsystem design and IP for SoCs and
how system level partitioning will play a
Balancing price-performance-power and growing role in achieving efficient
high system level test coverage therefore system management.
creates complex system management
design challenges that effect both System management design
hardware and software operation. considerations
System management must now be One of the key challenges associated
considered as a central feature and with designing SoC system management
responsibility of the SoC architecture, schemes stems from the growing number
not just as a tactical design consideration of programmable devices on-chip.
for the development of each individual Programmable devices exponentially
SoC. System management should increase the number of combinations of
provide adequate synchronization of software operations that drive hardware
hardware state changes driven by state changes in real time. This in turn
software, maintain reasonable time to complicates system level testing in order
market and maximize system test to achieve reasonable test coverage.
coverage and support. Optimizing the SoC design for a single
operating system provides little relief ,
The remainder of this paper will discuss because the diversity of applications
design considerations and compare and running on the SoC continues to
contrast three system management multiply the testing complexities at the
architectures. The first is an ad hoc system level.
2. System level testing via traditional elements introduced into the SoC
silicon level functional and data path architecture.
regressions must now be augmented by
system functional test suites include the In fact, this trend has already begun. The
programmable elements and their impact growing use of decoupled global
on hardware state changes. Each interconnect structures, such as those
programmable core can be isolated and that employ OCP or similar features,
tested to achieve a high level of code provides a proven example of how to
coverage, and each execution path ease chip architecture design as it
through the different cores combinations evolves from single to multicore or
can be tested., but the combinations of multi-layer. By “abstracting” the data
hardware state changes they require as a plane, and allowing the associations
result of application behavior makes it between the IP cores to become linked
almost impossible to achieve adequate through the independent global
system level coverage solely from interconnect structure, system
testing the cores and the buses in performance at the hardware level
isolation or even pseudo random becomes more predictable and tunable
combinations. (CPU to off chip memory for example).
This predictability affords opportunities
It is at this point that compromises are to streamline the design process because
often made in the SoC design. How these loosely coupled associations are
much risk is affordable when trading off less effected by specific design changes.
the cost and time to build these complex This leads to more rapid timing closure
system level regression suites with the even though the complexity of the data
actual test coverage achieved? As plane has grown significantly.
volumes grow the answer is risk must be
mitigated and therefore these tradeoffs Similar abstraction techniques can be
become essential to minimize. applied to system management. The
software and hardware layers, the system
This paper challenges the increasing management, and the functional
“tax” on the project costs to balance operation of the SoC can be decoupled,
adequate system level test coverage, and making it easier to test each component
risk, based on current system of the system level architecture while
management architecture assumptions . considering the system level driven
hardware state changes. This results in a
Specifically, instead of continuing to system level design which is more easily
grow regression suites and make risk understood and has better test coverage.
choices based on the assumption that the This approach also abstracts the system
associations between the levels of management operational complexities
hardware and system testing are tightly between hardware and software even
coupled, abstraction layers can be though the number of applications
inserted into the architecture to decouple grows.
the hardware, operating system, and
applications support functions. The next section of the paper will
Furthermore, each of these components discuss three potential methods of
can tested through independent
3. abstraction that lead to varied degrees of However, the complexity growth
optimizing system management. associated with multicore SoC for
consumer designs today have weakened
System Management Scheme the effectiveness of using this approach
Comparisons because as system tasks become
distributed, that is more interdependent
Given that the objective is to reduce as more cores are added to the SoC, the
overall system management complexity visibility and control of any one core
there are three baseline characteristics over any of the others is reduced with
that system management schemes should each new element added. The visibility
be benchmarked by: and control becomes more dependent on
the global interconnect as well as the
1. How well does the approach cores, adding even more complexity to
achieve independence between execute control functions. The addition
the silicon-operating system- of the global interconnect as part of the
and application layers? system testing is required in this case
2. How flexible is the approach to because it controls access to external
adapt to each derivative design memory, a key element in system
in a SoC road map? operations.
3. How much test coverage does
the resultant system If the master CPU can no longer manage
management scheme achieve and verify the hardware state changes of
for the SoC architecture? the other core elements, the number of
possible states increasing results in
By applying these benchmark criteria, unpredictable coverage and the
three methods can be evaluated. methodology no longer has value.
Extending the scheme then to add
Method 1: Using a single operating system test does not return meaningful
system hosted on a “master” CPU. This dividends on the potentially massive
has been a popular approach to perform investment of developing the tests and
system management because silicon verification infrastructure.
elements already required for real time
operation also execute system Applying the criteria then to this method
management functions. for today’s platform SoCs
Host IP
CPU Core 1. This approach fundamentally
breaks down for multicore SoCs
because it will not adequately
allow the economical
IP
Core
I/O construction of operating system
and application level system test
layers.
When SoC complexities are relatively 2. This criterion is considered
low, this scheme is very efficient. No inconsequential given that the
extra silicon, some extra software criteria failed the first test.
development, but very containable.
4. 3. This approach will yield approach feasible for some
extremely low system test multicore SoCs.
coverage and therefore its 2. However, the approach also has a
usefulness is directly dependent ceiling of usefulness which is
on the complexity of the SoC. normally reached when extra
logic is required to manage
Method 2: Introducing global “special” cases for each of the
interconnect structures and additional derivatives in the SoC road map
logic to support pseudo-control plane as inefficiencies mount that are
system management functions. This tolerated to minimize time to
approach is an extension of method 1 market. One area where this
because often the host CPU continues to occurs is when the system
act as the system management master. management master, usually the
Side band signaling, either contained in host CPU, requests that another
the interconnect or designed separately core should power down.
is used for the control functions. Inefficiencies sometimes occur
when complex arbitration
Host IP IP schemes and blocked requests
CPU Core Core delay the actual action of
powering down the core. These
delays can often be measured in
IP IP thousands of cycles, which is
I/O power consumed for no useful
Core Core
system function, and is therefore
Mixing data plane and control functions power wasted.
introduces abstraction levels that aides in 3. As a result of the ceiling in the
achieving higher system test coverage as benefits of the approach, overall
long as the SoC does not drive the coverage is directly dependent on
interconnect requirements to become so the complexity of the SoC and as
complex that the control functions such is useful only within a range
become a small and lower priority in the of SOC complexity.
overall mix of functions. When this
occurs the control tasks are executed Method 3: Introducing a control plane
sub-optimally as delays occur from that compliments a data plane global
priority choices between functional interconnect.
operations and system management tasks
because of complex arbitration
sequences and delayed communication
through blocked hierarchical buses.
Applying the criteria then to this method
for today’s SoCs
1. This approach introduces levels
of abstraction which makes the
5. This approach differs from the first two 2. This approach introduces high
methods because it does not extend the levels of flexibility as both
traditional host CPU system master control and data plane functions
approach. Rather, it introduces a can be tuned for each SoC
separate control plane and an derivative without changing the
independent system controller to base architecture.
perform system management tasks. 3. This approach also maximizes
the coverage achievable because
any source can direct the system
management and as such
Media
CPU DSP
Engine operations (applications) can be
System
isolated and tested within the
Controller Global Interconnect approach without compromising
Low High
overall coverage.
DRAM
Speed Speed
Controller
I/O I/O
Control Plane Summary:
An independent control plane essentially While method 3 introduces new control
abstracts the system management tasks plane functionality, it also enables SoCs
from any one entity. As such, it can be of virtually any complexity to be tested
controlled by any-or all SoC elements as and operated with maximum efficiency
required, and therefore offers multiple achieved using the same approach. As
layers of abstraction. System testing can such it is best suited for roadmaps that
be developed by software, hardware, contain a wide variety of complexity or
verification, and system engineers and when extreme flexibility is required for
applied using a common framework with the SoC architecture. The ability to
equal effectiveness. direct the system controller using any
SoC core is especially noteworthy
This approach is also advantageous because it allows multiple applications
because it separates targeted control to directly control the hardware states in
tasks ideally executed with low latency real time when needed and without the
from longer more complex and often overhead of channeling its requests
performance sensitive data plane tasks. through other entities, thus avoiding
This separation is often necessary when inter-function dependencies,
complexity is high, because traditional complexities and delays.
approaches reach the ceiling of
effectiveness discussed during method 2. The Impact of SoC Subsystems on
System Management.
Applying the criteria then to this method
for today’s SoCs The basic theme to achieving better
system management is successful
1. This approach creates maximum partitioning in order to increase adequate
levels of abstraction for system levels of system test coverage. This is
management but introduces why method 3 was chosen as the most
control plane functionality.
6. effective for today’s system management
needs.
It stands to reason, then, that the impact
of subsystem utilization further abstracts
the system management tasks. However,
creating systems within systems also
introduces hierarchies of complexity and
as such, further pushes traditional
methods of system management useless.
The growing use of subsystems over the
next generations of SoC design will
therefore accelerate the adoption of
control plane based system management
as the preferred method of architecture
so that hierarchical levels of complexity
can be absorbed into the system
management architecture while
maintaining a common architecture that
provides the flexibility and scalability
while minimizing risks and costs of
expensive architecture redesigns that
will accelerate as system requirements
continue to become more complex.