SOC: Processors Used
Mr. A. B. Shinde
Assistant Professor,
Electronics Engineering,
PVPIT, Budhgaon, Sangli
shindesir.pvp@gmail.com
Unit-III Contents
• Unit-III: PROCESSORS
• Introduction,
• Processor Selection for SOC,
• Basic Concepts in Processor
Architecture,
• Study of IBM’s power PC,
• Study of Picoblaze processor,
• Study of Microblaze processor
2
Datasheets: PowerPC processor
Picoblaze processor
Microblaze processor
Introduction
• Processors come in many
types and with many
intended uses.
• Much attention is focused
on high - performance
processors used in servers
and workstations.
• Figure shows the processor
production profile by annual
production count
3
Worldwide production of
microprocessors and controllers
Introduction
• Market growth, shows that
the demand for SOC and
larger microcontrollers is
growing at almost three
times that of microprocessor
units.
• In SOC type applications,
the processor itself is a small
component occupying just a
few percent of the die.
• SOC designs often use
many different types of
processors suiting the
application.
4
Annual growth in demand for
microprocessors and controllers
Processor Selection For SOC
• Overview:
• For SOC designs, the selection of the processor is the most obvious
task and the most restricted.
• The processor must run a specific system software, so at least a
core processor (usually a general - purpose processor (GPP)) must be
selected for this function.
• In computation - limited applications, the system includes a processor
configured and parameterized to meet requirements.
5
Processor Selection For SOC
• Overview:
• In some cases, it may be possible to merge these processors, but
that is usually an optimization consideration.
• Memory and interconnect components are considered as simple
delay elements in calculating processor performance.
• These are referred to here as idealized components.
6
Processor Selection For SOC
• Figure shows the processor model used in the initial design process.
7
Processors in the SOC model
Processor Selection For SOC
• The process of selection is
different in the case of
compute - limited
selection, as there can be
a real – time requirement
that must be met by one of
the selected processors.
• The processor selection
and parameterization
should result in an initial
SOC design that appears
to fully satisfy all functional
and performance
requirements set out in the
specifications.
8
Process of processor core selection
Processor Selection For SOC
• Soft Processors:
• A soft processor is an Intellectual Property (IP) core that is
implemented using the logic primitives of the FPGA.
• Being soft it has high degree of flexibility and configurability.
• Soft processor is a microprocessor core that can be entirely
implemented using logic synthesis.
• It can be implemented via different semiconductor devices
containing programmable logic (e.g., ASIC, FPGA, CPLD).
9
Processor Selection For SOC
• Soft Processors:
• Most systems, uses a single soft processor. However, a few
designers may use many soft cores onto an FPGA.
• While many people put exactly one soft microprocessor on a FPGA.
A sufficiently large FPGA can hold two or more soft
microprocessors, resulting in a multi-core processor.
• The number of soft processors on a single FPGA is only limited by
the size of the FPGA.
10
Processor Selection For SOC
• Soft Processors:
• The term “soft core” refers to an instruction processor design in
bitstream format that can be used to program a FPGA device.
• The 4 main reasons for using such designs, despite their large area –
power – time cost, are
1. Cost reduction in terms of system - level integration,
2. Design reuse in cases where multiple designs are really just
variations on one,
3. Creating an exact fit for a microcontroller/peripheral combination,
and
4. Providing future protection against discontinued microcontroller
variants.
11
Processor Selection For SOC
• rbe: register bit equivalent
• register bit equivalent (rbe) is the unit of area measurement.
• This is defined to be a six - transistor register cell.
• This is significantly more than six times the area of a single
transistor, since it includes larger transistors, their interconnections,
and necessary inter - bit isolating spaces.
• Example:
• 1 register bit (rbe) 1.0 rbe
• 1 static RAM bit in an on - chip cache 0.6 rbe
• 1 DRAM bit 0.1 rbe
• Xilinx FPGA
• A slice (2 LUTs + 2 FFs + MUX) 700 rbe
• A configurable logic block (4 slices) Virtex 4 2800 rbe
• A 18 - KB block RAM 12,600 rbe
13
Processor Selection For SOC
• Processor Core Selection (General Core Path):
• Assume that an initial design had performance of 1 using 100K rbe
of area, and we would like to have additional speed and
functionality.
• So we double the performance (half the T for the processor).
• This increases the area to 400K rbe and the power by a factor of 8.
• Each rbe is now dissipating twice the power as before.
• Doubling the performance (instruction execution rate) doubles the
number of cache misses per unit time.
14
Processor Selection For SOC
• Processor Core Selection (General Core Path):
• Cache misses significantly reduces the realized performance; to
recover this performance, we now need to increase the cache size.
• The general rule to half the miss rate, we need to double the cache
size.
• If the initial cache size was also 100K rbe, then new design will have
cache size of 600K rbe and probably dissipates about 10 times the
power of the initial design.
• The faster processor cache combination may provide important
functionality, such as additional security checking or input/output (I/O)
capability.
15
Processor Selection For SOC
• Processor Core Selection (Compute Core Path):
• Consider some trade - offs for the compute - limited path.
• Suppose the application is generally parallelizable, and we have
several different design approaches.
• One is a 10 - stage pipelined vector processor; the other is multiple
simpler processors.
• The application has performance of 1 with the vector processor
(area is 300K rbe) and half of that performance with a single simpler
processor (area is 100K rbe).
• In order to satisfy the real – time compute requirements, we need to
increase the performance to 1.5
16
Processor Selection For SOC
• Processor Core Selection (Compute Core Path):
• Now we must evaluate the various ways of achieving the target
performance.
• Approach 1 is to increase the pipeline depth and double the number
of vector pipelines; this satisfies the performance target.
• This increases the area to 600K rbe and doubles the power, while the
clock rate remains unchanged.
17
Processor Selection For SOC
• Processor Core Selection, Compute Core Path:
• Now we must evaluate the various ways of achieving the target
performance.
• Approach 2 is to use an “array” of simpler interconnected
processors.
• In order to achieve the target performance, we need to have at least
four processors: three for the basic target and one to account for
the overhead.
18
Basic Concepts In Processor Architecture
• Before Studying the basic concepts in Processor Architecture we will
understand the designing of any GPP:
• For example: EC-1 Microprocessor
19
Steps for Designing CPU
• Design of instruction set [IS]
• Define instruction set, number of instructions
• Define each instruction
• Instruction Encoding (OPCODE)
• Design of Data path
• Define number of functional unit required for IS
• Decide Number of registers needed
• Use Separate registers or register file
• Define Flow control registers, Status registers etc.
• Connect the functional unit and register to implement [IS]
• Design of Control unit
• Program Counter (PC)
• Instruction Register (IR)
• Instruction Cycle
• Step 1 fetches an Instruction
• Step 2 Decodes the Instruction
• Step 3 executes the Instruction
20
Basic Concepts In Processor Architecture
• The processor architecture consists of the instruction set of the
processor.
• While the instruction set implies many implementation
(microarchitecture) details.
• It is the synthesis of the physical device limitations with area – time –
power trade - offs to optimize specified user requirements.
25
Basic Concepts In Processor Architecture
• Instruction Set:
• The instruction set for most processors is based upon a register set
to hold operands and addresses.
• The register set size can be varied from 8 to 64 words or more,
where each word consists of 32 – 64 bits.
• An additional set of floating - point registers (32 – 128 bits) can also
be used.
• A typical instruction set specifies a program status word, which
consists of various types of control status information, including
condition codes (CCs) set by the instruction.
26
Basic Concepts In Processor Architecture
• Instruction Set:
• Common instruction sets can be classified into two basic types:
– load – store ( L/S ) architecture and
– register – memory ( R/M ) architecture:
27
Basic Concepts In Processor Architecture
• Instruction Set:
• The L/S instruction set includes the RISC microprocessors.
• Arguments are in registers before their execution.
• An ALU instruction has both source operands and result specified as
registers.
• The advantages of the L/S architecture are:
– regularity of execution and
– ease of instruction decode.
28
Basic Concepts In Processor Architecture
• Instruction Set:
• The R/M architectures include instructions that operate on operands in
registers or with one of the operands in memory.
• In the R/M architecture, an ADD instruction might sum a register value
and a value contained in memory, with the result going to a register.
29
Basic Concepts In Processor Architecture
• Instruction Set:
• The trade - off in instruction sets is an area – time compromise.
• The R/M approach offers a program representation using fewer
instructions of variable size compared with L/S.
• The variable instruction size makes decoding more difficult.
• The decoding of multiple instructions requires predicting the starting
point of each. The R/M processors require more circuitry (and area) to
be devoted to instruction fetch and decode.
30
Basic Concepts In Processor Architecture
31
Instruction size and format for typical processors
Basic Concepts In Processor Architecture
32
Instruction Set Mnemonic Operations
Basic Concepts In Processor Architecture
• Some Instruction Set Conventions:
• To indicate the data type that the operation specifies, the operation
mnemonic is extended by a data - type indicator:
• OP.W might indicate an OP for integers, while
• OP.F indicates a floating - point operation.
33
Typical data - type modifiers are shown in above table.
A typical instruction has the form OP.M destination, source 1, source 2.
The source and destination specification has the form of either a register or
a memory location
Basic Concepts In Processor Architecture
• Branches:
• Branches (or jumps ) manage program control flow.
• They typically consist of unconditional BR, conditional BC, and
subroutine call and return (link).
• Typically, the CC is set by an ALU instruction to record one of several
results, for example, specifying whether the instruction has generated
1. a positive result,
2. a negative result,
3. a zero result, or
4. an overflow.
34
Basic Concepts In Processor Architecture
• Interrupts and Exceptions:
• Many embedded SOC controllers have external interrupts and internal
exceptions
• These facilities can be managed and supported in various ways:
1. User Requested versus Coerced (Forcefully): The former often covers
executions like divide by zero, while the latter is usually triggered by
external events.
2. Maskable versus Nonmaskable: The former type of event can be
ignored, while the latter cannot be ignored.
3. Terminate versus Resume: An event such as divide by zero would
terminate ordinary processing, while a processor resumes operation.
4. Asynchronous versus Synchronous: Interrupt events can occur in
asynchrony with the processor clock by an external agent or not, as
when caused by a program’s execution.
5. Between versus Within Instructions: Interrupt events can be
recognized only between instructions or within an instruction execution.
35
Basic Concepts In Processor Architecture
• Interrupts and Exceptions:
• In general, the first alternative of most of these pairs is easier to
implement and may be handled after the completion of the current
instruction.
• Once the exception is handled, the latter instructions are restarted from
scratch.
• Some of these events may occur simultaneously and may even be
nested.
36
• PowerPC - Performance Optimization With Enhanced RISC –
Performance Computing,
• PowerPC sometimes abbreviated as PPC
37
IBM’s power PC
• The IBM 405Fx is 32-bit reduced instruction set computer (RISC)
processor core, referred to as the PPC405Fx core, implements the
PowerPC Architecture with extensions for embedded applications.
• PPC405Fx Features
– The PPC405Fx core provides high performance and low power
consumption.
– The PPC405Fx RISC CPU executes at sustained speeds
approaching one cycle per instruction.
– On-chip instruction and data cache arrays can be implemented to
reduce chip count and design complexity in systems.
38
PPC405Fx Embedded Processor
• PPC405Fx Features
• The PowerPC RISC fixed-point CPU features:
– Thirty-two, 32-bit general purpose registers (GPRs)
– Five-stage pipeline with single-cycle execution of most
instructions.
– Unaligned load/store support to cache arrays, main memory, and
on-chip memory (OCM)
– Hardware multiply/divide for faster integer arithmetic (4-cycle
multiply, 35-cycle divide)
– True little endian operation
– Parity detection and reporting for the instruction cache, data cache.
– Programmable Interval Timer (PIT), Fixed Interval Timer (FIT),
and watchdog timer
39
PPC405Fx Embedded Processor
• PPC405Fx Features
• Storage control :
– Separate, configurable, two-way set-associative instruction and
data cache units;
• Instruction cache array is 16KB and data cache array is
16KB
– 32 bytes per cache line
– Read and write line buffers
– Programmable ICU pre-fetching of next sequential line into line
buffer
– Programmable allocation on loads and stores
– Operand forwarding during cache line fills
40
PPC405Fx Embedded Processor
• PPC405Fx Features
• Memory Management
– Translation of the 4GB logical address space into physical
addresses
– Page level access control using the translation mechanism
– Software control of page replacement strategy
– WIU0GE (write-through, cachability, compressed user-defined 0,
guarded, endian) storage attribute control for each virtual memory
region
– Full floating-point unit (FPU) support using the auxiliary processor
unit (APU) interface
(the PPC405Fx does not include an FPU)
41
PPC405Fx Embedded Processor
• PPC405Fx Features
• PowerPC timer facilities
– 64-bit time base
– PIT, FIT, and watchdog timers
• Debug Support
– Enhanced debug support with logical operators
– Four instruction address compares (IACs)
– Two data address compares (DACs)
– Two data value compares (DVCs)
• Advanced power management support
42
PPC405Fx Embedded Processor
• PowerPC Architecture
• The PowerPC Architecture comprises three levels of standards:
• PowerPC User Instruction Set Architecture (UISA): including the
base user-level instruction set, user level registers, programming model,
data types, and addressing modes.
• PowerPC Virtual Environment Architecture (VEA): describing the
memory model, cache model, cache-control instructions, address
aliasing, and related issues.
• PowerPC Operating Environment Architecture (OEA): including the
memory management model, supervisor level registers, and the
exception model. These features are not accessible from the user level.
43
PPC405Fx Embedded Processor
• Processor Core Organization
• The processor core consists of a 5-stage pipeline, separate
instruction and data cache units, virtual memory management unit
(MMU), three timers, debug, and interfaces to other functions.
• Instruction and Data Cache Controllers
– The instruction cache unit (ICU) and data cache unit (DCU)
enable concurrent accesses and minimize pipeline stalls.
– The storage capacity of the cache units, which can range from
0KB–32KB, depends upon the implementation.
– The instruction set provides cache control instructions, including
instructions to read tag information and data arrays.
45
PPC405Fx Embedded Processor
• Processor Core Organization
• Instruction Cache Unit
– The ICU provides one or two instructions per cycle to the
execution unit (EXU) over a 64-bit bus.
– A line buffer enables the ICU to be accessed only once for every
four instructions, to reduce power consumption by the array.
– The ICU can forward any or all of the words of a line fill to the
EXU to minimize pipeline stalls caused by cache misses.
46
PPC405Fx Embedded Processor
• Processor Core Organization
• Data Cache Unit
– The DCU transfers 1, 2, 3, 4, or 8 bytes per cycle, depending on
CPU.
– The DCU contains a single-element command and store data
queue to reduce pipeline stalls; this queue enables the DCU to
independently process load/store and cache control instructions.
– When the DCU is busy with a low-priority request while a
subsequent storage operation requested by the CPU is stalled, the
DCU automatically increases the priority of the current request
to the PLB.
47
PPC405Fx Embedded Processor
• Processor Core Organization
• Data Cache Unit
– The DCU uses a two-line flush queue to minimize pipeline stalls
caused by cache misses.
– Single queued flushes are non-blocking. When a flush operation
is pending, the DCU can continue to access the array to determine
subsequent load or store.
– The DCU can function in write-back or write-through mode, as
controlled by the Data Cache Write-through Register (DCWR) or
the translation look-aside buffer (TLB).
48
PPC405Fx Embedded Processor
• Processor Core Organization
• Memory Management Unit
– The 4GB address space of the PPC405Fx is flat address space.
– The MMU provides address translation, protection functions,
and storage attribute control for embedded applications.
– MMU provides the following functions:
• Translation of the 4GB logical address space into physical
addresses
• Page level access control using the translation mechanism
• Software control of page replacement strategy
– The MMU can be disabled under software control.
49
PPC405Fx Embedded Processor
• The PicoBlaze microcontroller is a compact and cost-effective fully
embedded 8-bit RISC microcontroller core optimized for the Spartan-
3 family.
• It also provides support for the Virtex-5, Spartan-6, and Virtex-6
FPGA families.
• It occupies just 96 FPGA slices, (only 12.5% of an XC3S50 FPGA).
• Single FPGA block RAM stores up to 1024 program instructions,
which are automatically loaded during FPGA configuration.
• The PicoBlaze microcontroller performs a respectable 44 to 100
million instructions per second (MIPS) depending on the target
FPGA family and speed grade.
50
PicoBlaze
• The PicoBlaze microcontroller core is totally embedded within the
target FPGA and requires no external resources.
• The PicoBlaze peripheral set can be customized to meet the specific
features, function, and cost requirements of the target application.
• PicoBlaze microcontroller is delivered as synthesizable VHDL
source code, the core is future-proof and can be migrated to future
FPGA architectures.
• Being integrated within the FPGA, the PicoBlaze microcontroller
reduces board space, design cost, and inventory.
51
PicoBlaze
• The PicoBlaze microcontroller is specifically designed and
optimized for the Spartan-3 family, and supports for Spartan-6, and
Virtex-6 FPGA architectures.
• It is compact, and consumes considerably less FPGA resources than
comparable 8-bit microcontroller architectures within an FPGA.
• Because it is delivered as VHDL source, the PicoBlaze microcontroller
is immune to product obsolescence.
52
Why the PicoBlaze Microcontroller
• Before the advent of the PicoBlaze and MicroBlaze embedded
processors, the microcontroller resided externally to the FPGA,
limiting the connectivity to other FPGA functions and restricting overall
interface performance.
• By contrast, the PicoBlaze microcontroller is fully embedded in the
FPGA with flexible, extensive on-chip connectivity to other FPGA
resources.
• The PicoBlaze microcontroller reduces system cost because it is a
single-chip solution, integrated within the FPGA.
53
Why the PicoBlaze Microcontroller
• Microcontrollers and FPGAs both are successfully implemented in
any digital logic function. Each has unique advantages in cost,
performance and ease of use.
• Microcontrollers are well suited to control applications, especially
with widely changing requirements.
• The same FPGA logic is re-used by the various microcontroller
instructions, conserving resources.
• Programming control sequences or state machines in assembly
code is often easier than creating similar structures in FPGA logic.
• As an application increases in complexity, the number of instructions
required to implement the application grows and system performance
decreases accordingly.
54
Why Use a Microcontroller within an FPGA?
• FPGA is more flexible than microcontroller.
For example, an algorithm can be implemented sequentially or
completely in parallel, depending on the performance requirements.
A completely parallel implementation is faster but consumes more FPGA
resources.
A microcontroller embedded within the FPGA provides the best of
both.
The microcontroller implements non-timing crucial complex control
functions while timing critical or data path functions are best
implemented using FPGA logic.
For example, a microcontroller cannot respond to events much
faster than a few microseconds. The FPGA logic can respond to
multiple, simultaneous events in just a few to tens of nanoseconds.
55
Why Use a Microcontroller within an FPGA?
PicoBlaze Microcontroller FPGA Logic
Strengths Easy to program, excellent for
control and state machine
applications
Resource requirements remain
constant with increasing
complexity
Re-uses logic resources,
excellent for lower-performance
functions
Significantly higher
performance
Excellent at parallel
operations
Sequential Vs. parallel
implementation
Fast response to multiple,
simultaneous inputs
Weaknesses Executes sequentially
Performance degrades with
increasing complexity
Program memory requirements
increase with increasing
complexity
Slower response to
simultaneous inputs
Control and state machine
applications more difficult
to program
Logic resources grow with
increasing Complexity
56
Why Use a Microcontroller within an FPGA?
• 16 byte - wide general-purpose data registers
• 1K instructions of programmable on-chip program store,
automatically loaded during FPGA configuration
• Byte-wide ALU with CARRY and ZERO indicator flags
• 64-byte internal scratchpad RAM
• 256 input and 256 output ports.
• Automatic 31-location CALL/RETURN stack
• Predictable performance, always two clock cycles per instruction, up
to 200 MHz or 100 MIPS in a Virtex-II Pro FPGA
• Fast interrupt response (worst-case 5 clock cycles)
• Optimized for Xilinx Spartan-3 architecture — just 96 slices and 0.5
to 1 block RAM
57
PicoBlaze Microcontroller Features
• General-Purpose Register
– The PicoBlaze microcontroller includes 16 byte-wide general-
purpose registers, designated as registers s0 through sF
– All register operations are completely interchangeable.
– There is no dedicated accumulator; each result is computed in a
specified register.
• 1,024-Instruction Program Store
– The PicoBlaze microcontroller executes up to 1,024 instructions
from memory within the FPGA. Each PicoBlaze instruction is 18
bits wide.
– Other memory organizations are possible to accommodate more
PicoBlaze controllers within a single FPGA.
59
PicoBlaze Microcontroller Functional Blocks
• Arithmetic Logic Unit (ALU)
– The byte-wide Arithmetic Logic Unit (ALU) performs all
microcontroller calculations, including:
• Basic arithmetic operations such as addition and subtraction
• Bitwise logic operations such as AND, OR, and XOR
• Arithmetic compare and Bitwise test operations
• Comprehensive shift and rotate operations
– All operations are performed using an operand provided by any
specified register (sX). The result is returned to the same specified
register (sX).
– If an instruction requires a second operand, then the second operand
is either a second register (sY) or an 8-bit immediate constant (kk).
• Flags
– ALU operations affect the ZERO and CARRY flags.
– The INTERRUPT_ENABLE flag enables the INTERRUPT input.
60
PicoBlaze Microcontroller Functional Blocks
• 64-Byte Scratchpad RAM
– The PicoBlaze microcontroller provides an internal general-
purpose 64-byte scratchpad RAM, directly or indirectly
addressable from the register file using the STORE and FETCH
instructions.
– The STORE instruction writes the contents of any of the 16
registers to any of the 64 RAM locations.
– The complementary FETCH instruction reads any of the 64
memory locations into any of the 16 registers.
61
PicoBlaze Microcontroller Functional Blocks
• Input/Output
– The Input/Output ports extend the PicoBlaze microcontroller’s
capabilities and allow the microcontroller to connect to a custom
peripheral set or to other FPGA logic.
– The PicoBlaze microcontroller supports up to 256 input ports
and 256 output ports or a combination of input/output ports.
– The PORT_ID output provides the port address.
• During an INPUT operation: PicoBlaze microcontroller reads
data from the IN_PORT port to a specified register, sX.
• During an OUTPUT operation: PicoBlaze microcontroller writes
the contents of a specified register, sX, to the OUT_PORT
port.
62
PicoBlaze Microcontroller Functional Blocks
• Program Counter (PC)
– The Program Counter (PC) points to the next instruction to be
executed.
– Only the JUMP, CALL, RETURN instructions and the Interrupt
and Reset Events modify the default behavior.
– If the PC reaches the top of the memory at 3FF hex, it rolls over to
location 000.
• Program Flow Control
– The default execution sequence of the program can be modified
using conditional and non-conditional program flow control
instructions.
– CALL and RETURN instructions provide subroutine facilities for
commonly used sections of code.
63
PicoBlaze Microcontroller Functional Blocks
• CALL/RETURN Stack
– The CALL/RETURN hardware stack stores up to 31 instruction
addresses.
– When the stack is full, it overwrites the oldest value.
– No program memory is required for the stack.
• Interrupts
– The optional INTERRUPT input, allows the PicoBlaze microcontroller to
handle asynchronous external events.
– The PicoBlaze microcontroller responds to interrupts quickly in just five
clock cycles.
• Reset
– The PicoBlaze microcontroller is automatically reset immediately after
the FPGA configuration process completes.
– The PC is reset to address 0, the flags are cleared, interrupts are
disabled, and the CALL/RETURN stack is reset.
64
PicoBlaze Microcontroller Functional Blocks
• The MicroBlaze embedded processor soft core is a RISC optimized
for implementation in Xilinx FPGAs.
• With few exceptions, the MicroBlaze can issue a new instruction
every cycle, maintaining single-cycle execution under most
circumstances.
• MicroBlaze's primary I/O bus, the CoreConnect PLB bus, is a used
for system-memory data transactions.
• For accessing the local-memory, MicroBlaze uses a dedicated LMB
bus, which reduces loading on the other buses.
• User-defined coprocessors are supported through a dedicated
FIFO-style connection called FSL (Fast Simplex Link).
66
MicroBlaze Processor
• Many aspects of the MicroBlaze can be user configured:
– Cache size,
– Pipeline depth (3-stage or 5-stage),
– Embedded peripherals,
– Memory management unit, and
– Bus-interfaces can be customized.
The area-optimized version of MicroBlaze, uses a 3-stage pipeline.
The performance-optimized version expands the execution-pipeline to
5-stages.
67
MicroBlaze Processor
• Features
• The MicroBlaze soft core processor is highly configurable, allowing
you to select a specific set of features.
• The fixed feature set of the processor includes:
– Thirty-two 32-bit general purpose registers
– 32-bit instruction word with three operands and two addressing
modes
– 32-bit address bus
– Single issue pipeline
68
MicroBlaze
• Data Types and Endianness
– MicroBlaze uses Big-Endian bit-reversed format to represent data.
– The hardware supported data types for MicroBlaze are word, half
word, and byte.
Word Data Type
Half Word Data Type
Byte Data Type
70
MicroBlaze
• Instructions
• All MicroBlaze instructions are 32 bits and are defined as either Type
A or Type B.
• Type A instructions have up to two source register operands and
one destination register operand.
• Type B instructions have one source register and a 16-bit immediate
operand.
• Instructions are provided in the following functional categories:
– arithmetic,
– logical,
– branch,
– load/store, and
– special.
71
MicroBlaze
• Registers
– It has thirty-two 32-bit general purpose registers and up to
eighteen 32-bit special purpose registers.
1. General Purpose Registers
The thirty-two 32-bit General Purpose Registers are numbered
R0 through R31.
72
MicroBlaze
2. Special Purpose Registers
Program Counter (PC)
The Program Counter (PC) is the 32-bit address of the execution
instruction.
73
MicroBlaze
2. Special Purpose Registers
Machine Status Register (MSR)
The Machine Status Register contains control and status bits for
the processor.
When reading: bit 29 is replicated in bit 0 as the carry copy.
When writing: Carry bit takes effect immediately and the remaining
bits take effect one clock cycle later.
The MSR is specified by setting Sx = 0x0001.
74
MicroBlaze
2. Special Purpose Registers
Exception Status Register (ESR)
The Exception Status Register contains status bits for the processor.
The ESR is specified by setting Sa = 0x0005.
Branch Target Register (BTR)
The Branch Target Register only exists if the MicroBlaze processor is
configured to use exceptions.
The BTR is specified by setting Sa = 0x000B.
75
MicroBlaze
2. Special Purpose Registers
Floating Point Status Register (FSR)
The Floating Point Status Register contains status bits for the floating
point unit.
The register is specified by setting Sa = 0x0007.
Exception Data Register (EDR)
The contents of this register is undefined for all other exceptions.
The EDR is specified by setting Sa = 0x000D.
76
MicroBlaze
2. Special Purpose Registers
Zone Protection Register (ZPR)
The Zone Protection Register is used to override MMU memory
protection defined in TLB entries.
77
MicroBlaze
• Pipeline Architecture
• MicroBlaze instruction execution is pipelined. For most instructions,
each stage takes one clock cycle to complete.
• Consequently, the number of clock cycles necessary for a specific
instruction to complete is equal to the number of pipeline stages.
• A few instructions require multiple clock cycles in the execute stage to
complete.
78
MicroBlaze
• Pipeline Architecture
Three Stage Pipeline
Five Stage Pipeline
Fetch (IF), Decode (OF), Execute (EX), Access Memory (MEM), and Writeback (WB).
79
MicroBlaze