An advanced processor is a type of microprocessor that is designed to handle complex tasks and perform calculations at a high speed. These processors are typically used in high-performance computing applications, such as scientific research, artificial intelligence, and data analysis. They often have multiple cores and advanced instruction sets that allow them to process large amounts of data quickly and efficiently. Some examples of advanced processors include Intel's Core i9 and AMD's Ryzen Threadripper
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Advanced Processor Power Point Presentation
1.
2.
3.
4. INDEX
1. INTRODUCTION
2. DESIGN SPACE
3. INSTRUCTION SET ARCHITECTURE
4. CISC SCALAR PROCESSOR
5. RISC SCALAR PROCESSOR
6. VLIW ARCHITECTURE
7. VECTOR PROCESSOR
8. SYMBOLIC PROCESSOR
5. INTRODUTION
A Processor is an integrated electronic circuit that performs the calculations that run
a computer. A processor performs arithmetical, logical, input/output (I/O) and other
basic instructions that are passed from an operating system (OS). Most other processes
are dependent on the operations of a processor.
Similarly
Advanced processors allow you to perform advanced functions such as defining
processing ,This Advance Processing Technology incorporate the CISC, RISC,
VLIW, Vector and Symbolic Processors for its computation.
The Scalar and vector processors are for mathematical calculations.
Symbolic processors have been developed for AI applications
6. Design Space
Processor families can be mapped onto a coordinated space of clock rate versus cycles per
instruction (CPI).
Clock Rate = Clock speed, also known as clock rate or clock frequency, is a measure of speed of
computer’s central processing unit (CPU) in executing instructions. It is typically measured in
gigahertz (GHz). Higher clock speeds generally mean that a CPU can process more instructions per
second, and thus can perform better on tasks that require fast processing.
Cycles per instruction (CPI) or also known as clock cycles per instruction is one aspect of a
processor's performance: the average number of clock cycles per instruction for a program .
7. As implementation technology
evolves rapidly, the clock rates of
various processors have moved from
low to higher speeds toward the
right of the design space (i.e.
increase in clock rate).
Similarly processor manufacturers
have been trying to lower the CPI
rate(cycles taken to execute an
instruction) using innovative
hardware approaches.
8. Instruction Set
Architecture(ISA)
• An instruction is a set of codes that the computer processor can understand. The
code is usually in 1s and 0s, or machine language.
Example of some instruction sets −
ADD − Add two numbers together.
JUMP − Jump to designated RAM address.
LOAD − Load information from RAM to the CPU.
• Instruction set Architecture (ISA) is defined as the design of a computer from the
Programmer’s Perspective. This basically means that an ISA describes the design of a
Computer in terms of the basic operations it must support. The ISA is not concerned
with the implementation-specific details of a computer. It is only concerned with the
set or collection of basic operations the computer must support.
9. • The ISA acts as an interface between the hardware and the software.
• The ISA describes
(1) memory model,
(2) instruction format, types and modes, and
(3) operand registers, types, and data addressing.
Instruction types include arithmetic, logical, data transfer, and flow control.
Instruction modes include kernel and user instructions.
10. The implementation of the ISA in hardware as fetch-decode-execute cycle.
In the fetch step, operands are
retrieved from memory.
The decode step puts the
operands into a format that the
ALU can manipulate.
The execute cycle performs the
selected operation within the
ALU.
Control facilitates orderly
routing of data, including I/O to
the ALU's external environment
(e.g., peripheral devices such
as disk or keyboard).
Fetch-Decode-Execute Cycle
11. Objective of an ISA
Let us try to understand the Objectives of an ISA by taking the example of the MIPS
ISA(Million instructions per second (MIPS)) is an approximate measure of a computer's
raw processing power ).
MIPS is one of the most widely used ISAs due to its simplicity.
The ISA defines the types of instructions to be supported by the processor.
• Based on the type of operations they perform MIPS Instructions are classified into 3 types
1. Arithmetic/Logic Instructions: These Instructions perform various Arithmetic & Logical
operations on one or more operands.
2. Data Transfer Instructions: These instructions are responsible for the transfer of
instructions from memory to the processor registers and vice versa.
3. Branch and Jump Instructions: These instructions are responsible for breaking the
sequential flow of instructions and jumping to instructions at various other locations
12. • The ISA defines the maximum length of each type of
instruction. Since the MIPS is a 32 bit ISA, each
instruction must be accommodated within 32 bits.
• The ISA defines the Instruction Format of each type of
instruction.
The Instruction Format determines how the entire
instruction is encoded within 32 bits
There are 3 types of Instruction Formats in the MIPS
ISA:
1. R-Instruction Format
2. I-Instruction Format
3. J-Instruction Format
13. R-Instruction Format
The R instruction format has fields for three registers (typically, two sources and a destination), as
well a shift amount (5 bits) and a function (6 bits). It is used for arithmetic/bitwise instructions
which do not have an immediate operand.
Opcode = 000000 RS RT RD Shift Amount Function
6 bits 5 5 5 5 6
The Function field specifies the actual arithmetic function to be applied to the operands given by
the RS, RT (sources) and RD (destination) fields. For example, function 32 (100000b) is addition.
The Left/Right Shift instructions use the shift amount field to specify the amount to shift.
I-Instruction Format
The I instruction format contains fields for two registers (typically source and destination) and
for a 16-bit immediate value. The I format is used for arithmetic operations with an immediate
operand,
14. Opcode RS RD Immediate/Address
6 bits 5 bits 5 bits 16 bits
The Reg1 and Reg2 fields encode the source and destination registers (MIPS has 32 registers
= 25), while the Immediate field encodes any immediate value.
J-Instruction Format
The J format is used for the Jump instruction, which jumps to an absolute address. Because
instructions must be aligned to 32-bits, the low 3 bits of every valid address are always 0.
Thus, we can jump to any one of 232-6+3 addresses; the currently-executing address is used
to supply the missing 3 bits.
Opcode Address
6 bits 26 bits
15. Two Main Categories of Processors are :
1.CISC
2.RISC
Under both CISC and RISC categories,
products designed for multi-core chips,
embedded applications, or for low cost
and/or low power consumption, tend to
have lower clock speeds.
High performance processors must
necessarily be designed to operate at
high clock speeds.
The category of vector processors has
been marked VP;
vector processing features may be
associated with CISC or RISC main
processors.
16. CISC Scalar Processor
Scalar processors are a class of Computer Processors that process only one data item at a time.
Typical data items include Integers and floating point numbers .
A scalar processor is classified as a single instruction, single data (SISD) processor in Flynn's
taxonomy. The Intel i486 is an example of a scalar processor .
(Single instruction stream, single data stream (SISD)) ( Intel i486 )
17. • CISC stands for Complex Instruction Set Computer. It comprises a complex instruction set. It
incorporates a variable length instruction format.
• The CISC approach attempts to minimize the number of instructions per program but at the
the cost of an increase in the number of cycles per instruction.
• It emphasizes to build complex instructions directly in the hardware because the hardware
is always faster than software. However, CISC chips are relatively slower as compared to
RISC chips but use little instruction than RISC. Examples of CISC processors are VAX, AMD,
Intel x86 and the System/360.
• It has a large collection of complex instructions that range from simple to very complex and
specialized in the assembly language level, which takes a long time to execute the
instructions.
• Examples of CISC architectures include complex mainframe computers to simplistic
microcontrollers where memory load and store operations are not separated from
arithmetic instructions.
18. ( CISC Architecture )
• The CISC architecture helps reduce
program code by embedding
multiple operations on each
program instruction, which makes
the CISC processor more complex.
• The CISC architecture-based
computer is designed to decrease
memory costs because large
programs or instruction required
large memory space to store the
data, thus increasing the memory
requirement, and a large collection
of memory increases the memory
cost, which makes them more
expensive.
19. Characteristics of CISC
1. The length of the code is shorts, so it requires very little RAM.
2. CISC or complex instructions may take longer than a single clock cycle to execute the code.
3. Less instruction is needed to write an application.
4. It provides easier programming in assembly language.
5. Support for complex data structure and easy compilation of high-level languages.
6. It is composed of fewer registers and more addressing nodes, typically 5 to 20.
7. Instructions can be larger than a single word.
8. It emphasizes the building of instruction on hardware because it is faster to create than the
software.
20. RISC Scalar Processor
• RISC stands for Reduced Instruction Set Computer Processor, a microprocessor architecture
with a simple collection and highly customized set of instructions. It is built to minimize the
instruction execution time by optimizing and limiting the number of instructions.
• It means each instruction cycle requires only one clock cycle, and each cycle contains
three parameters: fetch, decode and execute.
• The RISC processor is also used to perform various complex instructions by combining
combining them into simpler ones. RISC chips require several transistors, making it
cheaper to design and reduce the execution time for instruction.
• Examples of RISC processors are SUN's SPARC, PowerPC (601), Microchip PIC
processors, RISC-V
21. ( RISC Architecture
)
Characteristics :
One cycle execution time: For executing each
instruction in a computer, the RISC processors
require one CPI (Clock per cycle). And each CPI
includes the fetch, decode and execute method
applied in computer instruction.
Pipelining technique: The pipelining technique is
used in the RISC processors to execute multiple
parts or stages of instructions to perform more
efficiently.
A large number of registers: RISC processors are
optimized with multiple registers that can be used
to store instruction and quickly respond to the
computer and minimize interaction with
computer memory.
It uses LOAD and STORE instruction to access the
the memory location.
22. RISC Instruction Sets Addressing
RISC instructions operate on processor registers only. The instructions that have arithmetic and logic
operation should have their operand either in the processor register or should be given directly in the
instruction.
Like in both the instructions below we have the operands in registers
Add R2, R3
Add R2, R3, R4
The operand can be mentioned directly in the instruction as below:
Add R2, 100
But initially, at the start of execution of the program, all the operands are in memory. So, to access
the memory operands, the RISC instruction set has Load and Store instruction.
The Load instruction loads the operand present in memory to the processor register. The load
instruction is of the form:
Load destination, Source
Example Load R2, A
The Store instruction above will store the content in register R2 into the A a memory location.
23. RISC Instruction Addressing Types
1. Immediate addressing mode: This addressing mode explicitly specifies the operand in the
instruction. Like
Add R4, R2, #200 add 200 to the content of R2 and store the result in R4
2. Register addressing mode: This addressing mode describes the registers holding the operands.
Add R3, R3, R4 add the content of register R4 to the content of register R3 and store in R3.
3. Absolute addressing mode: This addressing mode describes a name for a memory location in the
instruction. It is used to declare global variables in the program.
Integer A, B, SUM; This instruction will allocate memory to variable A, B, SUM.
4. Register Indirect addressing mode: This addressing mode describes the register which has the address
of the actual operand in the instruction.
Load R2, (R3); load the register R2 with the content, whose address is mentioned in register R3.
5. Index addressing mode: This addressing mode provides a register in the instruction, to which when
we add a constant, obtain the address of the actual operand.
Load R2, 4(R3) load the reg. R2 with the content present at the location obtained by adding 4 to the
content of reg. R3.
25. Superscalar Processor
A superscalar processor is a CPU that implements a form of parallelism called instruction-level
parallelism within a single processor. The concept of the superscalar issue was first
developed as early as 1970 (Tjaden and Flynn, 1970). It was later reformulated more
precisely in the 1980s (Torng, 1982, Acosta et al, 1986).
In contrast to a scalar processor, which can execute at most one single instruction per clock
cycle, a superscalar processor can execute more than one instruction during a clock cycle by
simultaneously dispatching multiple instructions to different execution units on the processor.
It therefore allows more throughput (the number of instructions that can be executed in a unit
of time) than would otherwise be possible at a given clock rate.
Superscalar design techniques involve parallel instruction decoding, parallel register renaming,
speculative execution, and out-of-order execution. Each execution unit is not a separate processor
(or a core if the processor is a multi-core processor), but an execution resource within a single CPU
such as an arithmetic logic unit.
26. In Flynn's taxonomy,
Single-core superscalar processor is classified as an SISD processor (single instruction
stream, single data stream),
Multi-core superscalar processor is classified as an MIMD processor (multiple instruction
streams, multiple data streams).
Remember Superscalar and pipelining execution are considered different performance
enhancement techniques. The former executes multiple instructions in parallel by using
multiple execution units, whereas the latter executes multiple instructions in the same
execution unit in parallel by dividing the execution unit into different phases.
Processor board of a CRAY T3e supercomputer
with four superscalar Alpha 21164 processors
28. Superscalar Advantages Superscalar
Disadvantages
The compiler can avoid many hazards
through judicious selection and ordering
of instructions.
In general, high performance is achieved
if the compiler is able to arrange program
instructions to take maximum advantage
of the available hardware units.
The compiler should strive to interleave
floating point and integer instructions.
This would enable the dispatch unit to
keep both the integer and floating point
units busy most of the time.
In a Superscalar Processor, the
detrimental effect on performance of
various hazards becomes even more
pronounced.
Due to this type of architecture, problem
in scheduling can occur.
29. VLIW Architecture
The limitations of the Superscalar processor are prominent as the difficulty of scheduling
instruction becomes complex. The intrinsic parallelism in the instruction stream, complexity, cost,
and the branch instruction issue get resolved by a higher instruction set architecture called the
Very Long Instruction Word (VLIW) or VLIW Machines.
VLIW uses Instruction Level Parallelism, i.e. it has programs to control the parallel execution of
the instructions.
In other architectures, the performance of the processor is improved by using either of the
following methods: pipelining (break the instruction into subparts),
superscalar processor (independently execute the instructions in different parts of the
processor),
VLIW Architecture deals with it by depending on the compiler. The programs decide the parallel
flow of the instructions and to resolve conflicts. This increases compiler complexity but
decreases hardware complexity by a lot.
30. ( Block Diagram of VLIW Architecture)
• The processors in this architecture have multiple
functional units, fetch from the Instruction
cache that have the Very Long Instruction Word.
• Multiple independent operations are grouped
together in a single VLIW Instruction. They are
initialized in the same clock cycle.
• Each operation is assigned an independent
functional unit.
• All the functional units share a common register
file.
• Instruction scheduling and parallel dispatch of
the word is done statically by the compiler.
• The compiler checks for dependencies before
scheduling parallel execution of the instructions
31. VLIW Advantages VLIW Disadvantages
Reduces hardware complexity Reduces
power consumption because of
reduction of hardware complexity.
Since compiler takes care of data
dependency check, decoding,
instruction issues, it becomes a lot
simpler.
Increases potential clock rate.
Complex compilers are required
which are hard to design and hence
Increased program code size.
Larger memory bandwidth and
register-file bandwidth.
Unscheduled events, for example a
cache miss could lead to a stall which
will stall the entire processor.
32. Vector Processors
• Vector processor is basically a central processing unit that has the ability to execute the
complete vector input in a single instruction. In other words, it is a complete unit of hardware
resources that executes a sequential (as to have successive addressing format of the memory)
33. Architecture • IPU (Instruction Processing Unit) fetches the
instruction from the memory.
• If Instruction scalar in nature, then the instruction
is transferred to the scalar register and then further
scalar processing is performed. Similarly If its vector
in nature then it is fed to vector instruction register.
• This vector instruction controller first decodes the
vector instruction then accordingly determines the
address of the vector operand present in the
memory.
• Then it gives a signal to the vector access controller
about the demand of the respective operand. This
vector access controller then fetches the desired
operand from the memory. Once the operand is
fetched then it is provided to the instruction
register so that it can be processed at the vector
processor
(Block Diagram of Vector Processors Computing)
35. 1. Register to Register Architecture
• In register to register architecture, operands and results are retrieved indirectly from
the main memory through the use of large number of vector registers or scalar
registers.
• Example - The processors like Cray-1 and the Fujitsu VP-200
The main points about register to register architecture are:
1. Register to register architecture has limited size.
2. Speed is very high as compared to the memory to memory architecture.
3. The hardware cost is high in this architecture.
2. Memory to Memory Architecture
• Here the operands or the results are directly fetched from the memory despite using
registers.
• This architecture enables the fetching of data of size 512 bits from memory to pipeline.
However, due to high memory access time, the pipelines of the vector computer
requires higher startup time, as higher time is required to initiate the vector
instruction. For Example Cyber 205, CDC etc.
36. Advantages
• Vector processor uses vector instructions by which code density of the instructions can
be improved.
• The sequential arrangement of data helps to handle the data by the hardware in a
better way.
• It offers a reduction in instruction bandwidth.
37. Symbolic Processors
Symbolic processors are designed for expert system, machine intelligence, knowledge
based system, pattern-recognition, text retrieval, etc. Symbolic processors are also called LISP
processors or PROLOG processors.
Attributes Characteristics
Common operations Search, sort, pattern matching, unification
Memory requirement Large memory with intensive access pattern
Properties of algorithm Parallel and distributed, irregular in pattern
Input / Output requirements Graphical/audio/keyboard. User guided programs, machine interface
Architecture Features Parallel update, dynamic load balancing and memory allocation
Knowledge representation Lists, relational databases, Semantic nets, Frames, Production
38. • For example, a Lisp program can be viewed as a set of functions in which data are passed from
function to function. The concurrent execution of these functions forms the basis for
parallelism.
• The applicative and recursive nature of Lisp requires an environment that efficiently supports
stack computations and function calling. The use of linked lists as the basic data structure
makes it possible to implement an automatic garbage collection mechanism.
• Instead of dealing with numerical data, symbolic processing deals with logic programs,
symbolic lists, objects, scripts, blackboards, production systems, semantic networks, frames,
artificial neural networks. Primitive operations for artificial intelligence include search,
logic inference, pattern matching, unification.
• Example: The Symbolics 3600 Lisp processor
39. Architecture of Symbolic 3600 Lisp
processor
• This was a stack-oriented machine. The
division of the overall architecture into
layers allowed the use of a simplified
instruction-set design, while
implementation was carried out with a
stack-oriented machine.
• Since most operands were fetched from
the stack, the stack buffer and scratch-pad
memories were implemented as fast
caches to main memory.
• The Symbolic 3600 executed most Lisp
instructions in one machine cycle. Integer
instructions fetched operands form the
stack buffer and the duplicate top of the
stack in the scratch-pad memory.