SlideShare ist ein Scribd-Unternehmen logo
1 von 43
William Stallings
Computer Organization
and Architecture
7th
Edition
Chapter 14
Instruction Level Parallelism
and Superscalar Processors
What is Superscalar?
• Common instructions (arithmetic,
load/store, conditional branch) can be
initiated and executed independently
• Equally applicable to RISC & CISC
• In practice usually RISC
Why Superscalar?
• Most operations are on scalar quantities
(see RISC notes)
• Improve these operations to get an
overall improvement
General Superscalar Organization
Superpipelined
• Many pipeline stages need less than half a
clock cycle
• Double internal clock speed gets two tasks
per external clock cycle
• Superscalar allows parallel fetch execute
Superscalar v
Superpipeline
Limitations
• Instruction level parallelism
• Compiler based optimisation
• Hardware techniques
• Limited by
—True data dependency
—Procedural dependency
—Resource conflicts
—Output dependency
—Antidependency
True Data Dependency
• ADD r1, r2 (r1 := r1+r2;)
• MOVE r3,r1 (r3 := r1;)
• Can fetch and decode second instruction
in parallel with first
• Can NOT execute second instruction until
first is finished
Procedural Dependency
• Can not execute instructions after a
branch in parallel with instructions before
a branch
• Also, if instruction length is not fixed,
instructions have to be decoded to find
out how many fetches are needed
• This prevents simultaneous fetches
Resource Conflict
• Two or more instructions requiring access
to the same resource at the same time
—e.g. two arithmetic instructions
• Can duplicate resources
—e.g. have two arithmetic units
Effect of
Dependencies
Design Issues
• Instruction level parallelism
—Instructions in a sequence are independent
—Execution can be overlapped
—Governed by data and procedural dependency
• Machine Parallelism
—Ability to take advantage of instruction level
parallelism
—Governed by number of parallel pipelines
Instruction Issue Policy
• Order in which instructions are fetched
• Order in which instructions are executed
• Order in which instructions change
registers and memory
In-Order Issue
In-Order Completion
• Issue instructions in the order they occur
• Not very efficient
• May fetch >1 instruction
• Instructions must stall if necessary
In-Order Issue In-Order Completion
(Diagram)
In-Order Issue
Out-of-Order Completion
• Output dependency
—R3:= R3 + R5; (I1)
—R4:= R3 + 1; (I2)
—R3:= R5 + 1; (I3)
—I2 depends on result of I1 - data dependency
—If I3 completes before I1, the result from I1
will be wrong - output (read-write)
dependency
In-Order Issue Out-of-Order Completion
(Diagram)
Out-of-Order Issue
Out-of-Order Completion
• Decouple decode pipeline from execution
pipeline
• Can continue to fetch and decode until
this pipeline is full
• When a functional unit becomes available
an instruction can be executed
• Since instructions have been decoded,
processor can look ahead
Out-of-Order Issue Out-of-Order
Completion (Diagram)
Antidependency
• Write-write dependency
—R3:=R3 + R5; (I1)
—R4:=R3 + 1; (I2)
—R3:=R5 + 1; (I3)
—R7:=R3 + R4; (I4)
—I3 can not complete before I2 starts as I2
needs a value in R3 and I3 changes R3
Register Renaming
• Output and antidependencies occur
because register contents may not reflect
the correct ordering from the program
• May result in a pipeline stall
• Registers allocated dynamically
—i.e. registers are not specifically named
Register Renaming example
• R3b:=R3a + R5a (I1)
• R4b:=R3b + 1 (I2)
• R3c:=R5a + 1 (I3)
• R7b:=R3c + R4b (I4)
• Without subscript refers to logical register
in instruction
• With subscript is hardware register
allocated
• Note R3a R3b R3c
Machine Parallelism
• Duplication of Resources
• Out of order issue
• Renaming
• Not worth duplication functions without
register renaming
• Need instruction window large enough
(more than 8)
Speedups of Machine Organizations
Without Procedural Dependencies
Branch Prediction
• 80486 fetches both next sequential
instruction after branch and branch target
instruction
• Gives two cycle delay if branch taken
RISC - Delayed Branch
• Calculate result of branch before unusable
instructions pre-fetched
• Always execute single instruction
immediately following branch
• Keeps pipeline full while fetching new
instruction stream
• Not as good for superscalar
—Multiple instructions need to execute in delay
slot
—Instruction dependence problems
• Revert to branch prediction
Superscalar Execution
Superscalar Implementation
• Simultaneously fetch multiple instructions
• Logic to determine true dependencies
involving register values
• Mechanisms to communicate these values
• Mechanisms to initiate multiple
instructions in parallel
• Resources for parallel execution of
multiple instructions
• Mechanisms for committing process state
in correct order
Pentium 4
• 80486 - CISC
• Pentium – some superscalar components
—Two separate integer execution units
• Pentium Pro – Full blown superscalar
• Subsequent models refine & enhance
superscalar design
Pentium 4 Block Diagram
Pentium 4 Operation
• Fetch instructions form memory in order of static
program
• Translate instruction into one or more fixed
length RISC instructions (micro-operations)
• Execute micro-ops on superscalar pipeline
—micro-ops may be executed out of order
• Commit results of micro-ops to register set in
original program flow order
• Outer CISC shell with inner RISC core
• Inner RISC core pipeline at least 20 stages
—Some micro-ops require multiple execution stages
– Longer pipeline
—c.f. five stage pipeline on x86 up to Pentium
Pentium 4 Pipeline
Pentium 4 Pipeline Operation (1)
Pentium 4 Pipeline Operation (2)
Pentium 4 Pipeline Operation (3)
Pentium 4 Pipeline Operation (4)
Pentium 4 Pipeline Operation (5)
Pentium 4 Pipeline Operation (6)
PowerPC
• Direct descendent of IBM 801, RT PC and
RS/6000
• All are RISC
• RS/6000 first superscalar
• PowerPC 601 superscalar design similar to
RS/6000
• Later versions extend superscalar concept
PowerPC 601 General View
PowerPC 601
Pipeline
Structure
PowerPC 601 Pipeline
Required Reading
• Stallings chapter 14
• Manufacturers web sites
• IMPACT web site
—research on predicated execution

Weitere ähnliche Inhalte

Was ist angesagt?

Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Hirochika Asai
 
Modes of 80386
Modes of 80386Modes of 80386
Modes of 80386
aviban
 
Instruction cycle
Instruction cycleInstruction cycle
Instruction cycle
Kumar
 

Was ist angesagt? (20)

Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
 
11 instruction sets addressing modes
11  instruction sets addressing modes 11  instruction sets addressing modes
11 instruction sets addressing modes
 
Cache memoy designed by Mohd Tariq
Cache memoy designed by Mohd TariqCache memoy designed by Mohd Tariq
Cache memoy designed by Mohd Tariq
 
Chapter 2 - Computer Evolution and Performance
Chapter 2 - Computer Evolution and PerformanceChapter 2 - Computer Evolution and Performance
Chapter 2 - Computer Evolution and Performance
 
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
 
Virtual Memory
Virtual MemoryVirtual Memory
Virtual Memory
 
15 ia64
15 ia6415 ia64
15 ia64
 
ADDRESSING MODE
ADDRESSING MODEADDRESSING MODE
ADDRESSING MODE
 
Computer architecture page replacement algorithms
Computer architecture page replacement algorithmsComputer architecture page replacement algorithms
Computer architecture page replacement algorithms
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Arm cm3 architecture_and_programmer_model
Arm cm3 architecture_and_programmer_modelArm cm3 architecture_and_programmer_model
Arm cm3 architecture_and_programmer_model
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 
Register organization, stack
Register organization, stackRegister organization, stack
Register organization, stack
 
Instruction cycle with interrupts
Instruction cycle with interruptsInstruction cycle with interrupts
Instruction cycle with interrupts
 
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
 
Instruction Set Architecture
Instruction  Set ArchitectureInstruction  Set Architecture
Instruction Set Architecture
 
Virtual Memory
Virtual MemoryVirtual Memory
Virtual Memory
 
Modes of 80386
Modes of 80386Modes of 80386
Modes of 80386
 
Pipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture ppt
 
Instruction cycle
Instruction cycleInstruction cycle
Instruction cycle
 

Ähnlich wie 14 superscalar

12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
Sher Shah Merkhel
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
Muhammad Ishaq
 
IT209 Cpu Structure Report
IT209 Cpu Structure ReportIT209 Cpu Structure Report
IT209 Cpu Structure Report
Bis Aquino
 

Ähnlich wie 14 superscalar (20)

14 superscalar
14 superscalar14 superscalar
14 superscalar
 
13_Superscalar.ppt
13_Superscalar.ppt13_Superscalar.ppt
13_Superscalar.ppt
 
13 superscalar
13 superscalar13 superscalar
13 superscalar
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
2. ILP Processors.ppt
2. ILP Processors.ppt2. ILP Processors.ppt
2. ILP Processors.ppt
 
The Challenges facing Libraries and Imperative Languages from Massively Paral...
The Challenges facing Libraries and Imperative Languages from Massively Paral...The Challenges facing Libraries and Imperative Languages from Massively Paral...
The Challenges facing Libraries and Imperative Languages from Massively Paral...
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Pipelining
PipeliningPipelining
Pipelining
 
Superscalar processor
Superscalar processorSuperscalar processor
Superscalar processor
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
3 ilp
3 ilp3 ilp
3 ilp
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
 
13 risc
13 risc13 risc
13 risc
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
 
IT209 Cpu Structure Report
IT209 Cpu Structure ReportIT209 Cpu Structure Report
IT209 Cpu Structure Report
 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptx
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
Pipelining
PipeliningPipelining
Pipelining
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 

Mehr von dilip kumar (16)

FLAT Notes
FLAT NotesFLAT Notes
FLAT Notes
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processing
 
17 micro programmed control
17 micro programmed control17 micro programmed control
17 micro programmed control
 
16 control unit
16 control unit16 control unit
16 control unit
 
13 risc
13 risc13 risc
13 risc
 
11 instruction sets addressing modes
11  instruction sets addressing modes 11  instruction sets addressing modes
11 instruction sets addressing modes
 
10 instruction sets characteristics
10 instruction sets characteristics10 instruction sets characteristics
10 instruction sets characteristics
 
09 arithmetic
09 arithmetic09 arithmetic
09 arithmetic
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
07 input output
07 input output07 input output
07 input output
 
06 external memory
06 external memory06 external memory
06 external memory
 
05 internal memory
05 internal memory05 internal memory
05 internal memory
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
03 buses
03 buses03 buses
03 buses
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
 
01 introduction
01 introduction01 introduction
01 introduction
 

Kürzlich hochgeladen

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 

Kürzlich hochgeladen (20)

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 

14 superscalar

  • 1. William Stallings Computer Organization and Architecture 7th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors
  • 2. What is Superscalar? • Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independently • Equally applicable to RISC & CISC • In practice usually RISC
  • 3. Why Superscalar? • Most operations are on scalar quantities (see RISC notes) • Improve these operations to get an overall improvement
  • 5. Superpipelined • Many pipeline stages need less than half a clock cycle • Double internal clock speed gets two tasks per external clock cycle • Superscalar allows parallel fetch execute
  • 7. Limitations • Instruction level parallelism • Compiler based optimisation • Hardware techniques • Limited by —True data dependency —Procedural dependency —Resource conflicts —Output dependency —Antidependency
  • 8. True Data Dependency • ADD r1, r2 (r1 := r1+r2;) • MOVE r3,r1 (r3 := r1;) • Can fetch and decode second instruction in parallel with first • Can NOT execute second instruction until first is finished
  • 9. Procedural Dependency • Can not execute instructions after a branch in parallel with instructions before a branch • Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed • This prevents simultaneous fetches
  • 10. Resource Conflict • Two or more instructions requiring access to the same resource at the same time —e.g. two arithmetic instructions • Can duplicate resources —e.g. have two arithmetic units
  • 12. Design Issues • Instruction level parallelism —Instructions in a sequence are independent —Execution can be overlapped —Governed by data and procedural dependency • Machine Parallelism —Ability to take advantage of instruction level parallelism —Governed by number of parallel pipelines
  • 13. Instruction Issue Policy • Order in which instructions are fetched • Order in which instructions are executed • Order in which instructions change registers and memory
  • 14. In-Order Issue In-Order Completion • Issue instructions in the order they occur • Not very efficient • May fetch >1 instruction • Instructions must stall if necessary
  • 15. In-Order Issue In-Order Completion (Diagram)
  • 16. In-Order Issue Out-of-Order Completion • Output dependency —R3:= R3 + R5; (I1) —R4:= R3 + 1; (I2) —R3:= R5 + 1; (I3) —I2 depends on result of I1 - data dependency —If I3 completes before I1, the result from I1 will be wrong - output (read-write) dependency
  • 17. In-Order Issue Out-of-Order Completion (Diagram)
  • 18. Out-of-Order Issue Out-of-Order Completion • Decouple decode pipeline from execution pipeline • Can continue to fetch and decode until this pipeline is full • When a functional unit becomes available an instruction can be executed • Since instructions have been decoded, processor can look ahead
  • 20. Antidependency • Write-write dependency —R3:=R3 + R5; (I1) —R4:=R3 + 1; (I2) —R3:=R5 + 1; (I3) —R7:=R3 + R4; (I4) —I3 can not complete before I2 starts as I2 needs a value in R3 and I3 changes R3
  • 21. Register Renaming • Output and antidependencies occur because register contents may not reflect the correct ordering from the program • May result in a pipeline stall • Registers allocated dynamically —i.e. registers are not specifically named
  • 22. Register Renaming example • R3b:=R3a + R5a (I1) • R4b:=R3b + 1 (I2) • R3c:=R5a + 1 (I3) • R7b:=R3c + R4b (I4) • Without subscript refers to logical register in instruction • With subscript is hardware register allocated • Note R3a R3b R3c
  • 23. Machine Parallelism • Duplication of Resources • Out of order issue • Renaming • Not worth duplication functions without register renaming • Need instruction window large enough (more than 8)
  • 24. Speedups of Machine Organizations Without Procedural Dependencies
  • 25. Branch Prediction • 80486 fetches both next sequential instruction after branch and branch target instruction • Gives two cycle delay if branch taken
  • 26. RISC - Delayed Branch • Calculate result of branch before unusable instructions pre-fetched • Always execute single instruction immediately following branch • Keeps pipeline full while fetching new instruction stream • Not as good for superscalar —Multiple instructions need to execute in delay slot —Instruction dependence problems • Revert to branch prediction
  • 28. Superscalar Implementation • Simultaneously fetch multiple instructions • Logic to determine true dependencies involving register values • Mechanisms to communicate these values • Mechanisms to initiate multiple instructions in parallel • Resources for parallel execution of multiple instructions • Mechanisms for committing process state in correct order
  • 29. Pentium 4 • 80486 - CISC • Pentium – some superscalar components —Two separate integer execution units • Pentium Pro – Full blown superscalar • Subsequent models refine & enhance superscalar design
  • 30. Pentium 4 Block Diagram
  • 31. Pentium 4 Operation • Fetch instructions form memory in order of static program • Translate instruction into one or more fixed length RISC instructions (micro-operations) • Execute micro-ops on superscalar pipeline —micro-ops may be executed out of order • Commit results of micro-ops to register set in original program flow order • Outer CISC shell with inner RISC core • Inner RISC core pipeline at least 20 stages —Some micro-ops require multiple execution stages – Longer pipeline —c.f. five stage pipeline on x86 up to Pentium
  • 33. Pentium 4 Pipeline Operation (1)
  • 34. Pentium 4 Pipeline Operation (2)
  • 35. Pentium 4 Pipeline Operation (3)
  • 36. Pentium 4 Pipeline Operation (4)
  • 37. Pentium 4 Pipeline Operation (5)
  • 38. Pentium 4 Pipeline Operation (6)
  • 39. PowerPC • Direct descendent of IBM 801, RT PC and RS/6000 • All are RISC • RS/6000 first superscalar • PowerPC 601 superscalar design similar to RS/6000 • Later versions extend superscalar concept
  • 43. Required Reading • Stallings chapter 14 • Manufacturers web sites • IMPACT web site —research on predicated execution