SlideShare a Scribd company logo
1 of 20
Spring 2003 CSE P548 1
Reorder Buffer Implementation (Pentium Pro)
Hardware data structures
• retirement register file (RRF)
(~ IBM 360/91 physical registers)
• physical register file that is the same size as the architectural
registers
• one-to-one relationship
• reorder buffer (ROB)
(~ R10K active list)
• provides in-order instruction commit
• circular queue with head & tail pointers
• holds 40 “executing” instructions in program order
(dispatched but not yet committed)
• field for either integer or FP result after it has been computed
• a result value is put in its register after its producing instruction
reaches the head of the buffer & is removed (i.e., commits)
Spring 2003 CSE P548 2
Reorder Buffer Implementation (Pentium Pro)
• register alias table (RAT)
(~ R10K map table)
• provides register renaming
• important because very few GPRs in the x86 architecture
• indicates whether a source operand of a new instruction points
to the reorder buffer or the physical register file
• do an associative search of ROB destination registers for the
new source operands
• if found, consumer instruction points to the producer
instruction in the ROB
• the data hazard check before instruction dispatch
• reservation station
(~ IBM 360/91 reservation stations, R10000 instruction buffer)
• holds instructions waiting to execute
• provides forwarding to reduce RAW hazards
• result values go back to the reservation station (as well as
ROB) so dependent instructions have source operand
values
• provides out-of-order execution
Spring 2003 CSE P548 3
Pentium Pro Execution
In-order issue
• decode instructions
• enter uops into reorder buffer for in-order completion
• rename registers via register alias table
• detect structural hazards for reservation station
Out-of-order execution
• one reservation station, multiple entries
• check source operands for RAW hazards
• check structural hazards for separate integer, FP, memory units
• result goes to reservation station & reorder buffer
In-order completion
• this & previous uops have completed
• write “G”PR registers
• rollback on interrupts
Spring 2003 CSE P548 4
Pentium Pro
fetch & decode pipeline
BTB access (1 stage)
instruction fetch & align for decoding (2.5 stages)
decode & uop generation (2.5 stages)
register renaming & instruction issue to reservation
stations (3 stages minimum)
integer pipeline
execute, resolve branch
write registers & commit
load pipeline
address calculation & to memory reorder buffer
integrated L1 & L2 data cache access
pipelined FP add & multiply
Spring 2003 CSE P548 5
Pentium Pro
Spring 2003 CSE P548 6
Pentium Pro
Spring 2003 CSE P548 7
Spring 2003 CSE P548 8
Pentium Pro
Some bandwidth constraints: maximum for one cycle
• 16 bytes fetched
• 3 instructions decoded
• 6 µops to the reorder buffer
• 5 µops dispatched to reservation station & functional units
• 1 load & 1 store access to the L1 data cache
• 1 cache result returned
• 3 µops committed
if
• good instruction mix
• right instruction order
• operands available
• functional units available
• load & store to different cache banks
• all previous instructions already committed
Spring 2003 CSE P548 9
Making Tomasulo Precise
Reorder buffer
• contains:
• instruction type (register operation, store, branch)
• destination field (register, memory, none)
• result values
• ready bit to indicate instruction has executed
• replaces load & store buffers
• replaces renaming function of the reservation stations
• operand fields in reservation stations point to reorder buffer
location
• results tagged with reorder buffer entry number
• provides hardware speculation (speculative execution, not just fetch
& issue) plus precise interrupts
Spring 2003 CSE P548 10
Precise Implementation for Tomasulo
Spring 2003 CSE P548 11
Tomasulo’s Algorithm: Execution Steps
issue & read (assume the instruction has been fetched)
• structural hazard detection for entry in reservation station &
reorder buffer
• issue if no hazard
• stall if hazard
• read registers for source operands
• can come from either registers or reorder buffer
• RAT indicates which
• allocate entry in reorder buffer
execute
• RAW hazard detection for source operands
• snoop on common data bus for missing operands
• reservation station tag is entry number in reorder
buffer
• dispatch to functional unit when obtain both operand values
Spring 2003 CSE P548 12
Tomasulo’s Algorithm: Execution Steps
write result
• broadcast result & reorder buffer entry (tag) on the common
data bus to reservation stations & reorder buffer
commit
• retire the instruction at the head of the reorder buffer
• update register with result in reorder buffer or do a store
• remove instruction from reorder buffer
• if a mispredicted branch, restart with right-path
instructions
Spring 2003 CSE P548 13
Example in the Book 1
Cycle after first load has committed
ROB 6
Spring 2003 CSE P548 14
Example in the Book 2
Cycle after second load
committed
ROB 6
Spring 2003 CSE P548 15
Example in the Book 3
Cycle after subd wrote its result in the reorder buffer but still can’t
commit
yes
ROB 4ROB 6
Spring 2003 CSE P548 16
Example in the Book 4
Cycle after addd wrote its result in the reorder buffer but can’t
commit
ROB 6 ROB 4
Spring 2003 CSE P548 17
Example in the Book 5
Cycle after multd
committed
ROB 4ROB 6 ROB 5
Spring 2003 CSE P548 18
Example in the Book 6
Cycle after subd committed; Next: complete divd, commit divd, commit
addd
ROB 6 ROB 5
Spring 2003 CSE P548 19
Pool of Physical Regisers vs. Reorder Buffer
Think about the advantages and disadvantages of these implementations
• clean-up all done at instruction commit (physical register pool)
• record that value no longer speculative in register busy table
• unmap previous mapping for the architectural register
• instruction issue simpler (physical register pool)
• only look in one place for the source operands (the physical
register file)
• book claims that deallocating register is more complicated with a
physical register pool
• have to search for outstanding uses in the active list
• not done in practice: wait until the instruction that redefines the
architectural register commits
Spring 2003 CSE P548 20
Limits
Limits on out-of-order execution
• amount of ILP in the code
• scheduling window size
• need to do associative searches & its effect on cycle time
• number & types of functional units

More Related Content

What's hot

JS introduction
JS introductionJS introduction
JS introductionYi Tseng
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memoryNico Ludwig
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISAHsien-Hsin Sean Lee, Ph.D.
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWHsien-Hsin Sean Lee, Ph.D.
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Hsien-Hsin Sean Lee, Ph.D.
 
Assembly Language Lecture 3
Assembly Language Lecture 3Assembly Language Lecture 3
Assembly Language Lecture 3Motaz Saad
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1Motaz Saad
 
An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingParis Carbone
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
Code GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleCode GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleMarina Kolpakova
 
Pipelining and co processor.
Pipelining and co processor.Pipelining and co processor.
Pipelining and co processor.Piyush Rochwani
 
Unit 2 assembly language programming
Unit 2   assembly language programmingUnit 2   assembly language programming
Unit 2 assembly language programmingKartik Sharma
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisJonas Traub
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 

What's hot (20)

Run time
Run timeRun time
Run time
 
JS introduction
JS introductionJS introduction
JS introduction
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
 
Run time administration
Run time administrationRun time administration
Run time administration
 
Protocol Independence
Protocol IndependenceProtocol Independence
Protocol Independence
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
ARM inst set part 2
ARM inst set part 2ARM inst set part 2
ARM inst set part 2
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
 
Assembly Language Lecture 3
Assembly Language Lecture 3Assembly Language Lecture 3
Assembly Language Lecture 3
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1
 
An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data Streaming
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
 
Code GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleCode GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principle
 
Pipelining and co processor.
Pipelining and co processor.Pipelining and co processor.
Pipelining and co processor.
 
Unit 2 assembly language programming
Unit 2   assembly language programmingUnit 2   assembly language programming
Unit 2 assembly language programming
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
 
FIFOPt
FIFOPtFIFOPt
FIFOPt
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 

Viewers also liked

Register transfer language
Register  transfer languageRegister  transfer language
Register transfer languagehamza munir
 
Extreme programming (xp) | David Tzemach
Extreme programming (xp) | David TzemachExtreme programming (xp) | David Tzemach
Extreme programming (xp) | David TzemachDavid Tzemach
 
Register transfer language
Register transfer languageRegister transfer language
Register transfer languageSanjeev Patel
 
extreme Programming
extreme Programmingextreme Programming
extreme ProgrammingBilal Shah
 
Agile Engineering for Managers Workshop
Agile Engineering for Managers WorkshopAgile Engineering for Managers Workshop
Agile Engineering for Managers WorkshopPaul Boos
 

Viewers also liked (8)

XP (Extreme programming) SE1
XP (Extreme programming)  SE1XP (Extreme programming)  SE1
XP (Extreme programming) SE1
 
Register transfer language
Register  transfer languageRegister  transfer language
Register transfer language
 
Ch12 system administration
Ch12 system administration Ch12 system administration
Ch12 system administration
 
Extreme programming (xp) | David Tzemach
Extreme programming (xp) | David TzemachExtreme programming (xp) | David Tzemach
Extreme programming (xp) | David Tzemach
 
Extreme programming (xp)
Extreme programming (xp)Extreme programming (xp)
Extreme programming (xp)
 
Register transfer language
Register transfer languageRegister transfer language
Register transfer language
 
extreme Programming
extreme Programmingextreme Programming
extreme Programming
 
Agile Engineering for Managers Workshop
Agile Engineering for Managers WorkshopAgile Engineering for Managers Workshop
Agile Engineering for Managers Workshop
 

Similar to Reorder buf

12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and functionSher Shah Merkhel
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
RuntimeenvironmentAnusuya123
 
Buffer overflow attacks
Buffer overflow attacksBuffer overflow attacks
Buffer overflow attacksJapneet Singh
 
Embedded computing platform design
Embedded computing platform designEmbedded computing platform design
Embedded computing platform designRAMPRAKASHT1
 
Replication
ReplicationReplication
ReplicationPerforce
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorDileep Bhandarkar
 
127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on LinuxSam Bowne
 
80386_ Bus Cycles & System Architecture.pdf
80386_ Bus Cycles & System Architecture.pdf80386_ Bus Cycles & System Architecture.pdf
80386_ Bus Cycles & System Architecture.pdfrupalidalvi10
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technologyAmirali Sharifian
 

Similar to Reorder buf (20)

test
testtest
test
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
 
15 ia64
15 ia6415 ia64
15 ia64
 
Buffer overflow attacks
Buffer overflow attacksBuffer overflow attacks
Buffer overflow attacks
 
Embedded computing platform design
Embedded computing platform designEmbedded computing platform design
Embedded computing platform design
 
ch2.pptx
ch2.pptxch2.pptx
ch2.pptx
 
Replication
ReplicationReplication
Replication
 
class-Stacks.pptx
class-Stacks.pptxclass-Stacks.pptx
class-Stacks.pptx
 
Intel IA 64
Intel IA 64Intel IA 64
Intel IA 64
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
MICROPROCESSOR
MICROPROCESSORMICROPROCESSOR
MICROPROCESSOR
 
Lec05
Lec05Lec05
Lec05
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro Processor
 
127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux
 
Lecture 03 basics of pic
Lecture 03 basics of picLecture 03 basics of pic
Lecture 03 basics of pic
 
13 superscalar
13 superscalar13 superscalar
13 superscalar
 
80386_ Bus Cycles & System Architecture.pdf
80386_ Bus Cycles & System Architecture.pdf80386_ Bus Cycles & System Architecture.pdf
80386_ Bus Cycles & System Architecture.pdf
 
13_Superscalar.ppt
13_Superscalar.ppt13_Superscalar.ppt
13_Superscalar.ppt
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Reorder buf

  • 1. Spring 2003 CSE P548 1 Reorder Buffer Implementation (Pentium Pro) Hardware data structures • retirement register file (RRF) (~ IBM 360/91 physical registers) • physical register file that is the same size as the architectural registers • one-to-one relationship • reorder buffer (ROB) (~ R10K active list) • provides in-order instruction commit • circular queue with head & tail pointers • holds 40 “executing” instructions in program order (dispatched but not yet committed) • field for either integer or FP result after it has been computed • a result value is put in its register after its producing instruction reaches the head of the buffer & is removed (i.e., commits)
  • 2. Spring 2003 CSE P548 2 Reorder Buffer Implementation (Pentium Pro) • register alias table (RAT) (~ R10K map table) • provides register renaming • important because very few GPRs in the x86 architecture • indicates whether a source operand of a new instruction points to the reorder buffer or the physical register file • do an associative search of ROB destination registers for the new source operands • if found, consumer instruction points to the producer instruction in the ROB • the data hazard check before instruction dispatch • reservation station (~ IBM 360/91 reservation stations, R10000 instruction buffer) • holds instructions waiting to execute • provides forwarding to reduce RAW hazards • result values go back to the reservation station (as well as ROB) so dependent instructions have source operand values • provides out-of-order execution
  • 3. Spring 2003 CSE P548 3 Pentium Pro Execution In-order issue • decode instructions • enter uops into reorder buffer for in-order completion • rename registers via register alias table • detect structural hazards for reservation station Out-of-order execution • one reservation station, multiple entries • check source operands for RAW hazards • check structural hazards for separate integer, FP, memory units • result goes to reservation station & reorder buffer In-order completion • this & previous uops have completed • write “G”PR registers • rollback on interrupts
  • 4. Spring 2003 CSE P548 4 Pentium Pro fetch & decode pipeline BTB access (1 stage) instruction fetch & align for decoding (2.5 stages) decode & uop generation (2.5 stages) register renaming & instruction issue to reservation stations (3 stages minimum) integer pipeline execute, resolve branch write registers & commit load pipeline address calculation & to memory reorder buffer integrated L1 & L2 data cache access pipelined FP add & multiply
  • 5. Spring 2003 CSE P548 5 Pentium Pro
  • 6. Spring 2003 CSE P548 6 Pentium Pro
  • 8. Spring 2003 CSE P548 8 Pentium Pro Some bandwidth constraints: maximum for one cycle • 16 bytes fetched • 3 instructions decoded • 6 µops to the reorder buffer • 5 µops dispatched to reservation station & functional units • 1 load & 1 store access to the L1 data cache • 1 cache result returned • 3 µops committed if • good instruction mix • right instruction order • operands available • functional units available • load & store to different cache banks • all previous instructions already committed
  • 9. Spring 2003 CSE P548 9 Making Tomasulo Precise Reorder buffer • contains: • instruction type (register operation, store, branch) • destination field (register, memory, none) • result values • ready bit to indicate instruction has executed • replaces load & store buffers • replaces renaming function of the reservation stations • operand fields in reservation stations point to reorder buffer location • results tagged with reorder buffer entry number • provides hardware speculation (speculative execution, not just fetch & issue) plus precise interrupts
  • 10. Spring 2003 CSE P548 10 Precise Implementation for Tomasulo
  • 11. Spring 2003 CSE P548 11 Tomasulo’s Algorithm: Execution Steps issue & read (assume the instruction has been fetched) • structural hazard detection for entry in reservation station & reorder buffer • issue if no hazard • stall if hazard • read registers for source operands • can come from either registers or reorder buffer • RAT indicates which • allocate entry in reorder buffer execute • RAW hazard detection for source operands • snoop on common data bus for missing operands • reservation station tag is entry number in reorder buffer • dispatch to functional unit when obtain both operand values
  • 12. Spring 2003 CSE P548 12 Tomasulo’s Algorithm: Execution Steps write result • broadcast result & reorder buffer entry (tag) on the common data bus to reservation stations & reorder buffer commit • retire the instruction at the head of the reorder buffer • update register with result in reorder buffer or do a store • remove instruction from reorder buffer • if a mispredicted branch, restart with right-path instructions
  • 13. Spring 2003 CSE P548 13 Example in the Book 1 Cycle after first load has committed ROB 6
  • 14. Spring 2003 CSE P548 14 Example in the Book 2 Cycle after second load committed ROB 6
  • 15. Spring 2003 CSE P548 15 Example in the Book 3 Cycle after subd wrote its result in the reorder buffer but still can’t commit yes ROB 4ROB 6
  • 16. Spring 2003 CSE P548 16 Example in the Book 4 Cycle after addd wrote its result in the reorder buffer but can’t commit ROB 6 ROB 4
  • 17. Spring 2003 CSE P548 17 Example in the Book 5 Cycle after multd committed ROB 4ROB 6 ROB 5
  • 18. Spring 2003 CSE P548 18 Example in the Book 6 Cycle after subd committed; Next: complete divd, commit divd, commit addd ROB 6 ROB 5
  • 19. Spring 2003 CSE P548 19 Pool of Physical Regisers vs. Reorder Buffer Think about the advantages and disadvantages of these implementations • clean-up all done at instruction commit (physical register pool) • record that value no longer speculative in register busy table • unmap previous mapping for the architectural register • instruction issue simpler (physical register pool) • only look in one place for the source operands (the physical register file) • book claims that deallocating register is more complicated with a physical register pool • have to search for outstanding uses in the active list • not done in practice: wait until the instruction that redefines the architectural register commits
  • 20. Spring 2003 CSE P548 20 Limits Limits on out-of-order execution • amount of ILP in the code • scheduling window size • need to do associative searches & its effect on cycle time • number & types of functional units