SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
Understanding the Tomasulo Algorithm
Yichao Cheng
Jul 23, 2013
Background
 IBM System/360 Model 91
 FPU’s add/mul/div takes 2/3/13 cycles
 Can performance be improved through utilizing
multiple execution units?
Adder
Mul
div
Major Contributions
Proposed three innovative mechanisms:
 Common data busing(CDB)
 Register tagging scheme
 Reservation station
which permits:
 Out-of-order execution of independent instructions
 while preserving the essential precedences in the
instruction stream
Doubt
 When people talk about Tomasolu algorithm, they
talk about register renaming
 However this word can’t be found in the original
paper
How could anyone invent a thing
without noticing it?
Architecture Overview
FLOS
Adder
Mul
div
FLB
SDB
FLR Decoder
Storage
Instruction
Unit
FPU
From a FPU’s perspective
All instructions are ‘register-to-register’
 Register-to-register arithmetic
 Storage-to-register arithmetic
 Load
 Store
Instruction Unit(outside FPU) is in charge of the
address generation and memory access.
 Be equivalent to destination and source
 For example, AD R1, R2
 R1 is both a sink and a source
‘sink’ and ‘source’
source
sink
value
1.Reg-to-reg arithmetic AD R1, R2
FLOS
Adder
Mul
div
FLB
SDB
FLR Decoder
Storage
2.Storage-to-reg arithmetic AD R1, FLB
FLOS
Mul
divSDB
Decoder
Storage
Adder
FLR
FLB
3.Load LD R1, FLB1
FLOS
Adder
Mul
div
FLB
SDB
FLR Decoder
Storage
0
4.Store STD R1, SDB1
FLOS
Mul
div
FLB
Decoder
Storage
FLR
AdderSDB
0
Timing Sequence: 1. reg-to reg arithmetic
DecodeIU
EU Execute
Write back
to FLR
2 operands
To ALU
Decode
2. storage-to-reg arithmetic
DecodeIU
EU Execute
Write back
to FLR
FLR
To ALU
Decode
FLB
To ALU
Addr
Gen
Mem
Read
3.Load
DecodeIU
EU Execute
Writeback
to FLR
FLR
To ALU
Decode
FLB
To ALU
Addr
Gen
Mem
Read
4.Store
DecodeIU
EU Execute
FLR
To ALU
Decode
Write
To SDB
Addr
Gen
Mem
Write
A Day in the Life of ‘LD R1, addr’
FLOS
Adder
Mul
div
FLB
SDB
FLR Decoder
Storage
Instruction
Unit
FLBStorage FLOS
Adder
Mul
divSDB
Decoder
FLB1
addr
FLR
Decode &
Address
generation
A Day in the Life of ‘LD R1, addr’
Instruction
Unit
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Adder
Mul
divSDB
Decoder
addr
FLB1
LD R1, FLB1
FLR
Instruction
Unit
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Adder
Mul
divSDB
Decoder
addr
FLB1
LD R1, FLB1
FLR
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Mul
divSDB
Decoder
addr
FLB1
LD R1, FLB1
OP
FLR
Adder
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Mul
divSDB
addr
FLB1
LD R1, FLB1
OP
DecoderFLR
Adder
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Adder
Mul
divSDB
FLR
addr
FLB1
R1
LD R1, FLB1
Decoder
An Example of Dependence
LD F0, FLB1
MD F0, FLB2
What if send them to different execution units at the
same time?
Adder
Mul
div
to exploit parallelisim
An Example of Dependence
LD F0, FLB1
MD F0, FLB2
The result(F0) cannot reflect the impact of LD, because
MD uses the old value of F0
Adder
Mul
div
An Example of Dependence
LD F0, FLB1
MD F0, FLB2
Adder
Mul
div
It is also called true dependence,
a.k.a. RAW
A Simple Solution
 ‘busy’ bit scheme
R0
R1
R2
R3
B
I’am already the sink
of some instruction
I need your
contentLD R1 B
MD R1 A
Performance Degrades...
 When the code keep using one register
 E.g. MD F0, E
AD F2, F0
AD F4, A
AD F2, F4
overlap fails because the first AD depends on MD,
though the others don’t
The second AD is qualified to
issue
Cause of the Problem
 If one instruction gets stuck(due to dependence), the
following can’t be decoded(even it is qualified to
issue)
Solution :
 Decouple the dependence mantainance from
decoding
 Look ahead more instructions for concurrency
Dispatch and Issue Decoupling
MD F0, E
AD F2, F0
AD F4, A
AD F2, F4
Adder
Can issue?Decode
Is that reg busy?
Dispatch and Issue Decoupling
MD F0, E
AD F2, F0
AD F4, A
AD F2, F4
Adder
Dispatch
anyway
Decode
Are my operands
ready?
MD F0, E Can issue?
An Example of True Dependence
LD F0, FLB1 F0 as sink
AD F2, F0 F0 as source
Adder
Mul
div
FLB
FLR
FLB1
F0
Assume CDB has not
been introduced yet
LD F0, FLB1 dispatches to A1
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
An Example of True Dependence
F0 is reserved for some
instruction
LD F0, FLB1 dispatches to A1
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
An Example of True Dependence
Its content is calculated
by A1
LD F0, FLB1
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
I need the value of F0,
but he seems to be busy
An Example of True Dependence
LD F0, FLB1
AD F2, F0 dispatches to A2
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
Since A1 is the
producer, just let
him tell me
An Example of True Dependence
AD F2, F0
LD F0, FLB1
AD F2, F0 dispatches to A2
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
Since A1 is the
producer, just ask
him for it
An Example of True Dependence
AD F2, A1
LD F0, FLB1 executing
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
An Example of True Dependence
AD F2, A1
Operands are ready.
Execute!
LD F0, FLB1 broadcasts it’s result to the air
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
I’m A1. Who needs
my result? Over..
An Example of True Dependence
AD F2, A1
LD F0, FLB1 broadcasts it’s result to the air
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
I depend on
A1!
An Example of True Dependence
AD F2, A1
Me too!
The Role of CDB
 Common Data Bus is in charge of value forwarding
 In reg-to-reg model, a value is passed through a
register(write & read)
F0
Write as sink
(Producer)
The Role of CDB
 Common Data Bus is in charge of value forwarding
 In reg-to-reg model, a value is passed through a
register(write & read)
F0
Read as source
(Consumer)
The Role of CDB
Add
For Mul
Resv. S
For
Resv. S
FLB
SDB
FLR
 Load/Store doesn’t need to go through ALU
 The dependence management is decoupled from
execution as expected
The Role of CDB
CDB
All units which
may take register
as an operand
All units which can
alter a register
ConsumerProducer
Add
For Mul
Resv. S
For
Resv. S
FLB
SDB
FLR
P:3
P:2
P:6
The Role of CDB
CDB
All units which
may take register
as an operand
All units which can
alter a register
ConsumerProducer
Add
For Mul
Resv. S
For
Resv. S
FLB
SDB
FLRC:4
C:3 C:2*2
C:3*2
The Implementation of CDB
 A consumer recognizes his producer by tagging
 Producers throw <tag, value> on the bus by
turns(make a request first)
 If tag matches , consumer ingates the value
C C C C C C
P P P P P P
tag tag tag X Y Y
Requset
(2 cycles)
The Implementation of CDB
 A consumer recognizes his producer by tagging
 Producers throw <tag, value> on the bus by
turns(make a request first)
 If tag matches , consumer ingates the value
P P P P P P
Y value
C C C C C C
tag tag tag X Y Y
The Implementation of CDB
 A consumer recognizes his producer by tagging
 Producers throw <tag, value> on the bus by
turns(make a request first)
 If tag matches , consumer ingates the value
PP P P P P
C C C C C C
tag tag tag X Y Y
request
The Implementation of CDB
 A consumer recognizes his producer by tagging
 Producers throw <tag, value> on the bus by
turns(make a request first)
 If tag matches , consumer ingates the value
PP P P P P
C C C C C C
tag tag tag X Y Y
X value
The Principle behind the Scene
 Tag is a pointer pointing to the producer of the value
required by the current instruction
 The pointers construct the dependency information
which are hidden by the reg-reg model(discuss later)
 With the information, the order of execution can be
resolved
 CDB enables ‘producer-consumer’ style data flow
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
An Example for False Dependence
FLB2
FLB1
WAW
WAR
LD F0, FLB1 dispatches
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
An Example for False Dependence
FLB2
FLB1
B FLB1
LD F0, FLB1
AD F2, F0 dispatches to A1
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB1
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB1
LD F0, FLB1
AD F2, F0
LD F0, FLB2 dispatches
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB2
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0 dispatches to A2
Adder
Mul
div
FLB
FLR
F0
AD F3, F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB2
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F3, F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB2
Keep tracing the source of
the value instead of the
register holding it
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F3, F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB2
There’s no need to rename
a register(Naming is just a
way of referring values)
Timing Sequence with Busy Bit
D
T EX WB
AG
D
FLB
D
T T EX WBD
D
T EX WB
AG
D
FLB
D
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
T T EX WBD
Timing Sequence with Reservation Station
D
T EX WB
AG
D
FLB
D
T T EX WBD
D
T EX WB
AG
D
FLB
D
T T EX WBD
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
The Side Effect of Register Machine
 What are the differences between a circuit and a
register machine?
The Side Effect of Register Machine
 What are the differences between a circuit and a
register machine?
Register Machine
 General purpose
 Control-driven
 Implict dependence via
registers
Circuit
 Special purpose
 Data-driven
 Exposed dependence
...But registers are rare
Conclusion
 Tomasulo algorithm has nothing to do with register
renaming
 It resolves the WAR & WAW by elimating the side
effect of using register to pass value
 By using Tomasulo algorithm, the execution of a
program is driven by data flow thus exploiting
maximum concurrency

Weitere ähnliche Inhalte

Was ist angesagt?

Multi Valued Dependency
Multi Valued DependencyMulti Valued Dependency
Multi Valued DependencyRam Sekhar
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Introduction to database-Normalisation
Introduction to database-NormalisationIntroduction to database-Normalisation
Introduction to database-NormalisationAjit Nayak
 
Software Engineering : Software testing
Software Engineering : Software testingSoftware Engineering : Software testing
Software Engineering : Software testingAjit Nayak
 
Managing & Showing Value during Red Team Engagements & Purple Team Exercises ...
Managing & Showing Value during Red Team Engagements & Purple Team Exercises ...Managing & Showing Value during Red Team Engagements & Purple Team Exercises ...
Managing & Showing Value during Red Team Engagements & Purple Team Exercises ...Jorge Orchilles
 
Format String Attack
Format String AttackFormat String Attack
Format String AttackMayur Mallya
 
Pipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptmali yogesh kumar
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
FIRST CTI Symposium: Turning intelligence into action with MITRE ATT&CK™
FIRST CTI Symposium: Turning intelligence into action with MITRE ATT&CK™FIRST CTI Symposium: Turning intelligence into action with MITRE ATT&CK™
FIRST CTI Symposium: Turning intelligence into action with MITRE ATT&CK™Katie Nickels
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningMadhu Sanjeevi (Mady)
 
[CB21] NAS as Not As Secure by Ta-Lun Yen and Shirley Kuo
[CB21] NAS as Not As Secure by Ta-Lun Yen and Shirley Kuo[CB21] NAS as Not As Secure by Ta-Lun Yen and Shirley Kuo
[CB21] NAS as Not As Secure by Ta-Lun Yen and Shirley KuoCODE BLUE
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazardAJAL A J
 
Deadlock detection & prevention
Deadlock detection & preventionDeadlock detection & prevention
Deadlock detection & preventionIkhtiarUddinShaHin
 
Using IOCs to Design and Control Threat Activities During a Red Team Engagement
Using IOCs to Design and Control Threat Activities During a Red Team EngagementUsing IOCs to Design and Control Threat Activities During a Red Team Engagement
Using IOCs to Design and Control Threat Activities During a Red Team EngagementJoe Vest
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processorMuhammad Ishaq
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Edureka!
 

Was ist angesagt? (20)

Multi Valued Dependency
Multi Valued DependencyMulti Valued Dependency
Multi Valued Dependency
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Pipelining & All Hazards Solution
Pipelining  & All Hazards SolutionPipelining  & All Hazards Solution
Pipelining & All Hazards Solution
 
Introduction to database-Normalisation
Introduction to database-NormalisationIntroduction to database-Normalisation
Introduction to database-Normalisation
 
Software Engineering : Software testing
Software Engineering : Software testingSoftware Engineering : Software testing
Software Engineering : Software testing
 
Managing & Showing Value during Red Team Engagements & Purple Team Exercises ...
Managing & Showing Value during Red Team Engagements & Purple Team Exercises ...Managing & Showing Value during Red Team Engagements & Purple Team Exercises ...
Managing & Showing Value during Red Team Engagements & Purple Team Exercises ...
 
Format String Attack
Format String AttackFormat String Attack
Format String Attack
 
Pipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture ppt
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
FIRST CTI Symposium: Turning intelligence into action with MITRE ATT&CK™
FIRST CTI Symposium: Turning intelligence into action with MITRE ATT&CK™FIRST CTI Symposium: Turning intelligence into action with MITRE ATT&CK™
FIRST CTI Symposium: Turning intelligence into action with MITRE ATT&CK™
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
 
[CB21] NAS as Not As Secure by Ta-Lun Yen and Shirley Kuo
[CB21] NAS as Not As Secure by Ta-Lun Yen and Shirley Kuo[CB21] NAS as Not As Secure by Ta-Lun Yen and Shirley Kuo
[CB21] NAS as Not As Secure by Ta-Lun Yen and Shirley Kuo
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
 
Dependency preserving
Dependency preservingDependency preserving
Dependency preserving
 
Deadlock detection & prevention
Deadlock detection & preventionDeadlock detection & prevention
Deadlock detection & prevention
 
Using IOCs to Design and Control Threat Activities During a Red Team Engagement
Using IOCs to Design and Control Threat Activities During a Red Team EngagementUsing IOCs to Design and Control Threat Activities During a Red Team Engagement
Using IOCs to Design and Control Threat Activities During a Red Team Engagement
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 

Ähnlich wie Understanding Tomasulo Algorithm

Basic programming of 8085
Basic programming of 8085 Basic programming of 8085
Basic programming of 8085 vijaydeepakg
 
Instruction_Set.pdf
Instruction_Set.pdfInstruction_Set.pdf
Instruction_Set.pdfboukomra
 
80386 microprocessor system instruction
80386 microprocessor system instruction80386 microprocessor system instruction
80386 microprocessor system instructionUmesh Talware
 
Tomasulo Algorithm
Tomasulo AlgorithmTomasulo Algorithm
Tomasulo AlgorithmFarwa Ansari
 
ospf ahmed tawfeek CCNA dump for Exam12
ospf  ahmed tawfeek CCNA dump for Exam12ospf  ahmed tawfeek CCNA dump for Exam12
ospf ahmed tawfeek CCNA dump for Exam12ym7md88
 
PIC Instructions.pptx
PIC Instructions.pptxPIC Instructions.pptx
PIC Instructions.pptxAltaafMulani
 
Federated SPARQL Query Processing With Replicated Fragment
Federated SPARQL Query Processing With Replicated FragmentFederated SPARQL Query Processing With Replicated Fragment
Federated SPARQL Query Processing With Replicated FragmentPascal Molli
 
Comparing IDL to C++ with IDL to C++11
Comparing IDL to C++ with IDL to C++11Comparing IDL to C++ with IDL to C++11
Comparing IDL to C++ with IDL to C++11Remedy IT
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
 

Ähnlich wie Understanding Tomasulo Algorithm (15)

Al2ed chapter18
Al2ed chapter18Al2ed chapter18
Al2ed chapter18
 
Instructions
InstructionsInstructions
Instructions
 
Basic programming of 8085
Basic programming of 8085 Basic programming of 8085
Basic programming of 8085
 
VSE/POWER, all the news since z/VSE 4.2
VSE/POWER, all the news since z/VSE 4.2VSE/POWER, all the news since z/VSE 4.2
VSE/POWER, all the news since z/VSE 4.2
 
Instruction set
Instruction setInstruction set
Instruction set
 
Instruction_Set.pdf
Instruction_Set.pdfInstruction_Set.pdf
Instruction_Set.pdf
 
80386 microprocessor system instruction
80386 microprocessor system instruction80386 microprocessor system instruction
80386 microprocessor system instruction
 
Tomasulo Algorithm
Tomasulo AlgorithmTomasulo Algorithm
Tomasulo Algorithm
 
BCNF
BCNFBCNF
BCNF
 
ospf ahmed tawfeek CCNA dump for Exam12
ospf  ahmed tawfeek CCNA dump for Exam12ospf  ahmed tawfeek CCNA dump for Exam12
ospf ahmed tawfeek CCNA dump for Exam12
 
Assembler
AssemblerAssembler
Assembler
 
PIC Instructions.pptx
PIC Instructions.pptxPIC Instructions.pptx
PIC Instructions.pptx
 
Federated SPARQL Query Processing With Replicated Fragment
Federated SPARQL Query Processing With Replicated FragmentFederated SPARQL Query Processing With Replicated Fragment
Federated SPARQL Query Processing With Replicated Fragment
 
Comparing IDL to C++ with IDL to C++11
Comparing IDL to C++ with IDL to C++11Comparing IDL to C++ with IDL to C++11
Comparing IDL to C++ with IDL to C++11
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Understanding Tomasulo Algorithm

  • 1. Understanding the Tomasulo Algorithm Yichao Cheng Jul 23, 2013
  • 2. Background  IBM System/360 Model 91  FPU’s add/mul/div takes 2/3/13 cycles  Can performance be improved through utilizing multiple execution units? Adder Mul div
  • 3. Major Contributions Proposed three innovative mechanisms:  Common data busing(CDB)  Register tagging scheme  Reservation station which permits:  Out-of-order execution of independent instructions  while preserving the essential precedences in the instruction stream
  • 4. Doubt  When people talk about Tomasolu algorithm, they talk about register renaming  However this word can’t be found in the original paper How could anyone invent a thing without noticing it?
  • 6. From a FPU’s perspective All instructions are ‘register-to-register’  Register-to-register arithmetic  Storage-to-register arithmetic  Load  Store Instruction Unit(outside FPU) is in charge of the address generation and memory access.
  • 7.  Be equivalent to destination and source  For example, AD R1, R2  R1 is both a sink and a source ‘sink’ and ‘source’ source sink value
  • 8. 1.Reg-to-reg arithmetic AD R1, R2 FLOS Adder Mul div FLB SDB FLR Decoder Storage
  • 9. 2.Storage-to-reg arithmetic AD R1, FLB FLOS Mul divSDB Decoder Storage Adder FLR FLB
  • 10. 3.Load LD R1, FLB1 FLOS Adder Mul div FLB SDB FLR Decoder Storage 0
  • 11. 4.Store STD R1, SDB1 FLOS Mul div FLB Decoder Storage FLR AdderSDB 0
  • 12. Timing Sequence: 1. reg-to reg arithmetic DecodeIU EU Execute Write back to FLR 2 operands To ALU Decode
  • 13. 2. storage-to-reg arithmetic DecodeIU EU Execute Write back to FLR FLR To ALU Decode FLB To ALU Addr Gen Mem Read
  • 14. 3.Load DecodeIU EU Execute Writeback to FLR FLR To ALU Decode FLB To ALU Addr Gen Mem Read
  • 16. A Day in the Life of ‘LD R1, addr’ FLOS Adder Mul div FLB SDB FLR Decoder Storage Instruction Unit
  • 17. FLBStorage FLOS Adder Mul divSDB Decoder FLB1 addr FLR Decode & Address generation A Day in the Life of ‘LD R1, addr’ Instruction Unit
  • 18. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Adder Mul divSDB Decoder addr FLB1 LD R1, FLB1 FLR Instruction Unit
  • 19. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Adder Mul divSDB Decoder addr FLB1 LD R1, FLB1 FLR
  • 20. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Mul divSDB Decoder addr FLB1 LD R1, FLB1 OP FLR Adder
  • 21. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Mul divSDB addr FLB1 LD R1, FLB1 OP DecoderFLR Adder
  • 22. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Adder Mul divSDB FLR addr FLB1 R1 LD R1, FLB1 Decoder
  • 23. An Example of Dependence LD F0, FLB1 MD F0, FLB2 What if send them to different execution units at the same time? Adder Mul div to exploit parallelisim
  • 24. An Example of Dependence LD F0, FLB1 MD F0, FLB2 The result(F0) cannot reflect the impact of LD, because MD uses the old value of F0 Adder Mul div
  • 25. An Example of Dependence LD F0, FLB1 MD F0, FLB2 Adder Mul div It is also called true dependence, a.k.a. RAW
  • 26. A Simple Solution  ‘busy’ bit scheme R0 R1 R2 R3 B I’am already the sink of some instruction I need your contentLD R1 B MD R1 A
  • 27. Performance Degrades...  When the code keep using one register  E.g. MD F0, E AD F2, F0 AD F4, A AD F2, F4 overlap fails because the first AD depends on MD, though the others don’t The second AD is qualified to issue
  • 28. Cause of the Problem  If one instruction gets stuck(due to dependence), the following can’t be decoded(even it is qualified to issue) Solution :  Decouple the dependence mantainance from decoding  Look ahead more instructions for concurrency
  • 29. Dispatch and Issue Decoupling MD F0, E AD F2, F0 AD F4, A AD F2, F4 Adder Can issue?Decode Is that reg busy?
  • 30. Dispatch and Issue Decoupling MD F0, E AD F2, F0 AD F4, A AD F2, F4 Adder Dispatch anyway Decode Are my operands ready? MD F0, E Can issue?
  • 31. An Example of True Dependence LD F0, FLB1 F0 as sink AD F2, F0 F0 as source Adder Mul div FLB FLR FLB1 F0 Assume CDB has not been introduced yet
  • 32. LD F0, FLB1 dispatches to A1 AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 An Example of True Dependence F0 is reserved for some instruction
  • 33. LD F0, FLB1 dispatches to A1 AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 An Example of True Dependence Its content is calculated by A1
  • 34. LD F0, FLB1 AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 I need the value of F0, but he seems to be busy An Example of True Dependence
  • 35. LD F0, FLB1 AD F2, F0 dispatches to A2 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 Since A1 is the producer, just let him tell me An Example of True Dependence AD F2, F0
  • 36. LD F0, FLB1 AD F2, F0 dispatches to A2 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 Since A1 is the producer, just ask him for it An Example of True Dependence AD F2, A1
  • 37. LD F0, FLB1 executing AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 An Example of True Dependence AD F2, A1 Operands are ready. Execute!
  • 38. LD F0, FLB1 broadcasts it’s result to the air AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 I’m A1. Who needs my result? Over.. An Example of True Dependence AD F2, A1
  • 39. LD F0, FLB1 broadcasts it’s result to the air AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 I depend on A1! An Example of True Dependence AD F2, A1 Me too!
  • 40. The Role of CDB  Common Data Bus is in charge of value forwarding  In reg-to-reg model, a value is passed through a register(write & read) F0 Write as sink (Producer)
  • 41. The Role of CDB  Common Data Bus is in charge of value forwarding  In reg-to-reg model, a value is passed through a register(write & read) F0 Read as source (Consumer)
  • 42. The Role of CDB Add For Mul Resv. S For Resv. S FLB SDB FLR  Load/Store doesn’t need to go through ALU  The dependence management is decoupled from execution as expected
  • 43. The Role of CDB CDB All units which may take register as an operand All units which can alter a register ConsumerProducer Add For Mul Resv. S For Resv. S FLB SDB FLR P:3 P:2 P:6
  • 44. The Role of CDB CDB All units which may take register as an operand All units which can alter a register ConsumerProducer Add For Mul Resv. S For Resv. S FLB SDB FLRC:4 C:3 C:2*2 C:3*2
  • 45. The Implementation of CDB  A consumer recognizes his producer by tagging  Producers throw <tag, value> on the bus by turns(make a request first)  If tag matches , consumer ingates the value C C C C C C P P P P P P tag tag tag X Y Y Requset (2 cycles)
  • 46. The Implementation of CDB  A consumer recognizes his producer by tagging  Producers throw <tag, value> on the bus by turns(make a request first)  If tag matches , consumer ingates the value P P P P P P Y value C C C C C C tag tag tag X Y Y
  • 47. The Implementation of CDB  A consumer recognizes his producer by tagging  Producers throw <tag, value> on the bus by turns(make a request first)  If tag matches , consumer ingates the value PP P P P P C C C C C C tag tag tag X Y Y request
  • 48. The Implementation of CDB  A consumer recognizes his producer by tagging  Producers throw <tag, value> on the bus by turns(make a request first)  If tag matches , consumer ingates the value PP P P P P C C C C C C tag tag tag X Y Y X value
  • 49. The Principle behind the Scene  Tag is a pointer pointing to the producer of the value required by the current instruction  The pointers construct the dependency information which are hidden by the reg-reg model(discuss later)  With the information, the order of execution can be resolved  CDB enables ‘producer-consumer’ style data flow
  • 50. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 An Example for False Dependence FLB2 FLB1 WAW WAR
  • 51. LD F0, FLB1 dispatches AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 An Example for False Dependence FLB2 FLB1 B FLB1
  • 52. LD F0, FLB1 AD F2, F0 dispatches to A1 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB1
  • 53. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB1
  • 54. LD F0, FLB1 AD F2, F0 LD F0, FLB2 dispatches AD F3, F0 Adder Mul div FLB FLR F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB2
  • 55. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 dispatches to A2 Adder Mul div FLB FLR F0 AD F3, F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB2
  • 56. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 AD F3, F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB2 Keep tracing the source of the value instead of the register holding it
  • 57. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 AD F3, F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB2 There’s no need to rename a register(Naming is just a way of referring values)
  • 58. Timing Sequence with Busy Bit D T EX WB AG D FLB D T T EX WBD D T EX WB AG D FLB D LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 T T EX WBD
  • 59. Timing Sequence with Reservation Station D T EX WB AG D FLB D T T EX WBD D T EX WB AG D FLB D T T EX WBD LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0
  • 60. The Side Effect of Register Machine  What are the differences between a circuit and a register machine?
  • 61. The Side Effect of Register Machine  What are the differences between a circuit and a register machine? Register Machine  General purpose  Control-driven  Implict dependence via registers Circuit  Special purpose  Data-driven  Exposed dependence ...But registers are rare
  • 62. Conclusion  Tomasulo algorithm has nothing to do with register renaming  It resolves the WAR & WAW by elimating the side effect of using register to pass value  By using Tomasulo algorithm, the execution of a program is driven by data flow thus exploiting maximum concurrency