CArcMOOC 06.04 - Dynamic optimizations

•

1 like•408 views

Alessandro Bogliolo

Computer Architecture Degree Program in Applied Computer Science University of Urbino http://informatica.uniurb.it/

Education

Carc 06.04
alessandro.bogliolo@uniurb.it
06. Performance optimization
06.04. Dynamic optimization
• Out-of-order execution
• Speculation
Computer Architecture
alessandro.bogliolo@uniurb.it

Carc 06.04
alessandro.bogliolo@uniurb.it
Dynamic optimization
• Out of order (OOO) execution
• Dynamic scheduling (Tomasulo’s algorithm)
• Speculation
• Tentative decisions on conditional branches with unknown
conditions
• Sub-clock-cycle operations
• More than one instruction processed per clock cycle

Carc 06.04
alessandro.bogliolo@uniurb.it
Data dependencies (1)
• (RAW) Read after write
• The second instruction reads a register written
by the first one
• (WAR) Write after read
• The first instruction reads the content of a
register before it is overwritten by the second
one
• (WAW) Write after write
• Both instructions write on the same register
add r1 r2 r3
sub r4 r4 r1
load r1 0(r2)
bnz r1 label
add r3 r2 r1
load r1 0(r4)
add r1 r2 r3
load r1 0(r4)

Carc 06.04
alessandro.bogliolo@uniurb.it
Data dependencies (2)
add r1 r2 r3
sub r4 r4 r1
add r1 r5 r6
write 0(r7) r1
RAW
WAW
WAR
RAW

Carc 06.04
alessandro.bogliolo@uniurb.it
Data dependencies (3)
add r1 r2 r3
sub r4 r4 r1
add r10 r5 r6
write 0(r7) r10
RAW
RAW

Carc 06.04
alessandro.bogliolo@uniurb.it
Data dependencies (4)
add r1 r2 r3
add r10 r5 r6
sub r4 r4 r1
write 0(r7) r10
RAW
RAW

Carc 06.04
alessandro.bogliolo@uniurb.it
OOO execution opportunities
• Give priority to ready-to-execute instructions
• Enqueu instructions on resource-specific lanes
• Avoid false data dependences
• Use temporary registers avoid WAR and WAW
• Forward partial results to all execution units
• Use a shared internal bus for forwarding
• Preserve sequential coherence when writing on the
register file

Carc 06.04
alessandro.bogliolo@uniurb.it
OOO execution support (1)
• Instruction decode and register fetch in separate
phases
• Instruction queue
• Reservation stations
• Common data bus (CDB)
• Reorder buffer (ROB)
• The above-mentioned components provide a
hardware implementation of Tomasulo’s algorithm
for dynamic scheduling

Carc 06.04
alessandro.bogliolo@uniurb.it
OOO execution support (2)

Carc 06.04
alessandro.bogliolo@uniurb.it
OOO execution phases
1. Issue
• Get an instruction from the instruction queue and put it
in the reservation station
2. Execute
• When the resource is available and the operands are
available at the reservation station, execute the
instruction
3. Write results
• Send the result (through the CDB) to the ROB and to all
reservation stations waiting for it
4. Commit
• Move the results from the head of the ROB to either the
register file or the memory

Carc 06.04
alessandro.bogliolo@uniurb.it
OOO execution example

Carc 06.04
alessandro.bogliolo@uniurb.it
Speculation
• Branch prediction
• Use a predictor to predict the value of not-yet available
branch conditions
• Delay commitment of speculative instructions
• Empty the ROB in case of wrong prediction
• Strategies:
• Branch prediction
• Target prediction

Carc 06.04
alessandro.bogliolo@uniurb.it
Branch prediction
• 1-bit predictor
• 2 mispredictions per condition change
• 2-bit predictor
• 1 misprediction
per condition
change
• n-bit predictors

Carc 06.04
alessandro.bogliolo@uniurb.it
Branch prediction buffer
• 1 predictor for all branches
• 1 predictor for each branch
• Branch prediction buffer
• N-entry table
• Indexed by part of the instruction address (PC)
• Tag not used
TAG index
PC
branch instruction
1 1 predict taken
BPB

Carc 06.04
alessandro.bogliolo@uniurb.it
Branch target buffer
• Directly provide the target PC in case of predicted
taken branches
• Works like a fully-associative cache
• A cache hit provide the (predicted) target PC
• A cache miss means PC=PC+1 (either the instruction is not a
branch, or the branch is untaken)
• Predictions must be instruction-specific
• Predict-untaken branches are removed from the buffer
• Any branch-prediction strategy can be associated with
the target (e.g., 1-bit, 2-bit, n-bit, …)

Carc 06.04
alessandro.bogliolo@uniurb.it
Branch target buffer

What's hot

Timing Analysisrchovatiya

Power Reduction TechniquesRajesh M

Study of inter and intra chip variationsRajesh M

EcoRajesh M

Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISAHsien-Hsin Sean Lee, Ph.D.

INSTRUCTION LEVEL PARALLALISMKamran Ashraf

Pipelining powerpoint presentationbhavanadonthi

3 Pipeliningfika sweety

pipeline and pipeline hazards Bharti Khemani

ARM Architecture in Details GlobalLogic Ukraine

Pipeline hazardAJAL A J

Pipeline and data hazardWaed Shagareen

Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Hsien-Hsin Sean Lee, Ph.D.

ARM instruction setKarthik Vivek

Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWHsien-Hsin Sean Lee, Ph.D.

Pipelining & All Hazards Solution.AIR UNIVERSITY ISLAMABAD

Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.

Pipeline processing - Computer Architecture S. Hasnain Raza

Pipelinig hazardousjasscheema

Pipelining , structural hazardsMunaam Munawar

What's hot (20)

Timing Analysis

Power Reduction Techniques

Study of inter and intra chip variations

Eco

Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA

INSTRUCTION LEVEL PARALLALISM

Pipelining powerpoint presentation

3 Pipelining

pipeline and pipeline hazards

ARM Architecture in Details

Pipeline hazard

Pipeline and data hazard

Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3

ARM instruction set

Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW

Pipelining & All Hazards Solution

Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore

Pipeline processing - Computer Architecture

Pipelinig hazardous

Pipelining , structural hazards

Viewers also liked

CodeMOOC2 - Dall'idea alla specificaAlessandro Bogliolo

CodeMOOC2 - Cos'è un'AppAlessandro Bogliolo

Coding: dai diagrammi di flusso al pipecodingAlessandro Bogliolo

Come vincere la CodeWeek4all challenge - Guida per le scuoleAlessandro Bogliolo

CodyRoby-hocAlessandro Bogliolo

DresscodeAlessandro Bogliolo

Pixel artAlessandro Bogliolo

Teelie Turner, Posts of the Week, March 6-12, 2017Teelie Turner

Innovation gefunden! Vertrieb fehlt?Robert Mehlan

Letter european investors – government of the Slovak republicvoFinanciach

Relative age and index fossils 1moodyp

Future Trends - Recycling - Demolished Construction MaterialsBruce LaCour

16th March 2017 - Life restored through Jesus ChristThorn Group Pvt Ltd

Evolución 2KAtiRojChu

scop and function public relation hanipatel0305

Nutritional requirement of the living cellsMicrobiology

Conflicts ResolvingReturn on Intelligence

The Europe Code Week (CodeEU) initiativeJesús Moreno León

Viewers also liked (18)

CodeMOOC2 - Dall'idea alla specifica

CodeMOOC2 - Cos'è un'App

Coding: dai diagrammi di flusso al pipecoding

Come vincere la CodeWeek4all challenge - Guida per le scuole

CodyRoby-hoc

Dresscode

Pixel art

Teelie Turner, Posts of the Week, March 6-12, 2017

Innovation gefunden! Vertrieb fehlt?

Letter european investors – government of the Slovak republic

Relative age and index fossils 1

Future Trends - Recycling - Demolished Construction Materials

16th March 2017 - Life restored through Jesus Christ

Evolución 2

scop and function public relation

Nutritional requirement of the living cells

Conflicts Resolving

The Europe Code Week (CodeEU) initiative

Similar to CArcMOOC 06.04 - Dynamic optimizations

Smart monitoring how does oracle rac manage resource, state ukoug19Anil Nair

13 superscalarHammad Farooq

13_Superscalar.pptLavleshkumarBais

Online Architecture Assignment HelpArchitecture Assignment Help

Introduction to ARM ArchitectureRacharla Rohit Varma

Design of control unit.pptxShubham014

Microchip's PIC Micro ControllerMidhu S V Unnithan

DEF CON 27 - JESSE MICHAEL - get off the kernel if you can't driveFelipe Prado

The power of linux advanced tracer [POUG18]Mahmoud Hatem

FFR GreenKiller - Automatic kernel-mode malware analysis systemFFRI, Inc.

Arm architecture chapter2_steve_furberasodariyabhavesh

Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex

Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Continuent

Control unit designDhaval Bagal

How to Design a Program Repair Bot? Insights from the Repairnator ProjectSimon Urli

Introduction to arm processorRAMPRAKASHT1

Introduction to pic microcontrollerSiva Kumar

Porting and Optimization of Numerical Libraries for ARM SVELinaro

Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksAnne Nicolas

Grow and Shrink - Dynamically Extending the Ruby VM StackKeitaSugiyama1

Similar to CArcMOOC 06.04 - Dynamic optimizations (20)

Smart monitoring how does oracle rac manage resource, state ukoug19

13 superscalar

13_Superscalar.ppt

Online Architecture Assignment Help

Introduction to ARM Architecture

Design of control unit.pptx

Microchip's PIC Micro Controller

DEF CON 27 - JESSE MICHAEL - get off the kernel if you can't drive

The power of linux advanced tracer [POUG18]

FFR GreenKiller - Automatic kernel-mode malware analysis system

Arm architecture chapter2_steve_furber

Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac

Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...

Control unit design

How to Design a Program Repair Bot? Insights from the Repairnator Project

Introduction to arm processor

Introduction to pic microcontroller

Porting and Optimization of Numerical Libraries for ARM SVE

Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks

Grow and Shrink - Dynamically Extending the Ruby VM Stack

Recently uploaded

Class 11th Physics NEET formula sheet pdfAyushMahapatra5

Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K

Sports & Fitness Value Added Course FY..Disha Kariya

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur

Nutritional Needs Presentation - HLTH 104misteraugie

Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417

9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt

Measures of Central Tendency: Mean, Median and ModeThiyagu K

The basics of sentences session 2pptx copy.pptxheathfieldcps1

Arihant handbook biology for class 11 .pdfchloefrazer622

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Advanced Views - Calendar View in Odoo 17Celine George

fourth grading exam for kindergarten in writingTeacherCyreneCayanan

Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K

Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande

Accessible design: Minimum effort, maximum impactdawncurless

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Paris 2024 Olympic Geographies - an activityGeoBlogs

Recently uploaded (20)

Class 11th Physics NEET formula sheet pdf

Z Score,T Score, Percential Rank and Box Plot Graph

Sports & Fitness Value Added Course FY..

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...

Nutritional Needs Presentation - HLTH 104

Unit-IV- Pharma. Marketing Channels.pptx

9548086042 for call girls in Indira Nagar with room service

Measures of Central Tendency: Mean, Median and Mode

The basics of sentences session 2pptx copy.pptx

Arihant handbook biology for class 11 .pdf

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx

Advanced Views - Calendar View in Odoo 17

fourth grading exam for kindergarten in writing

Measures of Dispersion and Variability: Range, QD, AD and SD

Web & Social Media Analytics Previous Year Question Paper.pdf

Accessible design: Minimum effort, maximum impact

Sanyam Choudhary Chemistry practical.pdf

Mattingly "AI & Prompt Design: The Basics of Prompt Design"

Paris 2024 Olympic Geographies - an activity

CArcMOOC 06.04 - Dynamic optimizations

1. Carc 06.04 alessandro.bogliolo@uniurb.it 06. Performance optimization 06.04. Dynamic optimization • Out-of-order execution • Speculation Computer Architecture alessandro.bogliolo@uniurb.it

2. Carc 06.04 alessandro.bogliolo@uniurb.it Dynamic optimization • Out of order (OOO) execution • Dynamic scheduling (Tomasulo’s algorithm) • Speculation • Tentative decisions on conditional branches with unknown conditions • Sub-clock-cycle operations • More than one instruction processed per clock cycle

3. Carc 06.04 alessandro.bogliolo@uniurb.it Data dependencies (1) • (RAW) Read after write • The second instruction reads a register written by the first one • (WAR) Write after read • The first instruction reads the content of a register before it is overwritten by the second one • (WAW) Write after write • Both instructions write on the same register add r1 r2 r3 sub r4 r4 r1 load r1 0(r2) bnz r1 label add r3 r2 r1 load r1 0(r4) add r1 r2 r3 load r1 0(r4)

4. Carc 06.04 alessandro.bogliolo@uniurb.it Data dependencies (2) add r1 r2 r3 sub r4 r4 r1 add r1 r5 r6 write 0(r7) r1 RAW WAW WAR RAW

5. Carc 06.04 alessandro.bogliolo@uniurb.it Data dependencies (3) add r1 r2 r3 sub r4 r4 r1 add r10 r5 r6 write 0(r7) r10 RAW RAW

6. Carc 06.04 alessandro.bogliolo@uniurb.it Data dependencies (4) add r1 r2 r3 add r10 r5 r6 sub r4 r4 r1 write 0(r7) r10 RAW RAW

7. Carc 06.04 alessandro.bogliolo@uniurb.it OOO execution opportunities • Give priority to ready-to-execute instructions • Enqueu instructions on resource-specific lanes • Avoid false data dependences • Use temporary registers avoid WAR and WAW • Forward partial results to all execution units • Use a shared internal bus for forwarding • Preserve sequential coherence when writing on the register file

8. Carc 06.04 alessandro.bogliolo@uniurb.it OOO execution support (1) • Instruction decode and register fetch in separate phases • Instruction queue • Reservation stations • Common data bus (CDB) • Reorder buffer (ROB) • The above-mentioned components provide a hardware implementation of Tomasulo’s algorithm for dynamic scheduling

9. Carc 06.04 alessandro.bogliolo@uniurb.it OOO execution support (2)

10. Carc 06.04 alessandro.bogliolo@uniurb.it OOO execution phases 1. Issue • Get an instruction from the instruction queue and put it in the reservation station 2. Execute • When the resource is available and the operands are available at the reservation station, execute the instruction 3. Write results • Send the result (through the CDB) to the ROB and to all reservation stations waiting for it 4. Commit • Move the results from the head of the ROB to either the register file or the memory

11. Carc 06.04 alessandro.bogliolo@uniurb.it OOO execution example

12. Carc 06.04 alessandro.bogliolo@uniurb.it Speculation • Branch prediction • Use a predictor to predict the value of not-yet available branch conditions • Delay commitment of speculative instructions • Empty the ROB in case of wrong prediction • Strategies: • Branch prediction • Target prediction

13. Carc 06.04 alessandro.bogliolo@uniurb.it Branch prediction • 1-bit predictor • 2 mispredictions per condition change • 2-bit predictor • 1 misprediction per condition change • n-bit predictors

14. Carc 06.04 alessandro.bogliolo@uniurb.it Branch prediction buffer • 1 predictor for all branches • 1 predictor for each branch • Branch prediction buffer • N-entry table • Indexed by part of the instruction address (PC) • Tag not used TAG index PC branch instruction 1 1 predict taken BPB

15. Carc 06.04 alessandro.bogliolo@uniurb.it Branch target buffer • Directly provide the target PC in case of predicted taken branches • Works like a fully-associative cache • A cache hit provide the (predicted) target PC • A cache miss means PC=PC+1 (either the instruction is not a branch, or the branch is untaken) • Predictions must be instruction-specific • Predict-untaken branches are removed from the buffer • Any branch-prediction strategy can be associated with the target (e.g., 1-bit, 2-bit, n-bit, …)

16. Carc 06.04 alessandro.bogliolo@uniurb.it Branch target buffer

17. Carc 06.04 alessandro.bogliolo@uniurb.it Branch target buffer

CArcMOOC 06.04 - Dynamic optimizations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to CArcMOOC 06.04 - Dynamic optimizations

Similar to CArcMOOC 06.04 - Dynamic optimizations (20)

More from Alessandro Bogliolo

More from Alessandro Bogliolo (20)

Recently uploaded

Recently uploaded (20)

CArcMOOC 06.04 - Dynamic optimizations