Ohad1

‫מבוא למחשבים‬
‫‪Pipeline Processing‬‬
‫השקפים מבוססים על הספרים‬
‫פרק 9 – ‪Mano‬‬
‫‪Hennessy & Patterson‬‬

‫ד"ר רון שמואלי‬
‫‪rshmueli@bgu.ac.il‬‬
‫1‬ ‫‪Ron Shmueli‬‬

‫חישוב גורם האצה של ה- ‪PIPELINE‬‬
‫• ‪ - n‬מספר המשימות לביצוע.‬
‫• במחשב רגיל ) לא ‪.(Pipelined‬‬
‫• ‪ -tn‬הזמן להשלים משימה.‬
‫• ‪ n*tn‬הזמן הנדרש להשלמת ‪ n‬משימות.‬
‫• במכונת ‪Pipelined‬‬
‫‪ – Tp‬זמן מחזור של השעון )זמן לסיום כל סיגמנט(‬ ‫•‬
‫‪ – Ktp‬זמן להשלמת המשימה הראשונה.‬ ‫•‬
‫‪ - (n-1)tp‬הזמן להשלמת 1-‪ n‬המשימות הנותרות.‬ ‫•‬
‫תוצאות מיטביות‬ ‫‪ – (k+n-1)tp‬הזמן להשלמת ‪ n‬משימות.‬ ‫•‬
‫ה- ‪ PIPELINE‬תמיד מלא‬
‫גורם האצה תיאורטי הוא ‪K‬‬
‫• גורם ההאצה ‪Speedup‬‬
‫) ‪ K‬מספר הסגמנטים(‬ ‫‪Sk = n*tn / (k + n - 1)*tp‬‬
‫‪tn‬‬
‫‪lim‬‬ ‫= ‪Sk‬‬ ‫) ‪= k, ( if tn = k * tp‬‬ ‫‪tn=ktp‬‬ ‫בהנחה‬
‫‪n‬‬ ‫‪tp‬‬


‫1‬

A=ax2 p B=bx2 q
p a q b
4-STAGE
FLOATING
Stages: Other
POINT
Exponent Fraction
S1 subtractor
fraction
selector
Fraction with min(p,q)
ADDER
r = max(p,q)
Right shifter
t = |p - q|

S2 Fraction
‫מימוש חלקי‬
adder
r c ‫לארכיטקטורה לסיכום‬
‫שני מספרים בנקודה‬
Leading zero
S3 counter
c ‫צפה‬
Left shifter
r

Exponent
d
‫לחשב זמנים‬
S4 adder

s d
r s
C=A+B=cx2 =dx2
(r = max (p,q), 0.5  d < 1)
3 Ron Shmueli

INSTRUCTION PIPELINE
Six Phases* in an [1] Fetch an instruction from memory
Instruction Cycle [2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place

* Some instructions skip some phases
* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase
4-Stage Pipeline
[1] FI: Fetch an instruction from memory
[2] DA: Decode the instruction and calculate
the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation

4 Ron Shmueli

2

‫סיכונים בביצוע ב- ‪Pipeline‬‬
‫הפרדת פקודות ונתונים‬ ‫‪ - Structural hazards‬גישה לזכרון ע"י שני סיגמנטים באותו זמן. -‬ ‫•‬
‫‪- Data hazards‬תלות בין הנתונים המתבצעים.‬ ‫•‬
‫• ‪ –Hardware interlock‬מעגל המשהה את הפקודה התלויה במחזורי שעון בהתאם‬
‫• ‪ – Operand forwarding‬חומרה שמגלה תלות בין פקודות ומנתבת‬
‫תוצאה במסלול חלופי ישירות לסגמנט הבא.‬
‫• ‪ - Delayed load‬הקומפיילר מסדר את הפקודות מחדש או הכנסת ‪NOPs‬‬
‫‪- Control Hazards‬פקודות לבקרת תוכנית-הסתעפות.‬ ‫•‬
‫‪ - PRE-FETCH‬ביצוע ‪ FETCH‬לכתובת הבאה ולכתובת ההסתעפות.‬ ‫•‬
‫)‪) Branch target Buffer (BTB‬ניהול הזיכרון אסוציאטיבי(‬ ‫•‬
‫)‪ –Loop Buffer (High Speed Register file‬שמירת הלולאה כולה כולל‬ ‫•‬
‫ההסתעפות באוגרים – הלולאה כולה מתבצעת ללא גישה לזיכרון.‬
‫‪ -Branch Prediction‬ניסיון לנחש את תוצאת ההסתעפות – והבאת הפקודות‬ ‫•‬
‫המתאימות בהתאם לניחוש.‬
‫‪ - Delayed Branch‬המהדר מסדר את הפקודות מחדש, מכניס פקודות שימושיות‬ ‫•‬
‫– כך שהצינור יישאר מלא כאשר ישנה הסתעפות.‬


‫דוגמא ליישום ‪Operand FORWARDING‬‬

‫:‪Example‬‬
‫‪Register‬‬
‫‪file‬‬
‫‪ADD‬‬ ‫3‪R1, R2, R‬‬
‫‪SUB‬‬ ‫5‪R4, R1, R‬‬

‫‪3-stage Pipeline‬‬ ‫‪MUX‬‬ ‫‪MUX‬‬ ‫‪Bypass‬‬
‫‪path‬‬
‫‪I: Instruction Fetch‬‬ ‫‪Result‬‬
‫‪write bus‬‬
‫,‪A: Decode, Read Registers‬‬ ‫‪ALU‬‬
‫‪ALU Operations‬‬
‫‪E: Write the result to the‬‬
‫‪destination register‬‬ ‫4‪R‬‬

‫‪ALU result buffer‬‬
‫‪ADD‬‬ ‫‪I‬‬ ‫‪A‬‬ ‫‪E‬‬

‫‪SUB‬‬ ‫‪I‬‬ ‫‪A‬‬ ‫‪E‬‬ ‫‪Without Bypassing‬‬

‫‪SUB‬‬ ‫‪I‬‬ ‫‪A‬‬ ‫‪E‬‬ ‫‪With Bypassing‬‬


‫3‬

Delayed Load - ‫דוגמא‬
a = b + c; ‫סידור פקודות מחדש ע" המהדר‬
d = e - f;

Unscheduled code: Scheduled Code:
LW Rb, b LW Rb, b
LW Rc, c LW Rc, c
ADD Ra, Rb, Rc LW Re, e
SW a, Ra ADD Ra, Rb, Rc
LW Re, e LW Rf, f
LW Rf, f SW a, Ra
SUB Rd, Re, Rf SUB Rd, Re, Rf
SW d, Rd SW d, Rd

Delayed Load
A load requiring that the following instruction not use its result
7 Ron Shmueli

RISC PIPELINE
‫ ביעילות – ביצוע של פקודות בזמן מחזור יחיד‬Pipeline ‫ = ביצוע‬RISC -‫• ה‬
‫• סט פקודות פשוט‬
‫• פקודות באורך קבוע‬
.‫• פעולות בין אוגרים בלבד‬
‫– פקודה קצרה, ושימוש באוגרים – מאפשרים ביצוע פקודה ב 3 סיגמנטים‬
Data Manipulation Instructions
I: Instruction Fetch
A: Decode, Read Registers, ALU Operations
E: Write a Register

Load and Store Instructions
A: Decode, Evaluate Effective Address
E: Register-to-Memory or Memory-to-Register

Program Control Instructions
A: Decode, Evaluate Branch Address
8 E: Write Register(PC) Ron Shmueli

4

RISC -‫ ב‬DELAYED LOAD - ‫תמיכת המהדר‬
LOAD: R1  M[address 1]
Three-segment pipeline timing LOAD: R2  M[address 2]
Pipeline timing with data conflict ADD: R3  R1 + R2
STORE: M[address 3]  R3
clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E

Pipeline timing with delayed load

clock cycle 1 2 3 4 5 6 7
Load R1 I A E
Load R2 I A E
NOP I A E The data dependency is taken
Add R1+R2 I A E care by the compiler rather
Store R3 I A E than the hardware

9 Ron Shmueli

RISC Pipeline

RISC -‫ ב‬DELAYED BRANCH -‫תמיכת ה‬

Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps

Using no-operation instructions Rearranging the instructions
Clock cycles: 1 2 3 4 5 6 7 8
Clock cycles: 1 2 3 4 5 6 7 8 9 10 1. Load I A E
1. Load I A E 2. Increment I A E
2. Increment I A E 3. Branch to X I A E
3. Add I A E 4. Add I A E
4. Subtract I A E 5. Subtract I A E
5. Branch to X I A E 6. Instr. in X I A E
6. NOP I A E
7. NOP I A E
8. Instr. in X I A E

10 Ron Shmueli

5

Hennessy & Patterson - Computer Architecture
11 Ron Shmueli

12 Ron Shmueli

6

13 Ron Shmueli

14 Ron Shmueli

7

15 Ron Shmueli

16 Ron Shmueli

8

17 Ron Shmueli

18 Ron Shmueli

9

Data hazards

19 Ron Shmueli

Data hazards – HW interlock

‫עם חציית אוגרים‬

R2R1-R3

R12 R2 and R5

R13 R6 or R2

R14R2+R2

M[R2+100]  R15

20 Ron Shmueli

10

Data hazards - Forwarding

‫ + חציית אוגרים‬FORWARDING ‫עם‬

R2R1-R3

R12 R2 and R5

R13 R6 or R2

R14R2+R2

M[R2+100]  R15

21 Ron Shmueli

Data Hazards that cannot solved by FWD

!!! ‫נדרש חזרה בזמן‬

‫ משוכללת‬FWD ‫ הכרחי גם עם יחידת‬Stall

22 Ron Shmueli

11

Control Hazard on Branches (1)
• Static Option 1: Stall
– Stall pipe when branch is encountered until resolved
JMP ID PC Branch address dependency

bubble IF ID EX WB
• Stall impact: assumptions
– CPI = 1
– 20% of instructions are branches
– Stall 3 cycles on every taken branch
• CPI new = 1 + 0.2 × 3 = 1.6
– (CPI new = CPI Ideal + avg. stall cycles / instr.)
• We loose 60% of the performance

23 Ron Shmueli

Control Hazard on Branches (2)
• Static Option 2: Predict Not Taken
– Execute instructions from the fall-through (not-
taken), path
• As if there is no branch
• If the branch is not-taken (~50%), no penalty is paid
– If branch actually taken
• Flush the fall-through path instructions before they
change the machine state (memory / registers).
• Fetch the instructions from the correct (taken) path
– Assuming ~50% branches not taken on average
• CPI new = 1 + (0.2 × 0.5) × 3 = 1.3

24 Ron Shmueli

12

25 Ron Shmueli

26 Ron Shmueli

13

27 Ron Shmueli

BTB
• Allocation
– Allocate instructions identified as branches (after decode)
• Both conditional and unconditional branches are allocated
– Not taken branches need not be allocated
• BTB miss implicitly predicts not-taken
• Prediction
– BTB lookup is done parallel to IC lookup
– BTB provides
• Indication that the instruction is a branch (BTB hits)
• Branch predicted target
• Branch predicted direction
• Branch predicted type (e.g., conditional, unconditional)
• Update (when branch outcome is known)
– Branch target
– Branch history (taken / not-taken)

28 Ron Shmueli

14

BTB (cont)
• Wrong prediction
– Predict not-taken, actual taken
– Predict taken, actual not-taken.
• In case of wrong prediction – flush the pipeline
– Reset latches (same as making all instructions to be NOPs)
– Select the PC source to be from the correct path
• Need get the fall-through with the branch
– Start fetching instruction from correct path
• Assuming P% correct prediction rate
– 20% of instructions are branches
• CPI new = 1 + (0.2 × (1-P)) × 3
– For example, if P=0.7
• CPI new = 1 + (0.2 × 0.3) × 3 = 1.18

29 Ron Shmueli

30 Ron Shmueli

15

‫נתון מעבד , אשר משתמש במנגנון ‪ BTB‬לחיזוי התנהגות של פקודות ‪.branch‬‬

‫להניח מצב התחלתי ‪NT‬‬
‫בדוגמא הנ"ל‬


‫61‬

Ohad1

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Empfohlen

Empfohlen (20)

Ohad1