SlideShare ist ein Scribd-Unternehmen logo
1 von 16
‫מבוא למחשבים‬
                      ‫‪Pipeline Processing‬‬
                ‫השקפים מבוססים על הספרים‬
                      ‫פרק 9 – ‪Mano‬‬
                      ‫‪Hennessy & Patterson‬‬

                             ‫ד"ר רון שמואלי‬
                          ‫‪rshmueli@bgu.ac.il‬‬
  ‫1‬                                ‫‪Ron Shmueli‬‬




           ‫חישוב גורם האצה של ה- ‪PIPELINE‬‬
                                        ‫• ‪ - n‬מספר המשימות לביצוע.‬
                                    ‫• במחשב רגיל ) לא ‪.(Pipelined‬‬
                                             ‫• ‪ -tn‬הזמן להשלים משימה.‬
                                  ‫• ‪ n*tn‬הזמן הנדרש להשלמת ‪ n‬משימות.‬
                                                       ‫• במכונת ‪Pipelined‬‬
                     ‫‪ – Tp‬זמן מחזור של השעון )זמן לסיום כל סיגמנט(‬    ‫•‬
                              ‫‪ – Ktp‬זמן להשלמת המשימה הראשונה.‬        ‫•‬
                     ‫‪ - (n-1)tp‬הזמן להשלמת 1-‪ n‬המשימות הנותרות.‬       ‫•‬
      ‫תוצאות מיטביות‬         ‫‪ – (k+n-1)tp‬הזמן להשלמת ‪ n‬משימות.‬        ‫•‬
 ‫ה- ‪ PIPELINE‬תמיד מלא‬
‫גורם האצה תיאורטי הוא ‪K‬‬
                                        ‫• גורם ההאצה ‪Speedup‬‬
    ‫) ‪ K‬מספר הסגמנטים(‬       ‫‪Sk = n*tn / (k + n - 1)*tp‬‬
                             ‫‪tn‬‬
                ‫‪lim‬‬   ‫= ‪Sk‬‬        ‫) ‪= k, ( if tn = k * tp‬‬    ‫‪tn=ktp‬‬   ‫בהנחה‬
                ‫‪n‬‬          ‫‪tp‬‬

  ‫2‬                                ‫‪Ron Shmueli‬‬




                                                                              ‫1‬
A=ax2 p                                         B=bx2 q
                      p                a                              q              b
                                                                                                          4-STAGE
                                                                                                         FLOATING
Stages:                                             Other
                                                                                                           POINT
                      Exponent                                            Fraction
  S1                  subtractor
                                                    fraction
                                                                          selector
                                                                                Fraction with min(p,q)
                                                                                                           ADDER
       r = max(p,q)
                                                                  Right shifter
                                                   t = |p - q|




  S2                                         Fraction
                                                                                                    ‫מימוש חלקי‬
                                              adder
                          r                        c                                        ‫לארכיטקטורה לסיכום‬
                                                                                             ‫שני מספרים בנקודה‬
                                           Leading zero
  S3                                         counter
                                                                          c                                ‫צפה‬
                                                             Left shifter
                          r


                          Exponent
                                                                          d
                                                                                                          ‫לחשב זמנים‬
  S4                        adder


                                   s                                      d
                                                    r             s
                              C=A+B=cx2 =dx2
                              (r = max (p,q), 0.5  d < 1)
       3                                                                  Ron Shmueli




                                           INSTRUCTION PIPELINE
   Six Phases* in an                         [1]    Fetch an instruction from memory
   Instruction Cycle                         [2]    Decode the instruction
                                             [3]    Calculate the effective address of the operand
                                             [4]    Fetch the operands from memory
                                             [5]    Execute the operation
                                             [6]    Store the result in the proper place

                                             * Some instructions skip some phases
                                             * Effective address calculation can be done in
                                               the part of the decoding phase
                                             * Storage of the operation result into a register
                                               is done automatically in the execution phase
    4-Stage Pipeline
                                             [1] FI: Fetch an instruction from memory
                                             [2] DA: Decode the instruction and calculate
                                                     the effective address of the operand
                                             [3] FO: Fetch the operand
                                             [4] EX: Execute the operation


       4                                                                  Ron Shmueli




                                                                                                                       2
‫סיכונים בביצוע ב- ‪Pipeline‬‬
      ‫הפרדת פקודות ונתונים‬      ‫‪ - Structural hazards‬גישה לזכרון ע"י שני סיגמנטים באותו זמן. -‬      ‫•‬
                                             ‫‪- Data hazards‬תלות בין הנתונים המתבצעים.‬               ‫•‬
        ‫• ‪ –Hardware interlock‬מעגל המשהה את הפקודה התלויה במחזורי שעון בהתאם‬
     ‫• ‪ – Operand forwarding‬חומרה שמגלה תלות בין פקודות ומנתבת‬
                          ‫תוצאה במסלול חלופי ישירות לסגמנט הבא.‬
               ‫• ‪ - Delayed load‬הקומפיילר מסדר את הפקודות מחדש או הכנסת ‪NOPs‬‬
                                      ‫‪- Control Hazards‬פקודות לבקרת תוכנית-הסתעפות.‬                 ‫•‬
             ‫‪ - PRE-FETCH‬ביצוע ‪ FETCH‬לכתובת הבאה ולכתובת ההסתעפות.‬                                  ‫•‬
                    ‫)‪) Branch target Buffer (BTB‬ניהול הזיכרון אסוציאטיבי(‬                           ‫•‬
         ‫)‪ –Loop Buffer (High Speed Register file‬שמירת הלולאה כולה כולל‬                             ‫•‬
                 ‫ההסתעפות באוגרים – הלולאה כולה מתבצעת ללא גישה לזיכרון.‬
  ‫‪ -Branch Prediction‬ניסיון לנחש את תוצאת ההסתעפות – והבאת הפקודות‬                                  ‫•‬
                                                  ‫המתאימות בהתאם לניחוש.‬
  ‫‪ - Delayed Branch‬המהדר מסדר את הפקודות מחדש, מכניס פקודות שימושיות‬                                ‫•‬
                                ‫– כך שהצינור יישאר מלא כאשר ישנה הסתעפות.‬



 ‫5‬                                     ‫‪Ron Shmueli‬‬




                  ‫דוגמא ליישום ‪Operand FORWARDING‬‬

‫:‪Example‬‬
                                                                        ‫‪Register‬‬
                                                                          ‫‪file‬‬
‫‪ADD‬‬     ‫3‪R1, R2, R‬‬
‫‪SUB‬‬     ‫5‪R4, R1, R‬‬

‫‪3-stage Pipeline‬‬                                                  ‫‪MUX‬‬              ‫‪MUX‬‬        ‫‪Bypass‬‬
                                                                                              ‫‪path‬‬
‫‪I: Instruction Fetch‬‬                        ‫‪Result‬‬
                                            ‫‪write bus‬‬
‫,‪A: Decode, Read Registers‬‬                                                ‫‪ALU‬‬
    ‫‪ALU Operations‬‬
‫‪E: Write the result to the‬‬
         ‫‪destination register‬‬                                             ‫4‪R‬‬

                                                                                ‫‪ALU result buffer‬‬
‫‪ADD‬‬        ‫‪I‬‬      ‫‪A‬‬    ‫‪E‬‬


‫‪SUB‬‬               ‫‪I‬‬        ‫‪A‬‬   ‫‪E‬‬      ‫‪Without Bypassing‬‬


 ‫‪SUB‬‬              ‫‪I‬‬    ‫‪A‬‬   ‫‪E‬‬          ‫‪With Bypassing‬‬

 ‫6‬                                     ‫‪Ron Shmueli‬‬




                                                                                                        ‫3‬
Delayed Load - ‫דוגמא‬
                   a = b + c;            ‫סידור פקודות מחדש ע" המהדר‬
                   d = e - f;

Unscheduled code:                           Scheduled Code:
     LW     Rb, b                             LW     Rb, b
     LW     Rc, c                             LW     Rc, c
     ADD    Ra, Rb, Rc                        LW     Re, e
     SW     a, Ra                             ADD    Ra, Rb, Rc
     LW     Re, e                             LW     Rf, f
     LW     Rf, f                             SW     a, Ra
     SUB    Rd, Re, Rf                        SUB    Rd, Re, Rf
     SW     d, Rd                             SW     d, Rd



 Delayed Load
     A load requiring that the following instruction not use its result
 7                                 Ron Shmueli




                            RISC PIPELINE
        ‫ ביעילות – ביצוע של פקודות בזמן מחזור יחיד‬Pipeline ‫ = ביצוע‬RISC -‫• ה‬
                                                     ‫• סט פקודות פשוט‬
                                                  ‫• פקודות באורך קבוע‬
                                              .‫• פעולות בין אוגרים בלבד‬
       ‫– פקודה קצרה, ושימוש באוגרים – מאפשרים ביצוע פקודה ב 3 סיגמנטים‬
 Data Manipulation Instructions
     I:  Instruction Fetch
     A: Decode, Read Registers, ALU Operations
     E: Write a Register

 Load and Store Instructions
    I:    Instruction Fetch
    A: Decode, Evaluate Effective Address
    E: Register-to-Memory or Memory-to-Register

 Program Control Instructions
     I:  Instruction Fetch
     A: Decode, Evaluate Branch Address
  8 E:   Write Register(PC)    Ron Shmueli




                                                                               4
RISC -‫ ב‬DELAYED LOAD                             - ‫תמיכת המהדר‬
                                                            LOAD:        R1  M[address 1]
Three-segment pipeline timing                               LOAD:        R2  M[address 2]
 Pipeline timing with data conflict                         ADD:         R3  R1 + R2
                                                            STORE:       M[address 3]  R3
           clock cycle         1 2 3 4 5 6
             Load R1           I A E
             Load R2             I A E
             Add R1+R2             I A E
             Store R3                 I A E

 Pipeline timing with delayed load

           clock cycle        1 2 3 4 5 6 7
             Load R1          I A E
             Load R2            I A E
             NOP                  I A E                         The data dependency is taken
             Add R1+R2              I A E                       care by the compiler rather
             Store R3                  I A E                    than the hardware


  9                                       Ron Shmueli




                                                                                    RISC Pipeline


         RISC -‫ ב‬DELAYED BRANCH                                    -‫תמיכת ה‬

       Compiler analyzes the instructions before and after
       the branch and rearranges the program sequence by
       inserting useful instructions in the delay steps



      Using no-operation instructions                   Rearranging the instructions
                                                        Clock cycles:    1 2 3 4 5 6 7 8
  Clock cycles:    1 2 3 4 5 6 7 8 9 10                 1. Load          I A E
  1. Load          I A E                                2. Increment       I A E
  2. Increment       I A E                              3. Branch to X       I A E
  3. Add               I A E                            4. Add                 I A E
  4. Subtract            I A E                          5. Subtract              I A E
  5. Branch to X           I A E                        6. Instr. in X             I A E
  6. NOP                     I A E
  7. NOP                       I A E
  8. Instr. in X                 I A E




  10                                      Ron Shmueli




                                                                                                    5
Hennessy & Patterson - Computer Architecture
    11                                  Ron Shmueli




    12                                  Ron Shmueli




                                                      6
13   Ron Shmueli




14   Ron Shmueli




                   7
15   Ron Shmueli




16   Ron Shmueli




                   8
17   Ron Shmueli




18   Ron Shmueli




                   9
Data hazards




19                        Ron Shmueli




          Data hazards – HW interlock


                                    ‫עם חציית אוגרים‬



     R2R1-R3


     R12 R2 and R5


     R13 R6 or R2


     R14R2+R2


     M[R2+100]  R15


20                        Ron Shmueli




                                                      10
Data hazards - Forwarding


                        ‫ + חציית אוגרים‬FORWARDING ‫עם‬



      R2R1-R3


      R12 R2 and R5


      R13 R6 or R2


      R14R2+R2


      M[R2+100]  R15


 21                          Ron Shmueli




Data Hazards that cannot solved by FWD


                                              !!! ‫נדרש חזרה בזמן‬




                                      ‫ משוכללת‬FWD ‫ הכרחי גם עם יחידת‬Stall




 22                          Ron Shmueli




                                                                            11
Control Hazard on Branches (1)
• Static Option 1: Stall
  – Stall pipe when branch is encountered until resolved
                                        JMP   ID        PC   Branch address dependency

                                                   bubble     IF   ID   EX   WB
• Stall impact: assumptions
  – CPI = 1
  – 20% of instructions are branches
  – Stall 3 cycles on every taken branch
• CPI new = 1 + 0.2 × 3 = 1.6
  – (CPI new = CPI Ideal + avg. stall cycles / instr.)
• We loose 60% of the performance

 23                            Ron Shmueli




       Control Hazard on Branches (2)
 • Static Option 2: Predict Not Taken
      – Execute instructions from the fall-through (not-
        taken), path
         • As if there is no branch
         • If the branch is not-taken (~50%), no penalty is paid
      – If branch actually taken
         • Flush the fall-through path instructions before they
           change the machine state (memory / registers).
         • Fetch the instructions from the correct (taken) path
      – Assuming ~50% branches not taken on average
         • CPI new = 1 + (0.2 × 0.5) × 3 = 1.3


 24                            Ron Shmueli




                                                                                         12
25   Ron Shmueli




26   Ron Shmueli




                   13
27                               Ron Shmueli




                                  BTB
• Allocation
     – Allocate instructions identified as branches (after decode)
        • Both conditional and unconditional branches are allocated
     – Not taken branches need not be allocated
        • BTB miss implicitly predicts not-taken
• Prediction
     – BTB lookup is done parallel to IC lookup
     – BTB provides
        •   Indication that the instruction is a branch (BTB hits)
        •   Branch predicted target
        •   Branch predicted direction
        •   Branch predicted type (e.g., conditional, unconditional)
• Update (when branch outcome is known)
     – Branch target
     – Branch history (taken / not-taken)

28                               Ron Shmueli




                                                                       14
BTB (cont)
• Wrong prediction
     – Predict not-taken, actual taken
     – Predict taken, actual not-taken.
• In case of wrong prediction – flush the pipeline
     – Reset latches (same as making all instructions to be NOPs)
     – Select the PC source to be from the correct path
         • Need get the fall-through with the branch
     – Start fetching instruction from correct path
• Assuming P% correct prediction rate
     – 20% of instructions are branches
• CPI new = 1 + (0.2 × (1-P)) × 3
     – For example, if P=0.7
• CPI new = 1 + (0.2 × 0.3) × 3 = 1.18

29                              Ron Shmueli




30                              Ron Shmueli




                                                                    15
‫נתון מעבד , אשר משתמש במנגנון ‪ BTB‬לחיזוי התנהגות של פקודות ‪.branch‬‬




                                                 ‫להניח מצב התחלתי ‪NT‬‬
                                                         ‫בדוגמא הנ"ל‬




 ‫13‬                         ‫‪Ron Shmueli‬‬




                                                                       ‫61‬

Weitere ähnliche Inhalte

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Ohad1

  • 1. ‫מבוא למחשבים‬ ‫‪Pipeline Processing‬‬ ‫השקפים מבוססים על הספרים‬ ‫פרק 9 – ‪Mano‬‬ ‫‪Hennessy & Patterson‬‬ ‫ד"ר רון שמואלי‬ ‫‪rshmueli@bgu.ac.il‬‬ ‫1‬ ‫‪Ron Shmueli‬‬ ‫חישוב גורם האצה של ה- ‪PIPELINE‬‬ ‫• ‪ - n‬מספר המשימות לביצוע.‬ ‫• במחשב רגיל ) לא ‪.(Pipelined‬‬ ‫• ‪ -tn‬הזמן להשלים משימה.‬ ‫• ‪ n*tn‬הזמן הנדרש להשלמת ‪ n‬משימות.‬ ‫• במכונת ‪Pipelined‬‬ ‫‪ – Tp‬זמן מחזור של השעון )זמן לסיום כל סיגמנט(‬ ‫•‬ ‫‪ – Ktp‬זמן להשלמת המשימה הראשונה.‬ ‫•‬ ‫‪ - (n-1)tp‬הזמן להשלמת 1-‪ n‬המשימות הנותרות.‬ ‫•‬ ‫תוצאות מיטביות‬ ‫‪ – (k+n-1)tp‬הזמן להשלמת ‪ n‬משימות.‬ ‫•‬ ‫ה- ‪ PIPELINE‬תמיד מלא‬ ‫גורם האצה תיאורטי הוא ‪K‬‬ ‫• גורם ההאצה ‪Speedup‬‬ ‫) ‪ K‬מספר הסגמנטים(‬ ‫‪Sk = n*tn / (k + n - 1)*tp‬‬ ‫‪tn‬‬ ‫‪lim‬‬ ‫= ‪Sk‬‬ ‫) ‪= k, ( if tn = k * tp‬‬ ‫‪tn=ktp‬‬ ‫בהנחה‬ ‫‪n‬‬ ‫‪tp‬‬ ‫2‬ ‫‪Ron Shmueli‬‬ ‫1‬
  • 2. A=ax2 p B=bx2 q p a q b 4-STAGE FLOATING Stages: Other POINT Exponent Fraction S1 subtractor fraction selector Fraction with min(p,q) ADDER r = max(p,q) Right shifter t = |p - q| S2 Fraction ‫מימוש חלקי‬ adder r c ‫לארכיטקטורה לסיכום‬ ‫שני מספרים בנקודה‬ Leading zero S3 counter c ‫צפה‬ Left shifter r Exponent d ‫לחשב זמנים‬ S4 adder s d r s C=A+B=cx2 =dx2 (r = max (p,q), 0.5  d < 1) 3 Ron Shmueli INSTRUCTION PIPELINE Six Phases* in an [1] Fetch an instruction from memory Instruction Cycle [2] Decode the instruction [3] Calculate the effective address of the operand [4] Fetch the operands from memory [5] Execute the operation [6] Store the result in the proper place * Some instructions skip some phases * Effective address calculation can be done in the part of the decoding phase * Storage of the operation result into a register is done automatically in the execution phase 4-Stage Pipeline [1] FI: Fetch an instruction from memory [2] DA: Decode the instruction and calculate the effective address of the operand [3] FO: Fetch the operand [4] EX: Execute the operation 4 Ron Shmueli 2
  • 3. ‫סיכונים בביצוע ב- ‪Pipeline‬‬ ‫הפרדת פקודות ונתונים‬ ‫‪ - Structural hazards‬גישה לזכרון ע"י שני סיגמנטים באותו זמן. -‬ ‫•‬ ‫‪- Data hazards‬תלות בין הנתונים המתבצעים.‬ ‫•‬ ‫• ‪ –Hardware interlock‬מעגל המשהה את הפקודה התלויה במחזורי שעון בהתאם‬ ‫• ‪ – Operand forwarding‬חומרה שמגלה תלות בין פקודות ומנתבת‬ ‫תוצאה במסלול חלופי ישירות לסגמנט הבא.‬ ‫• ‪ - Delayed load‬הקומפיילר מסדר את הפקודות מחדש או הכנסת ‪NOPs‬‬ ‫‪- Control Hazards‬פקודות לבקרת תוכנית-הסתעפות.‬ ‫•‬ ‫‪ - PRE-FETCH‬ביצוע ‪ FETCH‬לכתובת הבאה ולכתובת ההסתעפות.‬ ‫•‬ ‫)‪) Branch target Buffer (BTB‬ניהול הזיכרון אסוציאטיבי(‬ ‫•‬ ‫)‪ –Loop Buffer (High Speed Register file‬שמירת הלולאה כולה כולל‬ ‫•‬ ‫ההסתעפות באוגרים – הלולאה כולה מתבצעת ללא גישה לזיכרון.‬ ‫‪ -Branch Prediction‬ניסיון לנחש את תוצאת ההסתעפות – והבאת הפקודות‬ ‫•‬ ‫המתאימות בהתאם לניחוש.‬ ‫‪ - Delayed Branch‬המהדר מסדר את הפקודות מחדש, מכניס פקודות שימושיות‬ ‫•‬ ‫– כך שהצינור יישאר מלא כאשר ישנה הסתעפות.‬ ‫5‬ ‫‪Ron Shmueli‬‬ ‫דוגמא ליישום ‪Operand FORWARDING‬‬ ‫:‪Example‬‬ ‫‪Register‬‬ ‫‪file‬‬ ‫‪ADD‬‬ ‫3‪R1, R2, R‬‬ ‫‪SUB‬‬ ‫5‪R4, R1, R‬‬ ‫‪3-stage Pipeline‬‬ ‫‪MUX‬‬ ‫‪MUX‬‬ ‫‪Bypass‬‬ ‫‪path‬‬ ‫‪I: Instruction Fetch‬‬ ‫‪Result‬‬ ‫‪write bus‬‬ ‫,‪A: Decode, Read Registers‬‬ ‫‪ALU‬‬ ‫‪ALU Operations‬‬ ‫‪E: Write the result to the‬‬ ‫‪destination register‬‬ ‫4‪R‬‬ ‫‪ALU result buffer‬‬ ‫‪ADD‬‬ ‫‪I‬‬ ‫‪A‬‬ ‫‪E‬‬ ‫‪SUB‬‬ ‫‪I‬‬ ‫‪A‬‬ ‫‪E‬‬ ‫‪Without Bypassing‬‬ ‫‪SUB‬‬ ‫‪I‬‬ ‫‪A‬‬ ‫‪E‬‬ ‫‪With Bypassing‬‬ ‫6‬ ‫‪Ron Shmueli‬‬ ‫3‬
  • 4. Delayed Load - ‫דוגמא‬ a = b + c; ‫סידור פקודות מחדש ע" המהדר‬ d = e - f; Unscheduled code: Scheduled Code: LW Rb, b LW Rb, b LW Rc, c LW Rc, c ADD Ra, Rb, Rc LW Re, e SW a, Ra ADD Ra, Rb, Rc LW Re, e LW Rf, f LW Rf, f SW a, Ra SUB Rd, Re, Rf SUB Rd, Re, Rf SW d, Rd SW d, Rd Delayed Load A load requiring that the following instruction not use its result 7 Ron Shmueli RISC PIPELINE ‫ ביעילות – ביצוע של פקודות בזמן מחזור יחיד‬Pipeline ‫ = ביצוע‬RISC -‫• ה‬ ‫• סט פקודות פשוט‬ ‫• פקודות באורך קבוע‬ .‫• פעולות בין אוגרים בלבד‬ ‫– פקודה קצרה, ושימוש באוגרים – מאפשרים ביצוע פקודה ב 3 סיגמנטים‬ Data Manipulation Instructions I: Instruction Fetch A: Decode, Read Registers, ALU Operations E: Write a Register Load and Store Instructions I: Instruction Fetch A: Decode, Evaluate Effective Address E: Register-to-Memory or Memory-to-Register Program Control Instructions I: Instruction Fetch A: Decode, Evaluate Branch Address 8 E: Write Register(PC) Ron Shmueli 4
  • 5. RISC -‫ ב‬DELAYED LOAD - ‫תמיכת המהדר‬ LOAD: R1  M[address 1] Three-segment pipeline timing LOAD: R2  M[address 2] Pipeline timing with data conflict ADD: R3  R1 + R2 STORE: M[address 3]  R3 clock cycle 1 2 3 4 5 6 Load R1 I A E Load R2 I A E Add R1+R2 I A E Store R3 I A E Pipeline timing with delayed load clock cycle 1 2 3 4 5 6 7 Load R1 I A E Load R2 I A E NOP I A E The data dependency is taken Add R1+R2 I A E care by the compiler rather Store R3 I A E than the hardware 9 Ron Shmueli RISC Pipeline RISC -‫ ב‬DELAYED BRANCH -‫תמיכת ה‬ Compiler analyzes the instructions before and after the branch and rearranges the program sequence by inserting useful instructions in the delay steps Using no-operation instructions Rearranging the instructions Clock cycles: 1 2 3 4 5 6 7 8 Clock cycles: 1 2 3 4 5 6 7 8 9 10 1. Load I A E 1. Load I A E 2. Increment I A E 2. Increment I A E 3. Branch to X I A E 3. Add I A E 4. Add I A E 4. Subtract I A E 5. Subtract I A E 5. Branch to X I A E 6. Instr. in X I A E 6. NOP I A E 7. NOP I A E 8. Instr. in X I A E 10 Ron Shmueli 5
  • 6. Hennessy & Patterson - Computer Architecture 11 Ron Shmueli 12 Ron Shmueli 6
  • 7. 13 Ron Shmueli 14 Ron Shmueli 7
  • 8. 15 Ron Shmueli 16 Ron Shmueli 8
  • 9. 17 Ron Shmueli 18 Ron Shmueli 9
  • 10. Data hazards 19 Ron Shmueli Data hazards – HW interlock ‫עם חציית אוגרים‬ R2R1-R3 R12 R2 and R5 R13 R6 or R2 R14R2+R2 M[R2+100]  R15 20 Ron Shmueli 10
  • 11. Data hazards - Forwarding ‫ + חציית אוגרים‬FORWARDING ‫עם‬ R2R1-R3 R12 R2 and R5 R13 R6 or R2 R14R2+R2 M[R2+100]  R15 21 Ron Shmueli Data Hazards that cannot solved by FWD !!! ‫נדרש חזרה בזמן‬ ‫ משוכללת‬FWD ‫ הכרחי גם עם יחידת‬Stall 22 Ron Shmueli 11
  • 12. Control Hazard on Branches (1) • Static Option 1: Stall – Stall pipe when branch is encountered until resolved JMP ID PC Branch address dependency bubble IF ID EX WB • Stall impact: assumptions – CPI = 1 – 20% of instructions are branches – Stall 3 cycles on every taken branch • CPI new = 1 + 0.2 × 3 = 1.6 – (CPI new = CPI Ideal + avg. stall cycles / instr.) • We loose 60% of the performance 23 Ron Shmueli Control Hazard on Branches (2) • Static Option 2: Predict Not Taken – Execute instructions from the fall-through (not- taken), path • As if there is no branch • If the branch is not-taken (~50%), no penalty is paid – If branch actually taken • Flush the fall-through path instructions before they change the machine state (memory / registers). • Fetch the instructions from the correct (taken) path – Assuming ~50% branches not taken on average • CPI new = 1 + (0.2 × 0.5) × 3 = 1.3 24 Ron Shmueli 12
  • 13. 25 Ron Shmueli 26 Ron Shmueli 13
  • 14. 27 Ron Shmueli BTB • Allocation – Allocate instructions identified as branches (after decode) • Both conditional and unconditional branches are allocated – Not taken branches need not be allocated • BTB miss implicitly predicts not-taken • Prediction – BTB lookup is done parallel to IC lookup – BTB provides • Indication that the instruction is a branch (BTB hits) • Branch predicted target • Branch predicted direction • Branch predicted type (e.g., conditional, unconditional) • Update (when branch outcome is known) – Branch target – Branch history (taken / not-taken) 28 Ron Shmueli 14
  • 15. BTB (cont) • Wrong prediction – Predict not-taken, actual taken – Predict taken, actual not-taken. • In case of wrong prediction – flush the pipeline – Reset latches (same as making all instructions to be NOPs) – Select the PC source to be from the correct path • Need get the fall-through with the branch – Start fetching instruction from correct path • Assuming P% correct prediction rate – 20% of instructions are branches • CPI new = 1 + (0.2 × (1-P)) × 3 – For example, if P=0.7 • CPI new = 1 + (0.2 × 0.3) × 3 = 1.18 29 Ron Shmueli 30 Ron Shmueli 15
  • 16. ‫נתון מעבד , אשר משתמש במנגנון ‪ BTB‬לחיזוי התנהגות של פקודות ‪.branch‬‬ ‫להניח מצב התחלתי ‪NT‬‬ ‫בדוגמא הנ"ל‬ ‫13‬ ‫‪Ron Shmueli‬‬ ‫61‬