SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Advanced Pipelining in ARM
Processors
Prof. J.K.Das
School of Electronics Engineering
KIIT Deemed to be University
Pipelining: Review of the Basics
• ARM processor is a RISC but differs in some features from pure RISC
(Variable execution cycle for special instructions, inline barrel shifter,
Thumb Inst. Set, Conditional exec., DSP instructions)
• Regular ARM Architecture:
Load/Store Architecture
 Uniform Register Array
Fixed Length 32-bit instructions
3-address Instructions.
• System Speed- Latency and Throughput
Latency: Time required for a single instruction to pass through a system from start to
end.
Throughput: No. of instructions that can be executed in 1 machine cycle.
 Speed α Latency, Throughput
Pipelining: Overview
• Mechanism to speed up the regular execution by fetching the next
instruction while the present instruction is being decoded and executed.
• Induces Parallelism- executing several instructions at a time.
• Pipelining (Temporal Parallelism) should ideally increase throughput
without any penalty in latency.
• Pipelining divides the instruction cycle into multiple stages:
Fetch → Decode → Execute → Memory operations
Disadvantage: If the level of pipelining is increased and the instruction
spends more time in the pipeline, data dependencies start to surface.
*(Dependency- The execution of the present instruction depends on intermediate results from some previous instruction)
Pipelining in ARM
• ARM implements different pipeline stages in its architectures.
 ARM 7- 3 stage Pipeline
 ARM 9- 5 stage Pipeline
 ARM 10- 6 stage Pipeline
 ARM 11- 7 stage Pipeline
PROBLEMS IN 5 STAGE PIPELINE
• Ideally IPC=1 when pipelining is
implemented
• Incase of complex branching instructions:
Control Hazards- PC value modified
resulting in Pipeline FLUSH
Data Hazards- NOPs and loading of
new branch instructions due to
FLUSH.
Interrupt execution leading to
modifying the inst. Already present in
the pipeline with those from IVT(Int.
Vector Table).
Solution:
1) Data Forwarding: Keeping the data to be required for the next instruction ready (use of multi-level Cache)
2) Branch Prediction: Predicting the result of Branching Instructions
6- Stage Pipelining
• Usually present in ARM 10 architecture
• Additional ISSUE stage added – takes total 6 cycles to complete 1 inst.
• Issue stage checks if the inst. is ready to be decoded in the current stage or not.
• If the inst. Is not ready it allow out of order execution by allowing the next inst. In the pipeline to
start processing in the available time gap.
• Branch Prediction Mechanism has been introduced to improve throughput.
• Reduces processor stalls by resolving the Hazards.
• Throughput is almost double of ARM 7 but latency is compromised (Trade-off)
Details of 6-Stage Pipelining
Branch
Target Buffer
• Conditional branch inst. delay the operation as it takes time to
evaluating the condition and determining the branching address
• Sol: predict the branch statically.
• Prediction requires branch target calculation which might
induce delays
• BTB(Branch Target Buffers) are used to reduce delay in
pipeline and make it efficient.
• BTB is essentially a simple cache memory which should have
sufficient size to maintain throughput of pipelining.
7-Stage Pipelining
• State-of-the-art pipelining mechanism used in ARM 11 and above.
• Implements data forwarding along with branch prediction.
• Stage 1: IT The Instruction Translate (IT) stage uses the TLB to translate the (virtual) PC address into a physical instruction
address. We can occasionally have a TLB miss, but you can safely ignore this possibility for this problem.
• Stage 2: IF The Instruction Fetch (IF) stage uses the physical instruction address computed in the IT stage to access the cache, and
fetches the 32-bit instruction stored at that address. You can safely ignore the possibility of a cache miss.
• Stage 3: ID The Instruction Decode (ID) stage first decodes the 32-bit instruction (e.g., identifying the opcode field, the rs, rt, and
rd fields, etc.). In the second half of the clock cycle, it reads the register file. It also computes the (virtual) target address, if the
instruction is a jump (j) or a branch (beq/bne).
• Stage 4: EX The Execute (EX) stage does the necessary ALU operations for the instruction. For branches, this includes resolving
the branch decision (taken or not-taken). For lw/sw instructions, the ALU computes the (virtual) data address from/to which data
is to be read/written.
• Stage 5: MT The Memory Translate (MT) stage translates the virtual data address into a physical data address using the TLB, if
the instruction is a lw or sw. As in the IT stage, you can safely ignore the possibility of a TLB miss.
• Stage 6: MM The Memory (MM) stage uses the physical data address computed in the MT stage to access the cache if the
instruction is a lw (data is read from the cache into the rt register) or a sw (data in the rt register is written to the cache). You can
safely ignore the possibility of a cache miss.
• Stage 7: WB The Write Back (WB) stage updates the register file (if necessary) in the first half of the clock cycle.

Weitere ähnliche Inhalte

Was ist angesagt?

Unit II arm 7 Instruction Set
Unit II arm 7 Instruction SetUnit II arm 7 Instruction Set
Unit II arm 7 Instruction SetDr. Pankaj Zope
 
SOC - system on a chip
SOC - system on a chipSOC - system on a chip
SOC - system on a chipParth Kavi
 
Digital signal processor architecture
Digital signal processor architectureDigital signal processor architecture
Digital signal processor architecturekomal mistry
 
Instruction set of 8086
Instruction set of 8086Instruction set of 8086
Instruction set of 80869840596838
 
SOC Application Studies: Image Compression
SOC Application Studies: Image CompressionSOC Application Studies: Image Compression
SOC Application Studies: Image CompressionA B Shinde
 
Arm organization and implementation
Arm organization and implementationArm organization and implementation
Arm organization and implementationShubham Singh
 
Design challenges in embedded systems
Design challenges in embedded systemsDesign challenges in embedded systems
Design challenges in embedded systemsmahalakshmimalini
 
Analog to Digital converter in ARM
Analog to Digital converter in ARMAnalog to Digital converter in ARM
Analog to Digital converter in ARMAarav Soni
 
ARM Exception and interrupts
ARM Exception and interrupts ARM Exception and interrupts
ARM Exception and interrupts NishmaNJ
 
Fpga architectures and applications
Fpga architectures and applicationsFpga architectures and applications
Fpga architectures and applicationsSudhanshu Janwadkar
 
8051 micro controllers Instruction set
8051 micro controllers Instruction set 8051 micro controllers Instruction set
8051 micro controllers Instruction set Nitin Ahire
 
Timer counter in arm7(lpc2148)
Timer counter in arm7(lpc2148)Timer counter in arm7(lpc2148)
Timer counter in arm7(lpc2148)Aarav Soni
 
Layout & Stick Diagram Design Rules
Layout & Stick Diagram Design RulesLayout & Stick Diagram Design Rules
Layout & Stick Diagram Design Rulesvarun kumar
 

Was ist angesagt? (20)

Unit II arm 7 Instruction Set
Unit II arm 7 Instruction SetUnit II arm 7 Instruction Set
Unit II arm 7 Instruction Set
 
SOC - system on a chip
SOC - system on a chipSOC - system on a chip
SOC - system on a chip
 
06. thumb instructions
06. thumb instructions06. thumb instructions
06. thumb instructions
 
ARM Processors
ARM ProcessorsARM Processors
ARM Processors
 
Digital signal processor architecture
Digital signal processor architectureDigital signal processor architecture
Digital signal processor architecture
 
Instruction set of 8086
Instruction set of 8086Instruction set of 8086
Instruction set of 8086
 
SOC Application Studies: Image Compression
SOC Application Studies: Image CompressionSOC Application Studies: Image Compression
SOC Application Studies: Image Compression
 
ARM- Programmer's Model
ARM- Programmer's ModelARM- Programmer's Model
ARM- Programmer's Model
 
Arm organization and implementation
Arm organization and implementationArm organization and implementation
Arm organization and implementation
 
8051 instruction set
8051 instruction set8051 instruction set
8051 instruction set
 
Design challenges in embedded systems
Design challenges in embedded systemsDesign challenges in embedded systems
Design challenges in embedded systems
 
Unit4.addressing modes 54 xx
Unit4.addressing modes 54 xxUnit4.addressing modes 54 xx
Unit4.addressing modes 54 xx
 
Analog to Digital converter in ARM
Analog to Digital converter in ARMAnalog to Digital converter in ARM
Analog to Digital converter in ARM
 
Vlsi design flow
Vlsi design flowVlsi design flow
Vlsi design flow
 
ARM Exception and interrupts
ARM Exception and interrupts ARM Exception and interrupts
ARM Exception and interrupts
 
Fpga architectures and applications
Fpga architectures and applicationsFpga architectures and applications
Fpga architectures and applications
 
8051 micro controllers Instruction set
8051 micro controllers Instruction set 8051 micro controllers Instruction set
8051 micro controllers Instruction set
 
Timer counter in arm7(lpc2148)
Timer counter in arm7(lpc2148)Timer counter in arm7(lpc2148)
Timer counter in arm7(lpc2148)
 
FPGA
FPGAFPGA
FPGA
 
Layout & Stick Diagram Design Rules
Layout & Stick Diagram Design RulesLayout & Stick Diagram Design Rules
Layout & Stick Diagram Design Rules
 

Ähnlich wie Advanced Pipelining in ARM Processors.pptx

Instruction pipelining
Instruction pipeliningInstruction pipelining
Instruction pipeliningTech_MX
 
Computer arithmetic in computer architecture
Computer arithmetic in computer architectureComputer arithmetic in computer architecture
Computer arithmetic in computer architectureishapadhy
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of ProcessorsGaditek
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 
Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorSmit Shah
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and functionSher Shah Merkhel
 
pipeline and pipeline hazards
pipeline and pipeline hazards pipeline and pipeline hazards
pipeline and pipeline hazards Bharti Khemani
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerAmrutaMehata
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and functiondilip kumar
 
IT209 Cpu Structure Report
IT209 Cpu Structure ReportIT209 Cpu Structure Report
IT209 Cpu Structure ReportBis Aquino
 
Computer_Organization and architecture _unit 1.pptx
Computer_Organization and architecture _unit 1.pptxComputer_Organization and architecture _unit 1.pptx
Computer_Organization and architecture _unit 1.pptxManimegalaM3
 
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Atollic
 
Pipelining of Processors Computer Architecture
Pipelining of  Processors Computer ArchitecturePipelining of  Processors Computer Architecture
Pipelining of Processors Computer ArchitectureHaris456
 

Ähnlich wie Advanced Pipelining in ARM Processors.pptx (20)

Instruction pipelining
Instruction pipeliningInstruction pipelining
Instruction pipelining
 
Computer arithmetic in computer architecture
Computer arithmetic in computer architectureComputer arithmetic in computer architecture
Computer arithmetic in computer architecture
 
Conditional branches
Conditional branchesConditional branches
Conditional branches
 
ch2.pptx
ch2.pptxch2.pptx
ch2.pptx
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of Processors
 
cs-procstruc.ppt
cs-procstruc.pptcs-procstruc.ppt
cs-procstruc.ppt
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline Processor
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
pipeline and pipeline hazards
pipeline and pipeline hazards pipeline and pipeline hazards
pipeline and pipeline hazards
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
 
Presentation on risc pipeline
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipeline
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
IT209 Cpu Structure Report
IT209 Cpu Structure ReportIT209 Cpu Structure Report
IT209 Cpu Structure Report
 
Computer_Organization and architecture _unit 1.pptx
Computer_Organization and architecture _unit 1.pptxComputer_Organization and architecture _unit 1.pptx
Computer_Organization and architecture _unit 1.pptx
 
Bc0040
Bc0040Bc0040
Bc0040
 
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
 
Pipelining of Processors Computer Architecture
Pipelining of  Processors Computer ArchitecturePipelining of  Processors Computer Architecture
Pipelining of Processors Computer Architecture
 

Kürzlich hochgeladen

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 

Kürzlich hochgeladen (20)

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 

Advanced Pipelining in ARM Processors.pptx

  • 1. Advanced Pipelining in ARM Processors Prof. J.K.Das School of Electronics Engineering KIIT Deemed to be University
  • 2. Pipelining: Review of the Basics • ARM processor is a RISC but differs in some features from pure RISC (Variable execution cycle for special instructions, inline barrel shifter, Thumb Inst. Set, Conditional exec., DSP instructions) • Regular ARM Architecture: Load/Store Architecture  Uniform Register Array Fixed Length 32-bit instructions 3-address Instructions. • System Speed- Latency and Throughput Latency: Time required for a single instruction to pass through a system from start to end. Throughput: No. of instructions that can be executed in 1 machine cycle.  Speed α Latency, Throughput
  • 3. Pipelining: Overview • Mechanism to speed up the regular execution by fetching the next instruction while the present instruction is being decoded and executed. • Induces Parallelism- executing several instructions at a time. • Pipelining (Temporal Parallelism) should ideally increase throughput without any penalty in latency. • Pipelining divides the instruction cycle into multiple stages: Fetch → Decode → Execute → Memory operations Disadvantage: If the level of pipelining is increased and the instruction spends more time in the pipeline, data dependencies start to surface. *(Dependency- The execution of the present instruction depends on intermediate results from some previous instruction)
  • 4. Pipelining in ARM • ARM implements different pipeline stages in its architectures.  ARM 7- 3 stage Pipeline  ARM 9- 5 stage Pipeline  ARM 10- 6 stage Pipeline  ARM 11- 7 stage Pipeline
  • 5. PROBLEMS IN 5 STAGE PIPELINE • Ideally IPC=1 when pipelining is implemented • Incase of complex branching instructions: Control Hazards- PC value modified resulting in Pipeline FLUSH Data Hazards- NOPs and loading of new branch instructions due to FLUSH. Interrupt execution leading to modifying the inst. Already present in the pipeline with those from IVT(Int. Vector Table). Solution: 1) Data Forwarding: Keeping the data to be required for the next instruction ready (use of multi-level Cache) 2) Branch Prediction: Predicting the result of Branching Instructions
  • 6. 6- Stage Pipelining • Usually present in ARM 10 architecture • Additional ISSUE stage added – takes total 6 cycles to complete 1 inst. • Issue stage checks if the inst. is ready to be decoded in the current stage or not. • If the inst. Is not ready it allow out of order execution by allowing the next inst. In the pipeline to start processing in the available time gap. • Branch Prediction Mechanism has been introduced to improve throughput. • Reduces processor stalls by resolving the Hazards. • Throughput is almost double of ARM 7 but latency is compromised (Trade-off)
  • 7. Details of 6-Stage Pipelining Branch Target Buffer • Conditional branch inst. delay the operation as it takes time to evaluating the condition and determining the branching address • Sol: predict the branch statically. • Prediction requires branch target calculation which might induce delays • BTB(Branch Target Buffers) are used to reduce delay in pipeline and make it efficient. • BTB is essentially a simple cache memory which should have sufficient size to maintain throughput of pipelining.
  • 8. 7-Stage Pipelining • State-of-the-art pipelining mechanism used in ARM 11 and above. • Implements data forwarding along with branch prediction. • Stage 1: IT The Instruction Translate (IT) stage uses the TLB to translate the (virtual) PC address into a physical instruction address. We can occasionally have a TLB miss, but you can safely ignore this possibility for this problem. • Stage 2: IF The Instruction Fetch (IF) stage uses the physical instruction address computed in the IT stage to access the cache, and fetches the 32-bit instruction stored at that address. You can safely ignore the possibility of a cache miss. • Stage 3: ID The Instruction Decode (ID) stage first decodes the 32-bit instruction (e.g., identifying the opcode field, the rs, rt, and rd fields, etc.). In the second half of the clock cycle, it reads the register file. It also computes the (virtual) target address, if the instruction is a jump (j) or a branch (beq/bne). • Stage 4: EX The Execute (EX) stage does the necessary ALU operations for the instruction. For branches, this includes resolving the branch decision (taken or not-taken). For lw/sw instructions, the ALU computes the (virtual) data address from/to which data is to be read/written. • Stage 5: MT The Memory Translate (MT) stage translates the virtual data address into a physical data address using the TLB, if the instruction is a lw or sw. As in the IT stage, you can safely ignore the possibility of a TLB miss. • Stage 6: MM The Memory (MM) stage uses the physical data address computed in the MT stage to access the cache if the instruction is a lw (data is read from the cache into the rt register) or a sw (data in the rt register is written to the cache). You can safely ignore the possibility of a cache miss. • Stage 7: WB The Write Back (WB) stage updates the register file (if necessary) in the first half of the clock cycle.