SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Control Dependency 1 Problem Dependency tracking in ARVI ignores control dependency Can’t get practical Available registers Make same patterns for different directions CANNOT predict that branch correctly
Example of control dependency Practical Available Register Set of branch 2 r1, r3 Available Register Set in ARVI of branch 2 r3 When r3==1 r1 ==0 -> not taken r1 !=0 -> taken 2
Dependence Tracking 3 Branch 1 not taken Branch 1 taken
Improved Data Dependence Tracking Resolve control dependency Add Control flow information to tracking Add logical register to architecture Called TA-register(target address) For maintaining Target address of last branch TA is hidden source register of instructions 4
Behavior of branch instruction Example  beq in MIPS instruction set architecture 5
Improved Dependence Tracking 6 Branch 1 not taken Branch 1 taken
ARS of Branch 13 Improved Tracking Track control dependency well When completed by the INST1 It’s different with practical ARS But it can be predicted well  because TA has control flow information When r3==1 TA==2 Not Taken Ta==4 Taken 7
Common code problem Performance loss in not control dependence code In common code ARS of Branch 15 When completed by the INST2 Practical ARS r2(INST1) Previously proposed Tracking r2(INST1) Improved Tracking r2(INST1), TA 8
Distinguishing control flow in improved tracking TA is wasted Information it’s not mean that the prediction isn’t correct But mean that predictor need more training Information to Train Previously Tracking r2 = 0 -> Taken Improved Tracking r2 =0, TA=5 -> Taken r2=0, TA=6 -> Taken 9
“SetTA” Instruction  Add “SetTA” Instruction  Save next instruction address to TA ARS of branch 15 is still r2 and TA But TA is always 6 Disadvantage Wasted Instructions(INST6) Programs will be Recompiled Have to find start of common code for adding “setTA”  at compile time It’s hard because an Assembly language is not the structured programming language(have “goto”) 10
Encoding Amount of information is changed by number of registers in ARS Amount of information Assume each length of values is 10bits 1 register in ARS => 10bits 2 registers in ARS => 20bits 3 registers in ARS => 30bits Must generate fixed length pattern from various length information -> HASH Various Encodings are possible 11
Encoding of ARVI XOR with each physical register values Simple XOR HASH with XOR tree 12
Reducing Hash conflict Programs more use lower bits than higher bits of registers Almost information is centralized in lower bits Hash conflict occurs  due to lower bits For decentralizing information distribution Different circular shifted values per logical register numbers Because physical number is changed in runtime 13
Percentage of use of each bit 14 ,[object Object]
Hash conflict occurs in that bits,[object Object]
Proposed Encoding XOR with each Logical register values Different Circular shifted by logical number Serialize physical-logical mapping Value information is shorter than before(Disadventage) 16
Select Logical Register X Select Logical Register X Select physical register value that mapped in logical register X 17
Delay nPR = Number of Physical Register nLR = Number of Logical Register  L = Log2(nLR) Simple XOR Hash Log2(nPR) * XOR2 + AND2 Proposed Hash Log2(nLR) *XOR2 + Select + AND2 Select = XOR2 + ANDL + Gate + OR(nPR) nPR > nLR *2 Log2(nPR) > Log2(nLR) + 1 Approximately same or little bit slower 18
HW Resource Simple XOR Hash nPR *N*AND2 + (nPR-1)*N*XOR2 nPR-1 * 3bitADD for Logical num tag Proposed Hash nPR *N*AND2 + nLR * Select + (nLR-1)*N*XOR2 Select = nPR *( L * (XOR2 +2Gate) + ANDL) + N * OR(NPR) No Logical num tag Pattern  has that information already 19
Suitable predictor for register-value-pattern Characteristic of register-value-pattern Need long pattern length for reliable prediction PHT is not suitable Must save tags for comparing states Perceptron is not suitable[17][18] Non-linear-separable[17][18] Each bit of value has relation of AND with others Perceptron is not suitable Many various patterns for branches If there is loop that r1 is changed from 0 to 999 There is 999 not taken patterns and 1 taken pattern Long Delay for pattern generation Perceptron is not suitable[17][18] Must hybrid with fast predictor[19][20] 20
Proposed predictor 21 Modified YAGS[21] 1 Bimodial Saving Biases for each branches 2 Cache Save only pattern that different with bias Taken Cache Saving Not taken patterns for taken biased branches  Not Taken Cache Saving Taken patterns for Not taken biased branches
Block diagram 22 1 Fast predictor predict direction in early cycle When Modified YAGS hit and Depth tag is same with now state Update fetch direction in late cycle When Modified YAGS miss then predicted direction of YAGS is bias and we don’t know it is not trained or trained but not save Selector select biased direction  or Fasted predictor direction
Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 23
Experimental environment SimpleScalar3.0  PISA Instruction Set Architecture Little Endean sim-outorder Performance-based Execution driven Cycle timer Benchmarks 10 programs of SPEC 2k Instructions coverage 150M ~ 250M instruction 24
Processor Architecture Configuration 25
Memory Architecture Configuration 26
Predictor Configuration 27
Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 28
Register-Value-Pattern predictor Register-Value-Pattern predictor predictor is predict like Human doing.  If we know “this branch was taken before when a=3 and b=4” We predict the branch without calculation when arrive a=3 and b=4 again. Commonsense design Why it’s not possible 100% accuracy?? 29
Factors of performance loss 1. Limitation of dependence tracking 1.1 Load Branch 1.2 Control Dependency 2. Hash conflict in encoding 3. Prediction Delay 4. Various Patterns for same direction 4.1 Pattern capacity of predictor 4.2 Lack of training 30
Contribution We improve some factors of performance loss 1.2 Control Dependency 2 Hash conflict in encoding 4.1 Pattern capacity of predictor But we still have assignments 31
Applications of Register-Value-Pattern Register Value Pattern has limits at different kinds of branches with Branch History Pattern Higher performance in hybrid predictor with Branch History Pattern Register-Value-Pattern with Branch register value based Depth of dependence chain is 0 Means Branch register is already updated We are good to use Branch register value based prediction in that case Register-Value-Pattern for Value prediction We can use register-value-pattern for value prediction as Information  32
Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 33
Reference [1] T. Yeh and Y. Patt. “Two-level Adaptive Branch Prediction” In Proc 24th ACM/IEEE IntSymp. on Microarchitecture, 1991. [2] T. Yeh and Y. Patt. “A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History” In Proc 20th Ann IntSymp. on Computer Architecture,1993. [3] S. Pan , K So and J. Rahmeh. “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation” In Proc 5th Annual Intl Conf. on Architectural Support for Prog. Lang. and Operating Systems, 1992. [4] R. Nair “Dynamic Path-Based Branch Correlation” In Proc 28th Ann IntSymp On Microarchitecture,1995. [5] D. Jim´enez“Fast Path-Based Neural Branch Prediction” In Proc 36th Ann IEEE/ACM IntSymp On Microarchitecure, 2003 34
Reference [6] F. Gabbay and A. Mendelson“Speculative Execution Based on Value Prediction” In Technical Report Technion, 1997 [7] J. Gonzalez and A. Gonzalez “Control-Flow Speculation through Value Prediction for Superscalar Processors” In Proc Int Conf On Parallel Architectures and Compilation Techniques, 1999 [8] T. Heil, Z. Smith and J.E. Smith “Improving Branch Predictor by Correlating on Data Value” In Proc 32nd IntSymp On Microarchitecture,1999. 35
Reference [9] K.Wang “Highly Accurate Data Value Prediction using Hybrid Predictors” In Proc 30thIntSymp on Microarchitecture, 1997. [10] M. Lipasti and J. Shen “Exceeding the Dataflow Limit via Value Prediction” In proc 29thIntSymp on Microarchitecture,1996. [11] W.Mohan and M.Franklin “Improving Data Value Prediction Accuracy using Path Correlation” In Proc 6thInt Conf on High performance Computing, 1999. [12] Y. Sazeides and J. Smith. “Implementations of Context Based Value Predictors” In Technical Report #ECE-TR-97- 8, University of Wisconsin-Madison, 1997. 36
Reference [13]L. N. Vintan, M. Sbera, I. Z. Mihu and A. Florea, "An alternative to branch prediction: pre-computed branches," In ACM SIGARCH Computer Architecture News archive Vol 31 , 2003. [14] L. He and Z. Liu, “A New Value Based Branch Predictor For SMT Processors” In Proc 16th IASTED Int Conf on Parallel and Distributed Computing and System, 2004 [15] Y. Pan, X. Fan, L. He, D. Wang “A bypass Mechanism to Enhance Branch Predictor for SMT”, In Proc 12th Asia-Pacific Conf on Computer Systems Architecture ACSAC2007, vol 4697, 2007 37
Reference [16]  L. Chen, S. Dropsho and D. H. Albonesi“Dynamic Data Dependence Tracking and its Application to Branch Prediction” In Proc 9th IntSymp on Highperformance Computer Architecture, 2003.  [17]D.A.Jim´enez and C.Lin. “Dynamic Branch Prediction with Perceptrons”.InProc 7thIntSymp.on High Performace Computer Architecutre,2001. [18] D.A.Jim´enez and C.Lin. “Neural Methods for Dynamic Branch Prediction”.In ACM Transactions on Computer Systems, 2002. 38
Reference [19] P. Chang , E. Hao and Y. Patt “Alternative Implementations of Hybrid Branch Predictors”.In Proc 28th Ann IntSymp.onMicroarchitecture, 1995. [20] M. Evers, P. Chang and Y. Patt “Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in The Presence of Context Switches”. In Proc 23rd Ann IntSymp. on Computer Architecture ,1996 [21] A.Eden and T. Mudge. “The YAGS branch prediction scheme”InProc 31st Ann ACM/IEEE IntSymp.onMicroarchitectres, 1998 [22] P. N. Glaskowsky. “Pentium 4 (partially) previewed. “In Microprocessor  Report, 2000. 39
40

Weitere ähnliche Inhalte

Ähnlich wie improved register value pattern generation for branch prediction

Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
Eugine Kang
 
Instruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionInstruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch prediction
IAEME Publication
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
RAHUL WAGAJ
 

Ähnlich wie improved register value pattern generation for branch prediction (20)

A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierA High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
 
J046026268
J046026268J046026268
J046026268
 
IRJET- Estimating Various DHT Protocols
IRJET- Estimating Various DHT ProtocolsIRJET- Estimating Various DHT Protocols
IRJET- Estimating Various DHT Protocols
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
 
Fpga based efficient multiplier for image processing applications using recur...
Fpga based efficient multiplier for image processing applications using recur...Fpga based efficient multiplier for image processing applications using recur...
Fpga based efficient multiplier for image processing applications using recur...
 
IRJET - Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
IRJET -  	  Low Power M-Sequence Code Generator using LFSR for Body Sensor No...IRJET -  	  Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
IRJET - Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
 
Comparative study of single precision floating point division using differen...
Comparative study of single precision floating point division  using differen...Comparative study of single precision floating point division  using differen...
Comparative study of single precision floating point division using differen...
 
Hybrid branch prediction for pipelined MIPS processor
Hybrid branch prediction for pipelined MIPS processor Hybrid branch prediction for pipelined MIPS processor
Hybrid branch prediction for pipelined MIPS processor
 
Description Of A Graph
Description Of A GraphDescription Of A Graph
Description Of A Graph
 
IRJET- RESULT:Wavelet Transform along with SPIHT Algorithm Used for Image Com...
IRJET- RESULT:Wavelet Transform along with SPIHT Algorithm Used for Image Com...IRJET- RESULT:Wavelet Transform along with SPIHT Algorithm Used for Image Com...
IRJET- RESULT:Wavelet Transform along with SPIHT Algorithm Used for Image Com...
 
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
 
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...
 
IRJET- Wavelet Transform along with SPIHT Algorithm used for Image Compre...
IRJET-  	  Wavelet Transform along with SPIHT Algorithm used for Image Compre...IRJET-  	  Wavelet Transform along with SPIHT Algorithm used for Image Compre...
IRJET- Wavelet Transform along with SPIHT Algorithm used for Image Compre...
 
ANALYSIS OF SIGNAL TRANSITION ACTIVITY IN FIR FILTERS IMPLEMENTED BY PARALLEL...
ANALYSIS OF SIGNAL TRANSITION ACTIVITY IN FIR FILTERS IMPLEMENTED BY PARALLEL...ANALYSIS OF SIGNAL TRANSITION ACTIVITY IN FIR FILTERS IMPLEMENTED BY PARALLEL...
ANALYSIS OF SIGNAL TRANSITION ACTIVITY IN FIR FILTERS IMPLEMENTED BY PARALLEL...
 
Analysis of signal transition
Analysis of signal transitionAnalysis of signal transition
Analysis of signal transition
 
Program and Network Properties
Program and Network PropertiesProgram and Network Properties
Program and Network Properties
 
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODELANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
 
Instruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionInstruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch prediction
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 

improved register value pattern generation for branch prediction

  • 1. Control Dependency 1 Problem Dependency tracking in ARVI ignores control dependency Can’t get practical Available registers Make same patterns for different directions CANNOT predict that branch correctly
  • 2. Example of control dependency Practical Available Register Set of branch 2 r1, r3 Available Register Set in ARVI of branch 2 r3 When r3==1 r1 ==0 -> not taken r1 !=0 -> taken 2
  • 3. Dependence Tracking 3 Branch 1 not taken Branch 1 taken
  • 4. Improved Data Dependence Tracking Resolve control dependency Add Control flow information to tracking Add logical register to architecture Called TA-register(target address) For maintaining Target address of last branch TA is hidden source register of instructions 4
  • 5. Behavior of branch instruction Example beq in MIPS instruction set architecture 5
  • 6. Improved Dependence Tracking 6 Branch 1 not taken Branch 1 taken
  • 7. ARS of Branch 13 Improved Tracking Track control dependency well When completed by the INST1 It’s different with practical ARS But it can be predicted well because TA has control flow information When r3==1 TA==2 Not Taken Ta==4 Taken 7
  • 8. Common code problem Performance loss in not control dependence code In common code ARS of Branch 15 When completed by the INST2 Practical ARS r2(INST1) Previously proposed Tracking r2(INST1) Improved Tracking r2(INST1), TA 8
  • 9. Distinguishing control flow in improved tracking TA is wasted Information it’s not mean that the prediction isn’t correct But mean that predictor need more training Information to Train Previously Tracking r2 = 0 -> Taken Improved Tracking r2 =0, TA=5 -> Taken r2=0, TA=6 -> Taken 9
  • 10. “SetTA” Instruction Add “SetTA” Instruction Save next instruction address to TA ARS of branch 15 is still r2 and TA But TA is always 6 Disadvantage Wasted Instructions(INST6) Programs will be Recompiled Have to find start of common code for adding “setTA” at compile time It’s hard because an Assembly language is not the structured programming language(have “goto”) 10
  • 11. Encoding Amount of information is changed by number of registers in ARS Amount of information Assume each length of values is 10bits 1 register in ARS => 10bits 2 registers in ARS => 20bits 3 registers in ARS => 30bits Must generate fixed length pattern from various length information -> HASH Various Encodings are possible 11
  • 12. Encoding of ARVI XOR with each physical register values Simple XOR HASH with XOR tree 12
  • 13. Reducing Hash conflict Programs more use lower bits than higher bits of registers Almost information is centralized in lower bits Hash conflict occurs due to lower bits For decentralizing information distribution Different circular shifted values per logical register numbers Because physical number is changed in runtime 13
  • 14.
  • 15.
  • 16. Proposed Encoding XOR with each Logical register values Different Circular shifted by logical number Serialize physical-logical mapping Value information is shorter than before(Disadventage) 16
  • 17. Select Logical Register X Select Logical Register X Select physical register value that mapped in logical register X 17
  • 18. Delay nPR = Number of Physical Register nLR = Number of Logical Register L = Log2(nLR) Simple XOR Hash Log2(nPR) * XOR2 + AND2 Proposed Hash Log2(nLR) *XOR2 + Select + AND2 Select = XOR2 + ANDL + Gate + OR(nPR) nPR > nLR *2 Log2(nPR) > Log2(nLR) + 1 Approximately same or little bit slower 18
  • 19. HW Resource Simple XOR Hash nPR *N*AND2 + (nPR-1)*N*XOR2 nPR-1 * 3bitADD for Logical num tag Proposed Hash nPR *N*AND2 + nLR * Select + (nLR-1)*N*XOR2 Select = nPR *( L * (XOR2 +2Gate) + ANDL) + N * OR(NPR) No Logical num tag Pattern has that information already 19
  • 20. Suitable predictor for register-value-pattern Characteristic of register-value-pattern Need long pattern length for reliable prediction PHT is not suitable Must save tags for comparing states Perceptron is not suitable[17][18] Non-linear-separable[17][18] Each bit of value has relation of AND with others Perceptron is not suitable Many various patterns for branches If there is loop that r1 is changed from 0 to 999 There is 999 not taken patterns and 1 taken pattern Long Delay for pattern generation Perceptron is not suitable[17][18] Must hybrid with fast predictor[19][20] 20
  • 21. Proposed predictor 21 Modified YAGS[21] 1 Bimodial Saving Biases for each branches 2 Cache Save only pattern that different with bias Taken Cache Saving Not taken patterns for taken biased branches Not Taken Cache Saving Taken patterns for Not taken biased branches
  • 22. Block diagram 22 1 Fast predictor predict direction in early cycle When Modified YAGS hit and Depth tag is same with now state Update fetch direction in late cycle When Modified YAGS miss then predicted direction of YAGS is bias and we don’t know it is not trained or trained but not save Selector select biased direction or Fasted predictor direction
  • 23. Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 23
  • 24. Experimental environment SimpleScalar3.0 PISA Instruction Set Architecture Little Endean sim-outorder Performance-based Execution driven Cycle timer Benchmarks 10 programs of SPEC 2k Instructions coverage 150M ~ 250M instruction 24
  • 28. Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 28
  • 29. Register-Value-Pattern predictor Register-Value-Pattern predictor predictor is predict like Human doing. If we know “this branch was taken before when a=3 and b=4” We predict the branch without calculation when arrive a=3 and b=4 again. Commonsense design Why it’s not possible 100% accuracy?? 29
  • 30. Factors of performance loss 1. Limitation of dependence tracking 1.1 Load Branch 1.2 Control Dependency 2. Hash conflict in encoding 3. Prediction Delay 4. Various Patterns for same direction 4.1 Pattern capacity of predictor 4.2 Lack of training 30
  • 31. Contribution We improve some factors of performance loss 1.2 Control Dependency 2 Hash conflict in encoding 4.1 Pattern capacity of predictor But we still have assignments 31
  • 32. Applications of Register-Value-Pattern Register Value Pattern has limits at different kinds of branches with Branch History Pattern Higher performance in hybrid predictor with Branch History Pattern Register-Value-Pattern with Branch register value based Depth of dependence chain is 0 Means Branch register is already updated We are good to use Branch register value based prediction in that case Register-Value-Pattern for Value prediction We can use register-value-pattern for value prediction as Information 32
  • 33. Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 33
  • 34. Reference [1] T. Yeh and Y. Patt. “Two-level Adaptive Branch Prediction” In Proc 24th ACM/IEEE IntSymp. on Microarchitecture, 1991. [2] T. Yeh and Y. Patt. “A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History” In Proc 20th Ann IntSymp. on Computer Architecture,1993. [3] S. Pan , K So and J. Rahmeh. “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation” In Proc 5th Annual Intl Conf. on Architectural Support for Prog. Lang. and Operating Systems, 1992. [4] R. Nair “Dynamic Path-Based Branch Correlation” In Proc 28th Ann IntSymp On Microarchitecture,1995. [5] D. Jim´enez“Fast Path-Based Neural Branch Prediction” In Proc 36th Ann IEEE/ACM IntSymp On Microarchitecure, 2003 34
  • 35. Reference [6] F. Gabbay and A. Mendelson“Speculative Execution Based on Value Prediction” In Technical Report Technion, 1997 [7] J. Gonzalez and A. Gonzalez “Control-Flow Speculation through Value Prediction for Superscalar Processors” In Proc Int Conf On Parallel Architectures and Compilation Techniques, 1999 [8] T. Heil, Z. Smith and J.E. Smith “Improving Branch Predictor by Correlating on Data Value” In Proc 32nd IntSymp On Microarchitecture,1999. 35
  • 36. Reference [9] K.Wang “Highly Accurate Data Value Prediction using Hybrid Predictors” In Proc 30thIntSymp on Microarchitecture, 1997. [10] M. Lipasti and J. Shen “Exceeding the Dataflow Limit via Value Prediction” In proc 29thIntSymp on Microarchitecture,1996. [11] W.Mohan and M.Franklin “Improving Data Value Prediction Accuracy using Path Correlation” In Proc 6thInt Conf on High performance Computing, 1999. [12] Y. Sazeides and J. Smith. “Implementations of Context Based Value Predictors” In Technical Report #ECE-TR-97- 8, University of Wisconsin-Madison, 1997. 36
  • 37. Reference [13]L. N. Vintan, M. Sbera, I. Z. Mihu and A. Florea, "An alternative to branch prediction: pre-computed branches," In ACM SIGARCH Computer Architecture News archive Vol 31 , 2003. [14] L. He and Z. Liu, “A New Value Based Branch Predictor For SMT Processors” In Proc 16th IASTED Int Conf on Parallel and Distributed Computing and System, 2004 [15] Y. Pan, X. Fan, L. He, D. Wang “A bypass Mechanism to Enhance Branch Predictor for SMT”, In Proc 12th Asia-Pacific Conf on Computer Systems Architecture ACSAC2007, vol 4697, 2007 37
  • 38. Reference [16] L. Chen, S. Dropsho and D. H. Albonesi“Dynamic Data Dependence Tracking and its Application to Branch Prediction” In Proc 9th IntSymp on Highperformance Computer Architecture, 2003. [17]D.A.Jim´enez and C.Lin. “Dynamic Branch Prediction with Perceptrons”.InProc 7thIntSymp.on High Performace Computer Architecutre,2001. [18] D.A.Jim´enez and C.Lin. “Neural Methods for Dynamic Branch Prediction”.In ACM Transactions on Computer Systems, 2002. 38
  • 39. Reference [19] P. Chang , E. Hao and Y. Patt “Alternative Implementations of Hybrid Branch Predictors”.In Proc 28th Ann IntSymp.onMicroarchitecture, 1995. [20] M. Evers, P. Chang and Y. Patt “Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in The Presence of Context Switches”. In Proc 23rd Ann IntSymp. on Computer Architecture ,1996 [21] A.Eden and T. Mudge. “The YAGS branch prediction scheme”InProc 31st Ann ACM/IEEE IntSymp.onMicroarchitectres, 1998 [22] P. N. Glaskowsky. “Pentium 4 (partially) previewed. “In Microprocessor Report, 2000. 39
  • 40. 40