Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
improved register value pattern generation for branch prediction
1. Control Dependency 1 Problem Dependency tracking in ARVI ignores control dependency Can’t get practical Available registers Make same patterns for different directions CANNOT predict that branch correctly
2. Example of control dependency Practical Available Register Set of branch 2 r1, r3 Available Register Set in ARVI of branch 2 r3 When r3==1 r1 ==0 -> not taken r1 !=0 -> taken 2
4. Improved Data Dependence Tracking Resolve control dependency Add Control flow information to tracking Add logical register to architecture Called TA-register(target address) For maintaining Target address of last branch TA is hidden source register of instructions 4
5. Behavior of branch instruction Example beq in MIPS instruction set architecture 5
7. ARS of Branch 13 Improved Tracking Track control dependency well When completed by the INST1 It’s different with practical ARS But it can be predicted well because TA has control flow information When r3==1 TA==2 Not Taken Ta==4 Taken 7
8. Common code problem Performance loss in not control dependence code In common code ARS of Branch 15 When completed by the INST2 Practical ARS r2(INST1) Previously proposed Tracking r2(INST1) Improved Tracking r2(INST1), TA 8
9. Distinguishing control flow in improved tracking TA is wasted Information it’s not mean that the prediction isn’t correct But mean that predictor need more training Information to Train Previously Tracking r2 = 0 -> Taken Improved Tracking r2 =0, TA=5 -> Taken r2=0, TA=6 -> Taken 9
10. “SetTA” Instruction Add “SetTA” Instruction Save next instruction address to TA ARS of branch 15 is still r2 and TA But TA is always 6 Disadvantage Wasted Instructions(INST6) Programs will be Recompiled Have to find start of common code for adding “setTA” at compile time It’s hard because an Assembly language is not the structured programming language(have “goto”) 10
11. Encoding Amount of information is changed by number of registers in ARS Amount of information Assume each length of values is 10bits 1 register in ARS => 10bits 2 registers in ARS => 20bits 3 registers in ARS => 30bits Must generate fixed length pattern from various length information -> HASH Various Encodings are possible 11
12. Encoding of ARVI XOR with each physical register values Simple XOR HASH with XOR tree 12
13. Reducing Hash conflict Programs more use lower bits than higher bits of registers Almost information is centralized in lower bits Hash conflict occurs due to lower bits For decentralizing information distribution Different circular shifted values per logical register numbers Because physical number is changed in runtime 13
14.
15.
16. Proposed Encoding XOR with each Logical register values Different Circular shifted by logical number Serialize physical-logical mapping Value information is shorter than before(Disadventage) 16
17. Select Logical Register X Select Logical Register X Select physical register value that mapped in logical register X 17
18. Delay nPR = Number of Physical Register nLR = Number of Logical Register L = Log2(nLR) Simple XOR Hash Log2(nPR) * XOR2 + AND2 Proposed Hash Log2(nLR) *XOR2 + Select + AND2 Select = XOR2 + ANDL + Gate + OR(nPR) nPR > nLR *2 Log2(nPR) > Log2(nLR) + 1 Approximately same or little bit slower 18
19. HW Resource Simple XOR Hash nPR *N*AND2 + (nPR-1)*N*XOR2 nPR-1 * 3bitADD for Logical num tag Proposed Hash nPR *N*AND2 + nLR * Select + (nLR-1)*N*XOR2 Select = nPR *( L * (XOR2 +2Gate) + ANDL) + N * OR(NPR) No Logical num tag Pattern has that information already 19
20. Suitable predictor for register-value-pattern Characteristic of register-value-pattern Need long pattern length for reliable prediction PHT is not suitable Must save tags for comparing states Perceptron is not suitable[17][18] Non-linear-separable[17][18] Each bit of value has relation of AND with others Perceptron is not suitable Many various patterns for branches If there is loop that r1 is changed from 0 to 999 There is 999 not taken patterns and 1 taken pattern Long Delay for pattern generation Perceptron is not suitable[17][18] Must hybrid with fast predictor[19][20] 20
21. Proposed predictor 21 Modified YAGS[21] 1 Bimodial Saving Biases for each branches 2 Cache Save only pattern that different with bias Taken Cache Saving Not taken patterns for taken biased branches Not Taken Cache Saving Taken patterns for Not taken biased branches
22. Block diagram 22 1 Fast predictor predict direction in early cycle When Modified YAGS hit and Depth tag is same with now state Update fetch direction in late cycle When Modified YAGS miss then predicted direction of YAGS is bias and we don’t know it is not trained or trained but not save Selector select biased direction or Fasted predictor direction
23. Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 23
24. Experimental environment SimpleScalar3.0 PISA Instruction Set Architecture Little Endean sim-outorder Performance-based Execution driven Cycle timer Benchmarks 10 programs of SPEC 2k Instructions coverage 150M ~ 250M instruction 24
28. Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 28
29. Register-Value-Pattern predictor Register-Value-Pattern predictor predictor is predict like Human doing. If we know “this branch was taken before when a=3 and b=4” We predict the branch without calculation when arrive a=3 and b=4 again. Commonsense design Why it’s not possible 100% accuracy?? 29
30. Factors of performance loss 1. Limitation of dependence tracking 1.1 Load Branch 1.2 Control Dependency 2. Hash conflict in encoding 3. Prediction Delay 4. Various Patterns for same direction 4.1 Pattern capacity of predictor 4.2 Lack of training 30
31. Contribution We improve some factors of performance loss 1.2 Control Dependency 2 Hash conflict in encoding 4.1 Pattern capacity of predictor But we still have assignments 31
32. Applications of Register-Value-Pattern Register Value Pattern has limits at different kinds of branches with Branch History Pattern Higher performance in hybrid predictor with Branch History Pattern Register-Value-Pattern with Branch register value based Depth of dependence chain is 0 Means Branch register is already updated We are good to use Branch register value based prediction in that case Register-Value-Pattern for Value prediction We can use register-value-pattern for value prediction as Information 32
33. Outlines Why We need branch prediction ?? Related works Improved Register-value-pattern generation Experiment and Evaluation Contribution Reference 33
34. Reference [1] T. Yeh and Y. Patt. “Two-level Adaptive Branch Prediction” In Proc 24th ACM/IEEE IntSymp. on Microarchitecture, 1991. [2] T. Yeh and Y. Patt. “A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History” In Proc 20th Ann IntSymp. on Computer Architecture,1993. [3] S. Pan , K So and J. Rahmeh. “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation” In Proc 5th Annual Intl Conf. on Architectural Support for Prog. Lang. and Operating Systems, 1992. [4] R. Nair “Dynamic Path-Based Branch Correlation” In Proc 28th Ann IntSymp On Microarchitecture,1995. [5] D. Jim´enez“Fast Path-Based Neural Branch Prediction” In Proc 36th Ann IEEE/ACM IntSymp On Microarchitecure, 2003 34
35. Reference [6] F. Gabbay and A. Mendelson“Speculative Execution Based on Value Prediction” In Technical Report Technion, 1997 [7] J. Gonzalez and A. Gonzalez “Control-Flow Speculation through Value Prediction for Superscalar Processors” In Proc Int Conf On Parallel Architectures and Compilation Techniques, 1999 [8] T. Heil, Z. Smith and J.E. Smith “Improving Branch Predictor by Correlating on Data Value” In Proc 32nd IntSymp On Microarchitecture,1999. 35
36. Reference [9] K.Wang “Highly Accurate Data Value Prediction using Hybrid Predictors” In Proc 30thIntSymp on Microarchitecture, 1997. [10] M. Lipasti and J. Shen “Exceeding the Dataflow Limit via Value Prediction” In proc 29thIntSymp on Microarchitecture,1996. [11] W.Mohan and M.Franklin “Improving Data Value Prediction Accuracy using Path Correlation” In Proc 6thInt Conf on High performance Computing, 1999. [12] Y. Sazeides and J. Smith. “Implementations of Context Based Value Predictors” In Technical Report #ECE-TR-97- 8, University of Wisconsin-Madison, 1997. 36
37. Reference [13]L. N. Vintan, M. Sbera, I. Z. Mihu and A. Florea, "An alternative to branch prediction: pre-computed branches," In ACM SIGARCH Computer Architecture News archive Vol 31 , 2003. [14] L. He and Z. Liu, “A New Value Based Branch Predictor For SMT Processors” In Proc 16th IASTED Int Conf on Parallel and Distributed Computing and System, 2004 [15] Y. Pan, X. Fan, L. He, D. Wang “A bypass Mechanism to Enhance Branch Predictor for SMT”, In Proc 12th Asia-Pacific Conf on Computer Systems Architecture ACSAC2007, vol 4697, 2007 37
38. Reference [16] L. Chen, S. Dropsho and D. H. Albonesi“Dynamic Data Dependence Tracking and its Application to Branch Prediction” In Proc 9th IntSymp on Highperformance Computer Architecture, 2003. [17]D.A.Jim´enez and C.Lin. “Dynamic Branch Prediction with Perceptrons”.InProc 7thIntSymp.on High Performace Computer Architecutre,2001. [18] D.A.Jim´enez and C.Lin. “Neural Methods for Dynamic Branch Prediction”.In ACM Transactions on Computer Systems, 2002. 38
39. Reference [19] P. Chang , E. Hao and Y. Patt “Alternative Implementations of Hybrid Branch Predictors”.In Proc 28th Ann IntSymp.onMicroarchitecture, 1995. [20] M. Evers, P. Chang and Y. Patt “Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in The Presence of Context Switches”. In Proc 23rd Ann IntSymp. on Computer Architecture ,1996 [21] A.Eden and T. Mudge. “The YAGS branch prediction scheme”InProc 31st Ann ACM/IEEE IntSymp.onMicroarchitectres, 1998 [22] P. N. Glaskowsky. “Pentium 4 (partially) previewed. “In Microprocessor Report, 2000. 39