SlideShare ist ein Scribd-Unternehmen logo
1 von 57
The 2012 International Symposium on Parallel
 Architectures, Algorithms and Programming.
             December18, 2012

 Global Load Instruction Aggregation
       Based on Code Motion
Outline
 Background
   Previous works
   Motivations


 Partial Redundancy Elimination(PRE)
   Lazy code motion(LCM)


 Global Load Instruction Aggregation(GLIA)
 Experiment results
 Conclusion
Background


                  Processor

Speed:
              Speed:


 Main
memory
Background


         Important      Processor


               Cache
               memory

 Main
memory
Previous works

  1. Prefetch instructions
  2. Transform loop structures.
       before                     after
for(j=0;j<10;j++)        for(i=0;i<10;i++)
  for(i=0;i<10;i++)        for(j=0;j<10;j++)
       ... = a[i][j]            ... = a[i][j]
Previous works

for(j=0;j<10;j++)            j:0

  for(i=0;i<10;i++)          j:1
                       i:0
       ... = a[i][j]         ・
                             ・
                             ・
                             j:0

                             j:1
                       i:1
                             ・
                             ・
                             ・
Previous works

for(j=0;j<10;j++)            j:0

  for(i=0;i<10;i++)          j:1
                       i:0
       ... = a[i][j]         ・
                             ・
                             ・
                             j:0

                             j:1
                       i:1
                             ・
                             ・
                             ・
Previous works

for(j=0;j<10;j++)            j:0

  for(i=0;i<10;i++)          j:1
                       i:0
       ... = a[i][j]         ・
                             ・
                             ・
                             j:0

                             j:1
                       i:1
                             ・
                             ・
                             ・
Previous works

for(j=0;j<10;j++)            j:0

  for(i=0;i<10;i++)          j:1
                       i:0
       ... = a[i][j]         ・
                             ・
                             ・
                             j:0

                             j:1
                       i:1
                             ・
                             ・
                             ・
Previous works

for(j=0;j<10;j++)            j:0

  for(i=0;i<10;i++)          j:1
                       i:0
       ... = a[i][j]         ・
                             ・
                             ・
                             j:0

                             j:1
                       i:1
                             ・
                             ・
                             ・
Previous works

  1. Prefetch instructions
  2. Transform loop structures.
       before                     after
for(j=0;j<10;j++)        for(i=0;i<10;i++)
  for(i=0;i<10;i++)        for(j=0;j<10;j++)
       ... = a[i][j]            ... = a[i][j]
Problems

1. Local technique
 ex. target: initial load instruction, loop only.


2. It is necessary to change the structure.
How we can apply cache optimization to any program
globally?

                                   Main memory
                                         ・
                  Cache memory           ・
                                         ・
  main(){

      x = a[i]                          a[i]

                                      a[i+1]
  }
                                         ・
                                         ・
                                         ・
How we can apply cache optimization to any program
globally?

                                   Main memory
                                         ・
                  Cache memory           ・
                                         ・
  main(){

      x = a[i]          a[i]            a[i]
                      a[i+1]          a[i+1]
  }
                                         ・
                                         ・
                                         ・
How we can apply cache optimization to any program
globally?


                                   Main memory

                  Cache memory         a[i]
 main(){                              a[i+1]
   ... = a[i]                            ・
   ... = b[i]           a[i]
                                         ・
   ... = a[i+1]       a[i+1]
                                         ・
 }                                      b[i]

                                      b[i+1]
How we can apply cache optimization to any program
globally?


                                   Main memory

                  Cache memory         a[i]
 main(){                              a[i+1]
   ... = a[i]                            ・
   ... = b[i]           b[i]
                                         ・
   ... = a[i+1]       b[i+1]
                                         ・
 }                                      b[i]

                                      b[i+1]
How we can apply cache optimization to any program
globally?


                                    Main memory

                  Cache memory         a[i]
 main(){                              a[i+1]
   ... = a[i]                            ・
   ... = b[i]           b[i]
                                         ・
   ... = a[i+1]       b[i+1]
                                         ・
 }                                      b[i]

                       Cache miss     b[i+1]
How we can apply cache optimization to any program
globally?

   We can remove this cache miss by
   changing the order of accesses
                                       a[i]
 main(){                              a[i+1]
   ... = a[i]                            ・
   ... = b[i]           b[i]
                                         ・
   ... = a[i+1]       b[i+1]
                                         ・
 }                                      b[i]

                       Cache miss     b[i+1]
Code motion


               x = a[i]


  Expel from   y = x+1
cache memory

               z = b[i]
               w = a[i+j]
Code motion


              x = a[i]
              w = a[i+j]


              y = x+1


              z = b[i]
Code motion


                 x = a[i]
                 w = a[i+j]

Live range
                 y = x+1
of w

                 z = b[i]
Code motion


                  x = a[i]
                  w = a[i+j]
x       w

                  y = x+1


                  z = b[i]
Code motion


              x = a[i]
              w = a[i+j]


              y = x+1      Spill



              z = b[i]
Code motion


              x = a[i]
              t = Load(j)
              w = a[i+t]     Change the
                            access order

              y = x+1


              z = b[i]
Code motion


              x = a[i]
              w = a[i+j]


              y = x+1


              z = b[i]
Code motion


              x = a[i]
                           Delayed


              y = x+1


              w = a[i+j]
              z = b[i]
Implementation
We use Partial Redundancy Elimination(PRE)
 One of the code optimization
 Eliminates redundant expressions
PRE




x = a[i]              t = a[i]         t = a[i]
                      x=t

           y = a[i]              y=t
LCM
 LCM determines two insertion node
  -- Earliest and Latest
                                         x = a[i]
• Earliest(n) denotes that node n is
  the closest to the start node of the
  nodes which can be inserted                   y = a[i]

• Latest(n) denotes that node n is
  the closest to nodes which contain
  same load instruction.

 Knoop,J.,etc.:Lazy Code Motion, Proc. Programming Language
 Design and Implementation, ACM, pp.224-234, 1992.
LCM




      x = a[i]
      y = a[i]
LCM

      t = a[i]




                 x = a[i]
                 y = a[i]
LCM




      t = a[i]


      x = a[i]
      y = a[i]
LCM


                 Delayed
      t = a[i]


      x = a[i]
      y = a[i]
LCM




                 Delayed

      t = a[i]
      x = a[i]
      y = a[i]
LCM




      t = a[i]
      x=t
      y=t
Global Load Instruction
Aggregation(GLIA)
 Purpose
 1. Decrease the cache miss.
 2. Suppress register spills.


 Extension
 1. Move not redundant load instructions.
 2. Delayed considering the order of
    memory access.
GLIA




       x = a[i]

       y = b[i]

       w = a[i+1]
GLIA



       t = a[i+1]
       x = a[i]

       y = b[i]

       w = a[i+1]
GLIA



       x = a[i]

       t = a[i+1]
       y = b[i]

       w = a[i+1]
GLIA



       x = a[i]

       t = a[i+1]
       y = b[i]

       w = a[i+1]
GLIA



       x = a[i]

       t = a[i+1]
       y = b[i]

       w=t
Application to the entire program

             = a[i]

                      = b[i]
      = a[i+1]
                      = a[i+1]
Application to the entire program

             = a[i]

                      = b[i]
      = a[i+1]
                      = a[i+1]
Application to the entire program

             = a[i]

                      = b[i]
      = a[i+1]
                      = a[i+1]
Application to the entire program

             = a[i]

                      = b[i]
      = a[i+1]
                      = a[i+1]
Application to the entire program

             = a[i]

                      = b[i]
      = a[i+1]
                      = a[i+1]
Application to the entire program

             = a[i]
                      = a[i+1]
                      = b[i]
      = a[i+1]
Experiment
 Implementation
  our technique in COINS compiler as LIR
   converter.

 Benchmark
  SPEC2000


 Measurement
 1. Execution efficiency
 2. The number of cache misses
Experiment(1/2) | Execution
efficiency
 Environment
  SPARC64-V 2GHz, Solaris 10



 Optimization
   BASE:applies Dead Code Elimination(DCE)
   GLIADCE:applies GLIA and DCE.
Experiment(1/2) | Execution
efficiency
Improvement of art has been about 10.5%
The decrease reason 1: speculative code
motion


              = a[i]
              = b[i]




                       = a[j]
The decrease reason 1: speculative code
motion

              = a[i]
              = a[j]
              = b[i]
The decrease reason 2: register spill

 The number of spills
Experiment(2/2) | Cache misses

 System parameter of x86 machine
  Intel corei5-2320 3.00GHz
  Floating register:8
  Integer register :8
  L1D cache memory:32KB
  L2 cache memory :256KB
  L3 cache memory :6144KB
Experiment(2/2) | Level 2 cache
misses
 Improvement of twolf has been about 10.6%
Experiment(2/2) | Level 3 cache
misses
 Improvement of art has been about 93.7%
Conclusion

We proposed a new cache optimization.
 1. GLIA can be applied to any programs
 2. GLIA improves cache efficiency
 3. GLIA considers register spill




 Thank you for your attention.

Weitere ähnliche Inhalte

Was ist angesagt?

CG OpenGL vectors geometric & transformations-course 5
CG OpenGL vectors geometric & transformations-course 5CG OpenGL vectors geometric & transformations-course 5
CG OpenGL vectors geometric & transformations-course 5fungfung Chen
 
Introduction to deep learning using python
Introduction to deep learning using pythonIntroduction to deep learning using python
Introduction to deep learning using pythonLino Coria
 
The Ring programming language version 1.5.4 book - Part 60 of 185
The Ring programming language version 1.5.4 book - Part 60 of 185The Ring programming language version 1.5.4 book - Part 60 of 185
The Ring programming language version 1.5.4 book - Part 60 of 185Mahmoud Samir Fayed
 
OUDL Y Combinator
OUDL Y CombinatorOUDL Y Combinator
OUDL Y CombinatorKyle Oba
 
Scala meetup
Scala meetupScala meetup
Scala meetup扬 明
 

Was ist angesagt? (8)

CG OpenGL vectors geometric & transformations-course 5
CG OpenGL vectors geometric & transformations-course 5CG OpenGL vectors geometric & transformations-course 5
CG OpenGL vectors geometric & transformations-course 5
 
06slide
06slide06slide
06slide
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Introduction to deep learning using python
Introduction to deep learning using pythonIntroduction to deep learning using python
Introduction to deep learning using python
 
The Ring programming language version 1.5.4 book - Part 60 of 185
The Ring programming language version 1.5.4 book - Part 60 of 185The Ring programming language version 1.5.4 book - Part 60 of 185
The Ring programming language version 1.5.4 book - Part 60 of 185
 
OUDL Y Combinator
OUDL Y CombinatorOUDL Y Combinator
OUDL Y Combinator
 
Scala meetup
Scala meetupScala meetup
Scala meetup
 
Basic data structures part I
Basic data structures part IBasic data structures part I
Basic data structures part I
 

Andere mochten auch

IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS Constructing load balanced data ...
IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS Constructing load balanced data ...IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS Constructing load balanced data ...
IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS Constructing load balanced data ...IEEEGLOBALSOFTSTUDENTPROJECTS
 
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Constructing load balanced data a...
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Constructing load balanced data a...2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Constructing load balanced data a...
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Constructing load balanced data a...IEEEGLOBALSOFTSTUDENTSPROJECTS
 
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...CUBRID
 
Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.CUBRID
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
Conquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryConquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryJustin Swanhart
 

Andere mochten auch (7)

IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS Constructing load balanced data ...
IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS Constructing load balanced data ...IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS Constructing load balanced data ...
IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS Constructing load balanced data ...
 
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Constructing load balanced data a...
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Constructing load balanced data a...2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Constructing load balanced data a...
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Constructing load balanced data a...
 
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
 
Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Conquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryConquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard query
 

Ähnlich wie Global Load Instruction Aggregation Based on Code Motion

Dynamic programming
Dynamic programmingDynamic programming
Dynamic programmingShakil Ahmed
 
Brief intro to clojure
Brief intro to clojureBrief intro to clojure
Brief intro to clojureRoy Rutto
 
Presentation on C++ Programming Language
Presentation on C++ Programming LanguagePresentation on C++ Programming Language
Presentation on C++ Programming Languagesatvirsandhu9
 
Frege - consequently functional programming for the JVM
Frege - consequently functional programming for the JVMFrege - consequently functional programming for the JVM
Frege - consequently functional programming for the JVMDierk König
 
The present and the future of functional programming in c++
The present and the future of functional programming in c++The present and the future of functional programming in c++
The present and the future of functional programming in c++Alexander Granin
 
C Code and the Art of Obfuscation
C Code and the Art of ObfuscationC Code and the Art of Obfuscation
C Code and the Art of Obfuscationguest9006ab
 
JS Responsibilities
JS ResponsibilitiesJS Responsibilities
JS ResponsibilitiesBrendan Eich
 

Ähnlich wie Global Load Instruction Aggregation Based on Code Motion (9)

Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
Brief intro to clojure
Brief intro to clojureBrief intro to clojure
Brief intro to clojure
 
ZIO Queue
ZIO QueueZIO Queue
ZIO Queue
 
Aditazz 01-ul
Aditazz 01-ulAditazz 01-ul
Aditazz 01-ul
 
Presentation on C++ Programming Language
Presentation on C++ Programming LanguagePresentation on C++ Programming Language
Presentation on C++ Programming Language
 
Frege - consequently functional programming for the JVM
Frege - consequently functional programming for the JVMFrege - consequently functional programming for the JVM
Frege - consequently functional programming for the JVM
 
The present and the future of functional programming in c++
The present and the future of functional programming in c++The present and the future of functional programming in c++
The present and the future of functional programming in c++
 
C Code and the Art of Obfuscation
C Code and the Art of ObfuscationC Code and the Art of Obfuscation
C Code and the Art of Obfuscation
 
JS Responsibilities
JS ResponsibilitiesJS Responsibilities
JS Responsibilities
 

Global Load Instruction Aggregation Based on Code Motion

  • 1. The 2012 International Symposium on Parallel Architectures, Algorithms and Programming. December18, 2012 Global Load Instruction Aggregation Based on Code Motion
  • 2. Outline  Background  Previous works  Motivations  Partial Redundancy Elimination(PRE)  Lazy code motion(LCM)  Global Load Instruction Aggregation(GLIA)  Experiment results  Conclusion
  • 3. Background Processor Speed: Speed: Main memory
  • 4. Background Important Processor Cache memory Main memory
  • 5. Previous works 1. Prefetch instructions 2. Transform loop structures. before after for(j=0;j<10;j++) for(i=0;i<10;i++) for(i=0;i<10;i++) for(j=0;j<10;j++) ... = a[i][j] ... = a[i][j]
  • 6. Previous works for(j=0;j<10;j++) j:0 for(i=0;i<10;i++) j:1 i:0 ... = a[i][j] ・ ・ ・ j:0 j:1 i:1 ・ ・ ・
  • 7. Previous works for(j=0;j<10;j++) j:0 for(i=0;i<10;i++) j:1 i:0 ... = a[i][j] ・ ・ ・ j:0 j:1 i:1 ・ ・ ・
  • 8. Previous works for(j=0;j<10;j++) j:0 for(i=0;i<10;i++) j:1 i:0 ... = a[i][j] ・ ・ ・ j:0 j:1 i:1 ・ ・ ・
  • 9. Previous works for(j=0;j<10;j++) j:0 for(i=0;i<10;i++) j:1 i:0 ... = a[i][j] ・ ・ ・ j:0 j:1 i:1 ・ ・ ・
  • 10. Previous works for(j=0;j<10;j++) j:0 for(i=0;i<10;i++) j:1 i:0 ... = a[i][j] ・ ・ ・ j:0 j:1 i:1 ・ ・ ・
  • 11. Previous works 1. Prefetch instructions 2. Transform loop structures. before after for(j=0;j<10;j++) for(i=0;i<10;i++) for(i=0;i<10;i++) for(j=0;j<10;j++) ... = a[i][j] ... = a[i][j]
  • 12. Problems 1. Local technique ex. target: initial load instruction, loop only. 2. It is necessary to change the structure.
  • 13. How we can apply cache optimization to any program globally? Main memory ・ Cache memory ・ ・ main(){ x = a[i] a[i] a[i+1] } ・ ・ ・
  • 14. How we can apply cache optimization to any program globally? Main memory ・ Cache memory ・ ・ main(){ x = a[i] a[i] a[i] a[i+1] a[i+1] } ・ ・ ・
  • 15. How we can apply cache optimization to any program globally? Main memory Cache memory a[i] main(){ a[i+1] ... = a[i] ・ ... = b[i] a[i] ・ ... = a[i+1] a[i+1] ・ } b[i] b[i+1]
  • 16. How we can apply cache optimization to any program globally? Main memory Cache memory a[i] main(){ a[i+1] ... = a[i] ・ ... = b[i] b[i] ・ ... = a[i+1] b[i+1] ・ } b[i] b[i+1]
  • 17. How we can apply cache optimization to any program globally? Main memory Cache memory a[i] main(){ a[i+1] ... = a[i] ・ ... = b[i] b[i] ・ ... = a[i+1] b[i+1] ・ } b[i] Cache miss b[i+1]
  • 18. How we can apply cache optimization to any program globally? We can remove this cache miss by changing the order of accesses a[i] main(){ a[i+1] ... = a[i] ・ ... = b[i] b[i] ・ ... = a[i+1] b[i+1] ・ } b[i] Cache miss b[i+1]
  • 19. Code motion x = a[i] Expel from y = x+1 cache memory z = b[i] w = a[i+j]
  • 20. Code motion x = a[i] w = a[i+j] y = x+1 z = b[i]
  • 21. Code motion x = a[i] w = a[i+j] Live range y = x+1 of w z = b[i]
  • 22. Code motion x = a[i] w = a[i+j] x w y = x+1 z = b[i]
  • 23. Code motion x = a[i] w = a[i+j] y = x+1 Spill z = b[i]
  • 24. Code motion x = a[i] t = Load(j) w = a[i+t] Change the access order y = x+1 z = b[i]
  • 25. Code motion x = a[i] w = a[i+j] y = x+1 z = b[i]
  • 26. Code motion x = a[i] Delayed y = x+1 w = a[i+j] z = b[i]
  • 27. Implementation We use Partial Redundancy Elimination(PRE) One of the code optimization Eliminates redundant expressions
  • 28. PRE x = a[i] t = a[i] t = a[i] x=t y = a[i] y=t
  • 29. LCM LCM determines two insertion node -- Earliest and Latest x = a[i] • Earliest(n) denotes that node n is the closest to the start node of the nodes which can be inserted y = a[i] • Latest(n) denotes that node n is the closest to nodes which contain same load instruction. Knoop,J.,etc.:Lazy Code Motion, Proc. Programming Language Design and Implementation, ACM, pp.224-234, 1992.
  • 30. LCM x = a[i] y = a[i]
  • 31. LCM t = a[i] x = a[i] y = a[i]
  • 32. LCM t = a[i] x = a[i] y = a[i]
  • 33. LCM Delayed t = a[i] x = a[i] y = a[i]
  • 34. LCM Delayed t = a[i] x = a[i] y = a[i]
  • 35. LCM t = a[i] x=t y=t
  • 36. Global Load Instruction Aggregation(GLIA)  Purpose 1. Decrease the cache miss. 2. Suppress register spills.  Extension 1. Move not redundant load instructions. 2. Delayed considering the order of memory access.
  • 37. GLIA x = a[i] y = b[i] w = a[i+1]
  • 38. GLIA t = a[i+1] x = a[i] y = b[i] w = a[i+1]
  • 39. GLIA x = a[i] t = a[i+1] y = b[i] w = a[i+1]
  • 40. GLIA x = a[i] t = a[i+1] y = b[i] w = a[i+1]
  • 41. GLIA x = a[i] t = a[i+1] y = b[i] w=t
  • 42. Application to the entire program = a[i] = b[i] = a[i+1] = a[i+1]
  • 43. Application to the entire program = a[i] = b[i] = a[i+1] = a[i+1]
  • 44. Application to the entire program = a[i] = b[i] = a[i+1] = a[i+1]
  • 45. Application to the entire program = a[i] = b[i] = a[i+1] = a[i+1]
  • 46. Application to the entire program = a[i] = b[i] = a[i+1] = a[i+1]
  • 47. Application to the entire program = a[i] = a[i+1] = b[i] = a[i+1]
  • 48. Experiment  Implementation  our technique in COINS compiler as LIR converter.  Benchmark  SPEC2000  Measurement 1. Execution efficiency 2. The number of cache misses
  • 49. Experiment(1/2) | Execution efficiency  Environment  SPARC64-V 2GHz, Solaris 10  Optimization  BASE:applies Dead Code Elimination(DCE)  GLIADCE:applies GLIA and DCE.
  • 51. The decrease reason 1: speculative code motion = a[i] = b[i] = a[j]
  • 52. The decrease reason 1: speculative code motion = a[i] = a[j] = b[i]
  • 53. The decrease reason 2: register spill  The number of spills
  • 54. Experiment(2/2) | Cache misses  System parameter of x86 machine  Intel corei5-2320 3.00GHz  Floating register:8  Integer register :8  L1D cache memory:32KB  L2 cache memory :256KB  L3 cache memory :6144KB
  • 55. Experiment(2/2) | Level 2 cache misses Improvement of twolf has been about 10.6%
  • 56. Experiment(2/2) | Level 3 cache misses Improvement of art has been about 93.7%
  • 57. Conclusion We proposed a new cache optimization. 1. GLIA can be applied to any programs 2. GLIA improves cache efficiency 3. GLIA considers register spill Thank you for your attention.