SlideShare a Scribd company logo
1 of 43
Download to read offline
RapidMRC:
Approximating L2 Miss Rate Curves
     on Commodity Systems
     for Online Optimizations

David Tam, Reza Azimi, Livio Soares, Michael Stumm
               University of Toronto

            {tamda, azimi, livio, stumm}@eecg.toronto.edu


                        March 10, 2009
                            ASPLOS
Motivation


                                    App 1                      App 2
                                                               App 2
                                                                   Core
                                       Core




                                            Shared Cache




               In shared cache (L2/L3):
           ●


                   Applications compete for space
               ●


                   LRU replacement policy used
               ●


                   Interference: applications can evict each other's content
               ●


                                                                                 1
RapidMRC                           Image © Ian Junor, Creative Commons license
Cache Partitioning


                            App 1        App 2
                                         App 2
                                          Core
                              Core




                                Shared Cache




               Eliminates cache space interference
           ●



               More flexible than private caches
           ●



               Up to 50% improvement in IPC
           ●




                                                     2
RapidMRC
OS-Based Partitioning
           Apply page-coloring technique to implement set-based partitioning
      ●



           Guide physical page allocation to control cache line usage
      ●



           Works on existing processors    [WIOSCA'07]
      ●




                           Virtual Pages          Physical Pages
                                                   Color A


                                                                         L2 Cache
            Application
                                                                                } Color A
                                                                     {
                                                                                    (N sets)
                                                   Color A




                                                   Color A


                                     OS Managed              Static Mapping
                                                               (Hardware)




                                                                                               3
RapidMRC
OS-Based Partitioning
           Apply page-coloring technique to implement set-based partitioning
      ●



           Guide physical page allocation to control cache line usage
      ●



           Works on existing processors      [WIOSCA'07]
      ●




                           Virtual Pages         Physical Pages
                                                    Color A
                                                    Color B
                                                                            L2 Cache
           Application A
                                                                                   } Color A
                                                                        {
                                                                                       (N sets)
                                                    Color A
                                                    Color B

                           Virtual Pages
                                                                                   } Color B
                                                                        {
                                                                                       (N sets)

                                                    Color A
           Application B                            Color B




                                       OS Managed             Static Mapping
                                                                (Hardware)
                                                                                                  3
RapidMRC
Determining L2 Size
           Use Miss Rate Curve (MRC)
     ●


                                                                Application X
                                            100
                                             90
                                             80

                            Miss Rate (%)    70
                                             60
                                             50
                                             40
                                             30
                                             20
                                             10
                                              0
                                                  0   10   20   30   40   50   60   70   80   90 100
                                                       Allocated Cache Size (%)
           Shows trade-off spectrum
     ●



               Impact of allocating more/less cache space
           ●



           Also used for online sizing of main memory
     ●



               e.g. [Patterson95], [Zhou04], [Soundararajan08], etc...
           ●

                                                                                                       4
RapidMRC
Our Approach: RapidMRC
               Online approximation of L2 MRC
           ●



               Software-based + hardware performance counters
           ●


                   Requires no change to application binary or source
               ●



                   Runs on commodity hardware
               ●



               Rapid: 230 ms latency
           ●




                                                                        5
RapidMRC
RapidMRC Steps
           (1) Track memory accesses

           (2) Feed accesses into Mattson's stack algorithm (1970)
                     Emulates LRU aging of cache lines
                 ●


                     Maintains histogram of stack distances
                 ●


                     Generates MRC using histogram
                 ●




                                                                     6
RapidMRC
Tracking Accesses Online
               Hardware technique
           ●


                   e.g. [Berg04], [Suh04], [Qureshi06], etc...
               ●



                   Target future processors
               ●




               Software technique
           ●


                   Track accesses by instrumenting program code
               ●



                   High tracking cost: large volume of data
               ●




               Hybrid technique
           ●


                   Track accesses with hardware performance counters
               ●



                   Lower tracking cost: smaller volume of data
               ●




                                                                       7
RapidMRC
Tracking Accesses in IBM POWER5
       Hardware Performance Counter Configuration
               Upon every L2 access:
           ●



               (1) Update sampling register with data address
               (2) Trigger interrupt to copy register to trace log in main memory

                                          Register File


                                          L1 Cache
                                                    L2 Accesses

                                          L2 Cache

                                         Main Memory


               May miss some L2 cache accesses
           ●



                   Caused by multiple in-flight L2 accesses
               ●



                   Results show negligible impact
               ●
                                                                                    8
RapidMRC
Mattson's Stack Algorithm
           For each L2 access in trace log:
      ●



                 Find element, record stack distance, move element to top
             ●



                 Update histogram with stack distance
             ●




     stack
                                                                             stack
      top
                                                                            bottom
                                                              ...
                           Stack distance


           Stack size
      ●



                 One element per L2 cache line
             ●




           Optimizations
      ●



                 Hashing: eliminates linear traversal
             ●



                 Coarse-grained stack distance: reduces update operations
             ●

                                                                                     9
RapidMRC
Experimental Setup
               1.5 GHz dual-core POWER5
           ●



                   1.875 MB shared L2 cache
               ●


                        128-byte line size
                    ●


                        10-way set-associative
                    ●


                        16 partitions (colors) possible
                    ●




               Linux 2.6.x
           ●



                   Added RapidMRC mechanism
               ●



                   Added cache partitioning mechanism
               ●




               Trace log length
           ●



                   10 x number of cache lines
               ●




               30 applications
           ●



                   SPECjbb2000, SPECcpu2000, SPECcpu2006
               ●
                                                           10
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                         mcf




                                                        11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
Obtaining Real L2 MRCs
           Offline method: run application 16 times
     ●


                   Once for each cache partition size
               ●


                   Measure L2 cache miss rate
               ●




                           mcf
                   Slice


                                                                      Real L2 MRC




                                                L2 Miss Rate (MPKI)




                                                                       L2 Cache Size

                                                                                       11
RapidMRC
RapidMRC vs Real MRC
                          Accuracy: execution slice at 10 billion instrs
                      ●


                                                                           mgrid
                                    jbb                gzip
   Miss Rate (MPKI)




                             Cache Size (# colors)
                                                                           ammp
                                    mcf 2k           xalancbmk




                                                                                   12
RapidMRC
Latency of RapidMRC
       Invoke                                     Obtained
    RapidMRC                                      RapidMRC
                                  App is paused
                4X app slowdown
                   147 ms            83 ms
   Time             Trace         Mattson's Alg

                          230 ms RapidMRC




                                                             13
RapidMRC
Latency of RapidMRC                    Phase Change
       Invoke                                                   Invoke
                                                  Obtained
    RapidMRC                                                    RapidMRC
                                                  RapidMRC
                                  App is paused
                4X app slowdown
                   147 ms            83 ms
   Time             Trace         Mattson's Alg

                          230 ms RapidMRC




                                                                           13
RapidMRC
Latency of RapidMRC                                     Phase Change
       Invoke                                                                    Invoke
                                                  Obtained
    RapidMRC                                                                     RapidMRC
                                                  RapidMRC
                                  App is paused
                4X app slowdown
                   147 ms            83 ms
   Time             Trace         Mattson's Alg

                          230 ms RapidMRC                    Amortize costs




                                                                                            13
RapidMRC
Latency of RapidMRC                                        Phase Change
       Invoke                                                                       Invoke
                                                   Obtained
    RapidMRC                                                                        RapidMRC
                                                   RapidMRC
                                   App is paused
                4X app slowdown
                   147 ms            83 ms
   Time             Trace         Mattson's Alg

                          230 ms RapidMRC                       Amortize costs

                                  Phase length: 5 mins median




                                                                                               13
RapidMRC
Latency of RapidMRC                                                                            Phase Change
         Invoke                                                                                                                 Invoke
                                                     Obtained
      RapidMRC                                                                                                                  RapidMRC
                                                     RapidMRC
                                     App is paused
                  4X app slowdown
                     147 ms            83 ms
   Time               Trace         Mattson's Alg

                            230 ms RapidMRC                                                 Amortize costs

                                    Phase length: 5 mins median
                                                                                30
                                                                                                           astar                   size = 2
                                                                                                                                   size = 4
                                                                                                                                   size = 6
                                                                                                                                   size = 8
                                                                                25                                                 size = 10
                                                                                                                                   size = 12




                                                          L2 Miss Rate (MPKI)
                                                                                                                                   size = 14
                                                                                                                                   size = 16

 Phase change detection                                                         20

       Abrupt change in IPC, miss rate
  ●
                                                                                15
       Detectable online with low cost using
  ●

       hardware performance counters                                            10


                                                                                5

                                                                                 0
                                                                                  0   200     400       600      800     1000     1200         1400
                                                                                            Instructions Completed (Billions)
                                                                                                                                                 13
RapidMRC
Latency of RapidMRC                                                                            Phase Change
         Invoke                                                                                                                 Invoke
                                                     Obtained
      RapidMRC                                                                                                                  RapidMRC
                                                     RapidMRC
                                     App is paused
                  4X app slowdown
                     147 ms            83 ms
   Time               Trace         Mattson's Alg

                            230 ms RapidMRC                                                 Amortize costs

                                    Phase length: 5 mins median
                                                                                30
                                                                                                           astar                   size = 2
                                                                                                                                   size = 4
                                                                                                                                   size = 6
                                                                                                                                   size = 8
                                                                                25                                                 size = 10
                                                                                                                                   size = 12




                                                          L2 Miss Rate (MPKI)
                                                                                                                                   size = 14
                                                                                                                                   size = 16

 Phase change detection                                                         20

       Abrupt change in IPC, miss rate
  ●
                                                                                15
       Detectable online with low cost using
  ●

       hardware performance counters                                            10


                                                                                5

                                                                                 0
                                                                                  0   200     400       600      800     1000     1200         1400
                                                                                            Instructions Completed (Billions)
                                                                                                                                                 13
RapidMRC
RapidMRC for Sizing Partitions
                              For equake + twolf
           ●


                                  Has one long stable phase
                              ●



                              Feed MRCs into utility function
           ●


                                  e.g. Minimize total miss rate
                              ●



                                          equake                                    twolf




                                                            Miss Rate (MPKI)
           Miss Rate (MPKI)




                                    Cache Size (# colors)                      Cache Size (# colors)




                                                                                                       14
RapidMRC
RapidMRC for Sizing Partitions
                              For equake + twolf
           ●


                                  Has one long stable phase
                              ●



                              Feed MRCs into utility function
           ●


                                  e.g. Minimize total miss rate
                              ●



                                          equake                                    twolf




                                                            Miss Rate (MPKI)
           Miss Rate (MPKI)




                                    Cache Size (# colors)                      Cache Size (# colors)




                                                                                                       14
RapidMRC
RapidMRC for Sizing Partitions
                              For equake + twolf
           ●


                                  Has one long stable phase
                              ●



                              Feed MRCs into utility function
           ●


                                  e.g. Minimize total miss rate
                              ●



                                          equake                                         twolf




                                                                 Miss Rate (MPKI)
           Miss Rate (MPKI)




                                    Cache Size (# colors)                           Cache Size (# colors)

                                                            RapidMRC

                                                                                                            14
RapidMRC
RapidMRC for Sizing Partitions
                              For equake + twolf
           ●


                                  Has one long stable phase
                              ●



                              Feed MRCs into utility function
           ●


                                  e.g. Minimize total miss rate
                              ●



                                          equake                                         twolf




                                                                 Miss Rate (MPKI)
           Miss Rate (MPKI)




                                    Cache Size (# colors)                           Cache Size (# colors)

                                                            RapidMRC
                                                            Real MRC
                                                                                                            14
RapidMRC
Performance Impact of Sizes


                                                           Performance of
                                                           uncontrolled
                                                  Real MRC sharing
                                         RapidMRC


                                                           16 twolf
                       0   24        6    8   10   12 14
                                                            0 equake
                       16 14 12     10    8    6    42
                           L2 Cache Sizes (# of colors)



               twolf: 27% IPC improvement
           ●



               equake: unaffected
           ●




                                                                            15
RapidMRC
Conclusion
               RapidMRC
           ●


                   A tracing mechanism can be built with
               ●

                       hardware performance counters
                   Accurately approximates L2 MRCs online in software
               ●


                   230 ms latency, invocable upon phase change
               ●




               Application of RapidMRC
           ●


                   Enables online sizing of L2 cache partitions
               ●


                        Up to 27% performance improvement
                    ●




                                                                        16
RapidMRC
Future Work

               Explore online optimizations made possible by:
           ●


                   RapidMRC
               ●


                        Reducing energy
                    ●


                        Guiding co-scheduling
                    ●



                   Tracing mechanism
               ●




               Extend model
           ●


                   Account for non-uniform miss penalties
               ●




                                                                17
RapidMRC

More Related Content

Similar to RapidMRC

Varkon Semiconductor
Varkon Semiconductor Varkon Semiconductor
Varkon Semiconductor
Rajiv Parmar
 
NIAR_VRC_2010
NIAR_VRC_2010NIAR_VRC_2010
NIAR_VRC_2010
fftoledo
 
RCDU Data Sheet (Interface Displays)
RCDU Data Sheet (Interface Displays)RCDU Data Sheet (Interface Displays)
RCDU Data Sheet (Interface Displays)
kkhutton
 
A Graphical Language for Real-Time Critical Robot Commands
A Graphical Language for Real-Time Critical Robot CommandsA Graphical Language for Real-Time Critical Robot Commands
A Graphical Language for Real-Time Critical Robot Commands
Serge Stinckwich
 
The SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software ApplicationsThe SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software Applications
Heiko Koziolek
 
V Labs Product Presentation
V Labs  Product PresentationV Labs  Product Presentation
V Labs Product Presentation
Wil Huijben
 
Game cloud reference architecture
Game cloud reference architectureGame cloud reference architecture
Game cloud reference architecture
Jonathan Spindel
 
Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)
DVClub
 

Similar to RapidMRC (20)

Varkon Semiconductor
Varkon Semiconductor Varkon Semiconductor
Varkon Semiconductor
 
NIAR_VRC_2010
NIAR_VRC_2010NIAR_VRC_2010
NIAR_VRC_2010
 
Hardware Accelerated 2D Rendering for Android
Hardware Accelerated 2D Rendering for AndroidHardware Accelerated 2D Rendering for Android
Hardware Accelerated 2D Rendering for Android
 
Remote Graphical Rendering
Remote Graphical RenderingRemote Graphical Rendering
Remote Graphical Rendering
 
Anatomy Of An Agile .Net Project
Anatomy Of An Agile .Net ProjectAnatomy Of An Agile .Net Project
Anatomy Of An Agile .Net Project
 
Anatomy Of An Agile .Net Project
Anatomy Of An Agile .Net ProjectAnatomy Of An Agile .Net Project
Anatomy Of An Agile .Net Project
 
Windows offloaded data_transfer_steve_olsson
Windows offloaded data_transfer_steve_olssonWindows offloaded data_transfer_steve_olsson
Windows offloaded data_transfer_steve_olsson
 
RCDU Data Sheet (Interface Displays)
RCDU Data Sheet (Interface Displays)RCDU Data Sheet (Interface Displays)
RCDU Data Sheet (Interface Displays)
 
Embedded Systems Engineering section
Embedded Systems Engineering sectionEmbedded Systems Engineering section
Embedded Systems Engineering section
 
A Graphical Language for Real-Time Critical Robot Commands
A Graphical Language for Real-Time Critical Robot CommandsA Graphical Language for Real-Time Critical Robot Commands
A Graphical Language for Real-Time Critical Robot Commands
 
The SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software ApplicationsThe SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software Applications
 
Towards an Architectural Style for Multi-tenant Software Applications
Towards an Architectural Style for Multi-tenant Software ApplicationsTowards an Architectural Style for Multi-tenant Software Applications
Towards an Architectural Style for Multi-tenant Software Applications
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
V Labs Product Presentation
V Labs  Product PresentationV Labs  Product Presentation
V Labs Product Presentation
 
First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
Game cloud reference architecture
Game cloud reference architectureGame cloud reference architecture
Game cloud reference architecture
 
Yang greenstein part_2
Yang greenstein part_2Yang greenstein part_2
Yang greenstein part_2
 
Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)
 
OpenStack Quantum Network Service
OpenStack Quantum Network ServiceOpenStack Quantum Network Service
OpenStack Quantum Network Service
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

RapidMRC

  • 1. RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations David Tam, Reza Azimi, Livio Soares, Michael Stumm University of Toronto {tamda, azimi, livio, stumm}@eecg.toronto.edu March 10, 2009 ASPLOS
  • 2. Motivation App 1 App 2 App 2 Core Core Shared Cache In shared cache (L2/L3): ● Applications compete for space ● LRU replacement policy used ● Interference: applications can evict each other's content ● 1 RapidMRC Image © Ian Junor, Creative Commons license
  • 3. Cache Partitioning App 1 App 2 App 2 Core Core Shared Cache Eliminates cache space interference ● More flexible than private caches ● Up to 50% improvement in IPC ● 2 RapidMRC
  • 4. OS-Based Partitioning Apply page-coloring technique to implement set-based partitioning ● Guide physical page allocation to control cache line usage ● Works on existing processors [WIOSCA'07] ● Virtual Pages Physical Pages Color A L2 Cache Application } Color A { (N sets) Color A Color A OS Managed Static Mapping (Hardware) 3 RapidMRC
  • 5. OS-Based Partitioning Apply page-coloring technique to implement set-based partitioning ● Guide physical page allocation to control cache line usage ● Works on existing processors [WIOSCA'07] ● Virtual Pages Physical Pages Color A Color B L2 Cache Application A } Color A { (N sets) Color A Color B Virtual Pages } Color B { (N sets) Color A Application B Color B OS Managed Static Mapping (Hardware) 3 RapidMRC
  • 6. Determining L2 Size Use Miss Rate Curve (MRC) ● Application X 100 90 80 Miss Rate (%) 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Allocated Cache Size (%) Shows trade-off spectrum ● Impact of allocating more/less cache space ● Also used for online sizing of main memory ● e.g. [Patterson95], [Zhou04], [Soundararajan08], etc... ● 4 RapidMRC
  • 7. Our Approach: RapidMRC Online approximation of L2 MRC ● Software-based + hardware performance counters ● Requires no change to application binary or source ● Runs on commodity hardware ● Rapid: 230 ms latency ● 5 RapidMRC
  • 8. RapidMRC Steps (1) Track memory accesses (2) Feed accesses into Mattson's stack algorithm (1970) Emulates LRU aging of cache lines ● Maintains histogram of stack distances ● Generates MRC using histogram ● 6 RapidMRC
  • 9. Tracking Accesses Online Hardware technique ● e.g. [Berg04], [Suh04], [Qureshi06], etc... ● Target future processors ● Software technique ● Track accesses by instrumenting program code ● High tracking cost: large volume of data ● Hybrid technique ● Track accesses with hardware performance counters ● Lower tracking cost: smaller volume of data ● 7 RapidMRC
  • 10. Tracking Accesses in IBM POWER5 Hardware Performance Counter Configuration Upon every L2 access: ● (1) Update sampling register with data address (2) Trigger interrupt to copy register to trace log in main memory Register File L1 Cache L2 Accesses L2 Cache Main Memory May miss some L2 cache accesses ● Caused by multiple in-flight L2 accesses ● Results show negligible impact ● 8 RapidMRC
  • 11. Mattson's Stack Algorithm For each L2 access in trace log: ● Find element, record stack distance, move element to top ● Update histogram with stack distance ● stack stack top bottom ... Stack distance Stack size ● One element per L2 cache line ● Optimizations ● Hashing: eliminates linear traversal ● Coarse-grained stack distance: reduces update operations ● 9 RapidMRC
  • 12. Experimental Setup 1.5 GHz dual-core POWER5 ● 1.875 MB shared L2 cache ● 128-byte line size ● 10-way set-associative ● 16 partitions (colors) possible ● Linux 2.6.x ● Added RapidMRC mechanism ● Added cache partitioning mechanism ● Trace log length ● 10 x number of cache lines ● 30 applications ● SPECjbb2000, SPECcpu2000, SPECcpu2006 ● 10 RapidMRC
  • 13. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf 11 RapidMRC
  • 14. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 15. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 16. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 17. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 18. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 19. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 20. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 21. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 22. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 23. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 24. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 25. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 26. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 27. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 28. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 29. Obtaining Real L2 MRCs Offline method: run application 16 times ● Once for each cache partition size ● Measure L2 cache miss rate ● mcf Slice Real L2 MRC L2 Miss Rate (MPKI) L2 Cache Size 11 RapidMRC
  • 30. RapidMRC vs Real MRC Accuracy: execution slice at 10 billion instrs ● mgrid jbb gzip Miss Rate (MPKI) Cache Size (# colors) ammp mcf 2k xalancbmk 12 RapidMRC
  • 31. Latency of RapidMRC Invoke Obtained RapidMRC RapidMRC App is paused 4X app slowdown 147 ms 83 ms Time Trace Mattson's Alg 230 ms RapidMRC 13 RapidMRC
  • 32. Latency of RapidMRC Phase Change Invoke Invoke Obtained RapidMRC RapidMRC RapidMRC App is paused 4X app slowdown 147 ms 83 ms Time Trace Mattson's Alg 230 ms RapidMRC 13 RapidMRC
  • 33. Latency of RapidMRC Phase Change Invoke Invoke Obtained RapidMRC RapidMRC RapidMRC App is paused 4X app slowdown 147 ms 83 ms Time Trace Mattson's Alg 230 ms RapidMRC Amortize costs 13 RapidMRC
  • 34. Latency of RapidMRC Phase Change Invoke Invoke Obtained RapidMRC RapidMRC RapidMRC App is paused 4X app slowdown 147 ms 83 ms Time Trace Mattson's Alg 230 ms RapidMRC Amortize costs Phase length: 5 mins median 13 RapidMRC
  • 35. Latency of RapidMRC Phase Change Invoke Invoke Obtained RapidMRC RapidMRC RapidMRC App is paused 4X app slowdown 147 ms 83 ms Time Trace Mattson's Alg 230 ms RapidMRC Amortize costs Phase length: 5 mins median 30 astar size = 2 size = 4 size = 6 size = 8 25 size = 10 size = 12 L2 Miss Rate (MPKI) size = 14 size = 16 Phase change detection 20 Abrupt change in IPC, miss rate ● 15 Detectable online with low cost using ● hardware performance counters 10 5 0 0 200 400 600 800 1000 1200 1400 Instructions Completed (Billions) 13 RapidMRC
  • 36. Latency of RapidMRC Phase Change Invoke Invoke Obtained RapidMRC RapidMRC RapidMRC App is paused 4X app slowdown 147 ms 83 ms Time Trace Mattson's Alg 230 ms RapidMRC Amortize costs Phase length: 5 mins median 30 astar size = 2 size = 4 size = 6 size = 8 25 size = 10 size = 12 L2 Miss Rate (MPKI) size = 14 size = 16 Phase change detection 20 Abrupt change in IPC, miss rate ● 15 Detectable online with low cost using ● hardware performance counters 10 5 0 0 200 400 600 800 1000 1200 1400 Instructions Completed (Billions) 13 RapidMRC
  • 37. RapidMRC for Sizing Partitions For equake + twolf ● Has one long stable phase ● Feed MRCs into utility function ● e.g. Minimize total miss rate ● equake twolf Miss Rate (MPKI) Miss Rate (MPKI) Cache Size (# colors) Cache Size (# colors) 14 RapidMRC
  • 38. RapidMRC for Sizing Partitions For equake + twolf ● Has one long stable phase ● Feed MRCs into utility function ● e.g. Minimize total miss rate ● equake twolf Miss Rate (MPKI) Miss Rate (MPKI) Cache Size (# colors) Cache Size (# colors) 14 RapidMRC
  • 39. RapidMRC for Sizing Partitions For equake + twolf ● Has one long stable phase ● Feed MRCs into utility function ● e.g. Minimize total miss rate ● equake twolf Miss Rate (MPKI) Miss Rate (MPKI) Cache Size (# colors) Cache Size (# colors) RapidMRC 14 RapidMRC
  • 40. RapidMRC for Sizing Partitions For equake + twolf ● Has one long stable phase ● Feed MRCs into utility function ● e.g. Minimize total miss rate ● equake twolf Miss Rate (MPKI) Miss Rate (MPKI) Cache Size (# colors) Cache Size (# colors) RapidMRC Real MRC 14 RapidMRC
  • 41. Performance Impact of Sizes Performance of uncontrolled Real MRC sharing RapidMRC 16 twolf 0 24 6 8 10 12 14 0 equake 16 14 12 10 8 6 42 L2 Cache Sizes (# of colors) twolf: 27% IPC improvement ● equake: unaffected ● 15 RapidMRC
  • 42. Conclusion RapidMRC ● A tracing mechanism can be built with ● hardware performance counters Accurately approximates L2 MRCs online in software ● 230 ms latency, invocable upon phase change ● Application of RapidMRC ● Enables online sizing of L2 cache partitions ● Up to 27% performance improvement ● 16 RapidMRC
  • 43. Future Work Explore online optimizations made possible by: ● RapidMRC ● Reducing energy ● Guiding co-scheduling ● Tracing mechanism ● Extend model ● Account for non-uniform miss penalties ● 17 RapidMRC