SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Compiler optimizations
based on call-graph flattening
Carlo Alberto Ferraris
professor Silvano Rivoira

Master of Science in Telecommunication Engineering
Third School of Engineering: Information Technology
Politecnico di Torino
July 6th, 2011
Increasing complexities
Everyday objects are becoming
  multi-purpose
  networked
  interoperable
  customizable
  reusable
  upgradeable
Increasing complexities
Everyday objects are becoming
  more and more complex
Increasing complexities
Software that runs smart objects is becoming
  more and more complex
Diminishing resources
Systems have to be resource-efficient
Diminishing resources
Systems have to be resource-efficient

Resources come in many different flavours
Diminishing resources
Systems have to be resource-efficient

Resources come in many different flavours
Power
Especially valuable in battery-powered scenarios
  such as mobile, sensor, 3rd world applications
Diminishing resources
Systems have to be resource-efficient

Resources come in many different flavours
Power, density
Critical factor in data-center and product design
Diminishing resources
Systems have to be resource-efficient

Resources come in many different flavours
Power, density, computational
CPU, RAM, storage, etc. are often growing slower
  than the potential applications
Diminishing resources
Systems have to be resource-efficient

Resources come in many different flavours
Power, density, computational, development
Development time and costs should be as low as
  possible for low TTM and profitability
Diminishing resources
Systems have to be resource-efficient

Resources come in many non-orthogonal flavours
Power, density, computational, development
Do more with less
Abstractions
We need to modularize and hide the complexity
Operating systems, frameworks, libraries,
 managed languages, virtual machines, 

Abstractions
We need to modularize and hide the complexity
Operating systems, frameworks, libraries,
 managed languages, virtual machines, 


All of this comes with a cost: generic solutions are
   generally less efficient than ad-hoc ones
Abstractions
We need to modularize and hide the complexity


                         Palm webOS
                         User interface running on
                         HTML+CSS+Javascript
Abstractions
We need to modularize and hide the complexity


                         Javascript PC emulator
                         Running Linux inside a browser
Optimizations
We need to modularize and hide the complexity
 without sacrificing performance
Optimizations
We need to modularize and hide the complexity
 without sacrificing performance

Compiler optimizations trade off compilation time
  with development, execution time
Vestigial abstractions
The natural subdivision of code in functions is
  maintained in the compiler and all the way down
  to the processor

Each function is self-contained with strict
  conventions regulating how it relates to other
  functions
Vestigial abstractions
Processors don’t care about functions; respecting
  the conventions is just additional work

Push the contents of the registers and return
  address on the stack, jump to the callee;
  execute the callee, jump to the return address;
  restore the registers from the stack
Vestigial abstractions
Many optimizations are simply not feasible when
 functions are present
   int replace(int* ptr, int value) {   void *malloc(size_t size) {
     int tmp = *ptr;                      void *ret;
     *ptr = value;                        // [various checks]
     return tmp;                          ret = imalloc(size);
   }                                      if (ret == NULL)
                                            errno = ENOMEM;
   int A(int* ptr, int value) {           return ret;
     return replace(ptr, value);        }
   }
                                        // ...
   int B(int* ptr, int value) {         type *ptr = malloc(size);
     replace(ptr, value);               if (ptr == NULL)
     return value;                        return NOT_ENOUGH_MEMORY;
   }                                    // ...
Vestigial abstractions
Many optimizations are simply not feasible when
 functions are present
        interpreter_setup();
        while (opcode = get_next_instruction())
          interpreter_step(opcode);
        interpreter_shutdown();

        function interpreter_step(opcode) {
          switch (opcode) {
            case opcode_instruction_A: execute_instruction_A(); break;
            case opcode_instruction_B: execute_instruction_B(); break;
            // ...
            default:                   abort("illegal opcode!");
          }
        }
Vestigial abstractions
Many optimization efforts are directed at working
 around the overhead caused by functions

Inlining clones the body of the callee in the caller;
   optimal solution w.r.t. calling overhead but
   causes code size increase and cache pollution;
   useful only on small, hot functions
Call-graph flattening
Call-graph flattening
What if we dismiss
 functions during early
 compilation

Call-graph flattening
What if we dismiss
 functions during early
 compilation and track
 the control flow
 explicitely instead?
Call-graph flattening
What if we dismiss
 functions during early
 compilation and track
 the control flow
 explicitely instead?
Call-graph flattening
What if we dismiss
 functions during early
 compilation and track
 the control flow
 explicitely instead?
Call-graph flattening
We get most benefits of inlining without code
 duplication, including the ability to perform
 contextual code optimizations, without the
 code size issues
Call-graph flattening
We get most benefits of inlining without code
 duplication, including the ability to perform
 contextual code optimizations, without the
 code size issues

Where’s the catch?
Call-graph flattening
The load on the compiler increases greatly both
  directly due to CGF itself and also indirectly due
  to subsequent optimizations

Worse case complexity (number of edges) is
 quadratic w.r.t. the number of callsites being
 transformed (heuristics may help)
Call-graph flattening
During CGF we need to statically keep track of all
 live values across all callsites in all functions

A value is alive if it will be needed in subsequent
  instructions
                     A = 5, B = 9, C = 0;
                     // live: A, B
                     C = sqrt(B);
                     // live: A, C
                     return A + C;
Call-graph flattening
Basically the compiler has to statically emulate
  ahead-of-time all the possible stack usages of
  the program

This has already been done on microcontrollers
  and resulted in a 23% decrease of stack usage
  (and 5% performance increase)
Call-graph flattening
The indirect cause of increased compiler load
  comes from standard optimizations that are run
  after CGF

CGF does not create new branches (each call and
  return instruction is turned into a jump) but
  other optimizations can
Call-graph flattening
The indirect cause of increased compiler load
  comes from standard optimizations that are run
  after CGF

Most optimizations are designed to operate on
 small functions with limited amounts of
 branches
Call-graph flattening
Many possible application scenarios beside
 inlining
Call-graph flattening
Many possible application scenarios beside
 inlining

Code motion
Move instructions between function boundaries;
  avoid unneeded computations, alleviate
  register pressure, improve cache locality
Call-graph flattening
Many possible application scenarios beside
 inlining

Code motion, macro compression
Find similar code sequences in different parts of
  the code and merge them; reduce code size and
  cache pollution
Call-graph flattening
Many possible application scenarios beside
 inlining

Code motion, macro compression, nonlinear CF
CGF supports natively nonlinear control flows;
  almost-zero-cost EH and coroutines
Call-graph flattening
Many possible application scenarios beside
 inlining

Code motion, macro compression, nonlinear CF,
  stackless execution
No runtime stack needed in fully-flattened
  programs
Call-graph flattening
Many possible application scenarios beside
 inlining

Code motion, macro compression, nonlinear CF,
  stackless execution, stack protection
Effective stack poisoning attacks are much harder
  or even impossible
Implementation
To test if CGF is applicable also to complex
  architectures and to validate some of the ideas
  presented in the thesis, a pilot implementation
  was written against the open-source LLVM
  compiler framework
Implementation
Operates on LLVM-IR; host and target
 architecture agnostic; roughly 800 lines of C++
 code in 4 classes

The pilot implementation can not flatten
  recursive, indirect or variadic callsites; they can
  be used anyway
Implementation
Enumerate suitable functions
Enumerate suitable callsites (and their live values)
Create dispatch function, populate with code
Transform callsites
Propagate live values
Remove original functions or create wrappers
Examples
int a(int n) {
    return n+1;
}

int b(int n) {
    int i;
    for (i=0; i<10000; i++)
        n = a(n);
    return n;
}
int a(int n) {
    return n+1;
}

int b(int n) {
    int i;
    for (i=0; i<10000; i++)
        n = a(n);
    return n;
}
int a(int n) {
    return n+1;
}

int b(int n) {
    int i;
    for (i=0; i<10000; i++)
        n = a(n);
    return n;
}
Examples
int a(int n) {
    return n+1;
}

int b(int n) {
    n = a(n);
    n = a(n);
    n = a(n);
    n = a(n);
    return n;
}
int a(int n) {
    return n+1;
}

int b(int n) {
    n = a(n);
    n = a(n);
    n = a(n);
    n = a(n);
    return n;
}
.type    .Ldispatch,@function
.Ldispatch:
    movl     $.Ltmp4, %eax   # store the return dispather of a in rax
    jmpq     *%rdi           # jump to the requested outer disp.
.Ltmp2:                      # outer dispatcher of b
    movl     $.LBB2_4, %eax # store the address of %10
.Ltmp0:                      # outer dispatcher of a
    movl     (%rsi), %ecx    # load the argument n in ecx
    jmp     .LBB2_4
.Ltmp8:                      # block %17
    movl     $.Ltmp6, %eax
    jmp     .LBB2_4
.Ltmp6:                      # block %18
    movl     $.Ltmp7, %eax
.LBB2_4:                     # block %10
    movq     %rax, %rsi
    incl     %ecx            # n = n + 1
    movl     $.Ltmp8, %eax
    jmpq     *%rsi           # indirectbr
.Ltmp4:                      # return dispatcher of a
    movl     %ecx, (%rdx)    # store in pointer rdx the return value
    ret                      # in ecx and return to the wrapper
.Ltmp7:                      # return dispatcher of b
    movl     %ecx, (%rdx)
    ret
Fuzzing
To stress test the pilot implementation and to
  perform benchmarks a tunable fuzzer has been
  written
int f_1_2(int a)   {
  a += 1;
  switch (a%3) {
    case 0: a +=   f_0_2(a); break;
    case 1: a +=   f_0_4(a); break;
    case 2: a +=   f_0_6(a); break;
  }
  return a;
}
Benchmarks
Due to the shortcomings in the currently available
 optimizations in LLVM, the only meaningful
 benchmarks that can be done are those
 concerning code size and stack usage

In literature, average code size increases of 13%
   were reported due to CGF
Benchmarks
Using our tunable fuzzer different programs were
  generated and key statistics of the compiled
  code were gathered
Benchmarks
Using our tunable fuzzer different programs were
  generated and key statistics of the compiled
  code were gathered
Benchmarks
In short, when optimizations work the resulting
   code size is better than the one found in
   literature
Benchmarks
In short, when optimizations work the resulting
   code size is better than the one found in
   literature

When they don’t, the register spiller and allocator
 perform so badly that most instructions simply
 shuffle data around on the stack
Benchmarks
Next steps
Reduce live value verbosity
Alternative indirection schemes
Tune available optimizations for CGF constructs
Better register spiller and allocator
Ad-hoc optimizations (code threader, adaptive fl.)
Support recursion, indirect calls; better wrappers
Conclusions
“Do more with less”; optimizations are required
CGF removes unneeded overhead due to low-level
  abstractions and empowers powerful global
  optimizations
Benchmark results of the pilot implementation
  are better than those in literature when
  available LLVM optimizations can cope
Compiler optimizations
based on call-graph flattening
Carlo Alberto Ferraris
professor Silvano Rivoira
.type wrapper,@function
subq $24, %rsp       # allocate space on the stack
movl %edi, 16(%rsp) # store the argument n on the stack
movl $.Ltmp0, %edi   # address of the outer dispatcher
leaq 16(%rsp), %rsi # address of the incoming argument(s)
leaq 12(%rsp), %rdx # address of the return value(s)
callq .Ldispatch     # call to the dispatch function
movl 12(%rsp), %eax # load the ret value from the stack
addq $24, %rsp       # deallocate space on the stack
ret                  # return

Weitere Àhnliche Inhalte

Was ist angesagt?

Compiler optimization techniques
Compiler optimization techniquesCompiler optimization techniques
Compiler optimization techniquesHardik Devani
 
Peephole Optimization
Peephole OptimizationPeephole Optimization
Peephole OptimizationMeghaj Mallick
 
Aorta interpreter Builder 2019
Aorta interpreter Builder 2019Aorta interpreter Builder 2019
Aorta interpreter Builder 2019Igor Braga
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimizationliu_ming50
 
Code optimisation presnted
Code optimisation presntedCode optimisation presnted
Code optimisation presntedbhavanatmithun
 
Unit iv functions
Unit  iv functionsUnit  iv functions
Unit iv functionsindra Kishor
 
Function Overloading,Inline Function and Recursion in C++ By Faisal Shahzad
Function Overloading,Inline Function and Recursion in C++ By Faisal ShahzadFunction Overloading,Inline Function and Recursion in C++ By Faisal Shahzad
Function Overloading,Inline Function and Recursion in C++ By Faisal ShahzadFaisal Shehzad
 
Optimization in Programming languages
Optimization in Programming languagesOptimization in Programming languages
Optimization in Programming languagesAnkit Pandey
 
Modular Programming in C
Modular Programming in CModular Programming in C
Modular Programming in Cbhawna kol
 
Issues in design_of_code_generator
Issues in design_of_code_generatorIssues in design_of_code_generator
Issues in design_of_code_generatorvinithapanneer
 
Peephole optimization techniques
Peephole optimization techniquesPeephole optimization techniques
Peephole optimization techniquesgarishma bhatia
 
Introduction to Procedural Programming in C++
Introduction to Procedural Programming in C++Introduction to Procedural Programming in C++
Introduction to Procedural Programming in C++Salahaddin University-Erbil
 

Was ist angesagt? (20)

Open mp
Open mpOpen mp
Open mp
 
Compiler optimization techniques
Compiler optimization techniquesCompiler optimization techniques
Compiler optimization techniques
 
Code optimization
Code optimizationCode optimization
Code optimization
 
Peephole Optimization
Peephole OptimizationPeephole Optimization
Peephole Optimization
 
Peephole Optimization
Peephole OptimizationPeephole Optimization
Peephole Optimization
 
Aorta interpreter Builder 2019
Aorta interpreter Builder 2019Aorta interpreter Builder 2019
Aorta interpreter Builder 2019
 
sCode optimization
sCode optimizationsCode optimization
sCode optimization
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
Code optimisation presnted
Code optimisation presntedCode optimisation presnted
Code optimisation presnted
 
Unit iv functions
Unit  iv functionsUnit  iv functions
Unit iv functions
 
Function Overloading,Inline Function and Recursion in C++ By Faisal Shahzad
Function Overloading,Inline Function and Recursion in C++ By Faisal ShahzadFunction Overloading,Inline Function and Recursion in C++ By Faisal Shahzad
Function Overloading,Inline Function and Recursion in C++ By Faisal Shahzad
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Optimization in Programming languages
Optimization in Programming languagesOptimization in Programming languages
Optimization in Programming languages
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Embedded C - Optimization techniques
Embedded C - Optimization techniquesEmbedded C - Optimization techniques
Embedded C - Optimization techniques
 
C programming session10
C programming  session10C programming  session10
C programming session10
 
Modular Programming in C
Modular Programming in CModular Programming in C
Modular Programming in C
 
Issues in design_of_code_generator
Issues in design_of_code_generatorIssues in design_of_code_generator
Issues in design_of_code_generator
 
Peephole optimization techniques
Peephole optimization techniquesPeephole optimization techniques
Peephole optimization techniques
 
Introduction to Procedural Programming in C++
Introduction to Procedural Programming in C++Introduction to Procedural Programming in C++
Introduction to Procedural Programming in C++
 

Ähnlich wie Compiler optimizations based on call-graph flattening

Inline function
Inline functionInline function
Inline functionTech_MX
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeDmitri Nesteruk
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesMarina Kolpakova
 
Generalized Functors - Realizing Command Design Pattern in C++
Generalized Functors - Realizing Command Design Pattern in C++Generalized Functors - Realizing Command Design Pattern in C++
Generalized Functors - Realizing Command Design Pattern in C++ppd1961
 
Cling the llvm based interpreter
Cling the llvm based interpreterCling the llvm based interpreter
Cling the llvm based interpreterRoberto Nogueira
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review ProcessDr. Syed Hassan Amin
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...IntelÂź Software
 
cbybalaguruswami-e-180803051831.pptx
cbybalaguruswami-e-180803051831.pptxcbybalaguruswami-e-180803051831.pptx
cbybalaguruswami-e-180803051831.pptxSRamadossbiher
 
cbybalaguruswami-e-180803051831.pptx
cbybalaguruswami-e-180803051831.pptxcbybalaguruswami-e-180803051831.pptx
cbybalaguruswami-e-180803051831.pptxSRamadossbiher
 
C Programming Language Tutorial for beginners - JavaTpoint
C Programming Language Tutorial for beginners - JavaTpointC Programming Language Tutorial for beginners - JavaTpoint
C Programming Language Tutorial for beginners - JavaTpointJavaTpoint.Com
 
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)Ovidiu Farauanu
 
(3) cpp procedural programming
(3) cpp procedural programming(3) cpp procedural programming
(3) cpp procedural programmingNico Ludwig
 
What's new in c# 5.0 net ponto
What's new in c# 5.0   net pontoWhat's new in c# 5.0   net ponto
What's new in c# 5.0 net pontoPaulo Morgado
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMchiportal
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...Masashi Shibata
 
Object oriented programming system with C++
Object oriented programming system with C++Object oriented programming system with C++
Object oriented programming system with C++msharshitha03s
 
VIT351 Software Development VI Unit1
VIT351 Software Development VI Unit1VIT351 Software Development VI Unit1
VIT351 Software Development VI Unit1YOGESH SINGH
 

Ähnlich wie Compiler optimizations based on call-graph flattening (20)

Inline function
Inline functionInline function
Inline function
 
LLVM
LLVMLLVM
LLVM
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
Generalized Functors - Realizing Command Design Pattern in C++
Generalized Functors - Realizing Command Design Pattern in C++Generalized Functors - Realizing Command Design Pattern in C++
Generalized Functors - Realizing Command Design Pattern in C++
 
Introduction Of C++
Introduction Of C++Introduction Of C++
Introduction Of C++
 
Cling the llvm based interpreter
Cling the llvm based interpreterCling the llvm based interpreter
Cling the llvm based interpreter
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review Process
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
cbybalaguruswami-e-180803051831.pptx
cbybalaguruswami-e-180803051831.pptxcbybalaguruswami-e-180803051831.pptx
cbybalaguruswami-e-180803051831.pptx
 
cbybalaguruswami-e-180803051831.pptx
cbybalaguruswami-e-180803051831.pptxcbybalaguruswami-e-180803051831.pptx
cbybalaguruswami-e-180803051831.pptx
 
C Programming Language Tutorial for beginners - JavaTpoint
C Programming Language Tutorial for beginners - JavaTpointC Programming Language Tutorial for beginners - JavaTpoint
C Programming Language Tutorial for beginners - JavaTpoint
 
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
 
C programming session9 -
C programming  session9 -C programming  session9 -
C programming session9 -
 
(3) cpp procedural programming
(3) cpp procedural programming(3) cpp procedural programming
(3) cpp procedural programming
 
What's new in c# 5.0 net ponto
What's new in c# 5.0   net pontoWhat's new in c# 5.0   net ponto
What's new in c# 5.0 net ponto
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBM
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
 
Object oriented programming system with C++
Object oriented programming system with C++Object oriented programming system with C++
Object oriented programming system with C++
 
VIT351 Software Development VI Unit1
VIT351 Software Development VI Unit1VIT351 Software Development VI Unit1
VIT351 Software Development VI Unit1
 

KĂŒrzlich hochgeladen

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

KĂŒrzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Compiler optimizations based on call-graph flattening

  • 1. Compiler optimizations based on call-graph flattening Carlo Alberto Ferraris professor Silvano Rivoira Master of Science in Telecommunication Engineering Third School of Engineering: Information Technology Politecnico di Torino July 6th, 2011
  • 2. Increasing complexities Everyday objects are becoming multi-purpose networked interoperable customizable reusable upgradeable
  • 3. Increasing complexities Everyday objects are becoming more and more complex
  • 4. Increasing complexities Software that runs smart objects is becoming more and more complex
  • 5. Diminishing resources Systems have to be resource-efficient
  • 6. Diminishing resources Systems have to be resource-efficient Resources come in many different flavours
  • 7. Diminishing resources Systems have to be resource-efficient Resources come in many different flavours Power Especially valuable in battery-powered scenarios such as mobile, sensor, 3rd world applications
  • 8. Diminishing resources Systems have to be resource-efficient Resources come in many different flavours Power, density Critical factor in data-center and product design
  • 9. Diminishing resources Systems have to be resource-efficient Resources come in many different flavours Power, density, computational CPU, RAM, storage, etc. are often growing slower than the potential applications
  • 10. Diminishing resources Systems have to be resource-efficient Resources come in many different flavours Power, density, computational, development Development time and costs should be as low as possible for low TTM and profitability
  • 11. Diminishing resources Systems have to be resource-efficient Resources come in many non-orthogonal flavours Power, density, computational, development
  • 12. Do more with less
  • 13. Abstractions We need to modularize and hide the complexity Operating systems, frameworks, libraries, managed languages, virtual machines, 

  • 14. Abstractions We need to modularize and hide the complexity Operating systems, frameworks, libraries, managed languages, virtual machines, 
 All of this comes with a cost: generic solutions are generally less efficient than ad-hoc ones
  • 15. Abstractions We need to modularize and hide the complexity Palm webOS User interface running on HTML+CSS+Javascript
  • 16. Abstractions We need to modularize and hide the complexity Javascript PC emulator Running Linux inside a browser
  • 17. Optimizations We need to modularize and hide the complexity without sacrificing performance
  • 18. Optimizations We need to modularize and hide the complexity without sacrificing performance Compiler optimizations trade off compilation time with development, execution time
  • 19. Vestigial abstractions The natural subdivision of code in functions is maintained in the compiler and all the way down to the processor Each function is self-contained with strict conventions regulating how it relates to other functions
  • 20. Vestigial abstractions Processors don’t care about functions; respecting the conventions is just additional work Push the contents of the registers and return address on the stack, jump to the callee; execute the callee, jump to the return address; restore the registers from the stack
  • 21. Vestigial abstractions Many optimizations are simply not feasible when functions are present int replace(int* ptr, int value) { void *malloc(size_t size) { int tmp = *ptr; void *ret; *ptr = value; // [various checks] return tmp; ret = imalloc(size); } if (ret == NULL) errno = ENOMEM; int A(int* ptr, int value) { return ret; return replace(ptr, value); } } // ... int B(int* ptr, int value) { type *ptr = malloc(size); replace(ptr, value); if (ptr == NULL) return value; return NOT_ENOUGH_MEMORY; } // ...
  • 22. Vestigial abstractions Many optimizations are simply not feasible when functions are present interpreter_setup(); while (opcode = get_next_instruction()) interpreter_step(opcode); interpreter_shutdown(); function interpreter_step(opcode) { switch (opcode) { case opcode_instruction_A: execute_instruction_A(); break; case opcode_instruction_B: execute_instruction_B(); break; // ... default: abort("illegal opcode!"); } }
  • 23. Vestigial abstractions Many optimization efforts are directed at working around the overhead caused by functions Inlining clones the body of the callee in the caller; optimal solution w.r.t. calling overhead but causes code size increase and cache pollution; useful only on small, hot functions
  • 25. Call-graph flattening What if we dismiss functions during early compilation

  • 26. Call-graph flattening What if we dismiss functions during early compilation and track the control flow explicitely instead?
  • 27. Call-graph flattening What if we dismiss functions during early compilation and track the control flow explicitely instead?
  • 28. Call-graph flattening What if we dismiss functions during early compilation and track the control flow explicitely instead?
  • 29. Call-graph flattening We get most benefits of inlining without code duplication, including the ability to perform contextual code optimizations, without the code size issues
  • 30. Call-graph flattening We get most benefits of inlining without code duplication, including the ability to perform contextual code optimizations, without the code size issues Where’s the catch?
  • 31. Call-graph flattening The load on the compiler increases greatly both directly due to CGF itself and also indirectly due to subsequent optimizations Worse case complexity (number of edges) is quadratic w.r.t. the number of callsites being transformed (heuristics may help)
  • 32. Call-graph flattening During CGF we need to statically keep track of all live values across all callsites in all functions A value is alive if it will be needed in subsequent instructions A = 5, B = 9, C = 0; // live: A, B C = sqrt(B); // live: A, C return A + C;
  • 33. Call-graph flattening Basically the compiler has to statically emulate ahead-of-time all the possible stack usages of the program This has already been done on microcontrollers and resulted in a 23% decrease of stack usage (and 5% performance increase)
  • 34. Call-graph flattening The indirect cause of increased compiler load comes from standard optimizations that are run after CGF CGF does not create new branches (each call and return instruction is turned into a jump) but other optimizations can
  • 35. Call-graph flattening The indirect cause of increased compiler load comes from standard optimizations that are run after CGF Most optimizations are designed to operate on small functions with limited amounts of branches
  • 36. Call-graph flattening Many possible application scenarios beside inlining
  • 37. Call-graph flattening Many possible application scenarios beside inlining Code motion Move instructions between function boundaries; avoid unneeded computations, alleviate register pressure, improve cache locality
  • 38. Call-graph flattening Many possible application scenarios beside inlining Code motion, macro compression Find similar code sequences in different parts of the code and merge them; reduce code size and cache pollution
  • 39. Call-graph flattening Many possible application scenarios beside inlining Code motion, macro compression, nonlinear CF CGF supports natively nonlinear control flows; almost-zero-cost EH and coroutines
  • 40. Call-graph flattening Many possible application scenarios beside inlining Code motion, macro compression, nonlinear CF, stackless execution No runtime stack needed in fully-flattened programs
  • 41. Call-graph flattening Many possible application scenarios beside inlining Code motion, macro compression, nonlinear CF, stackless execution, stack protection Effective stack poisoning attacks are much harder or even impossible
  • 42. Implementation To test if CGF is applicable also to complex architectures and to validate some of the ideas presented in the thesis, a pilot implementation was written against the open-source LLVM compiler framework
  • 43. Implementation Operates on LLVM-IR; host and target architecture agnostic; roughly 800 lines of C++ code in 4 classes The pilot implementation can not flatten recursive, indirect or variadic callsites; they can be used anyway
  • 44. Implementation Enumerate suitable functions Enumerate suitable callsites (and their live values) Create dispatch function, populate with code Transform callsites Propagate live values Remove original functions or create wrappers
  • 45. Examples int a(int n) { return n+1; } int b(int n) { int i; for (i=0; i<10000; i++) n = a(n); return n; }
  • 46. int a(int n) { return n+1; } int b(int n) { int i; for (i=0; i<10000; i++) n = a(n); return n; }
  • 47. int a(int n) { return n+1; } int b(int n) { int i; for (i=0; i<10000; i++) n = a(n); return n; }
  • 48. Examples int a(int n) { return n+1; } int b(int n) { n = a(n); n = a(n); n = a(n); n = a(n); return n; }
  • 49. int a(int n) { return n+1; } int b(int n) { n = a(n); n = a(n); n = a(n); n = a(n); return n; }
  • 50. .type .Ldispatch,@function .Ldispatch: movl $.Ltmp4, %eax # store the return dispather of a in rax jmpq *%rdi # jump to the requested outer disp. .Ltmp2: # outer dispatcher of b movl $.LBB2_4, %eax # store the address of %10 .Ltmp0: # outer dispatcher of a movl (%rsi), %ecx # load the argument n in ecx jmp .LBB2_4 .Ltmp8: # block %17 movl $.Ltmp6, %eax jmp .LBB2_4 .Ltmp6: # block %18 movl $.Ltmp7, %eax .LBB2_4: # block %10 movq %rax, %rsi incl %ecx # n = n + 1 movl $.Ltmp8, %eax jmpq *%rsi # indirectbr .Ltmp4: # return dispatcher of a movl %ecx, (%rdx) # store in pointer rdx the return value ret # in ecx and return to the wrapper .Ltmp7: # return dispatcher of b movl %ecx, (%rdx) ret
  • 51. Fuzzing To stress test the pilot implementation and to perform benchmarks a tunable fuzzer has been written int f_1_2(int a) { a += 1; switch (a%3) { case 0: a += f_0_2(a); break; case 1: a += f_0_4(a); break; case 2: a += f_0_6(a); break; } return a; }
  • 52.
  • 53. Benchmarks Due to the shortcomings in the currently available optimizations in LLVM, the only meaningful benchmarks that can be done are those concerning code size and stack usage In literature, average code size increases of 13% were reported due to CGF
  • 54. Benchmarks Using our tunable fuzzer different programs were generated and key statistics of the compiled code were gathered
  • 55. Benchmarks Using our tunable fuzzer different programs were generated and key statistics of the compiled code were gathered
  • 56. Benchmarks In short, when optimizations work the resulting code size is better than the one found in literature
  • 57. Benchmarks In short, when optimizations work the resulting code size is better than the one found in literature When they don’t, the register spiller and allocator perform so badly that most instructions simply shuffle data around on the stack
  • 59. Next steps Reduce live value verbosity Alternative indirection schemes Tune available optimizations for CGF constructs Better register spiller and allocator Ad-hoc optimizations (code threader, adaptive fl.) Support recursion, indirect calls; better wrappers
  • 60. Conclusions “Do more with less”; optimizations are required CGF removes unneeded overhead due to low-level abstractions and empowers powerful global optimizations Benchmark results of the pilot implementation are better than those in literature when available LLVM optimizations can cope
  • 61. Compiler optimizations based on call-graph flattening Carlo Alberto Ferraris professor Silvano Rivoira
  • 62.
  • 63.
  • 64. .type wrapper,@function subq $24, %rsp # allocate space on the stack movl %edi, 16(%rsp) # store the argument n on the stack movl $.Ltmp0, %edi # address of the outer dispatcher leaq 16(%rsp), %rsi # address of the incoming argument(s) leaq 12(%rsp), %rdx # address of the return value(s) callq .Ldispatch # call to the dispatch function movl 12(%rsp), %eax # load the ret value from the stack addq $24, %rsp # deallocate space on the stack ret # return