SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Type-preserving heap profiler for C++

József Mihalicza             Zoltán Porkoláb             Ábel Gábor


            Eötvös Loránd University                    NNG


                      Budapest, Hungary, 2011.


Supported by the European Union
Co-financed by the European Social Fund
grant agreement no. TÁMOP 4.2.1./B-09/1/KMR-2010-0003
Outline

• Motivation

• Implementation

• Case study

• Summary
Outline

• Motivation

• Implementation

• Case study

• Summary
Motivation
• NNG: navigation software on handheld devices
   – How much RAM is needed?
     What if I do not ask feature A, B, C, will it fit into 32MB?
• Need for detailed memory usage profile per feature
   – We already have call stacks in heap profilers
      • Difficult to map to feature
   – Source location
      • Better, but not granulated enough
   – Type
      • Class types are in direct connection with features
Motivation (2)
• Natural clusterisation
• Quick overview of the big players
• Optimisation clues
    – Store my bool members in a bitfield?
    – How many instances of your class exist in a typical run?
• Languages with more introspection already have this
    – Haskell, Objective C, Java, C#




http://book.realworldhaskell.org/read/profiling-and-optimization.html
Outline

• Motivation

• Implementation

• Case study

• Summary
Implementation (1)
• post build techniques
   – sampling profiler: captures call stacks
   – instrumentation: small code fragments injected
   – types have already been lost by this point
      • we only see calls to memory allocation primitives (malloc,
        HeapAlloc etc.) along with the size parameter in bytes
• during build
   – types are available at compile time
   – we should choose this one
   – compiler support for code instrumentation
      • custom enter and exit functions called at each function call
      • build a calling context tree: each node represents a call
        stack (route from root)
      • maintain actual call stack per thread
Implementation (2)
• so far we have call stacks
• monitor (de)allocations
   – hook allocation routines
      • operator new
      • malloc (typeless entirely)
   – maintain a list of currently allocated blocks, with metainfo
      • hashmap: pointer -> metainfo
      • metainfo: size, alloc/free time, alloc/free call stack (captured
        in situ), pointer to call tree node
      • Add type to metainfo: not available in the hooks
          – void* operator new (size_t);
   – collect metainfo of deallocated blocks in a fixed size buffer,
     dump to file when full
      • have full history of heap behavior
Implementation (3)
• where is type lost?
• std::pair<int,int>* a = new std::pair<int,int>(3,4)
                              static type: std::pair<int,int>*
   – void* operator new (size_t);
           static type: void*
   – ctor/dtor: does not necessarily exist
   – the only proper hook point: new keyword at call site    Macro
• initial approaches:
   –   New((std::pair<int,int>),(3,4))
   –   New((std::pair<int,int>)(3,4))
   –   New(new std::pair<int,int>(3,4))
   –   new std::pair<int,int>(3,4))
• final solution:
   – std::pair<int,int>* a = new std::pair<int,int>(3,4)
Implementation (4)
• how does it work, what happens underneath?
#define new NewTrick() * new

template<class T>
struct TYPE_IDENTIFIER { static char mDummy; };
struct NewTrick {
  template<class T>
  T* operator* (T* Ptr) {
    RegisterAllocation(Ptr, &TYPE_IDENTIFIER<T>::mDummy);
    return Ptr;
  }
};
• address of mDummy can be looked up in map file to get real type name T
• A* a = new A(1,2) -> A* a = new NewTrick() * new A(1,2)
• A* NewTrick::operator*<A>(A*) gets called
Implementation (5)
Pitfalls (1)
• A* a = new A[5];
  // calls ::operator new(sizeof(size_t) + 5 * sizeof(A))
                        a

               5    A   A   A   A   A

               result of ::operator new
• Only if A has non-trivial destructor
  Cannot be detected with current language elements
  platform independently
• Actual allocations are traced in ::operator new
  Types are registered afterwards in NewTrick
  We assumed the pointers are the same.
•  A* a = New_<A>(5);
Implementation (6)
Pitfalls (2)
• New form of new[] calls: A* a = New_<A>(5);
   Since new[] is still valid, we have to force the new form
• Extra parameter to new  we are no longer compatible
   with placement new
•  we have to disable our macros for that time:
                         #include "undef_new.h"
                         new (Mem)
                or       #include "define_new.h"
                            std::auto_ptr<int>(new int(5));
#pragma push_macro("new")
#undef new
new (Mem)
#pragma pop_macro("new")
   std::auto_ptr<int>(new int(5));
Outline

• Motivation

• Implementation

• Case study

• Summary
Case study (1)
• Notepad++ text editor
   – Free, open source
   – Based on Scintilla text editor widget
   – 154 kLOC
   – Unknown code, from scratch
   – Goal1: Test integration overhead
   – Goal2: comprehend memory behavior
      • Use case: loading a big file
• Integration issues
   – Uses STL
      • Custom allocator
   – instrument both Scintilla and Notepad++
• ~1 net work day
Case study (2)
• Timeline view of loading   • Reallocation structure
  a big file                   becomes recognizable
• The big players            • The green part vanishes
   –   SplitVector<char>        – Why?
   –   char
   –   SplitVector<int>
   –   other
Case study (3)
• Green: char
   – Call stack -> it is the undo history
   – Lines are appended one by one
      • A corresponding undo entry gets created for each
• Why vanishes
   – Call stack -> undo history is cleared after load
   – Gotcha: undo history is being updated unnecessarily
   – Quick fix: not to do that during load
Case study (4)

• Before fix




• After fix
Outline

• Motivation

• Implementation

• Case study

• Summary
Summary
• Advantages
   –   Type is available in the profiler
   –   Low footprint
   –   Easy integration
   –   Multiplatform
   –   Scalable
• Disadvantages
   – Intrusive
   – Does not mix well with class-specific operator new
     overriding
• Pinpointed a performance bug in Notepad++
Q&A

         Thank you for your attention!



                Any questions?



Feel free to contact me at jmihalicza@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
 
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsIntroducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsKamiya Toshihiro
 
Consuming and Creating Libraries in C++
Consuming and Creating Libraries in C++Consuming and Creating Libraries in C++
Consuming and Creating Libraries in C++Richard Thomson
 
Spotting Working Code Examples (ICSE 2014)
Spotting Working Code Examples (ICSE 2014)Spotting Working Code Examples (ICSE 2014)
Spotting Working Code Examples (ICSE 2014)imanmahsa
 
(Kpi summer school 2015) theano tutorial part1
(Kpi summer school 2015) theano tutorial part1(Kpi summer school 2015) theano tutorial part1
(Kpi summer school 2015) theano tutorial part1Serhii Havrylov
 
stacks and queues class 12 in c++
stacks and  queues class 12 in c++stacks and  queues class 12 in c++
stacks and queues class 12 in c++Khushal Mehta
 
Intro to Java for C++ Developers
Intro to Java for C++ DevelopersIntro to Java for C++ Developers
Intro to Java for C++ DevelopersZachary Blair
 
Lisp for Python Programmers
Lisp for Python ProgrammersLisp for Python Programmers
Lisp for Python ProgrammersVsevolod Dyomkin
 
LeFlowを調べてみました
LeFlowを調べてみましたLeFlowを調べてみました
LeFlowを調べてみましたMr. Vengineer
 
Garbage Collection
Garbage CollectionGarbage Collection
Garbage CollectionEelco Visser
 
TI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesTI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesEelco Visser
 
XPath - A practical guide
XPath - A practical guideXPath - A practical guide
XPath - A practical guideTobias Schlitt
 
Programming with Python - Basic
Programming with Python - BasicProgramming with Python - Basic
Programming with Python - BasicMosky Liu
 
Java collections the force awakens
Java collections  the force awakensJava collections  the force awakens
Java collections the force awakensRichardWarburton
 
Compilers Are Databases
Compilers Are DatabasesCompilers Are Databases
Compilers Are DatabasesMartin Odersky
 
How to write a TableGen backend
How to write a TableGen backendHow to write a TableGen backend
How to write a TableGen backendMin-Yih Hsu
 
C++ Generators and Property-based Testing
C++ Generators and Property-based TestingC++ Generators and Property-based Testing
C++ Generators and Property-based TestingSumant Tambe
 

Was ist angesagt? (20)

Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsIntroducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
 
Consuming and Creating Libraries in C++
Consuming and Creating Libraries in C++Consuming and Creating Libraries in C++
Consuming and Creating Libraries in C++
 
Spotting Working Code Examples (ICSE 2014)
Spotting Working Code Examples (ICSE 2014)Spotting Working Code Examples (ICSE 2014)
Spotting Working Code Examples (ICSE 2014)
 
(Kpi summer school 2015) theano tutorial part1
(Kpi summer school 2015) theano tutorial part1(Kpi summer school 2015) theano tutorial part1
(Kpi summer school 2015) theano tutorial part1
 
Gcrc talk
Gcrc talkGcrc talk
Gcrc talk
 
stacks and queues class 12 in c++
stacks and  queues class 12 in c++stacks and  queues class 12 in c++
stacks and queues class 12 in c++
 
Intro to Java for C++ Developers
Intro to Java for C++ DevelopersIntro to Java for C++ Developers
Intro to Java for C++ Developers
 
Lisp for Python Programmers
Lisp for Python ProgrammersLisp for Python Programmers
Lisp for Python Programmers
 
LeFlowを調べてみました
LeFlowを調べてみましたLeFlowを調べてみました
LeFlowを調べてみました
 
Garbage Collection
Garbage CollectionGarbage Collection
Garbage Collection
 
TI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesTI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific Languages
 
XPath - A practical guide
XPath - A practical guideXPath - A practical guide
XPath - A practical guide
 
Programming with Python - Basic
Programming with Python - BasicProgramming with Python - Basic
Programming with Python - Basic
 
Java collections the force awakens
Java collections  the force awakensJava collections  the force awakens
Java collections the force awakens
 
Compilers Are Databases
Compilers Are DatabasesCompilers Are Databases
Compilers Are Databases
 
How to write a TableGen backend
How to write a TableGen backendHow to write a TableGen backend
How to write a TableGen backend
 
Monte Carlo C++
Monte Carlo C++Monte Carlo C++
Monte Carlo C++
 
Sync considered unethical
Sync considered unethicalSync considered unethical
Sync considered unethical
 
C++ Generators and Property-based Testing
C++ Generators and Property-based TestingC++ Generators and Property-based Testing
C++ Generators and Property-based Testing
 

Andere mochten auch

ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical DebtICSM 2011
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects ICSM 2011
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsICSM 2011
 
ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild ICSM 2011
 
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...ICSM 2011
 
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...ICSM 2011
 
Components - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsComponents - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsICSM 2011
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...ICSM 2011
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskICSM 2011
 
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationImpact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationICSM 2011
 
Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11ICSM 2011
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteICSM 2011
 
Metrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarMetrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarICSM 2011
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...ICSM 2011
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...ICSM 2011
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...ICSM 2011
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...ICSM 2011
 
Postdoc Symposium - Abram Hindle
Postdoc Symposium - Abram HindlePostdoc Symposium - Abram Hindle
Postdoc Symposium - Abram HindleICSM 2011
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...ICSM 2011
 
Faults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussionsFaults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussionsICSM 2011
 

Andere mochten auch (20)

ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical Debt
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming Conventions
 
ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild
 
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
 
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
 
Components - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsComponents - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API Limitations
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to Task
 
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationImpact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
 
Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 Keynote
 
Metrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarMetrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliar
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...
 
Postdoc Symposium - Abram Hindle
Postdoc Symposium - Abram HindlePostdoc Symposium - Abram Hindle
Postdoc Symposium - Abram Hindle
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
 
Faults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussionsFaults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussions
 

Ähnlich wie Industry - Program analysis and verification - Type-preserving Heap Profiler for C++

Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbZhangZhengming
 
lecture02-cpp.ppt
lecture02-cpp.pptlecture02-cpp.ppt
lecture02-cpp.pptDevliNeeraj
 
UsingCPP_for_Artist.ppt
UsingCPP_for_Artist.pptUsingCPP_for_Artist.ppt
UsingCPP_for_Artist.pptvinu28455
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploitTiago Henriques
 
A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010Tsukasa Oi
 
これからのPerlプロダクトのかたち(YAPC::Asia 2013)
これからのPerlプロダクトのかたち(YAPC::Asia 2013)これからのPerlプロダクトのかたち(YAPC::Asia 2013)
これからのPerlプロダクトのかたち(YAPC::Asia 2013)goccy
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceAntonio García-Domínguez
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPrashant Rane
 
Skillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet appSkillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet appSkillwise Group
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Reproducibility and automation of machine learning process
Reproducibility and automation of machine learning processReproducibility and automation of machine learning process
Reproducibility and automation of machine learning processDenis Dus
 
Python introduction
Python introductionPython introduction
Python introductionRoger Xia
 

Ähnlich wie Industry - Program analysis and verification - Type-preserving Heap Profiler for C++ (20)

Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
lecture02-cpp.ppt
lecture02-cpp.pptlecture02-cpp.ppt
lecture02-cpp.ppt
 
lecture02-cpp.ppt
lecture02-cpp.pptlecture02-cpp.ppt
lecture02-cpp.ppt
 
lecture02-cpp.ppt
lecture02-cpp.pptlecture02-cpp.ppt
lecture02-cpp.ppt
 
lecture02-cpp.ppt
lecture02-cpp.pptlecture02-cpp.ppt
lecture02-cpp.ppt
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
UsingCPP_for_Artist.ppt
UsingCPP_for_Artist.pptUsingCPP_for_Artist.ppt
UsingCPP_for_Artist.ppt
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploit
 
A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010
 
これからのPerlプロダクトのかたち(YAPC::Asia 2013)
これからのPerlプロダクトのかたち(YAPC::Asia 2013)これからのPerlプロダクトのかたち(YAPC::Asia 2013)
これからのPerlプロダクトのかたち(YAPC::Asia 2013)
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
Skillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet appSkillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet app
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Reproducibility and automation of machine learning process
Reproducibility and automation of machine learning processReproducibility and automation of machine learning process
Reproducibility and automation of machine learning process
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
 
lecture02-cpp.ppt
lecture02-cpp.pptlecture02-cpp.ppt
lecture02-cpp.ppt
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Python introduction
Python introductionPython introduction
Python introduction
 

Kürzlich hochgeladen

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Industry - Program analysis and verification - Type-preserving Heap Profiler for C++

  • 1. Type-preserving heap profiler for C++ József Mihalicza Zoltán Porkoláb Ábel Gábor Eötvös Loránd University NNG Budapest, Hungary, 2011. Supported by the European Union Co-financed by the European Social Fund grant agreement no. TÁMOP 4.2.1./B-09/1/KMR-2010-0003
  • 4. Motivation • NNG: navigation software on handheld devices – How much RAM is needed? What if I do not ask feature A, B, C, will it fit into 32MB? • Need for detailed memory usage profile per feature – We already have call stacks in heap profilers • Difficult to map to feature – Source location • Better, but not granulated enough – Type • Class types are in direct connection with features
  • 5. Motivation (2) • Natural clusterisation • Quick overview of the big players • Optimisation clues – Store my bool members in a bitfield? – How many instances of your class exist in a typical run? • Languages with more introspection already have this – Haskell, Objective C, Java, C# http://book.realworldhaskell.org/read/profiling-and-optimization.html
  • 7. Implementation (1) • post build techniques – sampling profiler: captures call stacks – instrumentation: small code fragments injected – types have already been lost by this point • we only see calls to memory allocation primitives (malloc, HeapAlloc etc.) along with the size parameter in bytes • during build – types are available at compile time – we should choose this one – compiler support for code instrumentation • custom enter and exit functions called at each function call • build a calling context tree: each node represents a call stack (route from root) • maintain actual call stack per thread
  • 8. Implementation (2) • so far we have call stacks • monitor (de)allocations – hook allocation routines • operator new • malloc (typeless entirely) – maintain a list of currently allocated blocks, with metainfo • hashmap: pointer -> metainfo • metainfo: size, alloc/free time, alloc/free call stack (captured in situ), pointer to call tree node • Add type to metainfo: not available in the hooks – void* operator new (size_t); – collect metainfo of deallocated blocks in a fixed size buffer, dump to file when full • have full history of heap behavior
  • 9. Implementation (3) • where is type lost? • std::pair<int,int>* a = new std::pair<int,int>(3,4) static type: std::pair<int,int>* – void* operator new (size_t); static type: void* – ctor/dtor: does not necessarily exist – the only proper hook point: new keyword at call site  Macro • initial approaches: – New((std::pair<int,int>),(3,4)) – New((std::pair<int,int>)(3,4)) – New(new std::pair<int,int>(3,4)) – new std::pair<int,int>(3,4)) • final solution: – std::pair<int,int>* a = new std::pair<int,int>(3,4)
  • 10. Implementation (4) • how does it work, what happens underneath? #define new NewTrick() * new template<class T> struct TYPE_IDENTIFIER { static char mDummy; }; struct NewTrick { template<class T> T* operator* (T* Ptr) { RegisterAllocation(Ptr, &TYPE_IDENTIFIER<T>::mDummy); return Ptr; } }; • address of mDummy can be looked up in map file to get real type name T • A* a = new A(1,2) -> A* a = new NewTrick() * new A(1,2) • A* NewTrick::operator*<A>(A*) gets called
  • 11. Implementation (5) Pitfalls (1) • A* a = new A[5]; // calls ::operator new(sizeof(size_t) + 5 * sizeof(A)) a 5 A A A A A result of ::operator new • Only if A has non-trivial destructor Cannot be detected with current language elements platform independently • Actual allocations are traced in ::operator new Types are registered afterwards in NewTrick We assumed the pointers are the same. •  A* a = New_<A>(5);
  • 12. Implementation (6) Pitfalls (2) • New form of new[] calls: A* a = New_<A>(5); Since new[] is still valid, we have to force the new form • Extra parameter to new  we are no longer compatible with placement new •  we have to disable our macros for that time: #include "undef_new.h" new (Mem) or #include "define_new.h" std::auto_ptr<int>(new int(5)); #pragma push_macro("new") #undef new new (Mem) #pragma pop_macro("new") std::auto_ptr<int>(new int(5));
  • 14. Case study (1) • Notepad++ text editor – Free, open source – Based on Scintilla text editor widget – 154 kLOC – Unknown code, from scratch – Goal1: Test integration overhead – Goal2: comprehend memory behavior • Use case: loading a big file • Integration issues – Uses STL • Custom allocator – instrument both Scintilla and Notepad++ • ~1 net work day
  • 15. Case study (2) • Timeline view of loading • Reallocation structure a big file becomes recognizable • The big players • The green part vanishes – SplitVector<char> – Why? – char – SplitVector<int> – other
  • 16. Case study (3) • Green: char – Call stack -> it is the undo history – Lines are appended one by one • A corresponding undo entry gets created for each • Why vanishes – Call stack -> undo history is cleared after load – Gotcha: undo history is being updated unnecessarily – Quick fix: not to do that during load
  • 17. Case study (4) • Before fix • After fix
  • 19. Summary • Advantages – Type is available in the profiler – Low footprint – Easy integration – Multiplatform – Scalable • Disadvantages – Intrusive – Does not mix well with class-specific operator new overriding • Pinpointed a performance bug in Notepad++
  • 20. Q&A Thank you for your attention! Any questions? Feel free to contact me at jmihalicza@gmail.com