This work was presented at SOAP 2019 co-located with PLDI 2019 in Phoenix, Arizona. It is joint work of Philipp Schubert, Richard Leer, Ben Hermann, and Eric Bodden.
Know Your Analysis: How Instrumentation Aids Static Analysis
1. PHILIPP SCHUBERT, RICHARD LEER,
BEN HERMANN, AND ERIC BODDEN
STATIC ANALYSIS
AIDS UNDERSTANDING
HOW INSTRUMENTATION
KNOW YOUR ANALYSIS:
2. Static analysis development
Ø Frameworks
Ø And many more …
Ø Type hierarchy struct Other : Super { void f() override {}};
Ø Pointer Super *S = new Other;
Ø Call-graph S->f();
Ø What we are interested in here: Data-flow
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
2
3. Encoding a Data-Flow Analysis
3
KNOW YOUR ANALYSIS
A. Rules
B. Flow functions
Ø Complex and demanding task
Ø Specify parameters
Ø Precision vs scalability
Ø Solving the data-flow problem
Ø e.g. imperative framework
Ø Call-string approach
Ø Summary approach
Ø e.g. IFDS / IDE / WPDS
Ø Multiple steps involved
Ø Test cases / analysis code are developed
incrementally
Implement Evaluate Optimize
Concrete
Analysis
Framework
UNDERSTANDING STATIC ANALYSIS
4. Focus
4
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
Static Program Analysis of
Programs written in C/C++
statement s
Λ 𝑑#$ 𝑑#% … 𝑑#'
Λ 𝑑($ 𝑑(% … 𝑑()
(𝜆𝑥. 𝑥)
Analysis encoding in IFDS/IDE style
5. What makes debugging these analyses hard?
5
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
Ø Complex algorithms and concepts
Ø Data-flow solvers / engines
Ø Framework parametrization
Ø Real-world target programs
Ø Intermediate representation(s)
Ø OS’s default limits
Ø Interactions of all of the above
Ø Standard debugging techniques not
sufficient
Ø Debugger à large portions of
framework code
Ø Logger à huge log files
Ø Dynamic analysis à expensive
Ø Testing à outlandish corner
cases
6. PerformAnce Measurement Mechanism (PAMM)
6
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
Ø Ready-to-use mechanism
Ø Timer, Counter, Histograms
Ø Implemented as a singleton
Ø Registration of arbitrary measures
Ø Instrumentation wrapped in macros
Ø Multiple measures can be grouped
Ø Variable amount of details
Ø Export reports in JSON
Ø Visualize using Python Pandas
void foo() {
PAMM_FACTORY;
REG_HISTOGRAM("MyHist");
std::set<int> res = compute();
ADD_TO_HIST("MyHist",
res.size(), 1);
}
void bar() {
PAMM_FACTORY;
while(...) {
START_TIMER("MyTimer");
// Code to be timed
PAUSE_TIMER("MyTimer");
}
}
Fully instrumented
framework
Instrumentable
analysis
7. Narrowing the bug(s)
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
7
Ø Run PhASAR on Coreutils
Ø Runs > 20k call-sites likely to fail
Ø Runs > 240k instructions fail
Ø Recursive IFDS/IDE solver
implementation
Ø OS stack limit problematic
Ø Increasing stack fixed most
failures
Ø Remaining runs could be
debugged using standard
debugger
8. Finding anomalies
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
8
Ø # of data-flow facts propagated
through the program (ESG edges)
Ø Outliers sets
Ø Generating several tens/hundreds
of facts is unreasonable
Ø “Over-tainting” in one flow function
Ø Generated all context-insensitive
aliases at a store instruction
9. Optimizing for container types
Ø Most sets are small
Ø Use more compact set
implementation like
boost::flatset
Ø Not relevant
Ø More copying or
accessing would be
required
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
9
10. Using shared pointers for memory allocation
Ø std::shared_ptr
Ø How bad are they?
Ø Difference is noticeable
Ø Introduction of a manager class
(owner)
Ø Manager hands out raw-pointers
Ø Cleans up at end of life time
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
10
11. Revealing the runtime distribution
Ø Precise runtime distribution
Ø What to sacrifice / optimize?
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
11
1000 s = 16.67 min
12. Evaluating the data-flow domain
Ø Summary reuse
Ø Indicator for
domain quality
12
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
13. How can we help analysis / framework developers?
Ø Combine debugging techniques
Ø Debugger, logger, runtime analysis, instrumentation, data visualization
Ø Do not burden developers with yet more work
Ø We need ready-to-use mechanisms / techniques
Ø Instrument and visualize what happens in an analysis run
Ø Spot anomalies and implausible figures
Ø Evaluate analysis performance
Ø VisuFlow static analysis debugger
Ø Separation of concerns
Ø What is your experience in debugging static analysis?
13
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
http://lisanqd.com/wp-content/uploads/2018/02/icse18demo.pdf
http://www.thewhitespace.de/publications/lh15-design.pdf
14. Using PhASAR and PAMM
Ø PhASAR is open-source (MIT license)
Ø Find us on
Ø https://phasar.org
Ø Get in touch to discuss ideas and problems
Ø Use it, report bugs / issues, help us improve PhASAR
philipp.schubert@upb.de
ben.hermann@upb.de
14
KNOW YOUR ANALYSIS
UNDERSTANDING STATIC ANALYSIS
@phasarframework