The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Unifi
1. Automatic Dimension Inference and
Checking for Object-Oriented Programs
Sudheendra Hangal
Monica S. Lam
http://suif.stanford.edu/unifi
International Conference on Software Engineering
Vancouver, Canada
May 20th, 2009
2. Overview
• A fully automatic dimension inference system
for Java programs
• Diff-based method to detect dimension errors
• Case-study on a 19KLoC program
• UniFi: Usable open-source tool
3. Dimensionality Checking
Used by physicists
(and high-schoolers)
E.g.
E = m * c;
[M x L2 x T-2 ] vs. [M x L x T-1]
Doesn’t “type check”!
4. Dimensions are Everywhere
• Program values have dimensions like
id, $, date, port, color, flag, state, mask, count,
message, filename, property, …
and of course, mass, length, time, etc.
• We focus on primitive types and strings
– Hard to define custom types for everything
– No benefit of type-checking
7. Observation
• Programmers use suffixes to capture
dimensions
int proxyPort, backgroundColor;
long startTimeMillis, eventMask;
String inputFilename, serverURL;
8. Putting Dimensions to Work
How do we get the benefit of dimension
checking in mainstream languages ?
2 ideas:
1) Detect (likely) errors automatically by
diff’ing dimension usage between programs
2) Bootstrap from standard libraries
9. UniFi’s Core Idea
• Infer dimensions of variables automatically
– Static analysis, type inference techniques
– Standard Java programs, zero annotation burden
• Optional: Examine results
• Compare inferred dimensions across two
programs that have something in common
10. Results 1
UniFi
Program 1 Inference
Diffs
UniFi
Diff
…
UniFi UniFi
Program 2 Inference GUI
Results 2
11. Use Cases
• Report changes as the same code evolves
– Nightly builds
– During program maintenance
• Compare against a different configuration
– Different programs using the same library
– 2 different implementations of an interface
– Implementation of a library v/s program using it
– Different programmers’ code
12. Inference Algorithm
• Input: Java program
• Assigns dimensions to variables
– Initially independent
• Set up constraints between dim. vars
• Solve constraints
• Output: a set of relations between dimension
variables
14. Inference Example (2)
int f(x) { return x * x; }
a1 = f(a); a1 a * a
b1 = f(b); b1 b * b
• Context sensitive analysis
– Uses method summaries
15. OO Constraints
• Subtypes retain supertype interface
– Liskov Substitution Principle
• Constrains dimensions of parameters and
return value of subtype methods
class A { int m( int x ) { … } }
class B extends A { int m( int x ) { … } }
16. Multiply/Divide Constraints
• Linear equation style expressions for multiply
and divide
– Special handling of java.math libraries
• Solved using Gaussian elimination style
algorithm
17. Comparing Inferred Dimensions
• Identify common variables
– Same name of field, position of method param,
etc.
• Compare equivalence classes formed by
unification constraints
• Compare Multiply-divide constraints
– Need canonical formulas for dvars
– Make common variables more “stable” than
others
– See paper for details
18.
19. Case Study: bddbddb
http://sourceforge.net/projects/bddbddb
• Retroactively run over 10 months of active
development
– Oct. 2004 to July 2005, 292 builds
– Approx. 19,000 lines of Java code
• Compared successive nightly builds
20. Results
• 26 reports, across 19 pairs of builds
• 5 real errors (+ fixes)
• False Positives
– Trivial reasons like field not used
– Probably easy to reduce number
21. Bug Example
double NO_CLASS = …; // default class id
double NO_CLASS_SCORE = …; // default score
…
double vScore=NO_CLASS, aScore=NO_CLASS;
double vClass=NO_CLASS, aClass=NO_CLASS;
• UniFi detected that independent dimensions
NO_CLASS and NO_CLASS_SCORE merged
22. Inference Example
double[] distribution = new double[numClasses];
... // compute sum
... // initialize distribution array
for (int i=0; i < NUM_TREES; ++i)
distribution[i] /= sum;
numClasses
distribution.length However: not caught
i since this was in new
NUM_TREES code!
23. Experiences
• Sometimes bugs indicated by removal of
unification constraint (“error of omission”)
• Dimensionally inconsistent code
– Ignore hashCode(), compareTo()
– Cannot interpret semantically
24. Experiences
• Types of Errors: Sometimes can be difficult to
root-cause
• Dimensions vs. Units
– May not catch wrong scaling factor…
…but might catch the absence of one (?)
25. Future Work
• Explore use-cases for UniFi in the wild
• “S.I. Units” for platform libraries
– Using JSR-308 for Java w/understandable names
• An intriguing possibility: Dimension inference
for hardware languages like Verilog
26. Related Work
• Osprey (Jiang and Su, ICSE ‘06)
• XeLda (Antoniu et al, ICSE '04)
• Type qualifiers (Foster et al, PLDI '99)
• Lackwit (O’Callahan and Jackson, ICSE '97)
• Fortress (Allen et al, OOPSLA '04)
27. Conclusions
• UniFi is the first dimension inference system
for standard Java programs
– for automatically detecting bugs
– for bootstrapping use of dimensions via libraries
– Many uses waiting to be explored
Open sourced and available from:
http://suif.stanford.edu/unifi
Users and collaborators welcome
28. Automatic Dimension Inference and
Checking for Object-Oriented Programs
Sudheendra Hangal
Monica S. Lam
http://suif.stanford.edu/unifi
International Conference on Software Engineering
Vancouver, Canada
May 20th, 2009
30. Bug Example
double[] distribution = new double[numClasses];
... // compute sum and initialize
... // distribution array
for (int i=0; i < NUM_TREES; ++i)
distribution[i] /= sum;
numClasses
distribution.length However: not caught
i
NUM_TREES since this was in new
code!
31. Dimension Variables
Assign dimension variables (dvars) to
• Fields
• Interfaces: Method Parameters, Return values
• Array elements and lengths
• Local Variables
• Constants
• Result of Multiply/Divide Operations
• Primitive types only
32. Mechanics
• Bytecode based static analysis
• Scripts to monitor a CVS/SVN repository and
generate diffs
• GUI to view inference results, correlated with
unification points in source code.
Editor's Notes
Dimensionality checking is a simple way of checking physics equations for consistency.Even if Prof. Einstein came up and told you that Energy = mass times the velocity of light,You could tell him he was wrong because the independent physical dimensions on both sides don’t match up.In software parlance, you could say it doesn’t “type check”.
Now programs operate on values which have dimensions not just in the scientific or physical sense. Regular applications manipulate values with dimensions like employee ID, network port, calendar year, a filename, a hostname, a street address and so on.In our work, we focus on primitive types and strings, and I argue that many of the actual values a program computes with are of these types, a lot of the rest is scaffolding to hold these values together. For example, your database is comprised of values of these types.anga
Most of these variables have their own space of values, and that’s what the colours are intended to represent.
I’ll be using color as a proxy to delineate different dimensions throughout this presentation.
How do we get the benefit of dimension checking in mainstream languagesWithout special languages or programmer annotations, and for say IT applications and not scientific code.
Instantly adaptable to java.
Could be that program 1 is wrong or program 2 is wrong.Or could be that they’re both fine.What’s a notion of dimensions and inference algorithm that will capture as many real bugs as possible.
An example of a case where this might work.E = m * c^2 yesterdayE = m * c today.Should note that we’ve not explored second area much.
Notice that we don’t necessarily have human understandable names.UniFication constraints based on assignment, comparison, add/subtract, array indexing (implicit comparison with length), method invocation
Our early examples lost precision due to not having context sensitivity
Add example.
GUI to view inference resultsWe also have scripts to monitor a CVS/SVN repository and run Unifi at regular intervals anddiff the results of successive runs.
When we started, we weren’t sure what to expect – would these equivalence classes be merging all the time due to legal program changes ? Would interesting errors show up as changes in dimensional relationships at all ?
Like mass, length, time in physics, will be wonderful if a platform comes with a set of base units
UniFi is first system for Java/OO features and is completely automatic.
Please use it, play around with it, give us feedback, extend it, build upon it, do whatever you want.