SlideShare a Scribd company logo
1 of 32
Automatic Dimension Inference and
Checking for Object-Oriented Programs

             Sudheendra Hangal
               Monica S. Lam

           http://suif.stanford.edu/unifi

  International Conference on Software Engineering
                  Vancouver, Canada
                    May 20th, 2009
Overview
• A fully automatic dimension inference system
  for Java programs
• Diff-based method to detect dimension errors
• Case-study on a 19KLoC program

• UniFi: Usable open-source tool
Dimensionality Checking
          Used by physicists
          (and high-schoolers)

          E.g.
                     E = m * c;

          [M x L2 x T-2 ] vs. [M x L x T-1]
          Doesn’t “type check”!
Dimensions are Everywhere
• Program values have dimensions like
  id, $, date, port, color, flag, state, mask, count,
  message, filename, property, …
  and of course, mass, length, time, etc.

• We focus on primitive types and strings
   – Hard to define custom types for everything
   – No benefit of type-checking
Programmer View
java.awt.MouseWheelEvent

public MouseWheelEvent(Component source,
  int id, long when, int modifiers,
  int x, int y, int clickCount,
  boolean popupTrigger,
  int scrollType, int scrollAmount,
  int wheelRotation);
Type-checker View
java.awt.MouseWheelEvent

public MouseWheelEvent(Component source,
  int id, long when, int modifiers,
  int x, int y, int clickCount,
  boolean popupTrigger,
  int scrollType, int scrollAmount,
  int wheelRotation);
Observation
• Programmers use suffixes to capture
  dimensions

  int proxyPort, backgroundColor;
  long startTimeMillis, eventMask;
  String inputFilename, serverURL;
Putting Dimensions to Work
 How do we get the benefit of dimension
 checking in mainstream languages ?

2 ideas:
1) Detect (likely) errors automatically by
diff’ing dimension usage between programs
2) Bootstrap from standard libraries
UniFi’s Core Idea
• Infer dimensions of variables automatically
  – Static analysis, type inference techniques
  – Standard Java programs, zero annotation burden


• Optional: Examine results

• Compare inferred dimensions across two
  programs that have something in common
Results 1

              UniFi
Program 1   Inference
                                    Diffs

                         UniFi
                         Diff




                                      …
              UniFi                  UniFi
Program 2   Inference                GUI

                        Results 2
Use Cases
• Report changes as the same code evolves
  – Nightly builds
  – During program maintenance
• Compare against a different configuration
  – Different programs using the same library
  – 2 different implementations of an interface
  – Implementation of a library v/s program using it
  – Different programmers’ code
Inference Algorithm
• Input: Java program
• Assigns dimensions to variables
  – Initially independent
• Set up constraints between dim. vars
• Solve constraints
• Output: a set of relations between dimension
  variables
Inference Example (1)
x = y + z        x   y       z


a < b            a   b


d[i]             i   d.length


u = v * w        u       v * w
Inference Example (2)
  int f(x) { return x * x; }

  a1 = f(a);         a1        a * a


  b1 = f(b);         b1        b * b


• Context sensitive analysis
  – Uses method summaries
OO Constraints
• Subtypes retain supertype interface
  – Liskov Substitution Principle
• Constrains dimensions of parameters and
  return value of subtype methods

  class A           { int m( int x ) { … } }
  class B extends A { int m( int x ) { … } }
Multiply/Divide Constraints
• Linear equation style expressions for multiply
  and divide
  – Special handling of java.math libraries
• Solved using Gaussian elimination style
  algorithm
Comparing Inferred Dimensions
• Identify common variables
  – Same name of field, position of method param,
    etc.
• Compare equivalence classes formed by
  unification constraints
• Compare Multiply-divide constraints
  – Need canonical formulas for dvars
  – Make common variables more “stable” than
    others
  – See paper for details
Case Study: bddbddb
  http://sourceforge.net/projects/bddbddb

• Retroactively run over 10 months of active
  development
  – Oct. 2004 to July 2005, 292 builds
  – Approx. 19,000 lines of Java code


• Compared successive nightly builds
Results
• 26 reports, across 19 pairs of builds
• 5 real errors (+ fixes)

• False Positives
  – Trivial reasons like field not used
  – Probably easy to reduce number
Bug Example


  double   NO_CLASS = …; // default class id
  double   NO_CLASS_SCORE = …; // default score
  …
  double   vScore=NO_CLASS, aScore=NO_CLASS;
  double   vClass=NO_CLASS, aClass=NO_CLASS;




• UniFi detected that independent dimensions
  NO_CLASS and NO_CLASS_SCORE merged
Inference Example
double[] distribution = new double[numClasses];
... // compute sum
... // initialize distribution array

for (int i=0; i < NUM_TREES; ++i)
    distribution[i] /= sum;




    numClasses
distribution.length           However: not caught
         i                    since this was in new
     NUM_TREES                code!
Experiences
• Sometimes bugs indicated by removal of
  unification constraint (“error of omission”)

• Dimensionally inconsistent code
  – Ignore hashCode(), compareTo()
  – Cannot interpret semantically
Experiences
• Types of Errors: Sometimes can be difficult to
  root-cause

• Dimensions vs. Units
  – May not catch wrong scaling factor…
  …but might catch the absence of one (?)
Future Work
• Explore use-cases for UniFi in the wild

• “S.I. Units” for platform libraries
   – Using JSR-308 for Java w/understandable names


• An intriguing possibility: Dimension inference
  for hardware languages like Verilog
Related Work
•   Osprey (Jiang and Su, ICSE ‘06)
•   XeLda (Antoniu et al, ICSE '04)
•   Type qualifiers (Foster et al, PLDI '99)
•   Lackwit (O’Callahan and Jackson, ICSE '97)
•   Fortress (Allen et al, OOPSLA '04)
Conclusions
• UniFi is the first dimension inference system
  for standard Java programs
  – for automatically detecting bugs
  – for bootstrapping use of dimensions via libraries
  – Many uses waiting to be explored
Open sourced and available from:
http://suif.stanford.edu/unifi
Users and collaborators welcome
Automatic Dimension Inference and
Checking for Object-Oriented Programs

             Sudheendra Hangal
               Monica S. Lam

           http://suif.stanford.edu/unifi

  International Conference on Software Engineering
                  Vancouver, Canada
                    May 20th, 2009
Backup slides
Bug Example
double[] distribution = new double[numClasses];
... // compute sum and initialize
... // distribution array

for (int i=0; i < NUM_TREES; ++i)
  distribution[i] /= sum;




       numClasses
   distribution.length       However: not caught
            i
        NUM_TREES            since this was in new
                             code!
Dimension Variables
Assign dimension variables (dvars) to
•   Fields
•   Interfaces: Method Parameters, Return values
•   Array elements and lengths
•   Local Variables
•   Constants
•   Result of Multiply/Divide Operations
•   Primitive types only
Mechanics
• Bytecode based static analysis

• Scripts to monitor a CVS/SVN repository and
  generate diffs

• GUI to view inference results, correlated with
  unification points in source code.

More Related Content

Similar to Unifi

Dependency Injection in .NET applications
Dependency Injection in .NET applicationsDependency Injection in .NET applications
Dependency Injection in .NET applicationsBabak Naffas
 
PROGRAMMING USING C#.NET SARASWATHI RAMALINGAM
PROGRAMMING USING C#.NET SARASWATHI RAMALINGAMPROGRAMMING USING C#.NET SARASWATHI RAMALINGAM
PROGRAMMING USING C#.NET SARASWATHI RAMALINGAMSaraswathiRamalingam
 
Mobile Weekend Budapest presentation
Mobile Weekend Budapest presentationMobile Weekend Budapest presentation
Mobile Weekend Budapest presentationPéter Ádám Wiesner
 
Close encounters in MDD: when Models meet Code
Close encounters in MDD: when Models meet CodeClose encounters in MDD: when Models meet Code
Close encounters in MDD: when Models meet Codelbergmans
 
Close Encounters in MDD: when models meet code
Close Encounters in MDD: when models meet codeClose Encounters in MDD: when models meet code
Close Encounters in MDD: when models meet codelbergmans
 
Формальная верификация как средство тестирования (в Java)
Формальная верификация как средство тестирования (в Java)Формальная верификация как средство тестирования (в Java)
Формальная верификация как средство тестирования (в Java)SQALab
 
Generation of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksGeneration of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksMarkus Scheidgen
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersSeunghyun Hwang
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
 
Mining Fix Patterns for FindBugs Violations
Mining Fix Patterns for FindBugs ViolationsMining Fix Patterns for FindBugs Violations
Mining Fix Patterns for FindBugs ViolationsDongsun Kim
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerAndrey Karpov
 
GPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesMohamed BOUSSAA
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 

Similar to Unifi (20)

Dependency Injection in .NET applications
Dependency Injection in .NET applicationsDependency Injection in .NET applications
Dependency Injection in .NET applications
 
Core Java
Core JavaCore Java
Core Java
 
PROGRAMMING USING C#.NET SARASWATHI RAMALINGAM
PROGRAMMING USING C#.NET SARASWATHI RAMALINGAMPROGRAMMING USING C#.NET SARASWATHI RAMALINGAM
PROGRAMMING USING C#.NET SARASWATHI RAMALINGAM
 
Csharp dot net
Csharp dot netCsharp dot net
Csharp dot net
 
Computer Engineer Master Project
Computer Engineer Master ProjectComputer Engineer Master Project
Computer Engineer Master Project
 
Java vs .Net
Java vs .NetJava vs .Net
Java vs .Net
 
ASE02.ppt
ASE02.pptASE02.ppt
ASE02.ppt
 
Mobile Weekend Budapest presentation
Mobile Weekend Budapest presentationMobile Weekend Budapest presentation
Mobile Weekend Budapest presentation
 
Close encounters in MDD: when Models meet Code
Close encounters in MDD: when Models meet CodeClose encounters in MDD: when Models meet Code
Close encounters in MDD: when Models meet Code
 
Close Encounters in MDD: when models meet code
Close Encounters in MDD: when models meet codeClose Encounters in MDD: when models meet code
Close Encounters in MDD: when models meet code
 
Формальная верификация как средство тестирования (в Java)
Формальная верификация как средство тестирования (в Java)Формальная верификация как средство тестирования (в Java)
Формальная верификация как средство тестирования (в Java)
 
Generation of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksGeneration of Random EMF Models for Benchmarks
Generation of Random EMF Models for Benchmarks
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
 
Dacj 1-2 c
Dacj 1-2 cDacj 1-2 c
Dacj 1-2 c
 
Mining Fix Patterns for FindBugs Violations
Mining Fix Patterns for FindBugs ViolationsMining Fix Patterns for FindBugs Violations
Mining Fix Patterns for FindBugs Violations
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzer
 
GPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators Families
 
Surge2012
Surge2012Surge2012
Surge2012
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 

Recently uploaded

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Unifi

  • 1. Automatic Dimension Inference and Checking for Object-Oriented Programs Sudheendra Hangal Monica S. Lam http://suif.stanford.edu/unifi International Conference on Software Engineering Vancouver, Canada May 20th, 2009
  • 2. Overview • A fully automatic dimension inference system for Java programs • Diff-based method to detect dimension errors • Case-study on a 19KLoC program • UniFi: Usable open-source tool
  • 3. Dimensionality Checking Used by physicists (and high-schoolers) E.g. E = m * c; [M x L2 x T-2 ] vs. [M x L x T-1] Doesn’t “type check”!
  • 4. Dimensions are Everywhere • Program values have dimensions like id, $, date, port, color, flag, state, mask, count, message, filename, property, … and of course, mass, length, time, etc. • We focus on primitive types and strings – Hard to define custom types for everything – No benefit of type-checking
  • 5. Programmer View java.awt.MouseWheelEvent public MouseWheelEvent(Component source, int id, long when, int modifiers, int x, int y, int clickCount, boolean popupTrigger, int scrollType, int scrollAmount, int wheelRotation);
  • 6. Type-checker View java.awt.MouseWheelEvent public MouseWheelEvent(Component source, int id, long when, int modifiers, int x, int y, int clickCount, boolean popupTrigger, int scrollType, int scrollAmount, int wheelRotation);
  • 7. Observation • Programmers use suffixes to capture dimensions int proxyPort, backgroundColor; long startTimeMillis, eventMask; String inputFilename, serverURL;
  • 8. Putting Dimensions to Work How do we get the benefit of dimension checking in mainstream languages ? 2 ideas: 1) Detect (likely) errors automatically by diff’ing dimension usage between programs 2) Bootstrap from standard libraries
  • 9. UniFi’s Core Idea • Infer dimensions of variables automatically – Static analysis, type inference techniques – Standard Java programs, zero annotation burden • Optional: Examine results • Compare inferred dimensions across two programs that have something in common
  • 10. Results 1 UniFi Program 1 Inference Diffs UniFi Diff … UniFi UniFi Program 2 Inference GUI Results 2
  • 11. Use Cases • Report changes as the same code evolves – Nightly builds – During program maintenance • Compare against a different configuration – Different programs using the same library – 2 different implementations of an interface – Implementation of a library v/s program using it – Different programmers’ code
  • 12. Inference Algorithm • Input: Java program • Assigns dimensions to variables – Initially independent • Set up constraints between dim. vars • Solve constraints • Output: a set of relations between dimension variables
  • 13. Inference Example (1) x = y + z x y z a < b a b d[i] i d.length u = v * w u v * w
  • 14. Inference Example (2) int f(x) { return x * x; } a1 = f(a); a1 a * a b1 = f(b); b1 b * b • Context sensitive analysis – Uses method summaries
  • 15. OO Constraints • Subtypes retain supertype interface – Liskov Substitution Principle • Constrains dimensions of parameters and return value of subtype methods class A { int m( int x ) { … } } class B extends A { int m( int x ) { … } }
  • 16. Multiply/Divide Constraints • Linear equation style expressions for multiply and divide – Special handling of java.math libraries • Solved using Gaussian elimination style algorithm
  • 17. Comparing Inferred Dimensions • Identify common variables – Same name of field, position of method param, etc. • Compare equivalence classes formed by unification constraints • Compare Multiply-divide constraints – Need canonical formulas for dvars – Make common variables more “stable” than others – See paper for details
  • 18.
  • 19. Case Study: bddbddb http://sourceforge.net/projects/bddbddb • Retroactively run over 10 months of active development – Oct. 2004 to July 2005, 292 builds – Approx. 19,000 lines of Java code • Compared successive nightly builds
  • 20. Results • 26 reports, across 19 pairs of builds • 5 real errors (+ fixes) • False Positives – Trivial reasons like field not used – Probably easy to reduce number
  • 21. Bug Example double NO_CLASS = …; // default class id double NO_CLASS_SCORE = …; // default score … double vScore=NO_CLASS, aScore=NO_CLASS; double vClass=NO_CLASS, aClass=NO_CLASS; • UniFi detected that independent dimensions NO_CLASS and NO_CLASS_SCORE merged
  • 22. Inference Example double[] distribution = new double[numClasses]; ... // compute sum ... // initialize distribution array for (int i=0; i < NUM_TREES; ++i) distribution[i] /= sum; numClasses distribution.length However: not caught i since this was in new NUM_TREES code!
  • 23. Experiences • Sometimes bugs indicated by removal of unification constraint (“error of omission”) • Dimensionally inconsistent code – Ignore hashCode(), compareTo() – Cannot interpret semantically
  • 24. Experiences • Types of Errors: Sometimes can be difficult to root-cause • Dimensions vs. Units – May not catch wrong scaling factor… …but might catch the absence of one (?)
  • 25. Future Work • Explore use-cases for UniFi in the wild • “S.I. Units” for platform libraries – Using JSR-308 for Java w/understandable names • An intriguing possibility: Dimension inference for hardware languages like Verilog
  • 26. Related Work • Osprey (Jiang and Su, ICSE ‘06) • XeLda (Antoniu et al, ICSE '04) • Type qualifiers (Foster et al, PLDI '99) • Lackwit (O’Callahan and Jackson, ICSE '97) • Fortress (Allen et al, OOPSLA '04)
  • 27. Conclusions • UniFi is the first dimension inference system for standard Java programs – for automatically detecting bugs – for bootstrapping use of dimensions via libraries – Many uses waiting to be explored Open sourced and available from: http://suif.stanford.edu/unifi Users and collaborators welcome
  • 28. Automatic Dimension Inference and Checking for Object-Oriented Programs Sudheendra Hangal Monica S. Lam http://suif.stanford.edu/unifi International Conference on Software Engineering Vancouver, Canada May 20th, 2009
  • 30. Bug Example double[] distribution = new double[numClasses]; ... // compute sum and initialize ... // distribution array for (int i=0; i < NUM_TREES; ++i) distribution[i] /= sum; numClasses distribution.length However: not caught i NUM_TREES since this was in new code!
  • 31. Dimension Variables Assign dimension variables (dvars) to • Fields • Interfaces: Method Parameters, Return values • Array elements and lengths • Local Variables • Constants • Result of Multiply/Divide Operations • Primitive types only
  • 32. Mechanics • Bytecode based static analysis • Scripts to monitor a CVS/SVN repository and generate diffs • GUI to view inference results, correlated with unification points in source code.

Editor's Notes

  1. Dimensionality checking is a simple way of checking physics equations for consistency.Even if Prof. Einstein came up and told you that Energy = mass times the velocity of light,You could tell him he was wrong because the independent physical dimensions on both sides don’t match up.In software parlance, you could say it doesn’t “type check”.
  2. Now programs operate on values which have dimensions not just in the scientific or physical sense. Regular applications manipulate values with dimensions like employee ID, network port, calendar year, a filename, a hostname, a street address and so on.In our work, we focus on primitive types and strings, and I argue that many of the actual values a program computes with are of these types, a lot of the rest is scaffolding to hold these values together. For example, your database is comprised of values of these types.anga
  3. Most of these variables have their own space of values, and that’s what the colours are intended to represent.
  4. I’ll be using color as a proxy to delineate different dimensions throughout this presentation.
  5. How do we get the benefit of dimension checking in mainstream languagesWithout special languages or programmer annotations, and for say IT applications and not scientific code.
  6. Instantly adaptable to java.
  7. Could be that program 1 is wrong or program 2 is wrong.Or could be that they’re both fine.What’s a notion of dimensions and inference algorithm that will capture as many real bugs as possible.
  8. An example of a case where this might work.E = m * c^2 yesterdayE = m * c today.Should note that we’ve not explored second area much.
  9. Notice that we don’t necessarily have human understandable names.UniFication constraints based on assignment, comparison, add/subtract, array indexing (implicit comparison with length), method invocation
  10. Our early examples lost precision due to not having context sensitivity
  11. Add example.
  12. GUI to view inference resultsWe also have scripts to monitor a CVS/SVN repository and run Unifi at regular intervals anddiff the results of successive runs.
  13. When we started, we weren’t sure what to expect – would these equivalence classes be merging all the time due to legal program changes ? Would interesting errors show up as changes in dimensional relationships at all ?
  14. Like mass, length, time in physics, will be wonderful if a platform comes with a set of base units
  15. UniFi is first system for Java/OO features and is completely automatic.
  16. Please use it, play around with it, give us feedback, extend it, build upon it, do whatever you want.