The document describes an approach called CAR-Miner for mining sequence association rules from software repositories to detect defects in exception handling. CAR-Miner constructs exception flow graphs from code examples, generates static traces, and mines sequence association rules of the form (context) Λ (trigger) => (recovery). It detects violations by checking if context sequences match rule contexts before missing recoveries. An evaluation on open source projects found that over 50% of mined rules represented real patterns, CAR-Miner detected more defects than previous approaches, and sequence rules helped uncover new defects.
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Making Exceptions on Exception Handling (WEH 2012 Keynote Speech)
1. Making Exceptions on
Exception Handling
Tao Xie
Department of Computer Science
North Carolina State University
Raleigh, USA
Joint work with Suresh Thummalapenta (IBM Research India)
WEH 2012
2. Software Reuse
Programmers commonly reuse APIs of existing
frameworks or libraries
– Advantages: High productivity of development
– Challenges: Complexity and lack of documentation
– Consequences:
• Spend efforts in understanding APIs
• Introduce defects in API client code
2
2
3. Exception Handling
APIs throw exceptions during runtime errors
Example: Session API of Hibernate framework throws HibernateException
APIs expect client applications to implement recovery actions
after exceptions occur
Example: Session API of Hibernate expect client application to rollback open
uncommitted transactions after HibernateException occurs
3
[Weimer&Necula 05]
4. Exception Handling
Failing to handle exceptions results in
Fatal issues: Database lock won’t be released if the
transaction is not rolled back
Performance degradation due to resource leaks:
17% increase in the performance is found in a
34KLOC program after properly handling
exceptions [Weimer and Necula04]
Use specification that describes exception-
handling behavior
Write correct API client code
Detect defects in API client code 4
5. Problem
Problem: Often specifications are not documented
Solution: Mine specifications from existing code
bases using APIs
Mining Software Repositories (MSR) Bugzilla Mailings
International effort to
make software repositories actionable Code Execution
CVS 5
http://msrconf.org repository traces
6. Solution-Driven Problem-Driven
Where can I apply X miner? What patterns do
we really need?
Basic Advanced
mining mining New/adapted
algorithms algorithms mining
algorithms
E.g., association rule mining, E.g., frequent partial order
frequent itemset mining… mining [ESEC/FSE 07]
E.g., [ICSE 09], [ASE 09]
6
7. Solution-Driven Problem-Driven
Where can I apply X miner? What patterns do
we really need?
Basic Advanced
mining mining New/adapted
algorithms algorithms mining
algorithms
E.g., association rule E.g., frequent partial order
mining, frequent itemset mining [ESEC/FSE 07]
E.g., [ICSE 09], [ASE 09]
mining…
7
8. Example
Scenario 1
1.1: ... Defect: No rollback done
1.2: OracleDataSource ods = null; Session session = null;
Connection conn = null; Statement statement = null; when SQLException occurs
1.3: logger.debug("Starting update");
1.4: try { Requires specification such
1.5: ods = new OracleDataSource();
1.6: ods.setURL("jdbc:oracle:thin:scott/tiger@192.168.1.2:1521:catfish"); as “Connection should be
1.7: conn = ods.getConnection();
1.8: statement = conn.createStatement();
rolled back when a
1.9: statement.executeUpdate("DELETE FROM table1");
1.10: connection.commit(); }
connection is created and
1.11: catch (SQLException se) { SQLException occurs”
1.13: logger.error("Exception occurred"); }
1.14: finally {
1.15: if(statement != null) { statement.close(); }
Q: Should every connection
1.16: if(conn != null) { conn.close(); } instance has to be rolled
1.17: if(ods != null) { ods.close(); } }
1.18: } back when SQLException
occurs?
Missing “conn.rollback()”
8
10. Example (cont.)
Simple association rules of the form “FCa => FCe” are
not expressive [Weimer&Necula 05]
Requires more general association rules (sequence
association rules) such as
(FCc1 FCc2) Λ FCa => FCe1, where
FCc1 -> Connection conn = OracleDataSource.getConnection()
FCc2 -> Statement stmt = Connection.createStatement()
FCa -> stmt.executeUpdate()
FCe1 -> conn.rollback()
10
11. Example (cont.)
Simple association rules of the form “FCa => FCe” are
not expressive [Weimer&Necula 05]
Requires more general association rules (sequence
association rules) such as
(FCc1 FCc2) Λ FCa => FCe1, where
FCc1 -> Connection conn = OracleDataSource.getConnection()
FCc2 -> Statement stmt = Connection.createStatement()
FCa -> stmt.executeUpdate() //Triggering Action
FCe1 -> conn.rollback()
11
12. Example (cont.)
Simple association rules of the form “FCa => FCe” are
not expressive [Weimer&Necula 05]
Requires more general association rules (sequence
association rules) such as
(FCc1 FCc2) Λ FCa => FCe1, where
FCc1 -> Connection conn = OracleDataSource.getConnection()
FCc2 -> Statement stmt = Connection.createStatement()
FCa -> stmt.executeUpdate()
FCe1 -> conn.rollback() //Recovery Action
12
13. Example (cont.)
Simple association rules of the form “FCa => FCe” are
not expressive [Weimer&Necula 05]
Requires more general association rules (sequence
association rules) such as
(FCc1 FCc2) Λ FCa => FCe1, where
FCc1 -> Connection conn = OracleDataSource.getConnection()
FCc2 -> Statement stmt = conn.createStatement() //Context
FCa -> stmt.executeUpdate()
FCe1 -> conn.rollback()
13
14. CAR-Miner Approach
Issue queries and collect relevant code
examples. Eg: “lang:java Construct exception-
java.sql.Statement executeUpdate” flow graphs
Open Source Projects on web
1 2 … N
Exception-Flow
… Graphs
Static Traces
Classes and
Functions
Extract classes Collect static traces
and functions
Mine static traces
reused
Sequence
Input
Violations Association
Application
Rules
Check whether there are
any exception-related Detect violations
defects 14
15. CAR-Miner Approach
Open Source Projects on web
1 2 … N
Exception-Flow
… Graphs
Static Traces
Classes and
Functions
Sequence
Input
Violations Association
Application
Rules
15
16. Exception-Flow-Graph Construction
Based on a previous algorithm [Sinha&Harrold TSE 00] : normal execution path
----: exceptional execution path
18. CAR-Miner Approach
Open Source Projects on web
1 2 … N
Exception-Flow
… Graphs
Static Traces
Classes and
Methods
Sequence
Input
Violations Association
Application
Rules
18
19. Static Trace Generation
Collect static traces with the actions
taken when exceptions occur
A static trace for Node 7:
“4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17”
19
21. Trace Post-Processing
Identify and remove unrelated function
calls using data dependency
“4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17”
4: FileWriter fw = new FileWriter(“output.txt”)
5: BufferedWriter bw = new BufferedWriter(fw)
...
7: Statement stmt = conn.createStatement()
...
Filtered sequence “6 -> 7 -> 15 -> 16“
21
22. CAR-Miner Approach
Open Source Projects on web
1 2 … N
Exception-Flow
… Graphs
Static Traces
Classes and
Methods
Sequence
Input
Violations Association
Application
Rules
22
23. Static Trace Mining
Handle traces of each function call (triggering
function call) individually
Input: Two sequence databases with a one-to-one
mapping
• normal function-call sequences (context)
• exception function-call sequences (recovery)
Objective: Generate sequence association rules of the
form
(FCc1 ... FCcn) Λ FCa => FCe1 ... FCen
Context Trigger Recovery
23
24. Mining Problem Definition
Input: Two sequence databases with a one-to-one mapping
Context Recovery
Objective: To get association rules of the form
FC1 FC2 ... FCm -> FE1 FE2 ... FEn
where {FC1, FC2, ..., Fcm} Є SDB1 and {FE1, FE2, ..., Fen} Є SDB2
Existing association rule mining algorithms cannot be directly
applied on multiple sequence databases
24
25. Mining Problem Solution
Annotate the sequences to generate a single combined database
Apply frequent subsequence mining algorithm [Wang and Han, ICDE 04] to
get frequent sequences
Transform mined sequences into sequence association rules
(3 10) Λ FCa => (2 8)
Context Trigger Recovery
Rank rules based on the support assigned by frequent
subsequence mining algorithm
25
26. CAR-Miner Approach
Open Source Projects on web
1 2 … N
Exception-Flow
… Graphs
Static Traces
Classes and
Methods
Sequence
Input
Violations Association
Application
Rules
26
27. Violation Detection
Analyze each call site of triggering call FCa
Step 1: Extract context call sequence “CC1
CC2 ... CCm” from the beginning of the
function to the call site of FCa
Step 2: If CC1 CC2 ... CCm is super-sequence
of FCc1 ... FCcn
Report any missing function calls of {FCe1 ... FCen} in
any exception path
API client: (CC1 CC2 ... CCm) Λ FCa => Missing any?
isSuperSeqOf
API Rule: (FCc1 ... FCcn) Λ FCa => FCe1 ... FCen
Context Trigger Recovery 27
28. Evaluation
Research Questions:
1. Do the mined rules represent real rules?
2. Do the detected violations represent real
defects?
3. Does CAR-Miner perform better than WN-
miner [Weimer&Necula 05]?
4. Do the sequence association rules help
detect new defects?
28
29. Subjects
Internal Info: classes and methods belonging to the app
External Info: classes and methods used by the app
Code examples: #files collected through code search engine
29
30. RQ1: Real Rules
Do the mined rules represent real rules?
Real rules: 55% (Total: 294)
Usage patterns: 3% 30
False positives: 43%
31. RQ1: Distribution of Real Rules for Axion
Distribution of rules based on ranks assigned by CAR-Miner
#false positives is quite low between 1 to 60 rules
31
32. RQ2: Detected Violations
Do the detected violations represent real defects?
Total number of defects: 160
New defects not found by WN-Miner approach: 87
32
33. RQ2: Status of Detected Violations
HsqlDB developers responded on the first 10 reported
defects
Accepted 7 defects
Rejected 3 defects
Reason given by HsqlDB developers for rejected defects:
“Although it can throw exceptions in general, it should not throw with
HsqlDB, So it is fine” 33
34. RQ3: Comparison with WN-miner
Does CAR-Miner performs better than WN-miner?
Found 224 new rules and missed 32 rules
CAR-Miner detected most of the rules mined by WN-miner
Two major factors:
sequence association rules
34
Increase in the data scope
35. RQ4: New defects by sequence association rules
Do the sequence association rules detect new defects?
Detected 21 new real defects among all applications
35
37. Large Number of False Positives
Existing approaches produce a large number of false
positives
One major observation:
Programmers often write code in different ways for
achieving the same task
Some ways are more frequent than others
Frequent Infrequent
ways ways
mine patterns detect violations
Mined Patterns 37
37
38. Example: java.util.Iterator.next()
Java.util.Iterator.next() throws NoSuchElementException when invoked on a list
without any elements Code Example 2
Code Sample 1 Code Sample 2
PrintEntries1(ArrayList<string> PrintEntries2(ArrayList<string>
entries) entries)
{ {
… …
Iterator it = entries.iterator(); if(entries.size() > 0) {
if(it.hasNext()) { Iterator it = entries.iterator();
string last = (string) it.next(); string last = (string) it.next();
} }
… …
} }
38
39. Example: java.util.Iterator.next()
Code Sample 1 Code Sample 2
PrintEntries1(ArrayList<string> PrintEntries2(ArrayList<string>
entries) entries)
{ {
… …
Iterator it = entries.iterator(); if(entries.size() > 0) {
if(it.hasNext()) { Iterator it = entries.iterator();
string last = (string) it.next(); string last = (string) it.next();
} }
… …
} }
Sample 2 (6/1243)
Sample 1 (1218 / 1243)
1243 code examples
Mined Pattern from existing approaches:
“boolean check on return of Iterator.hasNext before Iterator.next” 39
40. Example: java.util.Iterator.next()
Code Sample 1 Code Sample 2
PrintEntries1(ArrayList<string> PrintEntries2(ArrayList<string>
entries) entries)
{ {
… …
Iterator it = entries.iterator(); if(entries.size() > 0) {
if(it.hasNext()) { Iterator it = entries.iterator();
string last = (string) it.next(); string last = (string) it.next();
} }
… …
} }
Require more general patterns (alternative patterns): P1 or P2
P1 : boolean check on return of Iterator.hasNext before Iterator.next
P2 : boolean check on return of ArrayList.size before Iterator.next
Cannot be mined by existing approaches, since alternative P2
40
is infrequent
41. Our Solution: ImMiner Algorithm
Mines alternative patterns of the form P1 or P2
Based on the observation that infrequent alternatives such as P2 are
frequent among code examples that do not support P1
1243 code examples Sample 2 (6/1243)
Sample 1 (1218 / 1243)
P2 is infrequent among entire P2 is frequent among code
1243 code examples examples not supporting P1
41
42. Alternative Patterns
ImMiner mines three kinds of alternative
patterns of the general form “P1 or P2”
Balanced: all alternatives (both P1 and P2) are frequent
Imbalanced: some alternatives (P1) are frequent and
others are infrequent (P2). Represented as “P1 or P^2”
Single: only one alternative
42
43. ImMiner Algorithm
Uses frequent-itemset mining [Burdick et al. ICDE 01]
iteratively
An input database with the following APIs
for Iterator.next()
Input database Mapping of IDs to APIs
43
44. ImMiner Algorithm: Frequent Alternatives
Input database
Frequent itemset
mining
(min_sup 0.5)
Frequent item: 1
P1: boolean-check on the return of
Iterator.hasNext() before Iterator.next()
44
45. ImMiner: Infrequent Alternatives of P1
Split input database into two databases: Positive and Negative
Positive database (PSD)
Negative database (NSD)
Mine patterns that are frequent in NSD and are infrequent in PSD
Reason: Only such patterns serve as alternatives for P1
Alternative Pattern : P “const check on the return of ArrayList.size()
2
before Iterator.next()”
Alattin applies ImMiner algorithm to detect neglected conditions
45
46. Neglected Conditions
Neglected conditions refer to
Missing conditions that check the arguments or
receiver of the API call before the API call
Missing conditions that check the return or
receiver of the API call after the API call
One of the primary reasons for many fatal
issues
security or buffer-overflow vulnerabilities [Chang et
al. ISSTA 07]
46
47. Evaluation
Research Questions:
1. Do alternative patterns exist in real
applications?
2. How high percentage of false positives are
reduced (with low or no increase of false
negatives) in detected violations?
47
48. Subjects
#Samples: #code examples collected from Google code search
Two categories of subjects:
3 Java default API libraries
3 popular open source libraries
48
49. RQ1: Balanced and Imbalanced Patterns
How high percentage of balanced and imbalanced patterns exist in real apps?
Balanced patterns: 0% to 30% (average: 9.69%)
Imbalanced patterns:
30% to 100% (average: 65%) for Java default API libraries
0% to 9.5% (average: 5%) for open source libraries
Explanation: Java default API libraries provide more different ways of 49
writing code compared to open source libraries
50. RQ2: False Positives and False Negatives
How high % of false positives are reduced (with low or no increase of
false negatives)?
Applied mined patterns (“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”)
in three modes:
Existing mode:
“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”
P1 ,P2, ... , Pi
Balanced mode:
“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”
“P1 or P2 or ... or Pi”
Imbalanced mode:
“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”
“P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j ”50
53. Conclusion
Problem-driven methodology by identifying
new problems, patterns
mining algorithms, defects
CAR-Miner [ICSE 09]: mining sequence association
rules of the form
(FCc1 ... FCcn) Λ FCa => (FCe1 ... Fcen)
Context Trigger Recovery
reduce false negatives
Alattin [ASE 09]: mining alternative patterns classified
into three categories: balanced, imbalanced, and single
P1 or P2 or ... or Pi or A^1 or A^2 or ... or A^j
reduce false positives
Combining both 53
NLP on API documents [Pandita et al. ICSE 12]
54. Solution-Driven Problem-Driven
Where can I apply X miner? What patterns do
we really need?
Basic Advanced
mining mining New/adapted
algorithms algorithms mining
algorithms
E.g., association rule mining, E.g., frequent partial order
frequent itemset mining… mining [ESEC/FSE 07]
E.g., [ICSE 09], [ASE 09]
54
55. Road Ahead: Software Analytics
Software analytics is to enable
software practitioners to perform
data exploration and analysis in order
to obtain insightful and actionable
information for data-driven tasks
around software and services.
•Get research problems from real practice
•Get feedback from real practice
•Collaborate across disciplines
•Collaborate with industry
Dongmei Zhang and Tao Xie, Software Analytics in Practice, mini-tutorial ICSE 2012 SEIP. 55
http://research.microsoft.com/en-us/groups/sa/softwareanalyticsinpractice_minitutorial_icse2012.pdf
56. Thank You
Questions?
https://sites.google.com/site/asergrp/
This work is supported in part by NSF grants CCF-0845272, CCF-0915400,
56
CNS-0958235, and ARO grant W911NF-08-1-0443.
57. Alattin Approach
Phase 1: Issue queries and collect
relevant code samples. Eg: “lang:java Phase 2: Generate
java.util.Iterator next” pattern candidates
Open Source Projects on web
1 2 … N
Pattern
… Candidates
Classes and
methods
Extract classes
and methods Phase 3: Mine
reused alternative patterns
Application Alternative
Violations
Under Analysis Patterns
Detect neglected Phase 4: Detect neglected
conditions conditions statically
57
58. Keys to Making Real Impact
Engagement of practitioners
¤ Solving their problem ¤ Champions in product teams
¤ Timing ¤ Culture
Walking the last mile
¤ Targeting at real scenarios ¤ Trying out tool has cost
¤ “It works” is not enough ¤ Getting engineering support
Combination of expertise
¤ Research capabilities ¤ Engineering skills to build systems
¤ Visualization & design ¤ Communication
MSR 2012 58