Software debugging is tedious and time consuming. To reduce the manual effort needed for debugging, researchers have proposed a considerable number of techniques to automate the process of fault localization; in particular, techniques based on information retrieval (IR) have drawn increased attention in recent years. Although reportedly effective, these techniques have some potential limitations that may affect their performance. First, their effectiveness is likely to depend heavily on the quality of the bug reports; unfortunately, high-quality bug reports that contain rich information are not always available. Second, these techniques have not been evaluated through studies that involve actual developers, which is less than ideal, as purely analytical evaluations can hardly show the actual usefulness of debugging techniques. The goal of this work is to evaluate the usefulness of IR-based techniques in real-world scenarios. Our investigation shows that bug reports do not always contain rich information, and that low-quality bug reports can considerably affect the effectiveness of these techniques. Our research also shows, through a user study, that high-quality bug reports benefit developers just as much as they benefit IR-based techniques. In fact, the information provided by IR-based techniques when operating on high-quality reports is only helpful to developers in a limited number of cases. And even in these cases, such information only helps developers get to the faulty file quickly, but does not help them in their most time consuming task: understanding and fixing the bug within that file.
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
1. Evaluating the Usefulness
of IR-Based Fault
Localization Techniques
Qianqian Wang* Chris Parnin** Alessandro Orso*
* Georgia Institute of Technology, USA
** North Carolina State University, USA
6. IR-Based FL Techniques
• How do they work?
• Rank source files based on their lexical similarity to bug
reports
• How well do they work?
Top 1 Top 5 Top 10
Percentage 35% 58% 69%
7. Source code file: CTabFolder.java
public class CTabFolder extends Composite {
// tooltip
int [] toolTipEvents = new int[] {SWT.MouseExit,
SWT.MouseHover, SWT.MouseMove,
SWT.MouseDown, SWT.DragDetect};
Listener toolTipListener;
…
/ * Returns <code>true</code> if the CTabFolder
only displys the selected tab
* and <code>false</code> if the CTabFolder
displays multiple tabs.
*/
…void onMouseHover(Event event) {
showToolTip(event.x, event.y);
}
void onDispose() {
inDispose = true;
hideToolTip();
…
}
}
Understanding IR-based FL Techniques
Bug ID: 90018
Summary: Native tooltips left around on
CTabFolder.
Description: Hover over the PartStack
CTabFolder inside eclipse until some native
tooltip is displayed. For example, the
maximize button. When the tooltip appears,
change perspectives using the keybinding.
the CTabFolder gets hidden, but its tooltip
is permanently displayed and never goes
away. Even if that CTabFolder is disposed
(I'm assuming) when the perspective is
closed.
--------------------------------------------------------------------------
8. Source code file: CTabFolder.java
public class CTabFolder extends Composite {
// tooltip
int [] toolTipEvents = new int[] {SWT.MouseExit,
SWT.MouseHover, SWT.MouseMove,
SWT.MouseDown, SWT.DragDetect};
Listener toolTipListener;
…
/ * Returns <code>true</code> if the CTabFolder
only displys the selected tab
* and <code>false</code> if the CTabFolder
displays multiple tabs.
*/
…void onMouseHover(Event event) {
showToolTip(event.x, event.y);
}
void onDispose() {
inDispose = true;
hideToolTip();
…
}
}
Understanding IR-based FL Techniques
Bug ID: 90018
Summary: Native tooltips left around on
CTabFolder.
Description: Hover over the PartStack
CTabFolder inside eclipse until some native
tooltip is displayed. For example, the
maximize button. When the tooltip appears,
change perspectives using the keybinding.
the CTabFolder gets hidden, but its tooltip
is permanently displayed and never goes
away. Even if that CTabFolder is disposed
(I'm assuming) when the perspective is
closed.
--------------------------------------------------------------------------
9. • Does the presence of technical information affect the fault
localization results?
• How often do bug reports contain such information?
• Is such information enough for developers to find the faulty
files easily?
Assessing IR-based FL Techniques
14. Q1: Results
Program
entity
Stack
trace
Test case
Results √ X X
• Does bug report information affect fault
localization results?
√ Statistically significant difference: p < 0.05
X No statistically significant difference: p >= 0.05
15. Q1: Results
Program
entity
Stack
trace
Test case
Results √ X X
• Does bug report information affect fault
localization results?
√ Statistically significant difference: p < 0.05
X No statistically significant difference: p >= 0.05
16. Q1: Results
Program
entity
Stack
trace
Test case
Results √ X X
• Does bug report information affect fault
localization results?
√ Statistically significant difference: p < 0.05
X No statistically significant difference: p >= 0.05
Bug report characteristics affect IR-based
fault localization results
17. • How often bug reports contain technical
information?
• Select 10,000 bug reports from SWT Bugzilla
• Check presence of technical information:
• Stack traces
• Test cases
• Program entity names
Q2: Method
18. • How often bug reports contain technical
information?
Q2: Results
Stack traces Test cases
Program
entity
Percentage 10% 3% 32%
19. • How often bug reports contain technical
information?
Q2: Results
Stack traces Test cases
Program
entity
Percentage 10% 3% 32%
20. • How often bug reports contain technical
information?
Q2: Results
Stack traces Test cases
Program
entity
Percentage 10% 3% 32%
The majority bug reports do not contain
enough information
21. Additional finding
“Optimistic” Evaluation Approach
• Assumption: Changed files = faulty files
• Reality:
• 40% bugs contain multiple changed files
• Not all changed files contain bugs
• Best-ranked files may not be faulty
22. Additional finding
“Optimistic” Evaluation Approach
• Assumption: Changed files = faulty files
• Reality:
• 40% bugs contain multiple changed files
• Not all changed files contain bugs
• Best-ranked files may not be faulty
Results of existing studies might be
worse than what reported
23. • Q3: Does bug report information affect
developers’ performance?
• Q4: Do IR-based techniques help developers’
performance?
User Study
24. Experiment Protocol: Setup
Participants:
70 developers
Graduate Students
Software subject:
• Eclipse SWT
• 2 bugs for each developer
Task: find and fix the bug
Tools:
• Eclipse plug-in
• Integrating ranked lists
• Logging
…"
1)"
2)"
3)"
4)"
✔
✔
✔
28. Q3: Results
Time used to find
the faulty file
Time used to locate
the bug
Debugging score
√ X √
Compared the performance of 2 groups:
1. without tool, good bug reports
2. without tool, bad bug reports
√ Statistically significant difference: p < 0.05
X No statistically significant difference: p >= 0.05
29. Q3: Results
Time used to find
the faulty file
Time used to locate
the bug
Debugging score
√ X √
Compared the performance of 2 groups:
1. without tool, good bug reports
2. without tool, bad bug reports
√ Statistically significant difference: p < 0.05
X No statistically significant difference: p >= 0.05
30. Q3: Results
Time used to find
the faulty file
Time used to locate
the bug
Debugging score
√ X √
Compared the performance of 2 groups:
1. without tool, good bug reports
2. without tool, bad bug reports
Good bug reports (i.e., with entity names) allow
developers to shorten the time to find the
faulty file and help them find better fixes
√ Statistically significant difference: p < 0.05
X No statistically significant difference: p >= 0.05
31. Q4: Results
Condition
Debugging
score
Time to find
the file
Time to locate
the bug
X X X
X √ X
X X X
X X X
Compared the performance of 2 groups under 4 conditions:
1. without tool, {good|bad} bug reports, {good|bad} ranked list
2. with tool, {good|bad} bug reports, {good|bad} ranked list
32. Q4: Results
Condition
Debugging
score
Time to find
the file
Time to locate
the bug
X X X
X √ X
X X X
X X X
Compared the performance of 2 groups under 4 conditions:
1. without tool, {good|bad} bug reports, {good|bad} ranked list
2. with tool, {good|bad} bug reports, {good|bad} ranked list
Good ranked
list
Bad ranked
list
Good bug
report
Bad bug
report
X Not statist, sign.
√ Statist. significant
33. Q4: Results
Condition
Debugging
score
Time to find
the file
Time to locate
the bug
X X X
X √ X
X X X
X X X
Compared the performance of 2 groups under 4 conditions:
1. without tool, {good|bad} bug reports, {good|bad} ranked list
2. with tool, {good|bad} bug reports, {good|bad} ranked list
Good ranked
list
Bad ranked
list
Good bug
report
Bad bug
report
X Not statist, sign.
√ Statist. significant
34. Q4: Results
Condition
Debugging
score
Time to find
the file
Time to locate
the bug
X X X
X √ X
X X X
X X X
Compared the performance of 2 groups under 4 conditions:
1. without tool, {good|bad} bug reports, {good|bad} ranked list
2. with tool, {good|bad} bug reports, {good|bad} ranked list
Good ranked
list
Bad ranked
list
Good bug
report
Bad bug
report
X Not statist, sign.
√ Statist. significant
Only perfect ranked lists help when users
can not get enough hints from bug reports
35. Q4: Results
Condition
Debugging
score
Time to find
the file
Time to locate
the bug
X X X
X √ X
X X X
X X X
Compared the performance of 2 groups under 4 conditions:
1. without tool, {good|bad} bug reports, {good|bad} ranked list
2. with tool, {good|bad} bug reports, {good|bad} ranked list
Good ranked
list
Bad ranked
list
Good bug
report
Bad bug
report
X Not statist, sign.
√ Statist. significant
Only perfect ranked lists help when users
can not get enough hints from bug reports
The tool only helps find the faulty file, but
developers spend much more time locating
the bug in the faulty file than finding such file
36. Additional Observations
• Developers used program entity names in the
bug report as search keywords.
• Ranked lists generated by IR-based techniques
affected users’ debugging behavior
• Gave a starting point
• Gave them confidence
37. Summary
• Studied the practical usefulness of IR-based FL techniques
• Performed both an analytical study and a user study
• Main findings
• Bug report characteristics affect IR-based fault localization results
• Results of existing studies might be worse than what reported
• The majority of bug reports do not contain enough information
• “Good” bug reports allow developers to shorten the time to find the
faulty file and help them find better fixes
• Only perfect ranked lists help when users can not get enough hints from
bug reports
• The tool only helps find the faulty file, but developers spend much more
time locating the bug in the faulty file than finding such file
38. Implications
• Better bug reports are needed
• Automated debugging techniques should focus
on improving results for bug reports with little
information
• Automated debugging techniques should
provide finer-grained information and context
• More user studies and realistic evaluations are
needed