Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Automatically Generated Patches as 
Debugging Aids: A Human Study 
Yida Tao, Jindae Kim, Sunghun Kim 
Dept. of CSE, The Ho...
• Promising research progress 
• ClearView1: Prevent all 10 Firefox exploits 
• GenProg2: Fix 55/105 real bugs 
[1] Automa...
3 
Automatic Program Repair
“It won't get your bug patched any quicker. 
You’ll just have shifted the coders' attention away from 
their own app's bug...
#what-could-possibly-go-wrong 
• Blackbox repair 
• Increasing maintenance cost 
• Vulnerable to attack 
- Slashdot discus...
#what-could-possibly-go-wrong 
#program-out-of-control 
- Slashdot discussion: 
http://science.slashdot.org/story/09/10/29...
Use automatically 
generated patches as 
debugging aids 
7
Use automatically 
generated patches as 
debugging aids 
Our Human Study 
• Investigate the usefulness of 
generated patch...
Methodology 
9
Debugging aid Participants Bugs 
10 
is given to Debug
Debugging aid Participants Bugs 11
Low-quality 
generated patch 
Debugging aid Participants Bugs 12
Low-quality 
generated patch 
High-quality 
generated patch 
Debugging aid Participants Bugs 13
Low-quality 
generated patch 
High-quality 
generated patch 
Buggy method 
location 
Debugging aid Participants Bugs 14
Grad: 
44 
MTurk: 
23 
Engr: 
28 
95 Participants 
CS graduate students 
Amazon Mechanical 
Turk workers 
Industrial softw...
Debugging aid Participants Bugs 16
44 Graduate students 
• Between-group design 
14 students 
15 students 
15 students 
Debugging aid Participants Bugs 17
44 Graduate students 
• Between-group design 
Low-quality generated patch 
High-quality generated patch 
Buggy method loca...
44 Graduate students 
• Between-group design 
• Onsite setting 
• Eclipse IDE 
• Supervised session 
Low-quality generated...
Low-quality 
generated patch 
High-quality 
generated patch 
Buggy method 
location 
Remote participants 
(28 Engr + 23 MT...
Remote participants 
(28 Engr + 23 MTurk) 
• Within-group design 
• Online debugging system 
Low-quality 
generated patch ...
Debugging aid Participants Bugs 22
Bug Selection Criteria 
• Real bugs 
• The bug has accepted patches written by developers 
• Proper number of bugs 
• The ...
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
Debugging aid Participants Bugs 24
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubS...
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubS...
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubS...
Debugging aid Participants Bugs 28
Participants submit 337 patches as their debugging outcome 
Debugging aid Participants Bugs 29
Location 
109 
LowQ 
112 
HighQ 
# submitted patches 116 
w.r.t debugging aid 
Participants submit 337 patches as their de...
Location 
109 
LowQ 
112 
HighQ 
# submitted patches 116 
w.r.t debugging aid 
Bug1 
66 
Bug2 
74 
Bug5 
62 
Bug3 
59 
Bug...
Evaluation of debugging performance 
32
Patch Correctness 
Correctness 
33
Patch Correctness 
• Passing test cases 
Correctness 
34
Patch Correctness 
• Passing test cases 
• Matching the semantics of original accepted patches 
Correctness 
35
Patch Correctness 
• Passing test cases 
• Matching the semantics of original accepted patches 
• 3 evaluators 
Correctnes...
Debugging Time 
• Eclipse Plug-in 
•Website Timer 
Correctness 
Debugging time 
37
Correctness 
Debugging time 
• Independent variables 
• Debugging aids 
• Bugs 
• Participant types 
• Programming experie...
Multiple Regression Analysis 
Correctness 
Debugging time 
• Independent variables 
• Debugging aids 
• Bugs 
• Participan...
Post-study Survey 
• Helpfulness of debugging aids 
• Difficulty of bugs 
• Opinions on using generated patches as debuggi...
Results 
41
High-quality patches significantly 
improve debugging correctness 
1 
48% 
33% 
71% 
42
High-quality patches significantly 
improve debugging correctness 
1 
% of correct patches 
48% 
33% 
71% 
43 
Location Lo...
High-quality patches significantly 
improve debugging correctness 
% of correct patches 
Location LowQ HighQ 
1 
Positive ...
Low-quality patches slightly 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
48% 
33% 
71%...
Low-quality patches slightly 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
Negative Coef...
Low-quality patches can 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
Negative Coefficie...
High-quality patches are more useful for 
3 difficult bugs 
48
High-quality patches are more useful for 
3 difficult bugs 
49 
5 
4 
3 
2 
Bug Difficulty 
Bug1 
Math-280 
Bug2 
Rhino-11...
High-quality patches are more useful for 
3 difficult bugs 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
% of correct ...
4 
The type of debugging aid does not affect 
debugging time 
51
4 
The type of debugging aid does not affect 
debugging time 
80 
60 
40 
20 
0 
Debugging time (min) 
Location LowQ HighQ...
5 
Other factors’ impact on debugging 
performance 
Difficult bugs significantly slow down debugging 
Engr and MTurk are m...
Helpfulness of debugging aids 
Very helpful 
Helpful 
Medium 
Slightly Helpful 
Not Helpful 
6 
54 
Participants consider ...
Feedback 
55
56
Quick starting point 
• Point to the buggy area 
• Brainstorm 
“They would seem to be useful 
in helping find various idea...
Quick starting point 
• Point to the buggy area 
• Brainstorm 
Confusing, incomplete, misleading 
• Wrong lead, especially...
“Generated patches would be 
good at recognizing obvious 
problems” 
“…but may not recognize more 
involved defects.” 
59
“Generated patches would be 
good at recognizing obvious 
problems” 
“…but may not recognize more 
involved defects.” 
60 ...
“I would use generated 
patches as debugging aids, as 
they provide extra diagnostic 
information” 
61
“I would use generated 
patches as debugging aids, as 
they provide extra diagnostic 
information” 
“…along with access to...
Threats to Validity 
63
Threats to Validity 
• Bugs and generated patches may not be representative 
• Quality measure of generated patches may no...
Takeaway 
65 
• Auto-generated patches can be useful as 
debugging aids 
• Participants fix bugs more correctly with auto-...
Nächste SlideShare
Wird geladen in …5
×

Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

1.332 Aufrufe

Veröffentlicht am

Yida's FSE presentation.

Veröffentlicht in: Seele & Geist
  • Als Erste(r) kommentieren

Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

  1. 1. Automatically Generated Patches as Debugging Aids: A Human Study Yida Tao, Jindae Kim, Sunghun Kim Dept. of CSE, The Hong Kong University of Science and Technology Chang Xu State Key Lab for Novel Software Technology, Nanjing University
  2. 2. • Promising research progress • ClearView1: Prevent all 10 Firefox exploits • GenProg2: Fix 55/105 real bugs [1] Automatically Patching Errors in Deployed Software. Perkins et al. SOSP’09 [2] A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12 2 Automatic Program Repair
  3. 3. 3 Automatic Program Repair
  4. 4. “It won't get your bug patched any quicker. You’ll just have shifted the coders' attention away from their own app's bugs, and onto the repair tool’s bugs.” - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code 4 Automatic Program Repair
  5. 5. #what-could-possibly-go-wrong • Blackbox repair • Increasing maintenance cost • Vulnerable to attack - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code - A human study of patch maintainability. ISSTA’12 5 - Automatic patch generation learned from human-written patches. ICSE’13
  6. 6. #what-could-possibly-go-wrong #program-out-of-control - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code - A human study of patch maintainability. ISSTA’12 6 - Automatic patch generation learned from human-written patches. ICSE’13 • Blackbox repair • Increasing maintenance cost • Vulnerable to attack
  7. 7. Use automatically generated patches as debugging aids 7
  8. 8. Use automatically generated patches as debugging aids Our Human Study • Investigate the usefulness of generated patches as debugging aids • Discuss the impact of patch quality on debugging performance • Explore practitioners’ feedback on adopting automatic program repair 8
  9. 9. Methodology 9
  10. 10. Debugging aid Participants Bugs 10 is given to Debug
  11. 11. Debugging aid Participants Bugs 11
  12. 12. Low-quality generated patch Debugging aid Participants Bugs 12
  13. 13. Low-quality generated patch High-quality generated patch Debugging aid Participants Bugs 13
  14. 14. Low-quality generated patch High-quality generated patch Buggy method location Debugging aid Participants Bugs 14
  15. 15. Grad: 44 MTurk: 23 Engr: 28 95 Participants CS graduate students Amazon Mechanical Turk workers Industrial software engineers Debugging aid Participants Bugs 15
  16. 16. Debugging aid Participants Bugs 16
  17. 17. 44 Graduate students • Between-group design 14 students 15 students 15 students Debugging aid Participants Bugs 17
  18. 18. 44 Graduate students • Between-group design Low-quality generated patch High-quality generated patch Buggy method location 14 students 15 students 15 students Debugging aid Participants Bugs 18
  19. 19. 44 Graduate students • Between-group design • Onsite setting • Eclipse IDE • Supervised session Low-quality generated patch High-quality generated patch Buggy method location 14 students 15 students 15 students Debugging aid Participants Bugs 19
  20. 20. Low-quality generated patch High-quality generated patch Buggy method location Remote participants (28 Engr + 23 MTurk) • Within-group design Debugging aid Participants Bugs 20
  21. 21. Remote participants (28 Engr + 23 MTurk) • Within-group design • Online debugging system Low-quality generated patch High-quality generated patch Buggy method location Debugging aid Participants Bugs 21
  22. 22. Debugging aid Participants Bugs 22
  23. 23. Bug Selection Criteria • Real bugs • The bug has accepted patches written by developers • Proper number of bugs • The bug has generated patches with different quality Debugging aid Participants Bugs 23
  24. 24. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 Debugging aid Participants Bugs 24
  25. 25. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B Debugging aid Participants Bugs 25 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); }
  26. 26. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B avg. ranking from 85 devs and students Debugging aid Participants Bugs 26 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); } 1.6 2.8
  27. 27. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B High-Quality Patch Low-Quality patch avg. ranking from 85 devs and students Debugging aid Participants Bugs 27 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); } 1.6 2.8
  28. 28. Debugging aid Participants Bugs 28
  29. 29. Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 29
  30. 30. Location 109 LowQ 112 HighQ # submitted patches 116 w.r.t debugging aid Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 30
  31. 31. Location 109 LowQ 112 HighQ # submitted patches 116 w.r.t debugging aid Bug1 66 Bug2 74 Bug5 62 Bug3 59 Bug4 76 # submitted patches w.r.t bugs Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 31
  32. 32. Evaluation of debugging performance 32
  33. 33. Patch Correctness Correctness 33
  34. 34. Patch Correctness • Passing test cases Correctness 34
  35. 35. Patch Correctness • Passing test cases • Matching the semantics of original accepted patches Correctness 35
  36. 36. Patch Correctness • Passing test cases • Matching the semantics of original accepted patches • 3 evaluators Correctness 36
  37. 37. Debugging Time • Eclipse Plug-in •Website Timer Correctness Debugging time 37
  38. 38. Correctness Debugging time • Independent variables • Debugging aids • Bugs • Participant types • Programming experience 38
  39. 39. Multiple Regression Analysis Correctness Debugging time • Independent variables • Debugging aids • Bugs • Participant types • Programming experience correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4 debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4 39
  40. 40. Post-study Survey • Helpfulness of debugging aids • Difficulty of bugs • Opinions on using generated patches as debugging aids Correctness Debugging time Survey feedback 40
  41. 41. Results 41
  42. 42. High-quality patches significantly improve debugging correctness 1 48% 33% 71% 42
  43. 43. High-quality patches significantly improve debugging correctness 1 % of correct patches 48% 33% 71% 43 Location LowQ HighQ
  44. 44. High-quality patches significantly improve debugging correctness % of correct patches Location LowQ HighQ 1 Positive Coefficient = 1.25 p-value= 0.00 < 0.05 48% 71% 44
  45. 45. Low-quality patches slightly undermine debugging correctness % of correct patches Location LowQ HighQ 2 48% 33% 71% 45
  46. 46. Low-quality patches slightly undermine debugging correctness % of correct patches Location LowQ HighQ 2 Negative Coefficient = -0.55 p-value= 0.09 48% 33% 71% 46
  47. 47. Low-quality patches can undermine debugging correctness % of correct patches Location LowQ HighQ 2 Negative Coefficient = -0.55 p-value= 0.09 48% 33% 71% 47
  48. 48. High-quality patches are more useful for 3 difficult bugs 48
  49. 49. High-quality patches are more useful for 3 difficult bugs 49 5 4 3 2 Bug Difficulty Bug1 Math-280 Bug2 Rhino-114493 Bug3 Rhino-192226 Bug4 Rhino-217379 Bug5 Rhino-76683
  50. 50. High-quality patches are more useful for 3 difficult bugs 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% % of correct patches Bug1 Bug2 Bug3 Bug4 Bug5 Location LowQ HighQ 50 5 4 3 2 Bug Difficulty Bug1 Math-280 Bug2 Rhino-114493 Bug3 Rhino-192226 Bug4 Rhino-217379 Bug5 Rhino-76683
  51. 51. 4 The type of debugging aid does not affect debugging time 51
  52. 52. 4 The type of debugging aid does not affect debugging time 80 60 40 20 0 Debugging time (min) Location LowQ HighQ 52
  53. 53. 5 Other factors’ impact on debugging performance Difficult bugs significantly slow down debugging Engr and MTurk are more likely to debug correctly Novices tend to benefit more from HighQ patches 53
  54. 54. Helpfulness of debugging aids Very helpful Helpful Medium Slightly Helpful Not Helpful 6 54 Participants consider high-quality generated patches much more helpful than low-quality patches Low-quality generated patch High-quality generated patch Mann-Whitney U test p-value = 0.001
  55. 55. Feedback 55
  56. 56. 56
  57. 57. Quick starting point • Point to the buggy area • Brainstorm “They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.” 57
  58. 58. Quick starting point • Point to the buggy area • Brainstorm Confusing, incomplete, misleading • Wrong lead, especially for novices • Require further human perfection “They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.” 58
  59. 59. “Generated patches would be good at recognizing obvious problems” “…but may not recognize more involved defects.” 59
  60. 60. “Generated patches would be good at recognizing obvious problems” “…but may not recognize more involved defects.” 60 “Generated patches simplify the problem” “…but they may over-simplify it by not addressing the root cause.”
  61. 61. “I would use generated patches as debugging aids, as they provide extra diagnostic information” 61
  62. 62. “I would use generated patches as debugging aids, as they provide extra diagnostic information” “…along with access to standard debugging tools.” 62
  63. 63. Threats to Validity 63
  64. 64. Threats to Validity • Bugs and generated patches may not be representative • Quality measure of generated patches may not generalize • May not generalize to domain experts • Possibility of blindly reusing generated patches • Remove patches that are submitted less than 1 minute 64
  65. 65. Takeaway 65 • Auto-generated patches can be useful as debugging aids • Participants fix bugs more correctly with auto-generated patches • Quality control is required • Participants’ debugging correctness is compromised with low-quality generated patches • Maximize the benefits • Difficult bugs • Novice developers

×