SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Automatically Generated Patches as 
Debugging Aids: A Human Study 
Yida Tao, Jindae Kim, Sunghun Kim 
Dept. of CSE, The Hong Kong University of Science and Technology 
Chang Xu 
State Key Lab for Novel Software Technology, Nanjing University
• Promising research progress 
• ClearView1: Prevent all 10 Firefox exploits 
• GenProg2: Fix 55/105 real bugs 
[1] Automatically Patching Errors in Deployed Software. 
Perkins et al. SOSP’09 
[2] A systematic study of automated program repair: fixing 
55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12 
2 
Automatic Program Repair
3 
Automatic Program Repair
“It won't get your bug patched any quicker. 
You’ll just have shifted the coders' attention away from 
their own app's bugs, and onto the repair tool’s bugs.” 
- Slashdot discussion: 
http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- 
Bypassing-the-Source-Code 
4 
Automatic Program Repair
#what-could-possibly-go-wrong 
• Blackbox repair 
• Increasing maintenance cost 
• Vulnerable to attack 
- Slashdot discussion: 
http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- 
Bypassing-the-Source-Code 
- A human study of patch maintainability. ISSTA’12 
5 
- Automatic patch generation learned from human-written patches. ICSE’13
#what-could-possibly-go-wrong 
#program-out-of-control 
- Slashdot discussion: 
http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- 
Bypassing-the-Source-Code 
- A human study of patch maintainability. ISSTA’12 
6 
- Automatic patch generation learned from human-written patches. ICSE’13 
• Blackbox repair 
• Increasing maintenance cost 
• Vulnerable to attack
Use automatically 
generated patches as 
debugging aids 
7
Use automatically 
generated patches as 
debugging aids 
Our Human Study 
• Investigate the usefulness of 
generated patches as debugging aids 
• Discuss the impact of patch quality 
on debugging performance 
• Explore practitioners’ feedback on 
adopting automatic program repair 
8
Methodology 
9
Debugging aid Participants Bugs 
10 
is given to Debug
Debugging aid Participants Bugs 11
Low-quality 
generated patch 
Debugging aid Participants Bugs 12
Low-quality 
generated patch 
High-quality 
generated patch 
Debugging aid Participants Bugs 13
Low-quality 
generated patch 
High-quality 
generated patch 
Buggy method 
location 
Debugging aid Participants Bugs 14
Grad: 
44 
MTurk: 
23 
Engr: 
28 
95 Participants 
CS graduate students 
Amazon Mechanical 
Turk workers 
Industrial software 
engineers 
Debugging aid Participants Bugs 15
Debugging aid Participants Bugs 16
44 Graduate students 
• Between-group design 
14 students 
15 students 
15 students 
Debugging aid Participants Bugs 17
44 Graduate students 
• Between-group design 
Low-quality generated patch 
High-quality generated patch 
Buggy method location 
14 students 
15 students 
15 students 
Debugging aid Participants Bugs 18
44 Graduate students 
• Between-group design 
• Onsite setting 
• Eclipse IDE 
• Supervised session 
Low-quality generated patch 
High-quality generated patch 
Buggy method location 
14 students 
15 students 
15 students 
Debugging aid Participants Bugs 19
Low-quality 
generated patch 
High-quality 
generated patch 
Buggy method 
location 
Remote participants 
(28 Engr + 23 MTurk) 
• Within-group design 
Debugging aid Participants Bugs 20
Remote participants 
(28 Engr + 23 MTurk) 
• Within-group design 
• Online debugging system 
Low-quality 
generated patch 
High-quality 
generated patch 
Buggy method 
location 
Debugging aid Participants Bugs 21
Debugging aid Participants Bugs 22
Bug Selection Criteria 
• Real bugs 
• The bug has accepted patches written by developers 
• Proper number of bugs 
• The bug has generated patches with different quality 
Debugging aid Participants Bugs 23
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
Debugging aid Participants Bugs 24
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
if(sub!=null){ 
args[i+1] = sub.toString(); 
Auto-generated patch A Auto-generated patch B 
Debugging aid Participants Bugs 25 
} 
} 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
args[parenCount+1] = 
new Integer(reImpl.leftContext.length); 
}
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
if(sub!=null){ 
args[i+1] = sub.toString(); 
Auto-generated patch A Auto-generated patch B 
avg. ranking from 85 devs and students 
Debugging aid Participants Bugs 26 
} 
} 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
args[parenCount+1] = 
new Integer(reImpl.leftContext.length); 
} 
1.6 
2.8
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
if(sub!=null){ 
args[i+1] = sub.toString(); 
Auto-generated patch A Auto-generated patch B 
High-Quality Patch Low-Quality patch 
avg. ranking from 85 devs and students 
Debugging aid Participants Bugs 27 
} 
} 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
args[parenCount+1] = 
new Integer(reImpl.leftContext.length); 
} 
1.6 
2.8
Debugging aid Participants Bugs 28
Participants submit 337 patches as their debugging outcome 
Debugging aid Participants Bugs 29
Location 
109 
LowQ 
112 
HighQ 
# submitted patches 116 
w.r.t debugging aid 
Participants submit 337 patches as their debugging outcome 
Debugging aid Participants Bugs 30
Location 
109 
LowQ 
112 
HighQ 
# submitted patches 116 
w.r.t debugging aid 
Bug1 
66 
Bug2 
74 
Bug5 
62 
Bug3 
59 
Bug4 
76 
# submitted patches 
w.r.t bugs 
Participants submit 337 patches as their debugging outcome 
Debugging aid Participants Bugs 31
Evaluation of debugging performance 
32
Patch Correctness 
Correctness 
33
Patch Correctness 
• Passing test cases 
Correctness 
34
Patch Correctness 
• Passing test cases 
• Matching the semantics of original accepted patches 
Correctness 
35
Patch Correctness 
• Passing test cases 
• Matching the semantics of original accepted patches 
• 3 evaluators 
Correctness 
36
Debugging Time 
• Eclipse Plug-in 
•Website Timer 
Correctness 
Debugging time 
37
Correctness 
Debugging time 
• Independent variables 
• Debugging aids 
• Bugs 
• Participant types 
• Programming experience 
38
Multiple Regression Analysis 
Correctness 
Debugging time 
• Independent variables 
• Debugging aids 
• Bugs 
• Participant types 
• Programming experience 
correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4 
debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4 
39
Post-study Survey 
• Helpfulness of debugging aids 
• Difficulty of bugs 
• Opinions on using generated patches as debugging aids 
Correctness 
Debugging time 
Survey feedback 
40
Results 
41
High-quality patches significantly 
improve debugging correctness 
1 
48% 
33% 
71% 
42
High-quality patches significantly 
improve debugging correctness 
1 
% of correct patches 
48% 
33% 
71% 
43 
Location LowQ HighQ
High-quality patches significantly 
improve debugging correctness 
% of correct patches 
Location LowQ HighQ 
1 
Positive Coefficient = 1.25 
p-value= 0.00 < 0.05 48% 
71% 
44
Low-quality patches slightly 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
48% 
33% 
71% 
45
Low-quality patches slightly 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
Negative Coefficient = -0.55 
p-value= 0.09 48% 
33% 
71% 
46
Low-quality patches can 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
Negative Coefficient = -0.55 
p-value= 0.09 48% 
33% 
71% 
47
High-quality patches are more useful for 
3 difficult bugs 
48
High-quality patches are more useful for 
3 difficult bugs 
49 
5 
4 
3 
2 
Bug Difficulty 
Bug1 
Math-280 
Bug2 
Rhino-114493 
Bug3 
Rhino-192226 
Bug4 
Rhino-217379 
Bug5 
Rhino-76683
High-quality patches are more useful for 
3 difficult bugs 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
% of correct patches 
Bug1 Bug2 Bug3 Bug4 Bug5 
Location LowQ HighQ 
50 
5 
4 
3 
2 
Bug Difficulty 
Bug1 
Math-280 
Bug2 
Rhino-114493 
Bug3 
Rhino-192226 
Bug4 
Rhino-217379 
Bug5 
Rhino-76683
4 
The type of debugging aid does not affect 
debugging time 
51
4 
The type of debugging aid does not affect 
debugging time 
80 
60 
40 
20 
0 
Debugging time (min) 
Location LowQ HighQ 
52
5 
Other factors’ impact on debugging 
performance 
Difficult bugs significantly slow down debugging 
Engr and MTurk are more likely to debug correctly 
Novices tend to benefit more from HighQ patches 
53
Helpfulness of debugging aids 
Very helpful 
Helpful 
Medium 
Slightly Helpful 
Not Helpful 
6 
54 
Participants consider high-quality generated patches 
much more helpful than low-quality patches 
Low-quality 
generated patch 
High-quality 
generated patch 
Mann-Whitney U test 
p-value = 0.001
Feedback 
55
56
Quick starting point 
• Point to the buggy area 
• Brainstorm 
“They would seem to be useful 
in helping find various ideas 
around fixing the issue, even 
if the patch isn’t always 
correct on its own.” 
57
Quick starting point 
• Point to the buggy area 
• Brainstorm 
Confusing, incomplete, misleading 
• Wrong lead, especially for novices 
• Require further human perfection 
“They would seem to be useful 
in helping find various ideas 
around fixing the issue, even 
if the patch isn’t always 
correct on its own.” 
58
“Generated patches would be 
good at recognizing obvious 
problems” 
“…but may not recognize more 
involved defects.” 
59
“Generated patches would be 
good at recognizing obvious 
problems” 
“…but may not recognize more 
involved defects.” 
60 
“Generated patches simplify 
the problem” 
“…but they may over-simplify it by 
not addressing the root cause.”
“I would use generated 
patches as debugging aids, as 
they provide extra diagnostic 
information” 
61
“I would use generated 
patches as debugging aids, as 
they provide extra diagnostic 
information” 
“…along with access to standard 
debugging tools.” 
62
Threats to Validity 
63
Threats to Validity 
• Bugs and generated patches may not be representative 
• Quality measure of generated patches may not generalize 
• May not generalize to domain experts 
• Possibility of blindly reusing generated patches 
• Remove patches that are submitted less than 1 minute 
64
Takeaway 
65 
• Auto-generated patches can be useful as 
debugging aids 
• Participants fix bugs more correctly with auto-generated 
patches 
• Quality control is required 
• Participants’ debugging correctness is 
compromised with low-quality generated patches 
• Maximize the benefits 
• Difficult bugs 
• Novice developers

Weitere ähnliche Inhalte

Was ist angesagt?

Entaggle: an Agile Software Development Case Study
Entaggle: an Agile Software Development Case StudyEntaggle: an Agile Software Development Case Study
Entaggle: an Agile Software Development Case Study
Elisabeth Hendrickson
 

Was ist angesagt? (19)

Automatic testing in DevOps
Automatic testing in DevOpsAutomatic testing in DevOps
Automatic testing in DevOps
 
How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for Distro
 
Risk based testing with Jira and Jubula
Risk based testing with Jira and JubulaRisk based testing with Jira and Jubula
Risk based testing with Jira and Jubula
 
Open Source tools in Continuous Integration environment (case study for agil...
Open Source tools in Continuous Integration environment  (case study for agil...Open Source tools in Continuous Integration environment  (case study for agil...
Open Source tools in Continuous Integration environment (case study for agil...
 
What CS Class Didn't Teach About Testing
What CS Class Didn't Teach About TestingWhat CS Class Didn't Teach About Testing
What CS Class Didn't Teach About Testing
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
 
Project Management in 3 Slides
Project Management in 3 SlidesProject Management in 3 Slides
Project Management in 3 Slides
 
Ship code like a keptn
Ship code like a keptnShip code like a keptn
Ship code like a keptn
 
Testing Plug-in Architectures
Testing Plug-in ArchitecturesTesting Plug-in Architectures
Testing Plug-in Architectures
 
A lean automation blueprint for testing in continuous delivery
A lean automation blueprint for testing in continuous deliveryA lean automation blueprint for testing in continuous delivery
A lean automation blueprint for testing in continuous delivery
 
Code Reviews
Code ReviewsCode Reviews
Code Reviews
 
Using JIRA for Risk Based Testing - QASymphony Webinar
Using JIRA for Risk Based Testing  - QASymphony WebinarUsing JIRA for Risk Based Testing  - QASymphony Webinar
Using JIRA for Risk Based Testing - QASymphony Webinar
 
Entaggle: an Agile Software Development Case Study
Entaggle: an Agile Software Development Case StudyEntaggle: an Agile Software Development Case Study
Entaggle: an Agile Software Development Case Study
 
Continuous Automated Regression Testing to the Rescue
Continuous Automated Regression Testing to the RescueContinuous Automated Regression Testing to the Rescue
Continuous Automated Regression Testing to the Rescue
 
Injecting Threat Modeling into the SDLC by Susan Bradley
Injecting Threat Modeling into the SDLC by Susan BradleyInjecting Threat Modeling into the SDLC by Susan Bradley
Injecting Threat Modeling into the SDLC by Susan Bradley
 
Test driven development
Test driven developmentTest driven development
Test driven development
 
Is Test Planning a lost art in Agile? by Michelle Williams
Is Test Planning a lost art in Agile? by Michelle WilliamsIs Test Planning a lost art in Agile? by Michelle Williams
Is Test Planning a lost art in Agile? by Michelle Williams
 
There's no time to test, can you just automate it? by Anna Heiermann
There's no time to test, can you just automate it? by Anna HeiermannThere's no time to test, can you just automate it? by Anna Heiermann
There's no time to test, can you just automate it? by Anna Heiermann
 

Andere mochten auch (14)

Partitioning composite code changes to facilitate code review
Partitioning composite code changes to facilitate code reviewPartitioning composite code changes to facilitate code review
Partitioning composite code changes to facilitate code review
 
Sustainability with Regards to coal energy Production
Sustainability with Regards to coal energy ProductionSustainability with Regards to coal energy Production
Sustainability with Regards to coal energy Production
 
Best of tomhanks
Best of tomhanksBest of tomhanks
Best of tomhanks
 
Tema audio
Tema audioTema audio
Tema audio
 
Ch06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notesCh06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notes
 
How do software engineers understand code changes?
How do software engineers understand code changes?How do software engineers understand code changes?
How do software engineers understand code changes?
 
Kemal göktaş
Kemal göktaş Kemal göktaş
Kemal göktaş
 
Cultura de bolivia
Cultura de boliviaCultura de bolivia
Cultura de bolivia
 
Sinif yöneti̇mi̇
Sinif yöneti̇mi̇Sinif yöneti̇mi̇
Sinif yöneti̇mi̇
 
Gambar Teknik
Gambar TeknikGambar Teknik
Gambar Teknik
 
Neoimpressionism and postimpressionism
Neoimpressionism and postimpressionismNeoimpressionism and postimpressionism
Neoimpressionism and postimpressionism
 
Conversation Techniques
Conversation TechniquesConversation Techniques
Conversation Techniques
 
Aplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnyaAplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnya
 
Tarun hait cv
Tarun hait cvTarun hait cv
Tarun hait cv
 

Ähnlich wie Automatically generated patches as debugging aids

Requirements Based Testing
Requirements Based TestingRequirements Based Testing
Requirements Based Testing
SSA KPI
 
Verification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different ApproachVerification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different Approach
DVClub
 
How to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator ProjectHow to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator Project
Simon Urli
 
Real%20 world%20software%20testing%20white%20backgoround1
Real%20 world%20software%20testing%20white%20backgoround1Real%20 world%20software%20testing%20white%20backgoround1
Real%20 world%20software%20testing%20white%20backgoround1
Varun Sharma
 

Ähnlich wie Automatically generated patches as debugging aids (20)

Requirements Based Testing
Requirements Based TestingRequirements Based Testing
Requirements Based Testing
 
Code review prediction
Code review predictionCode review prediction
Code review prediction
 
Patterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug ReportsPatterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug Reports
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008
 
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and TacticalTLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
 
Software testing foundation
Software testing foundationSoftware testing foundation
Software testing foundation
 
Verification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different ApproachVerification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different Approach
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
How to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator ProjectHow to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator Project
 
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.
 
Refactoring workshop
Refactoring workshop Refactoring workshop
Refactoring workshop
 
Testing, fixing, and proving with contracts
Testing, fixing, and proving with contractsTesting, fixing, and proving with contracts
Testing, fixing, and proving with contracts
 
Cast 14 2 sample exam
Cast 14 2 sample examCast 14 2 sample exam
Cast 14 2 sample exam
 
Tool up your lamp stack
Tool up your lamp stackTool up your lamp stack
Tool up your lamp stack
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP Stack
 
Real%20 world%20software%20testing%20white%20backgoround1
Real%20 world%20software%20testing%20white%20backgoround1Real%20 world%20software%20testing%20white%20backgoround1
Real%20 world%20software%20testing%20white%20backgoround1
 
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
 
SOFTWARE TESTING W1_watermark.pdf
SOFTWARE TESTING W1_watermark.pdfSOFTWARE TESTING W1_watermark.pdf
SOFTWARE TESTING W1_watermark.pdf
 
ALE15 The real value of a definition of done
ALE15  The real value of a definition of doneALE15  The real value of a definition of done
ALE15 The real value of a definition of done
 
Backward thinking design qa system for quality goals
Backward thinking   design qa system for quality goalsBackward thinking   design qa system for quality goals
Backward thinking design qa system for quality goals
 

Kürzlich hochgeladen

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 

Kürzlich hochgeladen (20)

SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 

Automatically generated patches as debugging aids

  • 1. Automatically Generated Patches as Debugging Aids: A Human Study Yida Tao, Jindae Kim, Sunghun Kim Dept. of CSE, The Hong Kong University of Science and Technology Chang Xu State Key Lab for Novel Software Technology, Nanjing University
  • 2. • Promising research progress • ClearView1: Prevent all 10 Firefox exploits • GenProg2: Fix 55/105 real bugs [1] Automatically Patching Errors in Deployed Software. Perkins et al. SOSP’09 [2] A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12 2 Automatic Program Repair
  • 4. “It won't get your bug patched any quicker. You’ll just have shifted the coders' attention away from their own app's bugs, and onto the repair tool’s bugs.” - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code 4 Automatic Program Repair
  • 5. #what-could-possibly-go-wrong • Blackbox repair • Increasing maintenance cost • Vulnerable to attack - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code - A human study of patch maintainability. ISSTA’12 5 - Automatic patch generation learned from human-written patches. ICSE’13
  • 6. #what-could-possibly-go-wrong #program-out-of-control - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code - A human study of patch maintainability. ISSTA’12 6 - Automatic patch generation learned from human-written patches. ICSE’13 • Blackbox repair • Increasing maintenance cost • Vulnerable to attack
  • 7. Use automatically generated patches as debugging aids 7
  • 8. Use automatically generated patches as debugging aids Our Human Study • Investigate the usefulness of generated patches as debugging aids • Discuss the impact of patch quality on debugging performance • Explore practitioners’ feedback on adopting automatic program repair 8
  • 10. Debugging aid Participants Bugs 10 is given to Debug
  • 12. Low-quality generated patch Debugging aid Participants Bugs 12
  • 13. Low-quality generated patch High-quality generated patch Debugging aid Participants Bugs 13
  • 14. Low-quality generated patch High-quality generated patch Buggy method location Debugging aid Participants Bugs 14
  • 15. Grad: 44 MTurk: 23 Engr: 28 95 Participants CS graduate students Amazon Mechanical Turk workers Industrial software engineers Debugging aid Participants Bugs 15
  • 17. 44 Graduate students • Between-group design 14 students 15 students 15 students Debugging aid Participants Bugs 17
  • 18. 44 Graduate students • Between-group design Low-quality generated patch High-quality generated patch Buggy method location 14 students 15 students 15 students Debugging aid Participants Bugs 18
  • 19. 44 Graduate students • Between-group design • Onsite setting • Eclipse IDE • Supervised session Low-quality generated patch High-quality generated patch Buggy method location 14 students 15 students 15 students Debugging aid Participants Bugs 19
  • 20. Low-quality generated patch High-quality generated patch Buggy method location Remote participants (28 Engr + 23 MTurk) • Within-group design Debugging aid Participants Bugs 20
  • 21. Remote participants (28 Engr + 23 MTurk) • Within-group design • Online debugging system Low-quality generated patch High-quality generated patch Buggy method location Debugging aid Participants Bugs 21
  • 23. Bug Selection Criteria • Real bugs • The bug has accepted patches written by developers • Proper number of bugs • The bug has generated patches with different quality Debugging aid Participants Bugs 23
  • 24. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 Debugging aid Participants Bugs 24
  • 25. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B Debugging aid Participants Bugs 25 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); }
  • 26. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B avg. ranking from 85 devs and students Debugging aid Participants Bugs 26 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); } 1.6 2.8
  • 27. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B High-Quality Patch Low-Quality patch avg. ranking from 85 devs and students Debugging aid Participants Bugs 27 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); } 1.6 2.8
  • 29. Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 29
  • 30. Location 109 LowQ 112 HighQ # submitted patches 116 w.r.t debugging aid Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 30
  • 31. Location 109 LowQ 112 HighQ # submitted patches 116 w.r.t debugging aid Bug1 66 Bug2 74 Bug5 62 Bug3 59 Bug4 76 # submitted patches w.r.t bugs Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 31
  • 32. Evaluation of debugging performance 32
  • 34. Patch Correctness • Passing test cases Correctness 34
  • 35. Patch Correctness • Passing test cases • Matching the semantics of original accepted patches Correctness 35
  • 36. Patch Correctness • Passing test cases • Matching the semantics of original accepted patches • 3 evaluators Correctness 36
  • 37. Debugging Time • Eclipse Plug-in •Website Timer Correctness Debugging time 37
  • 38. Correctness Debugging time • Independent variables • Debugging aids • Bugs • Participant types • Programming experience 38
  • 39. Multiple Regression Analysis Correctness Debugging time • Independent variables • Debugging aids • Bugs • Participant types • Programming experience correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4 debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4 39
  • 40. Post-study Survey • Helpfulness of debugging aids • Difficulty of bugs • Opinions on using generated patches as debugging aids Correctness Debugging time Survey feedback 40
  • 42. High-quality patches significantly improve debugging correctness 1 48% 33% 71% 42
  • 43. High-quality patches significantly improve debugging correctness 1 % of correct patches 48% 33% 71% 43 Location LowQ HighQ
  • 44. High-quality patches significantly improve debugging correctness % of correct patches Location LowQ HighQ 1 Positive Coefficient = 1.25 p-value= 0.00 < 0.05 48% 71% 44
  • 45. Low-quality patches slightly undermine debugging correctness % of correct patches Location LowQ HighQ 2 48% 33% 71% 45
  • 46. Low-quality patches slightly undermine debugging correctness % of correct patches Location LowQ HighQ 2 Negative Coefficient = -0.55 p-value= 0.09 48% 33% 71% 46
  • 47. Low-quality patches can undermine debugging correctness % of correct patches Location LowQ HighQ 2 Negative Coefficient = -0.55 p-value= 0.09 48% 33% 71% 47
  • 48. High-quality patches are more useful for 3 difficult bugs 48
  • 49. High-quality patches are more useful for 3 difficult bugs 49 5 4 3 2 Bug Difficulty Bug1 Math-280 Bug2 Rhino-114493 Bug3 Rhino-192226 Bug4 Rhino-217379 Bug5 Rhino-76683
  • 50. High-quality patches are more useful for 3 difficult bugs 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% % of correct patches Bug1 Bug2 Bug3 Bug4 Bug5 Location LowQ HighQ 50 5 4 3 2 Bug Difficulty Bug1 Math-280 Bug2 Rhino-114493 Bug3 Rhino-192226 Bug4 Rhino-217379 Bug5 Rhino-76683
  • 51. 4 The type of debugging aid does not affect debugging time 51
  • 52. 4 The type of debugging aid does not affect debugging time 80 60 40 20 0 Debugging time (min) Location LowQ HighQ 52
  • 53. 5 Other factors’ impact on debugging performance Difficult bugs significantly slow down debugging Engr and MTurk are more likely to debug correctly Novices tend to benefit more from HighQ patches 53
  • 54. Helpfulness of debugging aids Very helpful Helpful Medium Slightly Helpful Not Helpful 6 54 Participants consider high-quality generated patches much more helpful than low-quality patches Low-quality generated patch High-quality generated patch Mann-Whitney U test p-value = 0.001
  • 56. 56
  • 57. Quick starting point • Point to the buggy area • Brainstorm “They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.” 57
  • 58. Quick starting point • Point to the buggy area • Brainstorm Confusing, incomplete, misleading • Wrong lead, especially for novices • Require further human perfection “They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.” 58
  • 59. “Generated patches would be good at recognizing obvious problems” “…but may not recognize more involved defects.” 59
  • 60. “Generated patches would be good at recognizing obvious problems” “…but may not recognize more involved defects.” 60 “Generated patches simplify the problem” “…but they may over-simplify it by not addressing the root cause.”
  • 61. “I would use generated patches as debugging aids, as they provide extra diagnostic information” 61
  • 62. “I would use generated patches as debugging aids, as they provide extra diagnostic information” “…along with access to standard debugging tools.” 62
  • 64. Threats to Validity • Bugs and generated patches may not be representative • Quality measure of generated patches may not generalize • May not generalize to domain experts • Possibility of blindly reusing generated patches • Remove patches that are submitted less than 1 minute 64
  • 65. Takeaway 65 • Auto-generated patches can be useful as debugging aids • Participants fix bugs more correctly with auto-generated patches • Quality control is required • Participants’ debugging correctness is compromised with low-quality generated patches • Maximize the benefits • Difficult bugs • Novice developers

Hinweis der Redaktion

  1. This is a work with …
  2. Automatic program repair has been a very hot topic in recent years. We’ve seen quite promising research progress in this area. For example, Perkins et al. proposed a self-defending software ClearView, which successfully prevents all of the 10 Firefox exploits created by a red team and generated patches for 7 of them. As another successful example, Le Goues et al. proposed GenProg and used it to fix 55 out of 105 real bugs
  3. However, there are also skeptics and worries about automatic program repair. Here is a quote from online discussion.
  4. Here is a quote from online discussion.
  5. Followed with this general concern, we’ve observed from online community and literatures worries about things that could possibly go wrong with the program repair technique. For example, whether it creates sort of blackbox repair that hardly make sense. Whether it increase maintenance cost, and whether machine generated patches are vulnerable to attack.
  6. In general, people are worried about whether a program, after being repaired automatically, still work as intended, or will become unexpectable, and out of control. Because of these concerns, direct deployment of automatic program repair seems problematic at this point. But, can we still benefit from this techinuqe?
  7. How about using ..? In this case, developers can refer to generated patches when they debug, but they don’t necessarily have to use it. In other words, they still take full control over the content of the patch. This sounds like a more comfortable usage scenario.
  8. Which is also the focus of our human study. First … And because some of the controversy of program repair comes from the quality of automatically generated patches, we also want to disc… Finally, we explore…
  9. Here is our methodology
  10. Which is actually quite intuitive. Basically, we conducted controlled experiments, where we give certain type of debugging aids to participants, who use them to debug. Next, I’ll introduce these 3 parts in detail.
  11. First, we have 3 different types of debugging aids.
  12. And for the last type of debugging aid, we need some kind of baseline. Because the first two debugging aids already suggest candidate fix.
  13. For fair comparison, for the baseline, or the control group, we provide only the buggy method location as the debugging aid. which is common in practice, where developers typically know the general buggy area from bug reports or stack trace, before they start to debug. That’s the 3 types of debugging aids we’re gonna give to pariticipants.
  14. We recruited 95 participants from a wide population. Which includes 44 cs graduate students, 28 software engineers from industry, and 23 workers of Amazon mechanical turk, which is a crowdsourcing marketplace. Average years: Grad: 4.1, Engr: 2.4 (1-10), Mturk: 5.7 (1-14)
  15. Now the question is, how we assign debugging aid to participants?
  16. For the 44 graduate students, we adopt a between-group design by evenly dividing students into 3 groups of similar programming experience
  17. Each group is given only one of debugging aids.
  18. These students use Eclipse to debug in a supervised session.
  19. For remote participants, namely 28 engr and 23 mturk workers, it’s unlikely for us to determine their numbers and expertise beforehand, so between-group design is not appropriate here if we want to ensure the fairness of group division. Instead, we adopt with-in group design, such that participants can be exposed to different debugging aids. To balance the experimental conditions, whenever participants select a bug, we assign the type of debugging aids to this particular bug in a round-robin fashion s.t. each aid was equally likely to be given to each bug.
  20. We developed an online … for them to complete debugging tasks.
  21. Next, how do we select bugs?
  22. Accordingly, we selected all 5 bugs reported in this work…..
  23. For each of the 5 bugs, this work reported two patches generated by different program repair techniques,
  24. And they presented these different patches of the same bug to 85 … , and asked them to rank the patch based on the question, “which one is more acceptable?” In the end, this work reported this ranking of different patches for the same bug
  25. And, for the purpose of our human study, we label the patch with higher ranking as the “high-quality patch”, and its peer patch for the same bug, but with lower ranking, as the “low-quality” patch
  26. That’s basically how we design this debugging human study.
  27. In total, participants submit 337 patches ……
  28. Here is the # of submitted patches that are created with each of the debugging aid.
  29. And here is the # of submitted patches that are created for each bug. Our design basically ensures that the these two distributions are well balanced.
  30. Next, I’ll describe how we evaluate participants’ debugging performance.
  31. First, we evaluate the correctness of participants’ submitted patches.
  32. A patch is labeled correct only if it passes our test cases
  33. … and match the …
  34. For this part we have 3 evaluators to check and discuss the semantic matching.
  35. We also measure participants’ debugging time by developing an eclipse plug-in and a website timer to record the time they spent on each bug
  36. Up to this point, several factors can affect debugging correctness and time. For example, the type of debugging aids, of course, and also bugs, participant types, and their expertise.
  37. So, we use multiple regression analysis to quantify the relation between these independent variables and the outcome. That is, we use multiple regression to compute the coefficient values and statistical significance, so that we can understand whether the corresponding factors really have positive or negative impact on debugging performance, and if so, how much the impact is.
  38. Our evaluation also includes a post-study survey, in which we asked participants to rate the …, the …, and offer opinions.
  39. Results
  40. First, high-q patches DO improve debugging correctness, SIGNIFICANTLY
  41. Here is the % of correct patches made by these two groups. It’s pretty straightforward that group with highq patches has made a MUCH higher % of correct patches.
  42. The regression analysis also shows that high-q patch has a statistically significant positive coefficient on debugging correctness
  43. Surprisingly, the group with low-quality patches has made less correct patches, EVEN when compared to the control group.
  44. Regression also shows negative coefficient for low-quality patches, although it’s not statistically significant.
  45. But we do observe that low… can indeed …
  46. Next, we find…
  47. Here’s participants’ survey feedback on bug difficult. We can see that they consider the third bug, Rhino … to be the most difficult one to debug
  48. And when we check for each bug, the percentage of correct patches made by each group, we observe an obvious trend For the 3rd bug, no one except for the participants using high-quality patches can fix the bug correctly.
  49. On the other hand, we also found that …
  50. We can see from this figure, that the debugging time of these three groups is not that different. And regression analysis also suggests the same.
  51. We also found other … . For example, the last bullet We found that novices, whose programming experience is below the average among all participants, tend to
  52. Next, when we analyze the survey results, where we ask participants to rate how help each debugging aid is, we found that they consider highQ generated patches much more helpful than lowQ generated patches,
  53. Now let’s listen to what participants said about their human study experience in using generated patches in debugging.
  54. As usual, things always have positive and the negative side.
  55. Quote…
  56. But, on the other hand, such a quick starting point may be confusing… And, they might require further perfection from human developers
  57. Since we distinguish highQ and lowQ patches based on their acceptability ranking reported in another work, this may not generalize to other quality measures, such as metric-based ones Another threat is that participants may blindly… Actually we took several measures to prevent such behaviors. … When participants submit their patches, we’ll ask them to justify their patches in an input box.
  58. Finally, the take-away of this work. BUT, strict quality … If we gave …, it could be misleading and indeed compromise their debugging performance. Finally, the benefits of using auto-generated patches as debugging aids could be much more obvious for difficult debugging tasks, or for novice developers