SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Source Code Comprehension on Evolving Software:
A Literature Survey
Yida Tao
Supervisor: Sunghun Kim
1
Motivation
Code Change Comprehension
Tao et al., FSE’12
Code change comprehension is
• Frequently required
• In major development activities, in
particular the code-review process
• How do software engineers understand code changes? An exploratory study in industry. Tao et al., FSE’12
• Expectations, outcomes, and challenges of modern code review. Bacchelli and Bird, ICSE’13
Bacchelli & Bird, ICSE’13
• “…review and understand code they
have not seen before may be more
common that a developer working on
new code”
• “From interviews, no other code
review challenge emerged as clearly as
understanding the submitted change”
2
Outline
Program Differencing
Describing code changes
Code Change Summarization
Explaining code changes
Querying and Filtering
Customization
Code Change Comprehension
3
Program Differencing
4
 Text Differencing
 Syntactic Differencing
 Semantic Differencing
Text Differencing
 Flat representation of a program
 Sequence of strings
 Unix diff
 Only output added/deleted lines, can not detect modified lines
 Hard to determine when a code fragment is moved upward or downward
 Ldiff (Canfora et al., ICSE’09)
 An enhanced line differencing tool
 Limitations
 Changes to *characters*
 No syntactic-structure information
5
Syntactic Differencing
 Structured representation of a program
 Abstract syntax tree; XML
 ChangeDistiller (Fluri et al., TSE’07)
 Tree differencing
 Node: bigram string similarity
 Control structure: subtree similarity
 Output: tree edit script (insert, delete, move, update)
 XML differecing
 srcXML (Maletic & Collard, ICSM’04): embeds abstract syntax and structure
within the source code
 diffX (Al-Ekram et al., CASCON '05)
 Limitation
 Cannot describe how the behavior of a program is changed
 Still report differences for behavior-preserving changes
6
Semantic Differencing
 Semantic diff (Jackson and Ladd, ICSM’94)
 Method-level
 Variable dependencies comparison
7
==
Semantic Differencing (cont.)
 JDiff (Apiwattanapong et al. ASE’04, 06)
 Extended control-flow graph (ECFG)
 Dynamic binding, class hierarchy, exception handling, etc.
8
Semantic Differencing (cont.)
 Differential symbolic execution (Person et al., FSE’08)
 “Executing” a program using symbolic values
9
Outline
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Comprehension
Code Change Summarization
Explaining code changes
Querying and Filtering
Customization
10
Code Change Summarization
 LSdiff (Kim and Notkin, ICSE’09)
 Group related changes
 Detect potential inconsistencies in a code change
11
Code Change Summarization (cont.)
 DeltaDoc (Buse and Weimer, ASE’10)
 Symbolic execution: obtain path predicates for each statement in both
versions
 Identify statements that are added, deleted, or have a changed predicates
 Summarization
12
Code Change Summarization (cont.)
 Multi-document summarization (Rastkar and Murphy, ICSE’13)
 Linking evolutionary documents (commit log, issue tracking entries)
 Finding the most informative sentences to extract to form a summary
 Similarity between a sentence and the title of the enclosing document
 Overlap between a sentence and the adjacent document
13
Code Change Summarization (cont.)
 Challenges
 Evolutionary documents
 Linkage might not be found (Bachman et al., FSE’10, Wu et al., FSE’11)
 Human-written document may be unavailable or uninformative (Buse and Weimer,
ASE’10, Tao et al., FSE’12)
 Automatically generated document
 Verbosity
 Uninteresting changes are identified, e.g., “all types that declared toString() added
constructors” (Kim and Notkin, ICSE’09)
14
LSdiff DeltaDoc
Outline
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Code Change Comprehension
Querying and Filtering
Customization
15
Querying and Filtering
 Specifying and detecting meaningful changes (Yu et al., ASE’11)
 Normalize the program (user-specified) before differencing
 Non-trivial to construct the query
16
Querying and Filtering (cont.)
 Filtering non-essential changes (Kawrykow and
Robillard, ICSE’11)
 Non-essential changes: rename-induced modifications, local
variable extraction, trivial keyword modification, whitespace
and documentation updates
 ChangeDistiller (Fluri et al., TSE’07) + Partial program
analysis (Dagenais and Robillard, ICSE’08)
 Goal: improving mining and recommendation accuracy
instead of developers’ comprehension
17
Outline
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Code Change Comprehension
18
Research Directions
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Source Code Changes
Work-item-based changes?
19
Work-item-based Changes
 Multiple work-items in a single code change (e.g., a bug fix +
code cleanup + a new feature)
 Very difficult to understand (Tao et al., FSE’12)
20
JFreeChart revision 1083
Trivial keyword removal
Bug fix
Formatting
Work-item-based Change Detection
 Multiple work-items in a single code change (e.g., a bug fix +
code cleanup + a new feature)
 Very difficult to understand (Tao et al., FSE’12)
 Change decomposition
 Program slicing (entity dependencies)
 Pattern matching (similarities)
 A single work-item spreads across multiple code changes
(e.g., 5 changes to finally fix a bug completely)
 Change aggregation
 Linkage to the same issue
 Heuristics like time duration, commit authors, program dependencies, etc.
21
Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Code Change Comprehension
Work-item change detection
Change decomposition
Change aggregation
22
Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific
changes
Code Change Comprehension
Work-item change detection
Change decomposition
Change aggregation
23
Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific
changes
Code Change Comprehension
Concrete Execution
Work-item change detection
Change decomposition
Change aggregation
24
Explaining code changes with executions of co-
changed test cases
25
 Test cases
 Best documentation for source code
 Test cases co-changed with source code
 Documentation for code changes?
 Mostly synchronous co-evolution of production and test
code (Zaidman et al., Empirical Software Engineering’11)
 Differential test executions
 Co-changed test cases T
 Executing T on the old version P and new version P’
 Comparing executions to explained change behaviors
From StackExchange
http://programmers.stackexchange.com/questions/154439/quality-of-code-in-
unit-tests?newsletter=1&nlcode=67628%7c1a35
• “Unit tests are one of the best sources of documentation for your system,
and arguably the most reliable form”
• “Unit tests are often the first thing you look at when trying to grasp what
some piece of code does”
• “They can also serve as a starting point for people new to the code base”
Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific
changes
Code Change Comprehension
Concrete Execution
• Co-changed test cases
• Differential test execution
Work-item change detection
Change decomposition
Change aggregation
26

Weitere ähnliche Inhalte

Was ist angesagt?

ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewAli Ouni
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsThe University of Adelaide
 
Recommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringRecommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringAli Ouni
 
Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...The University of Adelaide
 
Investigating Code Review Practices in Defective Files
Investigating Code Review Practices in Defective FilesInvestigating Code Review Practices in Defective Files
Investigating Code Review Practices in Defective FilesThe University of Adelaide
 
The Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyThe Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyRay Buse
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...The University of Adelaide
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
 
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...
Ph.D. Thesis Defense:  Studying Reviewer Selection and Involvement in Modern ...Ph.D. Thesis Defense:  Studying Reviewer Selection and Involvement in Modern ...
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...The University of Adelaide
 
Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsThe University of Adelaide
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Chakkrit (Kla) Tantithamthavorn
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...Ali Ouni
 
Synthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsSynthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsJeongwhan Choi
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
 

Was ist angesagt? (20)

ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
 
Cser13.ppt
Cser13.pptCser13.ppt
Cser13.ppt
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review Analytics
 
Recommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringRecommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software Enginnering
 
Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Icsm19.ppt
 
Investigating Code Review Practices in Defective Files
Investigating Code Review Practices in Defective FilesInvestigating Code Review Practices in Defective Files
Investigating Code Review Practices in Defective Files
 
The Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyThe Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency Statically
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
 
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...
Ph.D. Thesis Defense:  Studying Reviewer Selection and Involvement in Modern ...Ph.D. Thesis Defense:  Studying Reviewer Selection and Involvement in Modern ...
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...
 
Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer Recommendations
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
 
Icsm20.ppt
Icsm20.pptIcsm20.ppt
Icsm20.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Synthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsSynthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development Artifacts
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 

Andere mochten auch

Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksSung Kim
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesSung Kim
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionSung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 

Andere mochten auch (14)

Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Tensor board
Tensor boardTensor board
Tensor board
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
Time series classification
Time series classificationTime series classification
Time series classification
 

Ähnlich wie Source code comprehension on evolving software

A Source Code Similarity System For Plagiarism Detection
A Source Code Similarity System For Plagiarism DetectionA Source Code Similarity System For Plagiarism Detection
A Source Code Similarity System For Plagiarism DetectionJames Heller
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talkRay Buse
 
Code Craftsmanship Checklist
Code Craftsmanship ChecklistCode Craftsmanship Checklist
Code Craftsmanship ChecklistRyan Polk
 
Implementing Refactorings in IntelliJ IDEA
Implementing Refactorings in IntelliJ IDEAImplementing Refactorings in IntelliJ IDEA
Implementing Refactorings in IntelliJ IDEAintelliyole
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Axel Reichwein
 
A Comparative Study of Forward and Reverse Engineering
A Comparative Study of Forward and Reverse EngineeringA Comparative Study of Forward and Reverse Engineering
A Comparative Study of Forward and Reverse Engineeringijsrd.com
 
Questions Every software engineer should answer
Questions Every software engineer should answerQuestions Every software engineer should answer
Questions Every software engineer should answerErtan Deniz
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
 
Aspect Oriented Programming
Aspect Oriented ProgrammingAspect Oriented Programming
Aspect Oriented ProgrammingRodger Oates
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Martin Pinzger
 
Requirements Analysis and Management using Innoslate
Requirements Analysis and Management using InnoslateRequirements Analysis and Management using Innoslate
Requirements Analysis and Management using InnoslateElizabeth Steiner
 
Requirement Management.ppt
Requirement Management.pptRequirement Management.ppt
Requirement Management.pptSoham De
 
Software engineering lecture notes
Software engineering   lecture notesSoftware engineering   lecture notes
Software engineering lecture notesGarima Singh
 
A study of code change patterns for adaptive maintenance with AST analysis
A study of code change patterns for  adaptive maintenance with AST analysis A study of code change patterns for  adaptive maintenance with AST analysis
A study of code change patterns for adaptive maintenance with AST analysis IJECEIAES
 
15 implementing architectures
15 implementing architectures15 implementing architectures
15 implementing architecturesMajong DevJfu
 
Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...
Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...
Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...JeffCarver32
 

Ähnlich wie Source code comprehension on evolving software (20)

A Source Code Similarity System For Plagiarism Detection
A Source Code Similarity System For Plagiarism DetectionA Source Code Similarity System For Plagiarism Detection
A Source Code Similarity System For Plagiarism Detection
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talk
 
Code Craftsmanship Checklist
Code Craftsmanship ChecklistCode Craftsmanship Checklist
Code Craftsmanship Checklist
 
Implementing Refactorings in IntelliJ IDEA
Implementing Refactorings in IntelliJ IDEAImplementing Refactorings in IntelliJ IDEA
Implementing Refactorings in IntelliJ IDEA
 
Unit iv
Unit ivUnit iv
Unit iv
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
 
A Comparative Study of Forward and Reverse Engineering
A Comparative Study of Forward and Reverse EngineeringA Comparative Study of Forward and Reverse Engineering
A Comparative Study of Forward and Reverse Engineering
 
Questions Every software engineer should answer
Questions Every software engineer should answerQuestions Every software engineer should answer
Questions Every software engineer should answer
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
 
Se lec-uosl-8
Se lec-uosl-8Se lec-uosl-8
Se lec-uosl-8
 
Aspect Oriented Programming
Aspect Oriented ProgrammingAspect Oriented Programming
Aspect Oriented Programming
 
SE2018_Lec 17_ Coding
SE2018_Lec 17_ CodingSE2018_Lec 17_ Coding
SE2018_Lec 17_ Coding
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
Requirements Analysis and Management using Innoslate
Requirements Analysis and Management using InnoslateRequirements Analysis and Management using Innoslate
Requirements Analysis and Management using Innoslate
 
Requirement Management.ppt
Requirement Management.pptRequirement Management.ppt
Requirement Management.ppt
 
Software engineering lecture notes
Software engineering   lecture notesSoftware engineering   lecture notes
Software engineering lecture notes
 
SE2_Lec 18_ Coding
SE2_Lec 18_ CodingSE2_Lec 18_ Coding
SE2_Lec 18_ Coding
 
A study of code change patterns for adaptive maintenance with AST analysis
A study of code change patterns for  adaptive maintenance with AST analysis A study of code change patterns for  adaptive maintenance with AST analysis
A study of code change patterns for adaptive maintenance with AST analysis
 
15 implementing architectures
15 implementing architectures15 implementing architectures
15 implementing architectures
 
Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...
Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...
Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...
 

Mehr von Sung Kim

Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote Sung Kim
 
Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Sung Kim
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Sung Kim
 
Software Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSoftware Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSung Kim
 
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)Sung Kim
 
Self-defending software: Automatically patching errors in deployed software ...
Self-defending software: Automatically patching  errors in deployed software ...Self-defending software: Automatically patching  errors in deployed software ...
Self-defending software: Automatically patching errors in deployed software ...Sung Kim
 
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)Sung Kim
 

Mehr von Sung Kim (9)

Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote
 
Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
 
Software Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSoftware Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of Crowds
 
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
 
Self-defending software: Automatically patching errors in deployed software ...
Self-defending software: Automatically patching  errors in deployed software ...Self-defending software: Automatically patching  errors in deployed software ...
Self-defending software: Automatically patching errors in deployed software ...
 
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
 

Kürzlich hochgeladen

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 

Kürzlich hochgeladen (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 

Source code comprehension on evolving software

  • 1. Source Code Comprehension on Evolving Software: A Literature Survey Yida Tao Supervisor: Sunghun Kim 1
  • 2. Motivation Code Change Comprehension Tao et al., FSE’12 Code change comprehension is • Frequently required • In major development activities, in particular the code-review process • How do software engineers understand code changes? An exploratory study in industry. Tao et al., FSE’12 • Expectations, outcomes, and challenges of modern code review. Bacchelli and Bird, ICSE’13 Bacchelli & Bird, ICSE’13 • “…review and understand code they have not seen before may be more common that a developer working on new code” • “From interviews, no other code review challenge emerged as clearly as understanding the submitted change” 2
  • 3. Outline Program Differencing Describing code changes Code Change Summarization Explaining code changes Querying and Filtering Customization Code Change Comprehension 3
  • 4. Program Differencing 4  Text Differencing  Syntactic Differencing  Semantic Differencing
  • 5. Text Differencing  Flat representation of a program  Sequence of strings  Unix diff  Only output added/deleted lines, can not detect modified lines  Hard to determine when a code fragment is moved upward or downward  Ldiff (Canfora et al., ICSE’09)  An enhanced line differencing tool  Limitations  Changes to *characters*  No syntactic-structure information 5
  • 6. Syntactic Differencing  Structured representation of a program  Abstract syntax tree; XML  ChangeDistiller (Fluri et al., TSE’07)  Tree differencing  Node: bigram string similarity  Control structure: subtree similarity  Output: tree edit script (insert, delete, move, update)  XML differecing  srcXML (Maletic & Collard, ICSM’04): embeds abstract syntax and structure within the source code  diffX (Al-Ekram et al., CASCON '05)  Limitation  Cannot describe how the behavior of a program is changed  Still report differences for behavior-preserving changes 6
  • 7. Semantic Differencing  Semantic diff (Jackson and Ladd, ICSM’94)  Method-level  Variable dependencies comparison 7 ==
  • 8. Semantic Differencing (cont.)  JDiff (Apiwattanapong et al. ASE’04, 06)  Extended control-flow graph (ECFG)  Dynamic binding, class hierarchy, exception handling, etc. 8
  • 9. Semantic Differencing (cont.)  Differential symbolic execution (Person et al., FSE’08)  “Executing” a program using symbolic values 9
  • 10. Outline Program Differencing Text Differencing Syntactic differencing Semantic differencing Code Change Comprehension Code Change Summarization Explaining code changes Querying and Filtering Customization 10
  • 11. Code Change Summarization  LSdiff (Kim and Notkin, ICSE’09)  Group related changes  Detect potential inconsistencies in a code change 11
  • 12. Code Change Summarization (cont.)  DeltaDoc (Buse and Weimer, ASE’10)  Symbolic execution: obtain path predicates for each statement in both versions  Identify statements that are added, deleted, or have a changed predicates  Summarization 12
  • 13. Code Change Summarization (cont.)  Multi-document summarization (Rastkar and Murphy, ICSE’13)  Linking evolutionary documents (commit log, issue tracking entries)  Finding the most informative sentences to extract to form a summary  Similarity between a sentence and the title of the enclosing document  Overlap between a sentence and the adjacent document 13
  • 14. Code Change Summarization (cont.)  Challenges  Evolutionary documents  Linkage might not be found (Bachman et al., FSE’10, Wu et al., FSE’11)  Human-written document may be unavailable or uninformative (Buse and Weimer, ASE’10, Tao et al., FSE’12)  Automatically generated document  Verbosity  Uninteresting changes are identified, e.g., “all types that declared toString() added constructors” (Kim and Notkin, ICSE’09) 14 LSdiff DeltaDoc
  • 15. Outline Program Differencing Text Differencing Syntactic differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Code Change Comprehension Querying and Filtering Customization 15
  • 16. Querying and Filtering  Specifying and detecting meaningful changes (Yu et al., ASE’11)  Normalize the program (user-specified) before differencing  Non-trivial to construct the query 16
  • 17. Querying and Filtering (cont.)  Filtering non-essential changes (Kawrykow and Robillard, ICSE’11)  Non-essential changes: rename-induced modifications, local variable extraction, trivial keyword modification, whitespace and documentation updates  ChangeDistiller (Fluri et al., TSE’07) + Partial program analysis (Dagenais and Robillard, ICSE’08)  Goal: improving mining and recommendation accuracy instead of developers’ comprehension 17
  • 18. Outline Program Differencing Text Differencing Syntactic differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Code Change Comprehension 18
  • 19. Research Directions Program Differencing Text Differencing Syntactic differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Source Code Changes Work-item-based changes? 19
  • 20. Work-item-based Changes  Multiple work-items in a single code change (e.g., a bug fix + code cleanup + a new feature)  Very difficult to understand (Tao et al., FSE’12) 20 JFreeChart revision 1083 Trivial keyword removal Bug fix Formatting
  • 21. Work-item-based Change Detection  Multiple work-items in a single code change (e.g., a bug fix + code cleanup + a new feature)  Very difficult to understand (Tao et al., FSE’12)  Change decomposition  Program slicing (entity dependencies)  Pattern matching (similarities)  A single work-item spreads across multiple code changes (e.g., 5 changes to finally fix a bug completely)  Change aggregation  Linkage to the same issue  Heuristics like time duration, commit authors, program dependencies, etc. 21
  • 22. Research Directions Program Differencing Text Differencing Syntax differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Code Change Comprehension Work-item change detection Change decomposition Change aggregation 22
  • 23. Research Directions Program Differencing Text Differencing Syntax differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Work-item-specific changes Code Change Comprehension Work-item change detection Change decomposition Change aggregation 23
  • 24. Research Directions Program Differencing Text Differencing Syntax differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Work-item-specific changes Code Change Comprehension Concrete Execution Work-item change detection Change decomposition Change aggregation 24
  • 25. Explaining code changes with executions of co- changed test cases 25  Test cases  Best documentation for source code  Test cases co-changed with source code  Documentation for code changes?  Mostly synchronous co-evolution of production and test code (Zaidman et al., Empirical Software Engineering’11)  Differential test executions  Co-changed test cases T  Executing T on the old version P and new version P’  Comparing executions to explained change behaviors From StackExchange http://programmers.stackexchange.com/questions/154439/quality-of-code-in- unit-tests?newsletter=1&nlcode=67628%7c1a35 • “Unit tests are one of the best sources of documentation for your system, and arguably the most reliable form” • “Unit tests are often the first thing you look at when trying to grasp what some piece of code does” • “They can also serve as a starting point for people new to the code base”
  • 26. Research Directions Program Differencing Text Differencing Syntax differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Work-item-specific changes Code Change Comprehension Concrete Execution • Co-changed test cases • Differential test execution Work-item change detection Change decomposition Change aggregation 26

Hinweis der Redaktion

  1. We know that software is continuously evolving since developers practically change source code all the time. One of the consequences is, developers also have to understand these code changes, which I refer to as CCC through this talk. Last year, we conducted an exploratory study in MS, where we sent surveys and conducted interviews with MS developers for their practices on CCC. This work is published in FSE. In this work, we found first, CCC is frequently required. The majority of developers understand code changes several times each day In this year’s ICSE, B in their empirical study on modern code review, they also expressed the similar findings that CCC is more common than understanding the entire program, but CCC is also the most challenging part. These motivate our work since CCC is a challenging activity but it’s also fundamental to developers’ daily practices.
  2. So in the literature survey, I identify 3 major categories related to CCC. First is program differencing. This line of work try to help developers by describing code changes Second is …. Studies in this category take one step further to try to reasoning and explain code changes Third is. This is sort of “customized” CCC.
  3. Unix diff is the most well-known example in this category. But it’s also well-recognized for two major limitations. Ldiff: diff: Longest common subsequence All possible hunk pairs -> similarity (vector space cosine similarity) -> pick the topmost pairs Line matching -> Levenhstein edit distance -> above threshold is marked as changed Unmatched lines are new hunks -> iterate step 2 Since these techniques treat program as normal text, they report program difference as changes to characters. But from a developer’s point of view, the syntactic, or structure information about the source code is lost. This motivates another line of work, which we call “syntax differencing”
  4. This line of work uses structured representation of a program. Changedistiller, which represents a program as an abstract syntax tree and applies tree differencing algorithm. In addition to AST, studies also represent code in XML, which can also embed …Then we can apply XML differencing algorithms, like diffX proposed in, to compute program differences. In cases when developers perform behavior-preserving modifications such as switch the order of if-else, it will still report the differences although from developer’s perspective, they might not think it is an important change.
  5. Therefore, the next line of work focuses on semantic differencing of two program versions. Semantic diff operates on method level, and compares variable dependencies to derive behavioral changes. In the old version of method add, if x not equal to HI, add it to TOT, otherwise, add DEF to total. From this code, we can derive a list of dependencies, for example, … In the new version, developers simply want to switch the order of if-else but mistakenly uses assignment instead of equals. Therefore, when the technique computes variable dependencies and compare it to previous ones, it will report that.. These behavioral differences are certainly not expected because when x is assigned to HI, the initial value of x is always lost. In such cases, semantic diff is certainly better than syntactic diff since it can raise developers’ attention on program’s unexpected behavioral change.
  6. Another work, Jdiff, which is published in, is about semantic differencing for oo program. Simply applying syntactic differencing, we’ll only know that m1 is added, and . But developers may be more interested in how the behavior of program is changed. if the dynamic type of a is B, the call a.m1 in new version actually invokes m1 in B. The exception thrown will be caught by different catch blocks after the change. Jdiff extends CFG to combine…ECFG considers dynamic binding and exception handling for the previous example, and graph differencing algorithm can be applied to reveal the difference.
  7. Some studies also use symbolic execution to characterize programs’ behavior. This technique…instead of actual values. For example, a symbolic execution for this code fragment is like, if this condition is satisfied, return; otherwise, if…, return… XXX proposed differential symbolic execution that compares the SE of two program versions. The output is like this. Under which condition, two different versions produces different results.
  8. Now I’ve covered 3 categories in program differencing. These work basically try to help CCC by describing what the code change is. The next line of work, which I call “CCS”, takes a further step to try to explain code changes.
  9. Program is presented as a set of predicates that describe code elements, containment relationships, and structural dependencies, which are called “facts”. Then Lsdiff computes changed facts between two program versions. Inferring rules from the list of change facts Also inferring exceptions to the rules. Example: all Car’s subtypes’ start methods added calls to the Key.chk method except for the subtype Kia
  10. Finally, DeltaDoc uses some transformation heuristics to summarize these statements’ differences to human-readable documentation.
  11. The studies we’ve seen so far all extract information from source code itself. However, other software artifacts, such as commit log, can also be helpful for understanding code changes since from these artifacts, we might found useful natural language sentences related to the code changes. Motivated by this observation, …proposed… Each sentence has some features, for example. To locate the most informative or relevant sentences, they are ranked by their feature values. Here is an example of their output. For this change, its summary contains a list of relevant sentences extracted from its evolutionary documents.
  12. The major challenges of using evolutionary documents is first, linkage between these documents might not exist so we may not even be able to find documents relevant to a code change. This problem is known as the “missing link” and is studied recently. In addition, document may not… In such cases, we can not rely on them to extract informative change summaries. As for I introduced before, the biggest problem is verbosity. This is rules and exceptions generated by Lsdiff to describe a code change. This is the number of lines in the change documentation. Compared to human-written commit log, which is the black bar, documentation generated by DeltaDoc is still very long. Another challenge is, some uninterested changes can be identified automatically. For example, a rule reported by Lsdiff says…, which in the user study, participants complain that such a rule is not useful.
  13. Therefore, there are studies that customize CCC so that developers can query their interested changes and filtering out irrelevant changes.
  14. Non-essential changes include …, which is less likely to be of developers’ interest. They use ChangeDistiller to detect changes, and apply PPA to resolve type bindings for partial programs (i.e., code changes) However, the goal of this work is to…
  15. In general, studies in this category focuses on querying meaningful changes and filtering out non-essential changes.