Very often, source code lacks comments that adequately describe its behavior. In such situations developers need to infer knowledge from the source code itself, or to search for source code descriptions in external artifacts. We argue that messages exchanged among contributors/developers, in the form of bug reports and emails, are a useful source of information to help understanding source code. However, such communications are unstructured and usually not explicitly meant to describe specific parts of the source code. Developers searching for code descriptions within communications
face the challenge of filtering large amount of data to extract what pieces of information are important to them. We propose an approach to automatically extract method descriptions from
communications in bug tracking systems and mailing lists. We have evaluated the approach on bug reports and mailing lists from two open source systems (Lucene and Eclipse). The
results indicate that mailing lists and bug reports contain relevant
descriptions of about 36% of the methods from Lucene and 7%
from Eclipse, and that the proposed approach is able to extract such descriptions with a precision up to 79% for Eclipse and 87% for Lucene. The extracted method descriptions can help
developers in understanding the code and could also be used as a starting point for source code re-documentation.
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
ICPC 2012 - Mining Source Code Descriptions
1. Mining Source Code Descriptions
from Developer Communications
Sebastiano Jairo Massimiliano Andrian Gerardo
Panichella Aponte Di Penta Marcus Canfora
2. Context: Software Project
describes
Sequence
diagram Program
Documentation Comprehension
Source Code Class
diagram
Maintenance
Tasks
Difficult
understanding understanding
Developer
3. Context: Software Project
Coming back
to the reality...
describes
Sequence
diagram Program
Documentation Comprehension
Source Code Class
diagram
Maintenance
Tasks
Difficult
understanding understanding
Developer
4. ..................................................
When call the method IndexSplitter.split(File
Idea destDir, String[] segs) from the Lucene cotrib
directory(contrib/misc/src/java/org/apache/luc
ene/index) it creates an index with segments
We argue that messages exchanged wrong data. Namely wrong
descriptor file with among contributors/developers
are a useful source of information to help understanding of segment
is the number representing the name source code.
that would be created next in this index.
..................................................
In such situations developers need to infer knowledge from,
CLASS: IndexSplitter METHOD: split
source code descriptions in external
the source code itself
artifacts.
Developer
6. Step 1: Downloading emails/bugs reports and tracing them
onto classes
The discussion contains a fully-qualified class name (e.g.,
org.apache.lucene.analysis.MappingCharFilter); or the email contains a
Two heuristics file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the
bug ID of each closed bug to the commit notes, therefore tracing the bug
report to the files changed in that commit
Developer
Discussion
When call the method IndexSplitter .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
CLASS: IndexSplitter
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
7. Step 2: Extracting paragraphs
We consider as paragraphs, text section separated by one or more
white lines
Two heuristics
We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.
Developer
Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the PAR
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
1
representing the name of segment that would be created next in this index.
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
PAR
FSDirectory destFSDir = FSDirectory.open(destDir); 2
SegmentInfos destInfos = new SegmentInfos }
If some of the segments of the index already has this name this results either to
PAR
impossibility to create new segment or in crating of an corrupted segment. 3
8. Step 3: Tracing paragraphs onto methods
A) A valid paragraph must contain the keyword “method”
These paragraphs must
respect the following
B) and the method name must be followed by a open parenthesis—
two conditions:
i.e., we match “foo(”
Developer
Discussion A) B)
When call the method IndexSplitter.split(File destDir, String[] segs)
PAR
from the Lucene cotrib directory it creates an index with segments1
descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this
index.
CLASS: IndexSplitter
METHOD: split(
......................................................................................
......................................................................................
......................................................................................
......................................................................................
9. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
.......................... a) Method parameters:
Problem seems to come from
1 if the
% of parameters method
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher s1= mentioned in the does not
The Method paragraphs. Value have
searchMainMethods(IProgressMonitor, between 0 and 1 parameters
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types sub-
types as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE
10. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
.......................... a) Method parameters:
Problem seems to come from
1 if the
% of parameters method
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher s1= mentioned in the does not
The Method paragraphs. Value have
searchMainMethods(IProgressMonitor, between 0 and 1 parameters
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List, b) Syntactic descriptions (mentioning return values):
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is check whether the Equal to one if
ON. This method add all types sub- paragraph contains the the method is
types as soon as the given scope s2= keyword “return”. If YES
encloses them without testing if
void.
sub-types have a main! After return Value equal 1, 0 otherwise
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE = 1+
11. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
.......................... a) Method parameters:
Problem seems to come from
1 if the
% of parameters method
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher s1= mentioned in the does not
The Method paragraphs. Value have
searchMainMethods(IProgressMonitor, between 0 and 1 parameters
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List, b) Syntactic descriptions (mentioning return values):
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is check whether the Equal to one if
ON. This method add all types sub- paragraph contains the the method is
types as soon as the given scope s2= keyword “return”. If YES
encloses them without testing if
void.
sub-types have a main! After return Value equal 1, 0 otherwise
IType[] before the excecution c) Overriding/Overloading:
.......................... 1 if any of the “overload” or
CLASS: MainMethodSearchEngine s3=“override” keywords appears
METHOD: serachMainMethods in the paragraph, 0 otherwise
% parameter = 100% -> s1= 1 d) Method invocations:
SCORE = 1+ 0+ 1 = 2 1 if any of the “call” or
s4=“excecute” keywords appears
in the paragraph, 0 otherwise
12. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
a) Method parameters:
We selected paragraphs that have: % of parameters
1 if the
method
s1= mentioned in the does not
1. s1 ≥ thP = 0.5 paragraphs. Value have
between 0 and 1 parameters
2. s2 + s3 + s4 ≥ thH = 1 b) Syntactic descriptions (mentioning return values):
check whether the Equal to one if
paragraph contains the the method is
s2= keyword “return”. If YES void.
Value equal 1, 0 otherwise
OK c) Overriding/Overloading:
1 if any of the “overload” or
s3=“override” keywords appears
in the paragraph, 0 otherwise
% parameter = 100% -> s1= 1 ≥ 0.5 d) Method invocations:
SCORE = 1+ 0+ 1 = 2 ≥ 1 1 if any of the “call” or
s4=“excecute” keywords appears
in the paragraph, 0 otherwise
13. Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they
are likely describing.
METHOD PARAGRAPH SCORE Similarity Removing:
- English stop words;
Method_3 Paragraph_4 2.5 96.1%
- Programming
Method_1 Paragraph_1 2.5 95.6% language keywords
Method_2 Paragraph_2 1.5 97.4% Using:
Method_3 Paragraph_3 1.5 86.2% - Camel Case
splitting the on
Method_1 Paragraph_3 1.5 79.0%
remaining words
Method_3 Paragraph_2 1.5 77.5%
- Vector Space Model
Method_2 Paragraph_4 1.5 64.3%
Method_2 Paragraph_3 1.3 83.2%
Method_3 Paragraph_1 1.3 73.9% th>=0.80
Method_2 Paragraph_1 1.3 68.7%
Method_1 Paragraph_4 1.3 53.6%
14. Empirical Study
• Goal: analyze source code descriptions in developer
discussions
• Purpose: investigating how developer discussions
describe methods of Java Source Code
• Quality focus: find good method description in
messages exchanged among contributors/developers
• Context: Bug-report and mailing lists of two Java
Project
Apache Lucene and Eclipse
16. Research Questions
RQ1 (method coverage): How many methods from
the analyzed software systems are described by the
paragraphs identified by the proposed approach?
RQ2 (precision): How precise is the proposed approach
in identifying method descriptions?
RQ3 (missing descriptions): How many potentially
good method descriptions are missed by the approach?
17. RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
18. RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
19. RQ3: How many potentially good method descriptions are
missed by the approach?
We sampled 100 descriptions from each project
TABLE III
The analysis of a sample of 100 paragraphs traced to methods,
but not satisfying the Step 4 heuristic
System True Negatives False Negatives
Eclipse 78 22
Apache Lucene 67 33