SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Mining Source Code Descriptions
  from Developer Communications



Sebastiano    Jairo   Massimiliano   Andrian    Gerardo
Panichella   Aponte    Di Penta      Marcus    Canfora
Context: Software Project
                   describes


                                           Sequence
                                           diagram       Program
                               Documentation          Comprehension


Source Code                      Class
                                 diagram
                                                       Maintenance
                                                          Tasks
 Difficult
 understanding                  understanding




                 Developer
Context: Software Project
Coming back
to the reality...
                      describes


                                              Sequence
                                              diagram       Program
                                  Documentation          Comprehension


Source Code                         Class
                                    diagram
                                                          Maintenance
                                                             Tasks
 Difficult
 understanding                     understanding




                    Developer
..................................................
           When call the method IndexSplitter.split(File
 Idea      destDir, String[] segs) from the Lucene cotrib
           directory(contrib/misc/src/java/org/apache/luc
           ene/index) it creates an index with segments
We argue that messages exchanged wrong data. Namely wrong
           descriptor file with among contributors/developers
are a useful source of information to help understanding of segment
           is the number representing the name source code.
           that would be created next in this index.
            ..................................................
In such situations developers need to infer knowledge from,
              CLASS: IndexSplitter           METHOD: split
                                 source code descriptions in external
    the source code itself
                                 artifacts.




               Developer
A Five Step-Approach for Mining
      Method Descriptions




Developer
Step 1: Downloading emails/bugs reports and tracing them
  onto classes
                              The discussion contains a fully-qualified class name (e.g.,
                              org.apache.lucene.analysis.MappingCharFilter); or the email contains a
      Two heuristics          file name (e.g., MappingCharFilter.java)

                             For bug reports, we complement the above heuristic by matching the
                             bug ID of each closed bug to the commit notes, therefore tracing the bug
                             report to the files changed in that commit
 Developer
 Discussion
When call the method IndexSplitter .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

 CLASS: IndexSplitter
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }



If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
Step 2: Extracting paragraphs
                              We consider as paragraphs, text section separated by one or more
                              white lines
    Two heuristics

                             We prune out paragraph description from source code fragments
                             and/or stack Traces "by using an approach inspired by the work of
                             Bacchelli et al.
Developer
Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the     PAR
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
                                                                                   1
representing the name of segment that would be created next in this index.



public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
                                                                                       PAR
FSDirectory destFSDir = FSDirectory.open(destDir);                                     2
SegmentInfos destInfos = new SegmentInfos }



If some of the segments of the index already has this name this results either to
                                                                                       PAR
impossibility to create new segment or in crating of an corrupted segment.             3
Step 3: Tracing paragraphs onto methods
                               A) A valid paragraph must contain the keyword “method”
  These paragraphs must
  respect the following
                               B) and the method name must be followed by a open parenthesis—
  two conditions:
                                 i.e., we match “foo(”



 Developer
 Discussion            A)                       B)
When call the method IndexSplitter.split(File destDir, String[] segs)
                                                                   PAR
from the Lucene cotrib directory it creates an index with segments1
descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this
index.


CLASS: IndexSplitter
METHOD: split(
......................................................................................
......................................................................................
......................................................................................
......................................................................................
Step 4: Heuristic based Filtering

    We defined a set of heuristics to further filter the paragraphs associated with
    methods that assign each paragraph a score:



..........................             a) Method parameters:
Problem seems to come from
                                                                               1 if the
                                                   % of parameters            method
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher           s1= mentioned in the           does not
The Method                                         paragraphs. Value            have
searchMainMethods(IProgressMonitor,                between 0 and 1           parameters
 IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types sub-
types as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE
Step 4: Heuristic based Filtering

    We defined a set of heuristics to further filter the paragraphs associated with
    methods that assign each paragraph a score:



..........................             a) Method parameters:
Problem seems to come from
                                                                                       1 if the
                                                   % of parameters                    method
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher           s1= mentioned in the                   does not
The Method                                         paragraphs. Value                    have
searchMainMethods(IProgressMonitor,                between 0 and 1                   parameters
 IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,            b) Syntactic descriptions (mentioning return values):
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is                  check whether the             Equal to one if
ON. This method add all types sub-                  paragraph contains the        the method is
types as soon as the given scope               s2= keyword “return”. If YES
encloses them without testing if
                                                                                      void.
sub-types have a main! After return                 Value equal 1, 0 otherwise
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE =    1+
Step 4: Heuristic based Filtering

    We defined a set of heuristics to further filter the paragraphs associated with
    methods that assign each paragraph a score:



..........................             a) Method parameters:
Problem seems to come from
                                                                                       1 if the
                                                   % of parameters                    method
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher           s1= mentioned in the                   does not
The Method                                         paragraphs. Value                    have
searchMainMethods(IProgressMonitor,                between 0 and 1                   parameters
 IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,            b) Syntactic descriptions (mentioning return values):
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is                  check whether the             Equal to one if
ON. This method add all types sub-                  paragraph contains the        the method is
types as soon as the given scope               s2= keyword “return”. If YES
encloses them without testing if
                                                                                      void.
sub-types have a main! After return                 Value equal 1, 0 otherwise
IType[] before the excecution          c) Overriding/Overloading:
..........................                         1 if any of the “overload” or
CLASS: MainMethodSearchEngine                  s3=“override” keywords appears
METHOD: serachMainMethods                           in the paragraph, 0 otherwise
% parameter = 100% -> s1= 1            d) Method invocations:
SCORE =    1+ 0+ 1 = 2                              1 if any of the “call” or
                                                s4=“excecute” keywords appears
                                                   in the paragraph, 0 otherwise
Step 4: Heuristic based Filtering

    We defined a set of heuristics to further filter the paragraphs associated with
    methods that assign each paragraph a score:



                                    a) Method parameters:
We selected paragraphs that have:               % of parameters
                                                                                    1 if the
                                                                                   method
                                            s1= mentioned in the                   does not
1. s1 ≥ thP = 0.5                               paragraphs. Value                    have
                                                between 0 and 1                   parameters
2. s2 + s3 + s4 ≥ thH = 1           b) Syntactic descriptions (mentioning return values):
                                                 check whether the             Equal to one if
                                                 paragraph contains the        the method is
                                            s2= keyword “return”. If YES           void.
                                                 Value equal 1, 0 otherwise
                    OK              c) Overriding/Overloading:
                                                1 if any of the “overload” or
                                            s3=“override” keywords appears
                                                 in the paragraph, 0 otherwise
% parameter = 100% -> s1= 1 ≥ 0.5   d) Method invocations:
SCORE =    1+ 0+ 1 = 2 ≥ 1                       1 if any of the “call” or
                                             s4=“excecute” keywords appears
                                                in the paragraph, 0 otherwise
Step 5: Similarity based Filtering

 We rank filtered paragraphs through their textual similarity with the method they
    are likely describing.

       METHOD      PARAGRAPH      SCORE     Similarity         Removing:
                                                               - English stop words;
       Method_3     Paragraph_4     2.5      96.1%
                                                               - Programming
       Method_1     Paragraph_1     2.5      95.6%               language keywords
       Method_2     Paragraph_2     1.5      97.4%             Using:
       Method_3     Paragraph_3     1.5      86.2%             - Camel Case
                                                                 splitting the on
       Method_1     Paragraph_3     1.5      79.0%
                                                                 remaining words
       Method_3     Paragraph_2     1.5      77.5%
                                                               - Vector Space Model
       Method_2     Paragraph_4     1.5      64.3%
       Method_2     Paragraph_3     1.3      83.2%
       Method_3     Paragraph_1     1.3      73.9%              th>=0.80
       Method_2     Paragraph_1     1.3      68.7%
       Method_1     Paragraph_4     1.3      53.6%
Empirical Study
• Goal: analyze source code descriptions in developer
  discussions

• Purpose: investigating how developer discussions
  describe methods of Java Source Code

• Quality focus: find good method description in
  messages exchanged among contributors/developers

• Context: Bug-report and mailing lists of two Java
  Project
    Apache Lucene and Eclipse
Context
Research Questions

 RQ1 (method coverage): How many methods from
 the analyzed software systems are described by the
 paragraphs identified by the proposed approach?

 RQ2 (precision): How precise is the proposed approach
  in identifying method descriptions?


 RQ3 (missing descriptions): How many potentially
  good method descriptions are missed by the approach?
RQ1: How many methods from the analyzed software
   systems are described by the paragraphs identified
   by the proposed approach?
RQ2: How precise is the proposed approach in identifying
     method descriptions?


      We sampled 250 descriptions from each project
RQ3: How many potentially good method descriptions are
missed by the approach?
     We sampled 100 descriptions from each project
                                 TABLE III
      The analysis of a sample of 100 paragraphs traced to methods,
                   but not satisfying the Step 4 heuristic



                System         True Negatives     False Negatives
                Eclipse              78                 22
            Apache Lucene            67                 33
Conclusion

Weitere ähnliche Inhalte

Was ist angesagt?

Java lab1 manual
Java lab1 manualJava lab1 manual
Java lab1 manualnahalomar
 
Applicative Logic Meta-Programming as the foundation for Template-based Progr...
Applicative Logic Meta-Programming as the foundation for Template-based Progr...Applicative Logic Meta-Programming as the foundation for Template-based Progr...
Applicative Logic Meta-Programming as the foundation for Template-based Progr...Coen De Roover
 
Multi-dimensional exploration of API usage - ICPC13 - 21-05-13
Multi-dimensional exploration of API usage - ICPC13 - 21-05-13Multi-dimensional exploration of API usage - ICPC13 - 21-05-13
Multi-dimensional exploration of API usage - ICPC13 - 21-05-13Coen De Roover
 
Cs2312 OOPS LAB MANUAL
Cs2312 OOPS LAB MANUALCs2312 OOPS LAB MANUAL
Cs2312 OOPS LAB MANUALPrabhu D
 
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org)  (usef...Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org)  (usef...
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...Make Mannan
 
Ap Power Point Chpt7
Ap Power Point Chpt7Ap Power Point Chpt7
Ap Power Point Chpt7dplunkett
 
A recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templatesA recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templatesCoen De Roover
 
Interfaces in JAVA !! why??
Interfaces in JAVA !!  why??Interfaces in JAVA !!  why??
Interfaces in JAVA !! why??vedprakashrai
 
Chapter 06 constructors and destructors
Chapter 06 constructors and destructorsChapter 06 constructors and destructors
Chapter 06 constructors and destructorsPraveen M Jigajinni
 

Was ist angesagt? (19)

Java lab1 manual
Java lab1 manualJava lab1 manual
Java lab1 manual
 
Applicative Logic Meta-Programming as the foundation for Template-based Progr...
Applicative Logic Meta-Programming as the foundation for Template-based Progr...Applicative Logic Meta-Programming as the foundation for Template-based Progr...
Applicative Logic Meta-Programming as the foundation for Template-based Progr...
 
My c++
My c++My c++
My c++
 
Multi-dimensional exploration of API usage - ICPC13 - 21-05-13
Multi-dimensional exploration of API usage - ICPC13 - 21-05-13Multi-dimensional exploration of API usage - ICPC13 - 21-05-13
Multi-dimensional exploration of API usage - ICPC13 - 21-05-13
 
J3d hibernate
J3d hibernateJ3d hibernate
J3d hibernate
 
Java lab-manual
Java lab-manualJava lab-manual
Java lab-manual
 
Icom4015 lecture10-f16
Icom4015 lecture10-f16Icom4015 lecture10-f16
Icom4015 lecture10-f16
 
CiIC4010-chapter-2-f17
CiIC4010-chapter-2-f17CiIC4010-chapter-2-f17
CiIC4010-chapter-2-f17
 
CS2309 JAVA LAB MANUAL
CS2309 JAVA LAB MANUALCS2309 JAVA LAB MANUAL
CS2309 JAVA LAB MANUAL
 
Unit 2 Java
Unit 2 JavaUnit 2 Java
Unit 2 Java
 
Cs2312 OOPS LAB MANUAL
Cs2312 OOPS LAB MANUALCs2312 OOPS LAB MANUAL
Cs2312 OOPS LAB MANUAL
 
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org)  (usef...Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org)  (usef...
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
 
Java Notes
Java Notes Java Notes
Java Notes
 
Ap Power Point Chpt7
Ap Power Point Chpt7Ap Power Point Chpt7
Ap Power Point Chpt7
 
A recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templatesA recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templates
 
Interfaces in JAVA !! why??
Interfaces in JAVA !!  why??Interfaces in JAVA !!  why??
Interfaces in JAVA !! why??
 
C# interview
C# interviewC# interview
C# interview
 
Chapter 06 constructors and destructors
Chapter 06 constructors and destructorsChapter 06 constructors and destructors
Chapter 06 constructors and destructors
 
Unit 8 Java
Unit 8 JavaUnit 8 Java
Unit 8 Java
 

Andere mochten auch

Vietnam promotion tours
Vietnam promotion toursVietnam promotion tours
Vietnam promotion toursDent Tran
 
Development Emails Content Analyzer: Intention Mining in Developer Discussions
Development Emails Content Analyzer: Intention Mining in Developer DiscussionsDevelopment Emails Content Analyzer: Intention Mining in Developer Discussions
Development Emails Content Analyzer: Intention Mining in Developer DiscussionsSebastiano Panichella
 
Archaeobacteria dan eubacteria
Archaeobacteria dan eubacteriaArchaeobacteria dan eubacteria
Archaeobacteria dan eubacteriaMariia Theresia
 
En contra del raton clothing
En contra del raton clothingEn contra del raton clothing
En contra del raton clothingHA MFL Department
 
Fse2012 - Who is going to Mentor Newcomers in Open Source Projects?
Fse2012 - Who is going to Mentor Newcomers in Open Source Projects?Fse2012 - Who is going to Mentor Newcomers in Open Source Projects?
Fse2012 - Who is going to Mentor Newcomers in Open Source Projects?Sebastiano Panichella
 
State of Financial Affairs
State of Financial Affairs State of Financial Affairs
State of Financial Affairs aiesechyderabad
 
Easy Bites: Lean Start-Up in Developing Countries
Easy Bites: Lean Start-Up in Developing CountriesEasy Bites: Lean Start-Up in Developing Countries
Easy Bites: Lean Start-Up in Developing CountriesOliver Koch
 
Technical glossary
Technical glossaryTechnical glossary
Technical glossaryhalo4robo
 
Com.epost.psf.smi
Com.epost.psf.smiCom.epost.psf.smi
Com.epost.psf.smismartepost
 
Peralatan teknologi informasi dan komunikasi
Peralatan teknologi informasi dan komunikasiPeralatan teknologi informasi dan komunikasi
Peralatan teknologi informasi dan komunikasilabiebm
 
Social Network: guida ad un utilizzo consapevole
Social Network: guida ad un utilizzo consapevoleSocial Network: guida ad un utilizzo consapevole
Social Network: guida ad un utilizzo consapevoleAlessandro Grechi
 
The aviators '13 april lc day review
The aviators '13 april lc day reviewThe aviators '13 april lc day review
The aviators '13 april lc day reviewaiesechyderabad
 

Andere mochten auch (20)

Vietnam promotion tours
Vietnam promotion toursVietnam promotion tours
Vietnam promotion tours
 
Planning
PlanningPlanning
Planning
 
A mapa historia
A mapa historiaA mapa historia
A mapa historia
 
Xd review main
Xd review mainXd review main
Xd review main
 
Development Emails Content Analyzer: Intention Mining in Developer Discussions
Development Emails Content Analyzer: Intention Mining in Developer DiscussionsDevelopment Emails Content Analyzer: Intention Mining in Developer Discussions
Development Emails Content Analyzer: Intention Mining in Developer Discussions
 
Tutorial 5
Tutorial 5Tutorial 5
Tutorial 5
 
Archaeobacteria dan eubacteria
Archaeobacteria dan eubacteriaArchaeobacteria dan eubacteria
Archaeobacteria dan eubacteria
 
Repaso - Revision
Repaso - RevisionRepaso - Revision
Repaso - Revision
 
En contra del raton clothing
En contra del raton clothingEn contra del raton clothing
En contra del raton clothing
 
Fse2012 - Who is going to Mentor Newcomers in Open Source Projects?
Fse2012 - Who is going to Mentor Newcomers in Open Source Projects?Fse2012 - Who is going to Mentor Newcomers in Open Source Projects?
Fse2012 - Who is going to Mentor Newcomers in Open Source Projects?
 
State of Financial Affairs
State of Financial Affairs State of Financial Affairs
State of Financial Affairs
 
Think Laterally | LEAD
Think Laterally | LEADThink Laterally | LEAD
Think Laterally | LEAD
 
Easy Bites: Lean Start-Up in Developing Countries
Easy Bites: Lean Start-Up in Developing CountriesEasy Bites: Lean Start-Up in Developing Countries
Easy Bites: Lean Start-Up in Developing Countries
 
Testing guide
Testing guideTesting guide
Testing guide
 
Technical glossary
Technical glossaryTechnical glossary
Technical glossary
 
Com.epost.psf.smi
Com.epost.psf.smiCom.epost.psf.smi
Com.epost.psf.smi
 
Gcdp lc day
Gcdp lc dayGcdp lc day
Gcdp lc day
 
Peralatan teknologi informasi dan komunikasi
Peralatan teknologi informasi dan komunikasiPeralatan teknologi informasi dan komunikasi
Peralatan teknologi informasi dan komunikasi
 
Social Network: guida ad un utilizzo consapevole
Social Network: guida ad un utilizzo consapevoleSocial Network: guida ad un utilizzo consapevole
Social Network: guida ad un utilizzo consapevole
 
The aviators '13 april lc day review
The aviators '13 april lc day reviewThe aviators '13 april lc day review
The aviators '13 april lc day review
 

Ähnlich wie ICPC 2012 - Mining Source Code Descriptions

Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNicole Gomez
 
Object oriented basics
Object oriented basicsObject oriented basics
Object oriented basicsvamshimahi
 
Abap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorialsAbap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorialscesarmendez78
 
SE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUSSE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUSnikshaikh786
 
V_Sound(Visualize Softwar and Understand the Document)
V_Sound(Visualize Softwar  and Understand the Document)V_Sound(Visualize Softwar  and Understand the Document)
V_Sound(Visualize Softwar and Understand the Document)Imdad Ul Haq
 
Design patterns in Magento
Design patterns in MagentoDesign patterns in Magento
Design patterns in MagentoDivante
 
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...IJCI JOURNAL
 
Annotation Processing in Android
Annotation Processing in AndroidAnnotation Processing in Android
Annotation Processing in Androidemanuelez
 
Object Oriented Programming with C#
Object Oriented Programming with C#Object Oriented Programming with C#
Object Oriented Programming with C#SyedUmairAli9
 
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...acijjournal
 
What's New In Python 2.4
What's New In Python 2.4What's New In Python 2.4
What's New In Python 2.4Richard Jones
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionSiddharth Shrivastava
 
Java căn bản - Chapter7
Java căn bản - Chapter7Java căn bản - Chapter7
Java căn bản - Chapter7Vince Vo
 
A brief overview of java frameworks
A brief overview of java frameworksA brief overview of java frameworks
A brief overview of java frameworksMD Sayem Ahmed
 
Configuring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank ScholtenConfiguring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank Scholtenlucenerevolution
 
Application package
Application packageApplication package
Application packageJAYAARC
 
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISijseajournal
 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesNikos Katirtzis
 
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...acijjournal
 

Ähnlich wie ICPC 2012 - Mining Source Code Descriptions (20)

Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language Analysis
 
Object oriented basics
Object oriented basicsObject oriented basics
Object oriented basics
 
Abap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorialsAbap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorials
 
SE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUSSE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUS
 
V_Sound(Visualize Softwar and Understand the Document)
V_Sound(Visualize Softwar  and Understand the Document)V_Sound(Visualize Softwar  and Understand the Document)
V_Sound(Visualize Softwar and Understand the Document)
 
Design patterns in Magento
Design patterns in MagentoDesign patterns in Magento
Design patterns in Magento
 
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
 
Annotation Processing in Android
Annotation Processing in AndroidAnnotation Processing in Android
Annotation Processing in Android
 
Object Oriented Programming with C#
Object Oriented Programming with C#Object Oriented Programming with C#
Object Oriented Programming with C#
 
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
 
What's New In Python 2.4
What's New In Python 2.4What's New In Python 2.4
What's New In Python 2.4
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
Java căn bản - Chapter7
Java căn bản - Chapter7Java căn bản - Chapter7
Java căn bản - Chapter7
 
A brief overview of java frameworks
A brief overview of java frameworksA brief overview of java frameworks
A brief overview of java frameworks
 
Configuring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank ScholtenConfiguring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank Scholten
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
Application package
Application packageApplication package
Application package
 
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering Techniques
 
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
 

Mehr von Sebastiano Panichella

The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...Sebastiano Panichella
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation TrackSBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation TrackSebastiano Panichella
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...Sebastiano Panichella
 
COSMOS: DevOps for Complex Cyber-physical Systems
COSMOS: DevOps for Complex Cyber-physical SystemsCOSMOS: DevOps for Complex Cyber-physical Systems
COSMOS: DevOps for Complex Cyber-physical SystemsSebastiano Panichella
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Sebastiano Panichella
 
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...Sebastiano Panichella
 
Automated Identification and Qualitative Characterization of Safety Concerns ...
Automated Identification and Qualitative Characterization of Safety Concerns ...Automated Identification and Qualitative Characterization of Safety Concerns ...
Automated Identification and Qualitative Characterization of Safety Concerns ...Sebastiano Panichella
 
The 2nd Intl. Workshop on NL-based Software Engineering
The 2nd Intl. Workshop on NL-based Software EngineeringThe 2nd Intl. Workshop on NL-based Software Engineering
The 2nd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
The 16th Intl. Workshop on Search-Based and Fuzz Testing
The 16th Intl. Workshop on Search-Based and Fuzz TestingThe 16th Intl. Workshop on Search-Based and Fuzz Testing
The 16th Intl. Workshop on Search-Based and Fuzz TestingSebastiano Panichella
 
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...Sebastiano Panichella
 
Exposed! A case study on the vulnerability-proneness of Google Play Apps
Exposed! A case study on the vulnerability-proneness of Google Play AppsExposed! A case study on the vulnerability-proneness of Google Play Apps
Exposed! A case study on the vulnerability-proneness of Google Play AppsSebastiano Panichella
 
Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22Sebastiano Panichella
 
NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22Sebastiano Panichella
 
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
 "An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.  "An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021. Sebastiano Panichella
 
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...Sebastiano Panichella
 
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...Sebastiano Panichella
 

Mehr von Sebastiano Panichella (20)

The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation TrackSBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
 
COSMOS: DevOps for Complex Cyber-physical Systems
COSMOS: DevOps for Complex Cyber-physical SystemsCOSMOS: DevOps for Complex Cyber-physical Systems
COSMOS: DevOps for Complex Cyber-physical Systems
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
 
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...
 
Automated Identification and Qualitative Characterization of Safety Concerns ...
Automated Identification and Qualitative Characterization of Safety Concerns ...Automated Identification and Qualitative Characterization of Safety Concerns ...
Automated Identification and Qualitative Characterization of Safety Concerns ...
 
The 2nd Intl. Workshop on NL-based Software Engineering
The 2nd Intl. Workshop on NL-based Software EngineeringThe 2nd Intl. Workshop on NL-based Software Engineering
The 2nd Intl. Workshop on NL-based Software Engineering
 
The 16th Intl. Workshop on Search-Based and Fuzz Testing
The 16th Intl. Workshop on Search-Based and Fuzz TestingThe 16th Intl. Workshop on Search-Based and Fuzz Testing
The 16th Intl. Workshop on Search-Based and Fuzz Testing
 
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
 
Exposed! A case study on the vulnerability-proneness of Google Play Apps
Exposed! A case study on the vulnerability-proneness of Google Play AppsExposed! A case study on the vulnerability-proneness of Google Play Apps
Exposed! A case study on the vulnerability-proneness of Google Play Apps
 
Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22
 
NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22
 
NLBSE’22: Tool Competition
NLBSE’22: Tool CompetitionNLBSE’22: Tool Competition
NLBSE’22: Tool Competition
 
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
 "An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.  "An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
 
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
 
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
 

ICPC 2012 - Mining Source Code Descriptions

  • 1. Mining Source Code Descriptions from Developer Communications Sebastiano Jairo Massimiliano Andrian Gerardo Panichella Aponte Di Penta Marcus Canfora
  • 2. Context: Software Project describes Sequence diagram Program Documentation Comprehension Source Code Class diagram Maintenance Tasks Difficult understanding understanding Developer
  • 3. Context: Software Project Coming back to the reality... describes Sequence diagram Program Documentation Comprehension Source Code Class diagram Maintenance Tasks Difficult understanding understanding Developer
  • 4. .................................................. When call the method IndexSplitter.split(File Idea destDir, String[] segs) from the Lucene cotrib directory(contrib/misc/src/java/org/apache/luc ene/index) it creates an index with segments We argue that messages exchanged wrong data. Namely wrong descriptor file with among contributors/developers are a useful source of information to help understanding of segment is the number representing the name source code. that would be created next in this index. .................................................. In such situations developers need to infer knowledge from, CLASS: IndexSplitter METHOD: split source code descriptions in external the source code itself artifacts. Developer
  • 5. A Five Step-Approach for Mining Method Descriptions Developer
  • 6. Step 1: Downloading emails/bugs reports and tracing them onto classes The discussion contains a fully-qualified class name (e.g., org.apache.lucene.analysis.MappingCharFilter); or the email contains a Two heuristics file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit Developer Discussion When call the method IndexSplitter .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. CLASS: IndexSplitter public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.
  • 7. Step 2: Extracting paragraphs We consider as paragraphs, text section separated by one or more white lines Two heuristics We prune out paragraph description from source code fragments and/or stack Traces "by using an approach inspired by the work of Bacchelli et al. Developer Discussion When call the method IndexSplitter.split(File destDir, String[] segs) from the PAR Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number 1 representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); PAR FSDirectory destFSDir = FSDirectory.open(destDir); 2 SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to PAR impossibility to create new segment or in crating of an corrupted segment. 3
  • 8. Step 3: Tracing paragraphs onto methods A) A valid paragraph must contain the keyword “method” These paragraphs must respect the following B) and the method name must be followed by a open parenthesis— two conditions: i.e., we match “foo(” Developer Discussion A) B) When call the method IndexSplitter.split(File destDir, String[] segs) PAR from the Lucene cotrib directory it creates an index with segments1 descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. CLASS: IndexSplitter METHOD: split( ...................................................................................... ...................................................................................... ...................................................................................... ......................................................................................
  • 9. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... a) Method parameters: Problem seems to come from 1 if the % of parameters method MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher s1= mentioned in the does not The Method paragraphs. Value have searchMainMethods(IProgressMonitor, between 0 and 1 parameters IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types sub- types as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE
  • 10. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... a) Method parameters: Problem seems to come from 1 if the % of parameters method MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher s1= mentioned in the does not The Method paragraphs. Value have searchMainMethods(IProgressMonitor, between 0 and 1 parameters IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, b) Syntactic descriptions (mentioning return values): IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is check whether the Equal to one if ON. This method add all types sub- paragraph contains the the method is types as soon as the given scope s2= keyword “return”. If YES encloses them without testing if void. sub-types have a main! After return Value equal 1, 0 otherwise IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE = 1+
  • 11. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... a) Method parameters: Problem seems to come from 1 if the % of parameters method MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher s1= mentioned in the does not The Method paragraphs. Value have searchMainMethods(IProgressMonitor, between 0 and 1 parameters IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, b) Syntactic descriptions (mentioning return values): IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is check whether the Equal to one if ON. This method add all types sub- paragraph contains the the method is types as soon as the given scope s2= keyword “return”. If YES encloses them without testing if void. sub-types have a main! After return Value equal 1, 0 otherwise IType[] before the excecution c) Overriding/Overloading: .......................... 1 if any of the “overload” or CLASS: MainMethodSearchEngine s3=“override” keywords appears METHOD: serachMainMethods in the paragraph, 0 otherwise % parameter = 100% -> s1= 1 d) Method invocations: SCORE = 1+ 0+ 1 = 2 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise
  • 12. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: a) Method parameters: We selected paragraphs that have: % of parameters 1 if the method s1= mentioned in the does not 1. s1 ≥ thP = 0.5 paragraphs. Value have between 0 and 1 parameters 2. s2 + s3 + s4 ≥ thH = 1 b) Syntactic descriptions (mentioning return values): check whether the Equal to one if paragraph contains the the method is s2= keyword “return”. If YES void. Value equal 1, 0 otherwise OK c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise % parameter = 100% -> s1= 1 ≥ 0.5 d) Method invocations: SCORE = 1+ 0+ 1 = 2 ≥ 1 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise
  • 13. Step 5: Similarity based Filtering We rank filtered paragraphs through their textual similarity with the method they are likely describing. METHOD PARAGRAPH SCORE Similarity Removing: - English stop words; Method_3 Paragraph_4 2.5 96.1% - Programming Method_1 Paragraph_1 2.5 95.6% language keywords Method_2 Paragraph_2 1.5 97.4% Using: Method_3 Paragraph_3 1.5 86.2% - Camel Case splitting the on Method_1 Paragraph_3 1.5 79.0% remaining words Method_3 Paragraph_2 1.5 77.5% - Vector Space Model Method_2 Paragraph_4 1.5 64.3% Method_2 Paragraph_3 1.3 83.2% Method_3 Paragraph_1 1.3 73.9% th>=0.80 Method_2 Paragraph_1 1.3 68.7% Method_1 Paragraph_4 1.3 53.6%
  • 14. Empirical Study • Goal: analyze source code descriptions in developer discussions • Purpose: investigating how developer discussions describe methods of Java Source Code • Quality focus: find good method description in messages exchanged among contributors/developers • Context: Bug-report and mailing lists of two Java Project  Apache Lucene and Eclipse
  • 16. Research Questions  RQ1 (method coverage): How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?  RQ2 (precision): How precise is the proposed approach in identifying method descriptions?  RQ3 (missing descriptions): How many potentially good method descriptions are missed by the approach?
  • 17. RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
  • 18. RQ2: How precise is the proposed approach in identifying method descriptions? We sampled 250 descriptions from each project
  • 19. RQ3: How many potentially good method descriptions are missed by the approach? We sampled 100 descriptions from each project TABLE III The analysis of a sample of 100 paragraphs traced to methods, but not satisfying the Step 4 heuristic System True Negatives False Negatives Eclipse 78 22 Apache Lucene 67 33