SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
Mining Java Class Naming Conventions

           Simon Butler, Michel Wermelinger, Yijun Yu & Helen Sharp

                                      Centre for Research in Computing
                                            The Open University


                                          27 September 2011




           Centre for
           Research in Computing                                 m.a.wermelinger@open.ac.uk


Butler et al. (The Open University)     Mining Java Class Naming Conventions   27 September 2011   1/7
Class Identifier Names

         Despite the importance of
         class identifier names                                AbstractCollection           Set
         knowledge of their structure
         is limited
          adjective ∗ noun +
         approximation found to be                                           AbstractSet
         useful, but not universal
         What other part-of-speech
         patterns are commonly used?
         How are component words
                                                             EnumSet          HashSet         TreeSet
         repeated? How often?
         Are there project-specific
         naming conventions?


Butler et al. (The Open University)   Mining Java Class Naming Conventions          27 September 2011   2/7
Distribution of Java Classes in Inheritance Categories



                                                                     0.7
                                                                     0.6
                  Proportion of inheritance categories per project

                                                                     0.5
                                                                     0.4
                                                                     0.3
                                                                     0.2
                                                                     0.1
                                                                     0.0




                                                                           E0I0       E0I1       E0In       E1I0         E1I1   E1In



Butler et al. (The Open University)                                               Mining Java Class Naming Conventions             27 September 2011   3/7
Part-of-Speech Patterns
                Relative frequency of most common PoS patterns
                                             noun +
                                adjective +              verb +
                     noun +           +      adjective +
                                noun                     noun +
                                             noun +
             E0 I 0             0.85                 0.08                     0.01        0.01
             E0 I 1             0.73                 0.15                     0.02        0.02
             E0 I n             0.75                 0.15                     0.03        0.01
             E1 I 0             0.68                 0.12                     0.04        0.03
             E1 I 1             0.70                 0.15                     0.04        0.02
             E1 I n             0.75                 0.14                     0.04        0.02

       4 basic patterns account for 90% of class identifier names
       85% of E0 I0 class identifier names are composed of nouns
       The adjective ∗ noun + approximation includes 85% of class
       identifier names
Butler et al. (The Open University)    Mining Java Class Naming Conventions          27 September 2011   4/7
Component Word Inheritance
              Relative frequency distribution of name inheritance
                               Super Class Name                Interface Name
           Category             All    Fragment                 All Fragment        Both
           E0 I1                  -                    -       0.39          0.37        -
           E0 In                  -                    -       0.38          0.40        -
           E1 I0               0.23                 0.58          -             -        -
           E1 I1               0.14                 0.53       0.24          0.21     0.27
           E1 In               0.11                 0.50       0.15          0.25     0.18


       Fragments of super class name most commonly repeated
       Most common patterns:
               E0 I1 & E0 I1 : noun + interface name , noun + interface fragment
               E1 I0 : noun + super class fragment , noun + super class name
               E1 I1 & E1 In : noun + super class fragment ,
                interface name super class fragment , noun + super class name
Butler et al. (The Open University)   Mining Java Class Naming Conventions     27 September 2011   5/7
Case Study - Freemind

       652 class identifier names
       53 (8%) with uncommon PoS patterns
       Each class inspected with questions:
          1. Is the class identifier name a clear description of the class?
          2. Can the class identifier name be refactored to a more common PoS
             pattern?
          3. Can the class be refactored into classes that could be more
             conventionally named?
       We found:
               Class identifier names describing GUI actions initiated by the user, e.g.
               SelectAllAction ( verb determiner noun )
               Class identifier names that conform to local naming conventions
               7 class identifier names were candidates for name refactoring
               1 class was a candidate for refactoring



Butler et al. (The Open University)   Mining Java Class Naming Conventions   27 September 2011   6/7
Conclusions


       Contributions
               Identification of common PoS structures found in praxis
               Identification of common patterns of component word repetition
               Unconventional class names:
                       may conform to local naming conventions
                       may be candidates for refactoring
                       may indicate smells

       Practical Applications
               Recovery of class naming conventions
               Identification of unconventionally named classes
               Class identifier name recommendation systems




Butler et al. (The Open University)   Mining Java Class Naming Conventions   27 September 2011   7/7

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
 
Components - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsComponents - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API Limitations
 
ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild
 
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
 
Faults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussionsFaults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussions
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to Task
 
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationImpact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
 
Postdoc Symposium - Abram Hindle
Postdoc Symposium - Abram HindlePostdoc Symposium - Abram Hindle
Postdoc Symposium - Abram Hindle
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...
 
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
 
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
 
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
 
Reliability and Quality - Predicting post-release defects using pre-release f...
Reliability and Quality - Predicting post-release defects using pre-release f...Reliability and Quality - Predicting post-release defects using pre-release f...
Reliability and Quality - Predicting post-release defects using pre-release f...
 
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
 
ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical Debt
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Natural Language Analysis - Mining Java Class Naming Conventions

  • 1. Mining Java Class Naming Conventions Simon Butler, Michel Wermelinger, Yijun Yu & Helen Sharp Centre for Research in Computing The Open University 27 September 2011 Centre for Research in Computing m.a.wermelinger@open.ac.uk Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 1/7
  • 2. Class Identifier Names Despite the importance of class identifier names AbstractCollection Set knowledge of their structure is limited adjective ∗ noun + approximation found to be AbstractSet useful, but not universal What other part-of-speech patterns are commonly used? How are component words EnumSet HashSet TreeSet repeated? How often? Are there project-specific naming conventions? Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 2/7
  • 3. Distribution of Java Classes in Inheritance Categories 0.7 0.6 Proportion of inheritance categories per project 0.5 0.4 0.3 0.2 0.1 0.0 E0I0 E0I1 E0In E1I0 E1I1 E1In Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 3/7
  • 4. Part-of-Speech Patterns Relative frequency of most common PoS patterns noun + adjective + verb + noun + + adjective + noun noun + noun + E0 I 0 0.85 0.08 0.01 0.01 E0 I 1 0.73 0.15 0.02 0.02 E0 I n 0.75 0.15 0.03 0.01 E1 I 0 0.68 0.12 0.04 0.03 E1 I 1 0.70 0.15 0.04 0.02 E1 I n 0.75 0.14 0.04 0.02 4 basic patterns account for 90% of class identifier names 85% of E0 I0 class identifier names are composed of nouns The adjective ∗ noun + approximation includes 85% of class identifier names Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 4/7
  • 5. Component Word Inheritance Relative frequency distribution of name inheritance Super Class Name Interface Name Category All Fragment All Fragment Both E0 I1 - - 0.39 0.37 - E0 In - - 0.38 0.40 - E1 I0 0.23 0.58 - - - E1 I1 0.14 0.53 0.24 0.21 0.27 E1 In 0.11 0.50 0.15 0.25 0.18 Fragments of super class name most commonly repeated Most common patterns: E0 I1 & E0 I1 : noun + interface name , noun + interface fragment E1 I0 : noun + super class fragment , noun + super class name E1 I1 & E1 In : noun + super class fragment , interface name super class fragment , noun + super class name Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 5/7
  • 6. Case Study - Freemind 652 class identifier names 53 (8%) with uncommon PoS patterns Each class inspected with questions: 1. Is the class identifier name a clear description of the class? 2. Can the class identifier name be refactored to a more common PoS pattern? 3. Can the class be refactored into classes that could be more conventionally named? We found: Class identifier names describing GUI actions initiated by the user, e.g. SelectAllAction ( verb determiner noun ) Class identifier names that conform to local naming conventions 7 class identifier names were candidates for name refactoring 1 class was a candidate for refactoring Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 6/7
  • 7. Conclusions Contributions Identification of common PoS structures found in praxis Identification of common patterns of component word repetition Unconventional class names: may conform to local naming conventions may be candidates for refactoring may indicate smells Practical Applications Recovery of class naming conventions Identification of unconventionally named classes Class identifier name recommendation systems Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 7/7