Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
User-Perceived Source Code Quality Estimation based on
Static Analysis Metrics
1
Michail Papamichail, Themistoklis Diamant...
2 Outline
 The concept of user-perceived quality.
 Research objectives.
 Key implementation points.
 The designed syst...
3 Why to evaluate code quality?
 Various open source software projects.
 Numerous online software repositories.
Source C...
4 User-Perceived source code quality
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
...
5 The idea
User-Perceived Quality Estimation
IEEE International Conference on Software Quality, Reliability & Security – Q...
6 Key implementation points
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
 Qualita...
7 Training dataset
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
Top 100
Repositori...
8 Selected repositories qualitative evaluation
IEEE International Conference on Software Quality, Reliability & Security –...
9
Target set formation
 Use of GitHub stars as ground truth.
But:
 GitHub stars per repository (NOT per file)
 Every so...
10
Target set formation
For the i-th file of the j-th repository, the target if formulated as follows:
𝐹𝑠𝑐𝑜𝑟𝑒 𝑖, 𝑗 = log
𝑅...
11 Quality Evaluation Models
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
ANNs Mod...
12 ANNs Model
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
 Two-layer feedforward...
13 SVMs One Class Classifier
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
 Used t...
14 System Evaluation
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
Results validati...
15 System Evaluation
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
Repositories sel...
16 Evaluation – One Class Classifier
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
...
17 Evaluation – ANNs Model
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
User-Perce...
18 Evaluation – Popularity Prediction
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016...
19 Conclusions and future work
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
Conclu...
20
Thank you!
IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
User-Perceived Source C...
Nächste SlideShare
Wird geladen in …5
×

User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

223 Aufrufe

Veröffentlicht am

User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

  1. 1. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics 1 Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki Intelligent Systems & Software Engineering Labgroup, Information Processing Laboratory Thessaloniki, Greece Email: {mpapamic, thdiaman}@issel.ee.auth.gr, asymeon@eng.auth.gr User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis IEEE International Conference on Software Quality, Reliability & Security – QRS 2016
  2. 2. 2 Outline  The concept of user-perceived quality.  Research objectives.  Key implementation points.  The designed system.  Evaluation.  Conclusion and Future work. IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  3. 3. 3 Why to evaluate code quality?  Various open source software projects.  Numerous online software repositories. Source Code Quality Evaluation IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 Code Reuse Is a software component suitable for reuse? User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  4. 4. 4 User-Perceived source code quality IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 Idea:  Use of software components popularity as a quality indicator. But:  Popularity cannot be used as a sole quality criterion. - Is based on current trends. - Depends on the programming language. Popularity Static Analysis Metrics Recommended Coding Practices + + Measure of quality User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  5. 5. 5 The idea User-Perceived Quality Estimation IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 Idea Tools Used Proposed System  Use of software components popularity as ground truth – GitHub number of stars  Use of static analysis metrics and violations of “good” coding practices  Apply machine learning techniques for estimating user-perceived source code quality Static Analysis Quality Evaluation Models Quality Score User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  6. 6. 6 Key implementation points IEEE International Conference on Software Quality, Reliability & Security – QRS 2016  Qualitative evaluation of the selected repositories.  Training set formation.  Target set formation.  Quality estimation models. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  7. 7. 7 Training dataset IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 Top 100 Repositories GitHub 24930 files Training Dataset Qualitative Evaluation User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  8. 8. 8 Selected repositories qualitative evaluation IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis PMD Ruleset Percentage (%) of files containing severe violations PMD Ruleset Percentage (%) of files containing severe violations Priority 1 Priority2 Priority 1 Priority2 Unused Code 0.0% 0.0% Coupling 0.0% 0.0% Basic 0.015% 0.337% Design 3.37% 3.9% Braces 0.0% 0.0% Empty 0.0% 0.0% Comments 0.0% 0.0% Finalizers 0.0% 0.0% Naming 14.11% 0.45% Optimizations 0.0% 0.0% Clone 0.0% 0.0% Strict Exception 4.99% 0.0% CodeSize 0.0% 0.0% Strings 0.0% 0.06% Controversial 1.75% 1.58% Unnecessary 0.0% 0.0% Very small percentage of files contain severe violations
  9. 9. 9 Target set formation  Use of GitHub stars as ground truth. But:  GitHub stars per repository (NOT per file)  Every source code file is of different importance  Big differences in the number of files between repositories 10000 x stars y stars z stars Dependency Analysis IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  10. 10. 10 Target set formation For the i-th file of the j-th repository, the target if formulated as follows: 𝐹𝑠𝑐𝑜𝑟𝑒 𝑖, 𝑗 = log 𝑅 𝑠𝑡𝑎𝑟𝑠 𝑗 𝑛 𝑓𝑖𝑙𝑒𝑠 𝑗 + 𝑑𝑒𝑝 𝑖 𝑛 𝑓𝑖𝑙𝑒𝑠 𝑗 ∗ 𝑅 𝑠𝑡𝑎𝑟𝑠 𝑗 Smoothing factor A base score to all files in the same repository Added value according to the significance of the source code file IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  11. 11. 11 Quality Evaluation Models IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 ANNs Model  Input: The values of 73 static analysis metrics.  Output: User-Perceived source code quality estimation  Applicable only for source code files that exceed minimum quality threshold SVMs - One Class Classifier  Used to rule out low quality code. One Class Classifier ANNs Model Accepted Static Analysis Quality Estimation User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  12. 12. 12 ANNs Model IEEE International Conference on Software Quality, Reliability & Security – QRS 2016  Two-layer feedforward network.  Levenberg-Marquardt algorithm (LMA) for adjusting the weights and the biases.  (Training, Validation, Test) = (70%, 15%, 15%).  Applicable only for source code files that exceed minimum quality threshold. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  13. 13. 13 SVMs One Class Classifier IEEE International Conference on Software Quality, Reliability & Security – QRS 2016  Used to rule out low quality code.  Gaussian radial basis kernel function.  Training involved the use of 7 metrics:  Average Block Depth,  Average Cyclomatic Complexity  Average Depth of Inheritance Hierarchy  Average Line of Codes Per Method  Comments Ratio  Distance  Lines Of Code  (nu, gamma, tolerance) = (0.1, 0.01, 0.01) User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis 1124 false- positives
  14. 14. 14 System Evaluation IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 Results validation:  Quantitative: Using PMD  Qualitative: Examination of a representative sample of files and their metrics Evaluation on three main axes: 1. The system's ability to distinguish high quality source code files. 2. The effectiveness of the model for estimating the quality of files exceeding a quality threshold. 3. The accuracy of predicting the popularity of Java repositories given their source code files. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  15. 15. 15 System Evaluation IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 Repositories selected:  8 random typical GitHub projects chosen independently.  lines-of-code-per-file ratio around 100, including also several extreme cases.  Both human and auto-generated code.  The auto-generated projects are expected to be of high quality.  Follow all coding conventions.  Are architecturally and functionally complete. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  16. 16. 16 Evaluation – One Class Classifier IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis The percentage of the rejected files that contained severe violations is very high
  17. 17. 17 Evaluation – ANNs Model IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis The quality score reflects the characteristics of the repositories
  18. 18. 18 Evaluation – Popularity Prediction IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 𝑅 𝑠𝑡𝑎𝑟𝑠 𝑖, 𝑗 = 𝑒 𝐹𝑠𝑐𝑜𝑟𝑒(𝑖,𝑗) ∙ 𝑛 𝑓𝑖𝑙𝑒𝑠(𝑗) 1 + 𝑑𝑒𝑝(𝑖) 𝑅 𝑠𝑡𝑎𝑟𝑠 𝑗 = 𝑖=1 𝑛 𝑓𝑖𝑙𝑒𝑠(𝑗) 𝑑𝑒𝑝(𝑖) 𝑑𝑒𝑝𝑡𝑜𝑡𝑎𝑙(𝑗) ∙ 𝑅 𝑠𝑡𝑎𝑟𝑠 𝑖, 𝑗 User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  19. 19. 19 Conclusions and future work IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 Conclusions:  Reliable determination of the area of high quality source code based on static analysis metrics.  Effective user-perceived source code quality estimation. Future Work:  Further investigation of the response of our model in different scenarios.  Expansion of the ground truth coverage by using more metrics.  Application of feature selection techniques in order to drop overlapping metrics. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
  20. 20. 20 Thank you! IEEE International Conference on Software Quality, Reliability & Security – QRS 2016 User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

×