SlideShare a Scribd company logo
1 of 13
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Paper ID- xxx
Performance Analysis of Machine Learning
Approaches in Software Complexity
Prediction
Sayed Reza1, Mahfujur Rahman2, Hasnat Parvez3,
Omar Badreddin1, and Shamim Al Mamun3
1 University of Texas, 2 Daffodil International University and 3
Jahangirnagar University
1
Paper ID -
410
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Introduction
• Software complexity is an undesired characteristic of a software
• Increasing complexity reduces maintainability and sustainability
• Class level complexity
• Method level complexity
• Complexity can be affected by many factors related to code
structures, object-oriented properties, and source code metrics
• Machine learning techniques can automate the process and get rid of
manual process or code rules to detect class complexity
2
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Research Objectives
• Use machine learning techniques to build complexity
classifiers
• The reason behind using machine learning to get rid of
manual process or code rules to detect class complexity.
• Compare the performance of the ML classifiers
• Report the best technique based on performance
metrics
3
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Motivation
• Early detection of software complexity will
empower better software maintenance
• Effective software maintenance facilitates
better quality over time
• And a well qualified software facilitates
• Enhance future software maintainability
• Ensure a sustainable software over time
• Minimize software development efforts over time
• Reduce the software development costs
4
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Research Questions & Study
Design
• RQ1: How source code metrics are correlated with quality attribute:
class complexity?
• This question reveals the relationships between complexity and source code
metrics
• RQ2: How accurately can machine learning approaches predict class
complexity from source code metrics?
• This question is targeted to find out the accuracy of machine learning
approaches in class level complexity detection
5
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Study Design
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Dataset Collection
• Dataset for complexity prediction needs diverse set
of repositories
• We search codebase repositories using ModelMine
tool [1] with the following criteria;
• a repository with primary language Java
• a minimum of 5000 commits (proxy of maintenance)
• at least 100 active contributors
• a minimum of 3000 stars and 500 forks (proxy for
popularity )
• 10 repositories and 38,778 classes in total are
selected
6
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
[1] Sayed Mohsin Reza, Omar Badreddin, and Khandoker Rahad. ModelMine: A tool to facilitate mining models from open-source repositories. In 2020 ACM/IEEE 23rd
International Conference on Model Driven Engineering Languages and Systems(MODELS). ACM, 2020.
Figure: Class distribution among
repositories
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Dataset Collection
(Continue)
• Input Variables: Extract 18 unique source code
metrics using static analyzer tool from each class
in code repositories
• Target Variable: Extract Current Complexity using
CODEMR tool [2] from each class in code repositories
• The variables are then combined using the class name
to create a dataset for complexity classifier
7
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
[2] Asma Shaheen, Usman Qamar, Aiman Nazir, Raheela Bibi, Munazza Ansar, andIqra Zafar. Oocqm: Object oriented code quality meter. In International Conference on
Computational Science/Intelligence & Applied Informatics, pages 149–163.Springer, 2019.
Table: Source Code
Metrics
… … …
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Dataset Preparation
• Remove the duplicate observations
• Find the outliers to remove the bias datapoints
• Visualize explanatory data analysis on input and
target variables
• Create training (80%) and testing dataset (20%)
8
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Relationship of some input
variables with target variable
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Correlation Results
• RQ1: How source code metrics are correlated
with quality attribute: class complexity?
• The results of Pearson correlation reveals
the impact of source code metrics on
complexity.
• The following source code metrics DIT, SRFC,
RFC, WMC, CMLOC and CBO *** have moderately
high impact on complexity
9
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Correlation between source code
metrics and complexity
*** DIT = Depth Inheritance Tree, RFC = Response for a Class, CMLOC= Class-Method Lines of Code, CBO = Coupling between objects
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Training & Testing
• In training, we choose 5 different Machine Learning techniques to classify
complexity
1. Naive Bayes (NB)
2. Logistic Regression (LR)
3. Decision Tree (DT)
4. Random Forest (RF) and
5. Ada Boost (AB)
• These are well known classifiers in machine learning and used in several similar
research [3,4]
• Perform 10-fold cross validation to ensure the reduction in variability of
performance results
10
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
[3] Istehad Chowdhury and Mohammad Zulkernine. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture,
57(3):294–313, 2011
[4] Yun Zhang, David Lo, Xin Xia, Bowen Xu, Jianling Sun, and Shanping Li. Combining software metrics and text features for vulnerable file prediction. In 2015 20th
International Conference on Engineering of Complex Computer Systems (ICECCS), pages 40–49. IEEE, 2015.
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Performance
Evaluation
• RQ2: How accurately can machine learning
approaches predict class complexity from
source code metrics?
• Decision Tree & Random Forest classifier
has the highest accuracy and precision
compared to other classifiers.
• Random Forest has highest recall & F1
score
• Is that all to declare best technique?
11
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Relative performance of ML
classifiers
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Performance
Evaluation (Continue)
• We focus on false negative rate to reduce the risk of
false alarms
• Higher FN Rate -> High number of high complex classes are detected as
Low [Very Risky Model]
• Lower FN Rate -> low number of high complex classes are detected as
Low [Less Risky Model]
• Still, Random Forest(RF) shows lower FN rate compared to
others
• The reason behind this we find out that RF use
bootstrapping random re-sample technique and working
with significant elements which works much better in
prediction.
12
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Relative FN rate of
ML classifiers
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Conclusion
• Problem in quality management: It is undoubtedly necessary to take proper action
before classes are become more complex
• Research Objective & Results
• We compare Machine Learning techniques’ performance to predict class complexity
• Our results shows that Random Forest model is doing better compared to other models
• We also find out the source code metrics which have most impact on class complexity
• Industrial Usage: Using ML automatic prediction on code quality will allow quality
managers, practitioners to take preventive actions against high complex classes
• Long-term Outcome: Ensure a sustainable software, Minimize software development
efforts, Reduce the software development costs over time
13
If you have any questions, email me at sreza3@miners.utep.edu

More Related Content

What's hot

Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for RequirementsClément Portet
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models IJECEIAES
 
Using cyclomatic complexity to measure code complexity
Using cyclomatic complexity to measure code complexityUsing cyclomatic complexity to measure code complexity
Using cyclomatic complexity to measure code complexityJane Chung
 
Using Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements PrioritizationUsing Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements Prioritization Francis Palma
 
IRJET- Attribute Based Adaptive Evaluation System
IRJET-  	  Attribute Based Adaptive Evaluation SystemIRJET-  	  Attribute Based Adaptive Evaluation System
IRJET- Attribute Based Adaptive Evaluation SystemIRJET Journal
 
The comparison of the text classification methods to be used for the analysis...
The comparison of the text classification methods to be used for the analysis...The comparison of the text classification methods to be used for the analysis...
The comparison of the text classification methods to be used for the analysis...ijcsit
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...Ali Ouni
 
76201929
7620192976201929
76201929IJRAT
 
130817 latifa guerrouj - context-aware source code vocabulary normalization...
130817   latifa guerrouj - context-aware source code vocabulary normalization...130817   latifa guerrouj - context-aware source code vocabulary normalization...
130817 latifa guerrouj - context-aware source code vocabulary normalization...Ptidej Team
 
Functional Verification of Large-integers Circuits using a Cosimulation-base...
Functional Verification of Large-integers Circuits using a  Cosimulation-base...Functional Verification of Large-integers Circuits using a  Cosimulation-base...
Functional Verification of Large-integers Circuits using a Cosimulation-base...IJECEIAES
 
Reusability Metrics for Object-Oriented System: An Alternative Approach
Reusability Metrics for Object-Oriented System: An Alternative ApproachReusability Metrics for Object-Oriented System: An Alternative Approach
Reusability Metrics for Object-Oriented System: An Alternative ApproachWaqas Tariq
 
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Mariana de Azevedo Santos
 
Action-based Recommendation in Pull-request Development
Action-based Recommendation in Pull-request DevelopmentAction-based Recommendation in Pull-request Development
Action-based Recommendation in Pull-request DevelopmentSebastiano Panichella
 
Model Manipulation for End-User Modelers
Model Manipulation for End-User ModelersModel Manipulation for End-User Modelers
Model Manipulation for End-User ModelersVlad Acretoaie
 
Recommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringRecommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringAli Ouni
 
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSijseajournal
 
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...Lionel Briand
 

What's hot (20)

Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for Requirements
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
Using cyclomatic complexity to measure code complexity
Using cyclomatic complexity to measure code complexityUsing cyclomatic complexity to measure code complexity
Using cyclomatic complexity to measure code complexity
 
Using Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements PrioritizationUsing Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements Prioritization
 
IRJET- Attribute Based Adaptive Evaluation System
IRJET-  	  Attribute Based Adaptive Evaluation SystemIRJET-  	  Attribute Based Adaptive Evaluation System
IRJET- Attribute Based Adaptive Evaluation System
 
Software bug prediction
Software bug prediction Software bug prediction
Software bug prediction
 
The comparison of the text classification methods to be used for the analysis...
The comparison of the text classification methods to be used for the analysis...The comparison of the text classification methods to be used for the analysis...
The comparison of the text classification methods to be used for the analysis...
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
 
76201929
7620192976201929
76201929
 
130817 latifa guerrouj - context-aware source code vocabulary normalization...
130817   latifa guerrouj - context-aware source code vocabulary normalization...130817   latifa guerrouj - context-aware source code vocabulary normalization...
130817 latifa guerrouj - context-aware source code vocabulary normalization...
 
Functional Verification of Large-integers Circuits using a Cosimulation-base...
Functional Verification of Large-integers Circuits using a  Cosimulation-base...Functional Verification of Large-integers Circuits using a  Cosimulation-base...
Functional Verification of Large-integers Circuits using a Cosimulation-base...
 
Reusability Metrics for Object-Oriented System: An Alternative Approach
Reusability Metrics for Object-Oriented System: An Alternative ApproachReusability Metrics for Object-Oriented System: An Alternative Approach
Reusability Metrics for Object-Oriented System: An Alternative Approach
 
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
 
Action-based Recommendation in Pull-request Development
Action-based Recommendation in Pull-request DevelopmentAction-based Recommendation in Pull-request Development
Action-based Recommendation in Pull-request Development
 
Model Manipulation for End-User Modelers
Model Manipulation for End-User ModelersModel Manipulation for End-User Modelers
Model Manipulation for End-User Modelers
 
Recommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringRecommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software Enginnering
 
WCRE11b.ppt
WCRE11b.pptWCRE11b.ppt
WCRE11b.ppt
 
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
 
Thesis Giani UIC Slides EN
Thesis Giani UIC Slides ENThesis Giani UIC Slides EN
Thesis Giani UIC Slides EN
 
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
 

Similar to Performance analysis of machine learning approaches in software complexity prediction by sayed mohsin reza at tcce 2020 conference

EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...ijseajournal
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsAhmed Magdy Ezzeldin, MSc.
 
software engineering module i & ii.pptx
software engineering module i & ii.pptxsoftware engineering module i & ii.pptx
software engineering module i & ii.pptxrani marri
 
Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)Maurício Aniche
 
IRJET- Analysis of Software Cost Estimation Techniques
IRJET- Analysis of Software Cost Estimation TechniquesIRJET- Analysis of Software Cost Estimation Techniques
IRJET- Analysis of Software Cost Estimation TechniquesIRJET Journal
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolKellyton Brito
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET Journal
 
Mumbai University M.E computer engg syllabus
Mumbai University M.E computer engg syllabusMumbai University M.E computer engg syllabus
Mumbai University M.E computer engg syllabusShini Saji
 
A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...
A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...
A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...jbci
 
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONGRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONIJCSEA Journal
 
Using Data Mining to Identify COSMIC Function Point Measurement Competence
Using Data Mining to Identify COSMIC Function Point Measurement Competence  Using Data Mining to Identify COSMIC Function Point Measurement Competence
Using Data Mining to Identify COSMIC Function Point Measurement Competence IJECEIAES
 
Computer Organisation and Architecture Teaching Trends
Computer Organisation and Architecture Teaching TrendsComputer Organisation and Architecture Teaching Trends
Computer Organisation and Architecture Teaching Trendsyogesh1617
 
Computer Oraganisation and Architecture
Computer Oraganisation and ArchitectureComputer Oraganisation and Architecture
Computer Oraganisation and Architectureyogesh1617
 
CSE320 SOFTWARE ENGINEERING Lecture01 (1).ppt
CSE320  SOFTWARE ENGINEERING Lecture01 (1).pptCSE320  SOFTWARE ENGINEERING Lecture01 (1).ppt
CSE320 SOFTWARE ENGINEERING Lecture01 (1).pptDHIRENDRAHUDDA
 
ICPE 2022 - Data Challenge
ICPE 2022 - Data ChallengeICPE 2022 - Data Challenge
ICPE 2022 - Data ChallengeLuc Lesoil
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Gabriel Moreira
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 

Similar to Performance analysis of machine learning approaches in software complexity prediction by sayed mohsin reza at tcce 2020 conference (20)

EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithms
 
software engineering module i & ii.pptx
software engineering module i & ii.pptxsoftware engineering module i & ii.pptx
software engineering module i & ii.pptx
 
Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)
 
IRJET- Analysis of Software Cost Estimation Techniques
IRJET- Analysis of Software Cost Estimation TechniquesIRJET- Analysis of Software Cost Estimation Techniques
IRJET- Analysis of Software Cost Estimation Techniques
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
Mumbai University M.E computer engg syllabus
Mumbai University M.E computer engg syllabusMumbai University M.E computer engg syllabus
Mumbai University M.E computer engg syllabus
 
A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...
A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...
A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...
 
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONGRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
 
Using Data Mining to Identify COSMIC Function Point Measurement Competence
Using Data Mining to Identify COSMIC Function Point Measurement Competence  Using Data Mining to Identify COSMIC Function Point Measurement Competence
Using Data Mining to Identify COSMIC Function Point Measurement Competence
 
Computer Organisation and Architecture Teaching Trends
Computer Organisation and Architecture Teaching TrendsComputer Organisation and Architecture Teaching Trends
Computer Organisation and Architecture Teaching Trends
 
Computer Oraganisation and Architecture
Computer Oraganisation and ArchitectureComputer Oraganisation and Architecture
Computer Oraganisation and Architecture
 
CSE320 SOFTWARE ENGINEERING Lecture01 (1).ppt
CSE320  SOFTWARE ENGINEERING Lecture01 (1).pptCSE320  SOFTWARE ENGINEERING Lecture01 (1).ppt
CSE320 SOFTWARE ENGINEERING Lecture01 (1).ppt
 
ICPE 2022 - Data Challenge
ICPE 2022 - Data ChallengeICPE 2022 - Data Challenge
ICPE 2022 - Data Challenge
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Slides chapter 15
Slides chapter 15Slides chapter 15
Slides chapter 15
 

Recently uploaded

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 

Recently uploaded (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 

Performance analysis of machine learning approaches in software complexity prediction by sayed mohsin reza at tcce 2020 conference

  • 1. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Paper ID- xxx Performance Analysis of Machine Learning Approaches in Software Complexity Prediction Sayed Reza1, Mahfujur Rahman2, Hasnat Parvez3, Omar Badreddin1, and Shamim Al Mamun3 1 University of Texas, 2 Daffodil International University and 3 Jahangirnagar University 1 Paper ID - 410
  • 2. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Introduction • Software complexity is an undesired characteristic of a software • Increasing complexity reduces maintainability and sustainability • Class level complexity • Method level complexity • Complexity can be affected by many factors related to code structures, object-oriented properties, and source code metrics • Machine learning techniques can automate the process and get rid of manual process or code rules to detect class complexity 2
  • 3. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Research Objectives • Use machine learning techniques to build complexity classifiers • The reason behind using machine learning to get rid of manual process or code rules to detect class complexity. • Compare the performance of the ML classifiers • Report the best technique based on performance metrics 3
  • 4. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Motivation • Early detection of software complexity will empower better software maintenance • Effective software maintenance facilitates better quality over time • And a well qualified software facilitates • Enhance future software maintainability • Ensure a sustainable software over time • Minimize software development efforts over time • Reduce the software development costs 4
  • 5. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Research Questions & Study Design • RQ1: How source code metrics are correlated with quality attribute: class complexity? • This question reveals the relationships between complexity and source code metrics • RQ2: How accurately can machine learning approaches predict class complexity from source code metrics? • This question is targeted to find out the accuracy of machine learning approaches in class level complexity detection 5 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Study Design
  • 6. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Dataset Collection • Dataset for complexity prediction needs diverse set of repositories • We search codebase repositories using ModelMine tool [1] with the following criteria; • a repository with primary language Java • a minimum of 5000 commits (proxy of maintenance) • at least 100 active contributors • a minimum of 3000 stars and 500 forks (proxy for popularity ) • 10 repositories and 38,778 classes in total are selected 6 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique [1] Sayed Mohsin Reza, Omar Badreddin, and Khandoker Rahad. ModelMine: A tool to facilitate mining models from open-source repositories. In 2020 ACM/IEEE 23rd International Conference on Model Driven Engineering Languages and Systems(MODELS). ACM, 2020. Figure: Class distribution among repositories
  • 7. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Dataset Collection (Continue) • Input Variables: Extract 18 unique source code metrics using static analyzer tool from each class in code repositories • Target Variable: Extract Current Complexity using CODEMR tool [2] from each class in code repositories • The variables are then combined using the class name to create a dataset for complexity classifier 7 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique [2] Asma Shaheen, Usman Qamar, Aiman Nazir, Raheela Bibi, Munazza Ansar, andIqra Zafar. Oocqm: Object oriented code quality meter. In International Conference on Computational Science/Intelligence & Applied Informatics, pages 149–163.Springer, 2019. Table: Source Code Metrics … … …
  • 8. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Dataset Preparation • Remove the duplicate observations • Find the outliers to remove the bias datapoints • Visualize explanatory data analysis on input and target variables • Create training (80%) and testing dataset (20%) 8 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Relationship of some input variables with target variable
  • 9. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Correlation Results • RQ1: How source code metrics are correlated with quality attribute: class complexity? • The results of Pearson correlation reveals the impact of source code metrics on complexity. • The following source code metrics DIT, SRFC, RFC, WMC, CMLOC and CBO *** have moderately high impact on complexity 9 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Correlation between source code metrics and complexity *** DIT = Depth Inheritance Tree, RFC = Response for a Class, CMLOC= Class-Method Lines of Code, CBO = Coupling between objects
  • 10. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Training & Testing • In training, we choose 5 different Machine Learning techniques to classify complexity 1. Naive Bayes (NB) 2. Logistic Regression (LR) 3. Decision Tree (DT) 4. Random Forest (RF) and 5. Ada Boost (AB) • These are well known classifiers in machine learning and used in several similar research [3,4] • Perform 10-fold cross validation to ensure the reduction in variability of performance results 10 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique [3] Istehad Chowdhury and Mohammad Zulkernine. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture, 57(3):294–313, 2011 [4] Yun Zhang, David Lo, Xin Xia, Bowen Xu, Jianling Sun, and Shanping Li. Combining software metrics and text features for vulnerable file prediction. In 2015 20th International Conference on Engineering of Complex Computer Systems (ICECCS), pages 40–49. IEEE, 2015.
  • 11. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Performance Evaluation • RQ2: How accurately can machine learning approaches predict class complexity from source code metrics? • Decision Tree & Random Forest classifier has the highest accuracy and precision compared to other classifiers. • Random Forest has highest recall & F1 score • Is that all to declare best technique? 11 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Relative performance of ML classifiers
  • 12. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Performance Evaluation (Continue) • We focus on false negative rate to reduce the risk of false alarms • Higher FN Rate -> High number of high complex classes are detected as Low [Very Risky Model] • Lower FN Rate -> low number of high complex classes are detected as Low [Less Risky Model] • Still, Random Forest(RF) shows lower FN rate compared to others • The reason behind this we find out that RF use bootstrapping random re-sample technique and working with significant elements which works much better in prediction. 12 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Relative FN rate of ML classifiers
  • 13. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Conclusion • Problem in quality management: It is undoubtedly necessary to take proper action before classes are become more complex • Research Objective & Results • We compare Machine Learning techniques’ performance to predict class complexity • Our results shows that Random Forest model is doing better compared to other models • We also find out the source code metrics which have most impact on class complexity • Industrial Usage: Using ML automatic prediction on code quality will allow quality managers, practitioners to take preventive actions against high complex classes • Long-term Outcome: Ensure a sustainable software, Minimize software development efforts, Reduce the software development costs over time 13 If you have any questions, email me at sreza3@miners.utep.edu