This document presents a study that develops a software effort estimation model using an Adaptive Neuro Fuzzy Inference System (ANFIS). The study evaluates the proposed ANFIS model using COCOMO81 datasets and compares its performance to an Artificial Neural Network (ANN) model and the intermediate COCOMO model. The results show that the ANFIS model provides better estimates than the ANN and COCOMO models, with lower values for metrics like the Root Mean Square Error and Magnitude of Relative Error.
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Hybrid Fuzzy-ANN Software Effort Estimation Model Outperforms ANN & COCOMO
1. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
A HYBRID FUZZY-ANN APPROACH FOR SOFTWARE
EFFORT ESTIMATION
Sheenu Rizvi1, Dr. S.Q. Abbas2 and Dr. Rizwan Beg3
1Department of Computer Science, Amity University, Lucknow, India
2A.I.M.T., Lucknow, India
3 Integral University, Lucknow, India
ABSTRACT
Software development effort estimation is one of the major activities in software project management.
During the project proposal stage there is high probability of estimates being made inaccurate but later on
this inaccuracy decreases. In the field of software development there are certain matrices, based on which
the effort estimation is being made. Till date various methods has been proposed for software effort
estimation, of which the non algorithmic methods, like artificial intelligence techniques have been very
successful. A Hybrid Fuzzy-ANN model, known as Adaptive Neuro Fuzzy Inference System (ANFIS) is more
suitable in such situations. The present paper is concerned with developing software effort estimation
model based on ANFIS. The present study evaluates the efficiency of the proposed ANFIS model, for which
COCOMO81 datasets has been used. The result so obtained has been compared with Artificial Neural
Network (ANN) and Intermediate COCOCMO model developed by Boehm. The results were analyzed using
Magnitude of Relative Error (MRE) and Root Mean Square Error (RMSE). It is observed that the ANFIS
provided better results than ANN and COCOMO model.
KEYWORDS
Software Effort Estimation, RMSE, ANFIS, ANN, COCOMO, MRE.
1. INTRODUCTION
One of the key challenges in software industry is the accurate estimation of the development
effort, which is particularly important for risk evaluation, resource scheduling as well as progress
monitoring. Inaccuracies in estimations lead to problematic results; for instance, overestimation
causes waste of resources, whereas underestimation results in approval of projects that will
exceed their planned budgets. For this many models has been framed so as to make it cost
effective. These models can be examined based on methodologies used: Expert-based, analogy-based
and regression-based. Expert based models depend on the expert knowledge to use past
experience on software projects. Based on a comprehensive review, expert based estimation is
one of the most frequently applied estimation strategy. Alternatively, regression-based methods
use statistical techniques such as least square regression, in the sense that a set of independent
variables explain the dependent variable with minimum error rate. Mathematical models like
Barry Boehm’s COCOMO [1] and COCOMO II [2] are widely investigated regression-based
methods. Parameters of these models are calibrated according to the projects in a company. Thus,
they have the drawback of requiring local calibration. To combat these problems a hybrid Fuzzy-
ANN model known as Adaptive Neuro Fuzzy Inference System (ANFIS) has been dealt in this
paper.
DOI:10.5121/ijfcst.2014.4505 45
2. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
2. DATA USED
The data used is COCOMO 81. The data utilised for ANFIS model development as input and
output variables are given in the Table 1. Total sixteen input variables have been used which
include fifteen effort multipliers and the size measured in thousand delivered lines of code.
Development Effort (DE) has been used as the output of the model measured in man-months. The
data were collected from the analysis of sixty three (63) software projects, as published by Barry
Boehm in 1981[3] [16].
46
Table 1. Input and Output variables for ANFIS model.
Input
Variables
RELY - Required software reliability
DATA - Data base size,
CPLX - Product complexity,
TIME - Execution time,
STOR—main storage constraint,
VIRT—virtual machine volatility
TURN—computer turnaround time,
ACAP—analyst capability,
AEXP—applications experience,
PCAP—programmer capability,
VEXP—virtual machine experience,
LEXP—language experience
MODP—modern programming,
TOOL—use of software tools,
SCED—required development schedule,
SIZE — in KLOC
Output
Variable
Development Effort (DE)
Source: - COCOMO81 Dataset (PROMISE Software Engineering Repository data [16])
3. ANFIS MODEL DEVELOPMENT
3.1. Parameter Selection
ANFIS [9],[10] is a judicious integration of FIS and ANN, capable of learning, high-level
thinking and reasoning and it combines the benefits of these two techniques into a single capsule
[4]. The success for FIS is the finding of the rule base. The reason being that there are no specific
techniques for converting the knowledge of human beings into the rule base and also in order to
maximise the performance of the model and to minimize the output error, further fine tuning of
the membership functions is required. Thus when generating a FIS using ANFIS, it is important
to select proper parameters, including the number of membership functions (MFs) for each
individual antecedent variables. It is also vital to select appropriate parameters for learning and
refining process, including the initial step size (ss). In the present work the commonly used rule
extraction method applied for FIS identification and refinement is subtractive clustering. The
MATLAB Fuzzy Logic Toolbox [7] has been used for ANFIS model development.
Here the initial parameters of the ANFIS are identified using the subtractive clustering method
[5]. However, it is vital to properly define the substractive clustering parameters, of which the
clustering radius is the most important. It is determined through a trial and error approach. By
varying the clustering radius ra with varying step size, the optimal parameters are obtained by
3. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
minimizing the root mean squared error based on the validation datasets. Clustering radius rb is
selected as 1.5ra. Gaussian membership functions are used for each fuzzy set in the fuzzy system.
The number of membership functions and fuzzy rules required for a particular ANFIS is
determined through the subtractive clustering algorithm. Parameters of the Gaussian membership
function are optimally determined using the hybrid learning algorithm. Each ANFIS is trained for
10 epochs.
Gaussian membership function has been used as the input membership function and linear
membership function for the output function. Here separate sets of input and output data has been
used as input arguments. In MATLAB genfis2 generates a Sugeno-type FIS structure using
subtractive clustering. Genfis2 is generally used where there is only one output; hence here it has
been used to generate initial FIS for training the ANFIS. On the other hand genfis2 achieves this
by extracting a set of rules that simulates the data values. In order to determine the number of
rules and antecedent membership functions, subclust function has been used by the rule extraction
methods. Further it uses the linear least squares estimation to determine each rule's consequent
equations.
The parameters used in the model for training ANFIS are given in Table 2 and the rule extraction
method used is given in Table 3. Table 4 summarizes the results of types and values of model
parameters used for training ANFIS
47
Table 2. Parameters used in all the models for training ANFIS
Rule extraction method
used
Subtractive clustering
Input MF type Gaussian membership (‘gaussmf’)
Input partitioning variable
Output MF Type Linear
Number of output MFs one
Training algorithm Hybrid learning
Training epoch number 10
Initial step size 0.01
Table 3. Rule extraction method used for training ANFIS
Rule Extraction Method Type
And method ‘prod’
Or method ‘probor’
Defuzzy method ‘wtever’
Implication method ‘prod’
Aggregation method ‘max’
Table 4. Values of parameters used for training ANFIS
No. of nodes 1311
No. of linear parameters 646
No. of non-linear parameters 1216
Total no. of parameters 1862
No. of training data pairs 40
No. of testing data pairs 23
No. of fuzzy rules 38
4. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
4. RESULT AND DISCUSSION
Here the ANFIS model has been trained tested by ANFIS method and their performance for the
best prediction model are evaluated and compared for training and testing data sets separately.
The RMSE performances of the ANFIS model both for training and testing datasets have been
plotted separately in Fig. 1 & Fig.2 and their corresponding range of values (minimum and
maximum) are summarized in Table 5.
48
Figure 1. Graphical plot of RMSE value variation during training
Figure 2. Graphical plot of RMSE value variation during testing
Table 5. Range of RMSE during training and testing phase
RMSE Value
Minimum Maximum
Training datasets 0.4824 2.8096
Testing datasets 186.41 188.41
5. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
49
Further Table 6 gives the RMSE values using COCOMO, ANN and ANFIS techniques.
Table 6. Performance evaluation using RMSE criteria
RMSE
COCOMO ANN ANFIS
Val.
532.2147 353.1977 112.638
From analysis of Fig. 1 & Fig. 2 and perusal of the data given in tables 5 it is inferred that during
training phase (Fig.1), there is zig zag variation in the RMSE values, having a minimum value of
0.4824 (at epoch 8) and a maximum value of 2.8096 ( epoch 3). Hence during training phase
there is initially a rise in the RMSE value and then there is a fall at epoch no. 8, after which there
is again a slight increase. On the other hand, during testing phase (Fig.2) of ANFIS training
initially upto epoch 4 the RMSE value decreases and reaches upto a minimum of 186.41 and then
there is steep rise in the RMSE value upto 10 epochs, where the maximum value reached is
188.41. From Table 5 it can be inferred that ANFIS has performed better during training phase
than testing phase but its overall RMSE value is 112.638. Which shows a marked improvement
than those calculated in ANN and COCOMO model i.e. 353.1977 and 532.2147 respectively.
(Given above in Table 6).
Further consider the absolute values of Magnitude of Relative Error (MRE) calculated both for
COCOMO and ANFIS models (given below in Table 7) and their comparative plot, both for
training and testing datasets (as given in Fig. 3 & 4). From the perusal of both the data and the
graphical plot, it is seen that during the training as well as testing phase of the ANFIS model
development, the absolute values of the MRE are very less as compared to COCOMO model,
especially during training phase. Since Absolute MRE computes the absolute percentage of error
between the actual and predicted effort for each project, hence from the above data analysis it can
be derived that the absolute percentage of error between the actual and predicted effort using
ANFIS technique is far less than those using COCOMO model.
Thus, it is clear that proper selection of influential radius which affects the cluster results directly
in ANFIS using subtractive clustering rule extraction method has resulted in reduction of RMSE
and MRE both for training and testing data sets. Hence, it is seen that for small size training data,
ANFIS has outperformed ANN and COCOMO model.
Table 7. Comparative chart of Absolute values of MRE for COCOMO and ANFIS Model
S.No. ABS MRE
COCOMO
ABS MRE
ANFIS
1. 8.651813725 0.000103189
2. 73.9110625 0.030832219
3. 1.377489712 0.00195532
4. 2.00825 0.000158388
5. 16.93939394 0.000202853
6. 40.51162791 1.22696E-05
7. 22.125 0.000142747
8. 41.41395349 1.94362E-05
9. 21.04728132 1.11052E-05
10. 14.17757009 5.40767E-05
11. 42.22018349 0.000783969
12. 0.646766169 9.3241E-05
13. 43.78481013 0.000854332 14. 16.41666667 6.95013E-07
7. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
51
Absolute MRE of COCOMO and ANFIS Output for training data
200
100
0
1 4 7 10 13 16 19 22 25 28 31 34 37 40
No. of Projects
A b s o l u t e
M R E
COCOMO MRE
ANFIS MRE
Figure 3. Absolute MRE plot for COCOMO and ANFIS Output for training datasets
MRE of COCOMO and ANFIS output for testing data
200
150
100
50
0
1 3 5 7 9 11 13 15 17 19 21 23
No. of Projects
A b s o l u t e M R E
MRE COCOMO
MRE ANFIS
Figure 4. Absolute MRE plot for COCOMO and ANFIS Output for testing datasets
In order to depict how well ANFIS has performed over ANN and COCOMO model, a
comparative plot of actual effort versus predicted effort, by COCOMO, ANN and ANFIS
technique, has been shown in Fig. 5 using data given in Table 8.. From the graph it is seen that
ANFIS model line almost closely follows the actual effort line than those of COCOMO. This
again depicts the superiority of ANFIS technique over ANN and COCOMO model for effort
estimation.
Table 8. Comparative chart of Actual Effort Versus Estimated Effort using COCOMO, ANN and ANFIS
S. No Actual
Effort
Estimated Effort using
COCOMO ANN ANFIS
1 2040 1863.503 2040.022 2040.002
2 1600 2782.577 3168.456 1599.507
3 243 246.3473 242.8827 242.9952
9. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
53
42 45 109.29 234.8325 195.2396
43 83 103.73 101.074 228.257
44 87 132.87 100.6351 130.0721
45 106 109.2 157.2179 3.31
46 126 213.91 122.6887 343.28
47 36 32.77 7.266029 57.82236
48 1272 2204.63 6.364794 738.6743
49 156 141.51 155.7227 335.0579
50 176 162.46 491.2995 188.5651
51 122 82.74 254.6255 93.75488
52 41 36.46 48.05263 51.03936
53 14 22.41 38.53126 104.7524
54 20 11.78 6.371402 34.6563
55 18 7.51 8.634863 16.71238
56 958 388.88 957.3443 385.3861
57 237 277.35 238.0535 177.1851
58 130 145.19 1540.691 282.375
59 70 82.78 6.243794 85.83885
60 57 50.11 132.3261 119.6359
61 50 47.26 6.030985 40.99599
62 38 41.18 38.24981 140.7745
63 15 17.13 6.164915 19.69363
Finally, Figure 6, 7 & 8 shows the scatter plot of Actual Effort versus Estimated Effort using
ANFIS, ANN and COCOMO models. The figures show that the model performance is generally
precise in case of ANFIS, where all data points follow a linear trend line and the model using
ANFIS is better than ANN and COCOMO.
15000
10000
5000
0
Actual Effort
Estimated Effort using COCOMO
Estimated Effort using ANN
Estimated Effort using ANFIS
1 7 13 19 25 31 37 43 49 55 61
Figure 5. Comparative plot of Actual Effort, COCOMO, ANN and ANFIS Output
10. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
54
Using ANFIS
15000
10000
5000
0
0 5000 10000 15000
Actual Effort
Estimated Effort
Figure 6. Scatter Plot of Actual vs. Estimated Effort using ANFIS
Using ANN
15000
10000
5000
0
0 2000 4000 6000 8000 1000
0
1200
0
Actual Effort
Estimated Effort
Figure 7. Scatter Plot of Actual vs. Estimated Effort using ANN
11. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
55
Using COCOMO
15000
10000
5000
0
0 5000 10000 15000
Actual Effort
Estimated Effort
Figure 8. Scatter Plot of Actual vs. Estimated Effort using COCOMO
5. CONCLUSION
Here, in the present paper, applicability and capability of ANFIS techniques for effort estimation
prediction has been investigated. It is seen that ANFIS models are very robust, characterized by
fast computation, capable of handling the noisy and approximate data that are typical of data used
here for the present study. Due to the presence of non-linearity in the data, it is an efficient
quantitative tool to predict effort estimation. The studies have been carried out using MATLAB
simulation environment. In all sixteen input variable were used, consisting of fifteen Effort
Adjustment Factors and size of the project and one output variable as Effort.
Here the initial parameters of the ANFIS are identified using the subtractive clustering method.
Gaussian membership functions (given in earlier section) are used for each fuzzy set in the fuzzy
system. Subtractive clustering algorithm has been used to determine the number of membership
functions and fuzzy rules required for ANFIS development. Here hybrid learning algorithm has
been used to determine the parameters of the Gaussian membership function. Each ANFIS has
been trained for 10 epochs.
From the analysis of the above results, given under heading Results and Discussions, it is seen
that the Effort Estimation prediction model developed using ANFIS technique has been able to
perform well over ANN and COCOMO Model. This can be concluded from the analysis of the
results given in Tables 5, 6, 7 and 8. The RMSE value obtained from ANFIS model (112.638) is
lower than those from ANN (353.1977) and COCOMO Model (532.2147). Further from Fig. 6, 7
& 8 and Table 8 it is seen that ANFIS model line almost closely follows the actual effort line than
those of ANN and COCOMO. This again depicts the superiority of ANFIS technique over ANN
and COCOMO model for effort estimation.
REFERENCES
[1]. Alpaydın,E. 2004. Introduction to machine learning. Cambridge: MIT Press.
[2]. Boehm,B., Abts, C., Chulani, S. 2000. Software development cost estimation approaches: A survey.
[3]. Annals of Software Engineering (10): 177–205.
12. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
[4]. Boehm,B.W. 1981. Software Engineering Economics. Upper Saddle River, NJ, USA: Prentice Hall
56
PTR.
[5]. Chen,D.W. And Zhang, J.P., (2005), “Time series prediction based on ensemble ANFIS”,
Proceedings of the fourth International Conference on Machine Learning and Cybernetics, IEEE, pp
3552-3556.10
[6]. Chiu,S.,(1994), “Fuzzy Model Identification based on cluster estimation”, Journal of Intelligent and
Fuzzy Systems, 2 (3), pp 267–278.11
[7] .Fuller,R.,(1995), “Neural Fuzzy Systems”, ISBN 951-650-624-0, ISSN 0358-5654.17
[8]. “Fuzzy Logic Toolbox”, MATLAB version R2013a.
[9]. Hammouda, K. A., “Comparative Study of Data Clustering Techniques”.
[10]. Jang,J-S.R.,(1992),“Neuro-Fuzzy Modelling: Architecture, Analyses and Applications”, P.hd. Thesis.
[11]. Jang,J-S.R.,(1993),“ANFIS-Adaptive-Network Based Fuzzy Inference System”, IEEE Transactions
on Systems, Man and Cybernetics, 23(3), pp 665-685.
[12]. Jang, J-S. R., SUN, C.-T., (1995), “Neuro-fuzzy modelling and control”, Proceedings IEEE,. 83 (3),
pp 378–406.
[13]. Jantzen,J.,(1998), “Neurofuzzy Modelling. Technical Report no. 98-H-874(nfmod)”, Department of
Automation. Technical University of Denmark.1-28.
[14]. Pendharkar, Parag C., et. al., (2005), “A Probabilistic Model for Predicting Software Development
Effort”, IEEE Transactions On Software Engineering, Vol. 31, NO. 7.
[15]. Priyono, A. Ridwan, M., et. al. (2005), “Generation of fuzzy rules with subtractive clustering”,
Journal Teknologi., 43(D), pp 143-153.
[16]. Sayyad Shirabad, J. and Menzies, T.J. (2005) The PROMISE Repository of Software Engineering
Databases. School of Information Technology and Engineering, University of Ottawa, Canada.
Available: http://promise.site.uottawa.ca/SERepository
[17]. Tagaki, T. And Sugeno, M. , (1983), “Derivation of fuzzy control rules from human operators control
actions”, Proc. IFAC Symp. Fuzzy Inform, Knowledge Representation and Decision Analysis, pp 55-
60.
[18]. Vaidehi, V., Monica, S., Mohammad Sheikh Safeer, S.,Deepika, M. And Sangeetha, S., (2008), “A
Prediction System Based on Fuzzy Logic”, Proceedings of World Congress on Engineering and
Computer Science. 38
[19]. Zadeh, L.A., 1965), “Fuzzy sets, Information and Control”, 8, pp 338–353.36.
Authors
Sheenu Rizvi, Assistant Professor, Amity School of Engineering and Technology
Lucknow, India. He received his M.Tech degree in Information Technology in 2005 and
Persuing Ph.D in Computer Application from Integral University.
Syed Qamar Abbas completed his Master of Science (MS) from BITS Pilani. His PhD
was on computer-oriented study on Queueing models. He has more than 20 years of
teaching and research experience in the field of Computer Science and Information
Technology. Currently, he is Director of Ambalika Institute of Management and
Technology, Lucknow.
Prof. Dr. M. Rizwan Beg is M.Tech & Ph.D in Computer Sc. & Engg. Presently he is
working as Controller of Examination in Integral University Luck now, Uttar Pradesh,
India He is having more than 16 years of experience which includes around 14 years of
teaching experience. His area of expertise is Software Engg., Requirement Engineering,
Software Quality, and Software Project Management. He has published more than 40
Research papers in International Journals & Conferences. Presently 8 research scholars
are pursuing their Ph.D in his supervision.