Predicting the Credit Defaulter is a perilous task of Financial Industries like Banks. Ascertainingnon payer
before giving loan is a significant and conflict-ridden task of the Banker. Classification techniques
are the better choice for predictive analysis like finding the claimant, whether he/she is an unpretentious
customer or a cheat. Defining the outstanding classifier is a risky assignment for any industrialist like a
banker. This allow computer science researchers to drill down efficient research works through evaluating
different classifiers and finding out the best classifier for such predictive problems. This research
work investigates the productivity of LADTree Classifier and REPTree Classifier for the credit risk prediction
and compares their fitness through various measures. German credit dataset has been taken and used
to predict the credit risk with a help of open source machine learning tool.
Comparing LADTree and REPTree Classifiers for Credit Risk Forecasting
1. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
39
PROFICIENCY COMPARISON OFLADTREE
AND REPTREE CLASSIFIERS FOR CREDIT
RISK FORECAST
Lakshmi Devasena C
Dept. of Operations & Systems, ISB Hyderabad, IFHE University
Abstract
Predicting the Credit Defaulter is a perilous task of Financial Industries like Banks. Ascertainingnon-
payer before giving loan is a significant and conflict-ridden task of the Banker. Classification techniques
are the better choice for predictive analysis like finding the claimant, whether he/she is an unpretentious
customer or a cheat. Defining the outstanding classifier is a risky assignment for any industrialist like a
banker. This allow computer science researchers to drill down efficient research works through eval-
uating different classifiers and finding out the best classifier for such predictive problems. This research
work investigates the productivity of LADTree Classifier and REPTree Classifier for the credit risk predic-
tion and compares their fitness through various measures. German credit dataset has been taken and used
to predict the credit risk with a help of open source machine learning tool.
Key words
Credit Risk Forecast, LAD Tree Classifier, Proficiency Comparison, REP Tree Classifier.
1.INTRODUCTION
The enormous volume of transactions made information processing automation an invigorating
factor for high quality standards, cost reduction, with high speed results. Data analysis automation
and result of the relevant successes produced by state-of-the art computer algorithms have
changed the opinions of many misanthropists. In the past, people thought that financial market
analysis necessitates intuition, knowledge and experience and speculated how this job could be
automated. Conversely, growth of scientific and technological advances, achieved the automation
of financial market analysis. In recent days, credit defaulter prediction and credit risk evaluation
have fascinated great deal of interests from regulators, practitioners, and theorists, in the financial
industry. Since, the credit score of an applicant could be calculated from the past giant database
and the demographic data, it needs automation. Automation of credit risk forecastcan be achieved
using classification techniques. Selecting the classifier, which envisages credit risk in an efficient
manner, is an imperative and critical task. This research work appraises the credit risk
performance of two diverse classifiers, namely, REP Tree Classifier and LAD Tree Classifier and
compares their accuracyofcredit risk prediction.
3. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
41
In this research work, two diversetree based classifiers namely, LAD Tree Classifierand REP
Tree Classifier are compared for proficiencyassessment of credit risk estimation.
4.1.LAD Tree Classifier
LAD Tree builds a classifier for binary target variable based on learning a logical expression that
can discriminate between positive and negative samples in a data set. The construction of LAD
model for a given data set typically involves the generation of large set patterns and the selection
of a subset of them that satisfies the above assumption that a binary point covered by some
positive patterns, but not covered by any negative pattern is positive, and similarly, a binary point
covered by some negative patterns, but not covered by positive pattern is negative, such that each
pattern in the model satisfies certain requirements in terms of prevalence and homogeneity[23].
LAD Tree Classifier generates a multi-class alternating decision tree using the Logit Boost
strategy. The LAD Tree algorithm applies logistic boosting algorithm in order to induce an
alternating decision tree. In this algorithm, a single attribute test is chosen as a splitter node for
the tree at each iteration. For each training instance, working response and weights are calculated
and stored on a per-class basis. Then, it fits the working response to the mean value of the
instances, in a particular subset, by minimizing the least-squares value between them. IN this
algorithm, trees for the different classes are grown in parallel. Once all the trees have been
constructed, then it merges the trees into a final model. Advantage of this classifier is the size of
the tree cannot outgrow the combined size of the individual trees [19].
4.2.REP Tree Classifier
Reduces Error Pruning (REP) Tree Classifier is a fast decision tree learning algorithm and is
based on the principle of computing the information gain with entropy and minimizing the error
arising from variance [20]. This algorithm is first recommended in [21]. REP Tree applies
regression tree logic and generates multiple trees in altered iterations. Afterwards it picks best one
from all spawned trees. This algorithm constructs the regression/decision tree using variance and
information gain. Also, this algorithm prunes the tree using reduced-error pruning with back
fitting method. At the beginning of the model preparation, it sorts the values of numeric attributes
once. As in C4.5 Algorithm, this algorithm also deals the missing values by splitting the
corresponding instances into pieces.[22].
5.PERFORMANCE MEASURES USED
Variousscales are used to gauge the performance of the classifiers.
Classification Accuracy
Any classifier could have an error rate and it may fail to categorize correctly. Classification
accuracy is calculated as Correctly classified instances divided by Total number of instances
multiplied by 100.
Mean Absolute Error
4. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
42
Mean absolute error is the average of the variance between predicted and actual value in all test
cases. It is a good measure to gauge the performance.
Root Mean Square Error
Root mean squared error is used to scaledissimilarities between values actually perceived and the
values predicted by the model. It is determined by taking the square root of the mean square error.
Confusion Matrix
A confusion matrix encompasses information about actual and predicted groupings done by a
classification system.
6.RESULTS AND DISCUSSION
Open source machine learning tool is used to experiment the performance of LAD Tree and REP
Tree Classifiers. The performance is tested out using the Training set as well as using different
Cross Validation methods. The class is arrived by considering all 20 attributes of the dataset.
6.1.Performance of LAD Tree Classifier
The overall assessment summary of LAD Tree Classifier using training set and different cross
validation methods is given in Table I. The performance of LAD Tree Classifier in terms of
Correctly Classified Instances and Classification Accuracy is shown in Fig. 1and Fig. 2. The
confusion matrix for different test mode is given in Table II to Table VI. LAD Tree Classifier
gives 76.1% accuracy for the training data set. Various cross validation methods are used to
check its actual performance. On an average, it gives around 70.7% of accuracy for credit risk
estimation.
TABLE I
LAD TREE CLASSIFIER OVERALL EVALUATION SUMMARY
Test Mode Correctly
Classified
Instances
Incorrectly
Classified
Instances
Accuracy Mean
Absolute
Error
Root Mean
Squared
Error
Time Taken
to Build
Model (Sec)
Training Set 761 239 76.1% 0.3236 0.3953 2.64
5 Fold CV 702 298 70.2% 0.3547 0.437 2.07
10 Fold CV 708 292 70.8% 0.3494 0.4326 1.92
15 Fold CV 715 285 71.5% 0.3545 0.4351 2.67
20 Fold CV 704 296 70.4% 0.3559 0.437 1.43
TABLE II
CONFUSION MATRIX –LAD TREE CLASSIFIER (ON TRAINING DATASET)
Good Bad Actual (Total)
5. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
43
Good 655 45 700
Bad 194 106 300
Predicted (Total) 849 151 1000
TABLE III
CONFUSION MATRIX – LAD TREE CLASSIFIER (5 FOLD CROSS VALIDATION)
Good Bad Actual (Total)
Good 585 115 700
Bad 183 117 300
Predicted (Total) 768 232 1000
TABLE IV
CONFUSION MATRIX – LAD TREE CLASSIFIER (10 FOLD CROSS VALIDATION)
Good Bad Actual (Total)
Good 597 103 700
Bad 189 111 300
Predicted (Total) 786 214 1000
TABLE V
CONFUSION MATRIX – LAD TREE CLASSIFIER (15 FOLD CROSS VALIDATION)
Good Bad Actual (Total)
Good 614 86 700
Bad 199 101 300
Predicted (Total) 813 187 1000
TABLE VI
CONFUSION MATRIX –LAD TREE CLASSIFIER (20 FOLD CROSS VALIDATION)
Good Bad Actual (Total)
Good 594 106 700
Bad 190 110 300
Predicted (Total) 784 216 1000
6. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
44
Fig. 1 Correctly Classified instances of LAD Tree Classifier
Fig. 2 Classification Accuracy of LAD Tree Classifier
6.2.Performance of REP Tree Classifier
The overall assessment summary of REP Tree Classifier using training set and different cross
validation methods is given in Table VII. The performance of REP Tree Classifier in terms of
Correctly Classified Instances and Classification Accuracy is shown in Fig. 3and Fig. 4. The
confusion matrix for different test mode is given in Table VIII to Table XII. REP Tree Classifier
gives 80% accuracy for the training data set. Various cross validation methods are used to check
its actual performance. On an average, it gives around 71.9% of accuracy for credit risk
estimation.
7. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
45
TABLE VIII
REP TREE CLASSIFIER COMPLETE EVALUATION SUMMARY
Test Mode Correctly
Classified
Instances
Incorrectly
Classified
Instances
Accuracy Mean
absolute
error
Root Mean
Squared
Error
Time Taken
to Build
Model (Sec)
Training
Set
800 200 80% 0.2905 0.3811 0.32
5 Fold CV 717 283 71.7% 0.3458 0.4437 0.78
10 Fold CV 718 282 71.8% 0.3417 0.4424 1.33
15 Fold CV 726 274 72.6% 0.3422 0.4382 0.16
20 Fold CV 719 281 71.9% 0.3368 0.4364 0.11
TABLE VIII
CONFUSION MATRIX – REP TREE CLASSIFIER (ON TRAINING DATASET)
Good Bad Actual (Total)
Good 649 51 700
Bad 149 151 300
Predicted (Total) 798 202 1000
TABLE IX
CONFUSION MATRIX – REP TREE CLASSIFIER (5 FOLD CROSS VALIDATION)
Good Bad Actual (Total)
Good 616 84 700
Bad 199 101 300
Predicted (Total) 815 185 1000
TABLE X
CONFUSION MATRIX – REP TREE CLASSIFIER (10 FOLD CROSS VALIDATION)
Good Bad Actual (Total)
Good 601 99 700
Bad 183 117 300
Predicted (Total) 784 216 1000
TABLE XI
CONFUSION MATRIX – REP TREE CLASSIFIER (15 FOLD CROSS VALIDATION)
Good Bad Actual (Total)
Good 612 88 700
Bad 186 114 300
Predicted (Total) 798 202 1000
8. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
46
TABLE XII
CONFUSION MATRIX – REP TREE CLASSIFIER (20 FOLD CROSS VALIDATION)
Good Bad Actual (Total)
Good 605 95 700
Bad 151 149 300
Predicted (Total) 756 244 1000
Fig. 3 Correctly Classified instances of REP Tree Classifier
Fig. 4 Classification Accuracy of REP Tree Classifier
9. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
47
6.3Comparison of LAD Tree Classifier and REP Tree Classifier
The comparison of performance between LAD Tree Classifier and REP Tree Classifier is
depicted in Fig 5, and Fig. 6 in terms of Correctly Classified Instances and Classification
Accuracy. The complete ranking is prepared based on correctly classified instances, classification
accuracy, MAE and RMSE values and other statistics found using Training Set result and Cross
Validation Techniques. Consequently, it is perceived that REP Tree classifier performs better
than LAD Tree Classifier.
Fig. 5 Correctly Classified Instances Comparison between LAD Tree Classifier and REP Tree Classifier
Fig. 6Classification Accuracy Comparison between LAD Tree Classifier and REP Tree Classifier
10. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
48
7.CONCLUSION
This work investigated the efficiency of two different classifiers namely, LAD Tree Classifier
andREP Tree Classifier for credit risk prediction. Testing is accomplished using the open source
machine learning tool. Also, effectiveness comparison of both the classifiers has been done in
view of different scales of performance evaluation. At last, it is observed that REP Tree Classifier
performs better than LAD Tree Classifier for credit risk prediction by taking various measures
including Classification accuracy and Time taken to build the model.
ACKNOWLEDGMENT
The author expresses her deep gratitude to the Management of IBS Hyderabad, IFHE University
and Operations & IT Department of IBS Hyderabad for constant support and motivation.
REFERENCES
[1] Germano C. Vasconcelos, Paulo J. L. Adeodato and Domingos S. M. P. Monteiro, “A Neural
Network Based Solution for the Credit Risk Assessment Problem,” Proceedings of the IV Brazilian
Conference on Neural Networks - IV CongressoBrasileiro de RedesNeurais pp. 269-274, July 20-22,
1999.
[2] Tian-Shyug Lee, Chih-Chou Chiu, Chi-Jie Lu and I-Fei Chen, “Credit scoring using the hybrid neural
discriminant technique,” Expert Systems with Applications (Elsevier) 23, pp. 245–254, 2002.
[3] Zan Huang, Hsinchun Chena, Chia-Jung Hsu, Wun-Hwa Chen and Soushan Wu, “Credit rating
analysis with support vector machines and neural networks: a market comparative study,” Decision
Support Systems (Elsevier) 37, pp. 543– 558, 2004.
[4] Kin Keung Lai, Lean Yu, Shouyang Wang, and Ligang Zhou, “Credit Risk Analysis Using a
Reliability-Based Neural Network Ensemble Model,” S. Kollias et al. (Eds.): ICANN 2006, Part II,
Springer LNCS 4132, pp. 682 – 690, 2006.
[5] Eliana Angelini, Giacomo di Tollo, and Andrea Roli “A Neural Network Approach for Credit Risk
Evaluation,” Kluwer Academic Publishers, pp. 1 – 22, 2006.
[6] S. Kotsiantis, “Credit risk analysis using a hybrid data mining model,” Int. J. Intelligent Systems
Technologies and Applications, Vol. 2, No. 4, pp. 345 – 356, 2007.
[7] HamadiMatoussi and Aida Krichene, “Credit risk assessment using Multilayer Neural Network
Models - Case of a Tunisian bank,” 2007.
[8] Lean Yu, Shouyang Wang, Kin Keung Lai, “Credit risk assessment with a multistage neural network
ensemble learning approach”, Expert Systems with Applications (Elsevier) 34, pp.1434–1444, 2008.
[9] ArnarIngiEinarsson, “Credit Risk Modeling”, Ph.D Thesis, Technical University of Denmark, 2008.
[10] SanazPourdarab, Ahmad Nadali and Hamid EslamiNosratabadi, “A Hybrid Method for Credit Risk
Assessment of Bank Customers,” International Journal of Trade, Economics and Finance, Vol. 2, No.
2, April 2011.
[11] Vincenzo Pacelli and Michele Azzollini, “An Artificial Neural Network Approach for Credit Risk
Management”, Journal of Intelligent Learning Systems and Applications, 3, pp. 103-112, 2011.
[12] A.R.Ghatge and P.P.Halkarnikar, “Ensemble Neural Network Strategy for Predicting Credit Default
Evaluation” International Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue
7, January 2013 pp. 223 – 225.
[13] Lakshmi Devasena, C., “Adeptness Evaluation of Memory Based Classifiers for Credit Risk
Analysis,” Proc. of International Conference on Intelligent Computing Applications - ICICA 2014,
978-1-4799-3966-4/14 (IEEE Explore), 6-7 March 2014, pp. 143-147, 2014.
[14] Lakshmi Devasena, C., “Adeptness Comparison between Instance Based and K Star Classifiers for
Credit Risk Scrutiny,” International Journal of Innovative Research in Computer and Communication
Engineering, Vol.2, Special Issue 1, March 2014.
11. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
49
[15] Lakshmi Devasena, C., “Effectiveness Assessment between Sequential Minimal Optimization and
Logistic Classifiers for Credit Risk Prediction,” International Journal of Application or Innovation in
Engineering & Management, Volume3, Issue 4, April 2014.
[16] Lakshmi Devasena, C., “Efficiency Comparison of Multilayer Perceptron and SMO Classifier for
Credit Risk Prediction,” International Journal of Advanced Research in Computer and
Communication Engineering, Vol. 3, Issue 4, 2014.
[17] Lakshmi Devasena, C. “Competency Assessment between JRip and Partial Decision Tree Classifiers
for Credit Risk Estimation”, International Journal of Advanced Research in Computer Science and
Software Engineering, Vol. 4 (5), May – 2014, pp. 164-173.
[18] UCI Machine Learning Data Repository – http://archive.ics.uci.edu/ml/datasets.
[19] Geoffrey Holmes, Bernhard Pfahringer, Richard Kirkby, Eibe Frank and Mark Hall. “Multiclass
Alternating Decision Trees”, ECML, pp. 161-172, 2001.
[20] Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques – 2nd ed. the
United States of America, Morgan Kaufmann series in data management systems.
[21] Quinlan J (1987) Simplifying decision trees, International Journal of Man Machine Studies, 27(3), pp.
221–234.
[22] S.K. Jayanthi and S.Sasikala, “REPTree Classifier for indentifying Link Spam in Web Search
Engines”, IJSC, Volume 3, Issue 2, pp. 498 – 505, Jan 2013.
[23] Trilok Chand Sharma and Manoj Jain, “WEKA Approach for Comparative Study of Classification
Algorithm”, International Journal of Advanced Research in Computer and Communication
Engineering,Vol. 2, Issue 4, April 2013, pp 1925 – 1931.
[24] Adesesan B. Adeyemo and OluwafemiOriola, “Personnel Audit Using a Forensic Mining Technique”,
IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010, pp 222 –
231.
[25] G. V. Nadiammai and M. Hemalatha, “Performance Analysis of Tree Based Classification
Algorithms for Intrusion Detection System”, Mining Intelligence and Knowledge Exploration,
Lecture Notes in Computer Science Volume 8284, pp 82-89, 2013.
[26] Ashish Kokare, Pradeep Venkatesan, SwapnilTandel, and HemantPalivela, “Survey On Classification
Based Techniques on Non-Spatial Data”, International Journal of Innovative Research in Science,
Engineering and Technology, Volume 3, Special Issue 1, February 2014, pp. 409 – 413.
[27] K. Wisaeng, “A Comparison of Different Classification Techniques for Bank Direct Marketing”,
International Journal of Soft Computing and Engineering (IJSCE), Volume-3, Issue-4, September
2013, pp. 116 – 119.
[28] M. C. Hansen, R. S. Defries, J. R. G. Townshend and R. Sohlberg, “Global land cover classification
at 1 km spatial resolution using a classification tree approach”, Int. j. remote sensing, vol. 21, no. 6 &
7, pp. 1331–1364, 2000.
[29] P. K. Srimani and Manjula Sanjay Koti, “Medical Diagnosis Using Ensemble Classifiers - A Novel
Machine-Learning Approach”, Journal of Advanced Computing, volume 1, pp. 9-27,
doi:10.7726/jac.2013.1002, 2013.
[30] OlaiyaFolorunsho, “Comparative Study of Different Data Mining Techniques Performance in
knowledge Discovery from Medical Database”, International Journal of Advanced Research in
Computer Science and Software Engineering, Volume 3, Issue 3, March 2013, pp. 11 – 15.
[31] Yongheng Zhao and Yanxia Zhang, “Comparison of decision tree methods for finding active
objects”, Advances of Space Research, Aug 2007.
[32] GregorStiglic , Simon Kocbek and Peter Kokol, “Comprehensibility of Classifiers for Future
Microarray Analysis Datasets”, 2000.
[33] David JohnGagne, Amy McGovern and Jerry Brotzge, “Classification of Convective Areas using
Decision Trees”, American Metrological Society, July 2009, pp. 1341 – 1351.
[34] AbuBakrAwad, MahasenMabrouk, and TahanyAwad, “Performance Evaluation of Decision Tree
Classifiers for the Prediction of Response to treatment of Hepatitis C Patients”, Pervasive Health,
May 2014, pp. 186 – 190.
[35] S.K. Jayanthi and S. Sasikala, “REPTree Classifier for Identifying Link Spam in Web Search
Engines”, ICTACT JOURNAL ON SOFT COMPUTING, January 2013, Volume 3, Issue 2, pp. 498
– 505.
12. International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.1, February 2015
50
[36] Jayshri R. Patel, “Performance Evaluation of Decision Tree Classifiers for Ranked Features of
Intrusion Detection”, Journal of Information, Knowledge and Research in Information Technology ,
Nov 12 to Oct 13, Volume 2, Issue 2, pp. 152 – 155, 2013.