This summarizes a document describing an analogy-based defect prediction model using case-based reasoning. The model retrieves similar past cases from a case base of software modules to predict the number of defects in new modules based on their metrics. It pre-processes data from public NASA repositories, selects relevant metrics as features, and evaluates the model on a test set, achieving predictive performance comparable to other methods. The goal is to identify high-risk modules early to improve software quality.
13. Model Scalability: CBR systems are scalable to very large case libraries, and fast retrieval of cases continues to be of practical advantage. References<br />[Admodt et al. ‘94] <br />Aamodt A., Plaza E., Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches, Artificial Intelligence Communications, vol. 7 (1), 1994. pp. 39-52.<br />[Aha ‘91] <br />Aha, D.W., Case-based learning algorithms, DARPA Case-Based Reasoning Workshop, Morgan Kaufmann, 1991.<br />[Catal et al. ‘08] <br />Catal C., Diri B., A systematic review of software fault prediction studies, Expert Systems with Applications, vol. 36, 2008, pp. 7346-7354.<br />[El Emam et al. ‘01] <br />El Emam K., Benlarbi S., Goel N., Rai S. N., Comparing case-based reasoning classifiers for predicting high risk software components, The Journal of Systems and Software, vol. 55, 2001, pp. 301-320.<br /> [Khoshgoftaar et al. ‘97] <br />Khoshgoftaar T. M., Ganesan K., Allen E. B., Ross F. D., Munikoti R., Goel N., Nandi A., Predicting fault-prone modules with case-based reasoning, 8th International Symposium on Software Reliability Engineering, IEEE Computer Society, 1997, pp. 27–35.<br />[Khoshgoftaar et al. ‘03] <br />Khoshgoftaar T.M., Seliya N., Analogy-based practical classification rules for software quality estimation, Empirical Software Engineering, vol 8, 2003, pp. 325–350.<br />[Kohavi ‘97] <br />Kohavi R., John G.H., Wrappers for feature selection for machine learning, Artificial Intelligence, 1997, pp. 273-324.<br />[Larose ‘05] <br />Daniel T. Larose, DISCOVERINGKNOWLEDGE IN DATA; an Introduction to Data Mining, Wiley, New Jersey, 2005, Chapter 7: Neural Networks, pp. 128-147. <br />[Li et al. ‘08] <br />Jingzhu Li, Ruhe G., Software effort estimation by analogy using attribute selection based on rough set analysis, International Journal of Software Engineering and Knowledge Engineering, Vol. 18 (1), 2008, pp. 1-23.<br />[Little et al. ‘02] <br />Little R. J. A., Rubin D.B., Statistical Analysis with Missing Data, 2nd edition, John Wiley & Sons, New York, 2002.<br />[Mahar et al. ‘06]<br />Mahar K., El Deraa A., software project estimation model using function point analysis with cbr support, IADIS International Conference Applied Computing, 2006, pp. 508-512.<br />[URL 1] URL: http://mdp.ivv.nasa.gov, Last Accessed on 4/12/2009.<br />[URL 2] URL: http://promisedata.org/, Last Accessed on 4/12/2009.<br />[URL 3] URL: http://en.wikipedia.org/wiki/Main_Page, Last Accessed on 4/12/2009.<br />