SlideShare a Scribd company logo
1 of 35
COMBINING STATISTICAL AND MACHINE
LEARNING BASED CLASSIFIERS
IN THE PREDICTION OF CORPORATE FAILURE
S. Dizdarevic, P. Larrañaga, B. Sierra, J.A. Lozano, J.M. Peña
Department of Computer Science and Artificial Intelligence
University of the Basque Country
Abstract
This project presents the application of methods coming from Statistics as well as from an area of
the Artificial Intelligence called Machine Learning, in the problem of the corporate failure
prediction. The empirically compared paradigms applied to a sample of 120 Spanish companies,
60 of which had gone bankrupt, and 60 had not, are Discriminant Analysis, Logistic Regression,
Classification Trees, Rule Induction and Bayesian Networks. Two Artificial Intelligence
techniques - Voting by Majority Principle and Bayesian Formalism -, are implemented in order to
obtain prediction improvement over the single models that are compared. The predictor variables
that gather the accountant information taken for every company over the three years previous to
the date of survey are financial ratios.
1. Introduction
Corporate failure prediction, through classification of known cases and generalization to
other cases, has been a subject of study for almost 30 years. Accurate prediction of corporate
failure is important to investors, creditors and auditors. It also can help shareholders, creditors
and governments to avoid heavy losses stemming from surprise bankrupts. Therefore, using
analytic tools and data from corporate financial reports, one can evaluate and predict its future
financial status.
Although the idea of a company going bankrupt is associated with its disappearance,
before this really happens the company has gone through a long period of crisis with several
stages in it. Many authors comprise them into two main stages taking into account the two senses
of the concept of failure: economic and financial. The economic failure starts when the
profitability of the invested capital is under its costs of opportunity, receiving its owner an
investment yield lower than other alternative opportunities with the same risk. As the economical
failure advances and settles down in the company, the incomes start to be lower than the expenses
appearing the first negative results.
The deterioration produced during the economic failure process, if not corrected, will
drive the company into technical insolvency. This is the first stage of what is called the financial
failure. In this situation the company does not have enough liquid for the payments as these are
increasing.
The breaking point of this ruinous process will be reached when the company is not only
unable to pay off its falling dues but also in a situation of negative net patrimony. This means
that its current liabilities are greater than the value of its assets, so it could soon lead the
company to its disappearance.
The study of the corporate failure must be dealt always through the search of its causes
that we can analyze through their visible symptoms. As Argenti (1976) proposes, it is very
interesting to know the reasons why others companies have failed in order to avoid them in ours.
Nevertheless, the capture of these causes is impossible if not through the discovery of their
symptoms. Some of these causes are the following: management fault, deficiency in the systems
of accounting information, disability of adaptation to the changes of environment, overtrading,
the launch of big projects, abuse of financing by debt, the actual risks from the business world.
As for the symptoms, Argenti accepts as such, the deterioration suffered by the financial ratios,
as the corporate comes closer to failure, and indicating also that accounting manipulation is one
clear symptom of the failure.
According to Platt (1985) different sources that an external economic agent can use to
detect the aforementioned symptoms, can be grouped together into three sets of sources: the
common sense, the analysis of statement of accounts published publicly by the companies and
statistical tools.
2
The use of common sense, although a very simple strategy, has the following advantages:
it does not need sophisticate computers and there is no need of assimilation of financial
conditions, sometimes not easily understandable. All you should do is to pay attention to the daily
reality of the corporate and its environment. Some signs of getting close to a situation of crisis
are: auditor change, development of the relation with the new value, the members of council
resign suddenly, credit lines are reduced or canceled, the sale of stocks done by the directors of
the board, the appreciation of its stocks in the market to the prices inferior to its accounting
value, excess of stock,…
The analysis of the statement of accounts is part of a process of information whose aim is
to provide data for decision making. The idea of failure, and more precisely, the idea of
insolvency has remained connected to the technique of accounting ratios. It was thought that the
ratios are worsen as the corporate approached the crisis process, in this way the deterioration
suffered by the corporate could be measured.
Due to big complicity of information and comprehension contained in financial statement
data, the analysis of financial ratios, which gather all this information, has been the most used
technique. The great interest in comparison between different companies (industrial sector, size,
…) has influenced its use. There are two main difficulties related to financial ratios, their
creation and their interpretation. Another difficulty added to the previous ones is that the same
value of ratio for two companies from different sectors may represent different situations.
The financial information gathered in ratios has to be homogenized, so that it could be used for
description and prediction of corporate failure. The second task is directly related to the use of
Statistics.
Although ignored for half a century by the analysts, nowadays the use of statistical
techniques have became a helpful tool commonly used because they give objectivity to the
analysis. Beaver (1966) was among the pioneers who used them for the analysis of financial
ratios in order to predict corporate failure. In his work, starting from 30 variables-ratio taken
from 79 pairs of companies, failed and non-failed, 6 variables-ratio are selected. An analysis of
profiles is based upon them by comparing the means of the values of every ratio in each group,
failed and non-failed, and observing the important differences, five years before the date of
failure. Beaver developed a dichotomic heuristic test of classification for every ratio by using a
process of trial and error that allowed him to choose the suitable cut-point for every ratio and
every year that minimized the errors of classification.
Nevertheless the univariate model of Beaver contrasts with the inherent character of
multivariable documents of the financial situation. Therefore, in order to make the above
mentioned documents valuables, they will have to be interpreted from a perspective that allows to
think over the several financial aspects of a corporate as a whole. The search of this perspective
has been the reason why several researchers have used multivariate statistical techniques for the
corporate failure prediction.
Altman (1968) was the pioneer in application of Discriminant Analysis to the
aforementioned problem obtaining surprising results. The lineal combination of five ratios in a
profile created a score capable of discriminating between “healthy” and “failed” companies with
very high percentages of success in the two years previous to the failure. The initial work of
Altman was adapted, updated and improved by several researchers. It is worth noticing the works
of Deakin (1972), Blum (1974), Edmister (1972), Libby (1975), Scott (1981), and Taffler
(1982).
The necessity of a statistical alternative to avoid the problems related to the Discriminant
Analysis leads to the use of models of conditional probability, logit and probit, more flexible in
their requirements. Ohlson (1980) is considered as the first author who published a model for the
prediction of failure based on conditional probability models. Though he had no brilliant results
Page - 3 -
his methodology was followed by other authors: Mensah (1983), Zavgren (1985), Casey and
Baztczak (1985), and Peel and Peel (1987).
This chapter is organized as follows. Section 2 presents the features of the case study
used for carrying out the empirical comparison among several paradigms coming from Statistics
and Artificial Intelligence and the combining techniques. These paradigms are explained in
Section 3. Section 4 shows the results obtained for every method in terms of the percentage of
well-classified companies, as well as models descriptions and analysis of the results. In Section 5
it finishes with the conclusion of the work, proposing further research.
2. Problem Description
Starting from the hypothesis that the accounting information pattern of non-failed and
failed companies are different, the fundamental aim of this chapter was to show by means of an
example how to create models, able to predict in advance (1 year, 2 years and 3 years) the failure
of companies. These models could be considered as normative systems as they are founded on the
probability theory.
Following the recent progressive research in Artificial Intelligence two techniques have
been implemented and used for integration of individual models in one, in order to improve
predictive ability of every one.
In this section the problem is presented, dealing with aspects of it such as, the concept of
failure, sample obtaining and validating, selection of financial ratios with which models can be
constructed, and sample for multiple models. A more detailed description of the failure problem
can be found in Lizarraga (1996), which could be considered as one that inspired elaboration of
this project.
The data sample used here was the same that Lizarraga gathered from several Provincial
Trade Register Offices and used for empirical comparison in his doctoral dissertation. The
following is the procedure of how the data sample of 120 companies was selected and formed.
The need of determining the concept of failure to use was the first methodological
problem to solve. Finally, he chose the concept of suspension of payments, given that it is related
not with a specific financial problem but with a situation of profound economic crisis. This
concept presents three fundamental advantages: objectivity, it gives a representative date of the
moment of failure and the large increment in the number of companies which had to turn to it in
the period of study. Finally, the availability of the annual accounts deposited in the several
Provincial Trade Register Office was another aspect that helped Lizarraga to carry out the
empirical work of information gathering.
The sample was made of 120 companies, half of them belonged to a group of companies
classified as “failed” and the other half was classified as “healthy” in order to incorporate them to
the analysis. The selection was carried out by a matching process. Using a list of “failed”
companies previously selected, matching them with a “healthy” corporate of the same size and
industrial sector. This matching process is justified by the convenience of avoiding any possible
distortion effect related with the size and industrial sector. As the access to each Provincial Trade
Register Office was not possible Lizarraga decided to reduce the scope to the 10 provinces with
larger number of records of payment suspension requested during the period of the study. These
4
selected provinces gathered the 63% of the total number of records of payment suspension. The
information was gathered through the Official Bulletin of the Trade Register Office. The interval
of time was of 18 months (from January 1993 to July 1994), and it can be regarded as
representative of a period of severe crisis among the companies in Spain. For every company in
the study the economical and financial data corresponding to the three years previous to the end
of the study were obtained.
Though Lizarraga (1996) validated the model created using a sample of 44 companies
(22 “healthy” and 22 “failed”) different from the ones used to construct the models, in this study
the models are created using a sample of 120. A validation of the original model, based in the 5-
fold cross-validation (Stone 1976) has been carried out, as well as another validation
methodology which will be explained later. Estimates of the goodness of fit of every model, that
is, the probability of the model classifying correctly, are calculated following the next steps: first,
the sample is randomly ordered and then partitioned in 5 disjointed subsets. Secondly, choosing 4
of those subsets as training set, a model - which was tested with the fifth subset -, is obtained.
These steps are repeated five times, using every time a different testing set and obtaining this way
5 percentages p 1 , p 2 , …, p 5 of well-classified cases, as well as the “destiny” (failed or non
failed) of each one of 120 companies distributed between five disjointed test sets. The estimation
of p , the probability with which the model created with the original sample classifies correctly
is given by the following formula:
p
p
=
=
∑
i
i 51
5
.
Figure 1 shows graphically the process of estimation of the goodness of fit of the model with
5-fold cross-validation.
Training 1 ◊ model 1
Test 1 ◊ p 1
Training 2 ◊ model 2
Test 2 ◊ p 2
Training 3 ◊ model 3
Page - 5 -
MODEL
p
p
=
=
∑
i
i 51
5
Test 3 ◊ p 3
Training 4 ◊ model 4
Test 4 ◊ p 4
Training 5 ◊ model 5
Test 5 ◊ p 5
Figure 1. Process of estimation of the goodness of fit of the model with 5-fold cross-validation
The other approach to model validation presented in this chapter is the next one. Using 4
of 5 disjointed subsets (got with 5-fold cross-validation) as training set a model is obtained and
later tested with a sample of the 120 companies, instead of 24 used in previous approach. The
same process of estimation of p - the probability with which the model created with the original
sample classifies correctly -, explained before is used. Figure 2 shows graphically the process of
estimation of the goodness of fit of the model with this validation.
Training 1 ◊ model 1
Test 1 ◊ p 1
Training 2 ◊ model 2
Test 2 ◊ p 2
Training 3 ◊ model 3
MODEL
6
p
p
=
=
∑
i
i 51
5
Test 3 ◊ p 3
Training 4 ◊ model 4
Test 4 ◊ p 4
Training 5 ◊ model 5
Test 5 ◊ p 5
Figure 2. Process of estimation of the goodness of fit of the model with the second validation
Lizarraga selected the explanation variables from the certificate of the Trial Balance,
from the profit and loss account and from the financial chart. The election of these financial
ratios was based on two main criteria:
i) they were frequently mentioned in the literature treating the analysis of financial
statements,
ii) the possibility of calculating them in a direct way or with simple adjustments.
Starting from the 50 variables that Lizarraga had obtained in the previous selection, a
Principal Components Factorial Analysis was made. It allowed to reduce them to 9 ratios which
were used.
These ratios were:
X1. CURRENT ASSETS / CURRENT LIABILITIES
X2. CURRENT ASSETS / TOTAL ASSETS
X3. NET RESULT / TOTAL ASSETS
X4. EARNINGS BEFORE INTEREST AND TAXES / FINANCIAL CHARGES
X5. OWN FUNDS / TOTAL DEBT
X6. SALES / OWN FUNDS
X7. STOCKS / SALES
X8. DEBTORS / SALES
X9. OPERATIVE CASH FLOW º / TOTAL ASSETS
X1 - CURRENT ASSETS / CURRENT LIABILITIES ratio measures the firm’s ability
to meet maturing short-term obligations. It is used to measure liquidity - if the firm has sufficient
cash to meet short-term claims.
X2 - CURRENT ASSETS / TOTAL ASSETS ratio represents the position of current
assets (money, bank, quick assets, stocks…) in total assets (current assets + fixed assets).
X3 - NET RESULT / TOTAL ASSETS ratio is a profitability ratio which explains the
extend to which a company earns the highest return possible for resources used or capital
employed, consistent with the desire and ability to assume risk.
Page - 7 -
X4 - EARNINGS BEFORE INTEREST AND TAXES / FINANCIAL CHARGES ratio
is calculated by dividing the earnings before interest and tax reductions by the financial charges
of a firm. In essence, it is a measure of the true productivity of the firm’s charges. Earnings
before interest and taxes is calculated as: earnings before taxes + financial charges.
X5 - OWN FUNDS / TOTAL DEBT ratio is usually used to measure debt. It is
calculated by dividing its funds by total debt. Own funds are those funds for financing that
consist of capital that the owner deposited at the beginning of firm’s existence, reserves,
amortization, subventions ….
X6 - SALES / OWN FUNDS ratios is calculated by dividing sales by own funds. Sales
represents net income of firm’s business.
X7 - STOCKS / SALES is ratio used to observe rotation of stocks. It is calculated by
dividing stocks (merchandise, raw materials, products terminated and semi-terminated…) by
sales. It represents percentage of stocks in sales.
X8 - DEBTORS / SALES ratio used to observe rotation of current assets. It is calculated
dividing debtors by sales.
X9 - OPERATIVE CASH FLOW º / TOTAL ASSETS ratio is calculated by dividing
total assets into operative cash flow. This last value is calculated with the following formula:
OCF º = traditional cash flow ( net result + amortization + large term reserves ) - increment of
exploitation of current assets + decrement of exploitation of current liabilities.
The several classificatory models constructed - described in the following section - use the
aforementioned ratios.
The process of sample for obtaining the combined classifiers is conducted as follows.
Regardless what internal structure an individual classifier has, and methodology it bases on, a
classifier may be regarded as a function that receives an input individual X and outputs a class j ,
or in short denoted by e(X) = j. In the problem of corporate failure there are only two classes:
non-failed and failed, represented in our case by class 1 and by class 2 respectively. As explained
before an individual X is represented by the set of nine ratio-variable values. Even though some
classifiers may supply some extra information like, probability with which the class j is assigned
to the individual X, only class j was considered as the output information. The decision to use this
approach was based on its generality, as the individual classifiers could be very different from
each other in their methodologies and this kind of information is common for all of them. A good
summary of existing techniques for combining classification results could be found in Xu et al.
(1992) and Ho et al. (1994).
Having the result (class j) for every classifier, these are now considered knowledge from
different sources and used as input data for the combined classifiers. As explained before, two
validation methods are used. Figure 3 shows graphically the process of individual classifiers
output data’s integration when 5-fold cross-validation is used.
PARADIGMS
individual
s
DA LR CN2 CART Bayes
test
set 1
1…24
test
set 2
25…48
8
test
set 3
49…72
test
set 4
73…96
test
set 5
97…120
Figure 3. Process of construction of the sample for combined classifier with
5-fold cross-validation
Elements of pair (test set i , paradigm j), where i = 1 ... 5; and j = 1 … 5, represent the
classes that model i , created with training set i , using paradigm j, assigns to each one of 24
individuals of the test set i.
Figure 4 shows graphically the same process but when using the second validation
explained before.
PARADIGMS
DA LR CN2 CART Bayes
individuals M1 … M5 M1 … M5 M1 … M5 M1 … M5 M1 … M5
1
2
:
:
120
Figure 4. Process of construction of the sample for combined classifier with
the second validation
The pair ( individual i, ( paradigm j, model k )) , where i = 1 ... 120; j = 1 … 5; and
k = 1 … 5, represents the class that model k (M k) created with training set k , using
paradigm j, assigns to the individual i.
Methodologies used to integrate individual classifiers that use aforementioned matrixes
are described in the following section.
3. Methodologies
3.1 Methodologies - Individual Classifiers
Supervised classification is one of the tasks most frequently carried out by the so-called
Intelligent Systems. Thus, a large number of paradigms developed either by Statistics (Logistic
Regression, Discriminant Analysis, K-N-N) or by Artificial Intelligence (Neural Networks, Rule
Induction, Classification Trees, Bayesian Networks) are capable of carrying out classification
tasks.
The comparison among the previous paradigms cannot be carried out in an analytical
way, thus it is usual to compare them empirically, - Michie et al. (1994).
Page - 9 -
Five of the previous paradigms have been selected to make this comparison -
Discriminant Analysis, Logistic Regression, Classification Trees, Rule Induction and Bayesian
Networks.
3.1.1 DISCRIMINANT ANALYSIS
Discriminant Analysis - introduced by Fisher (1936) - is a statistical technique used in
Pattern Matching. The main use of Discriminant Analysis is to predict group membership from a
set of predictors. Discriminant function analysis consists of finding a transformation which gives
the maximum ratio of difference between a pair of group multivariate means to the multivariate
variance within the two groups. Accordingly, an attempt is made to delineate based upon
maximizing between group variance while minimizing within group variance. The predictors
characteristics are related to form groups based upon similarities of the distribution in the p-
dimensional space, which are then compared to groups which are input by user as truth. This
enables the user to test the validity of groups based upon actual data, to test groups which have
been created, or to put objects into groups. Two things can then be done with Discriminant
Analysis (DA) : analysis and classification. Analysis is related to how the material is to be
interpreted i.e., are there any differences between the groups and which variables make these
differences?. Classification is used to find a set of variables - which provide satisfactory
discrimination - so that classification equations can be derived, their use checked out through
hit/rate tables, and if good, they can be used to classify new subjects who were not in the original
analysis. DA creates a discriminant lineal function starting from the predictor variables x1, x2, …,
xp and the results of this function are used later as a classification criteria. The discriminant
function takes this form:
Z = a1x1 + a2x2 + … +apxp
xj are explaining variables (j = 1, 2 , …, p),
aj real coefficients estimated by the model (j = 1, 2 , …, p),
Z discriminant score.
The discriminant function is used for classifying new individuals starting from rules in
the following way:
“If a a ... a ci i
p p
i
1 1 2 2x x x+ + + > then individual i is classified as 0,
on the contrary is classified as 1.”
The procedure PROC DISCRIMINANT of SPSS software has been used in this project.
The method selected to model construction was MAHAL, and variables were selected using
stepwise selection. Selection rule was: maximize minimum Mahalanobis distance between
groups. For each model some of the p variables were selected, and their corresponding
coefficients were calculated.
3.1.2 LOGISTIC REGRESSION
Logistic Regression (Hosmer and Lemeshow (1989)) is a method coming from Statistics
whose objective is to obtain a functional relationship between a transformation - from a
qualitative variable - called logit and p predictor variables which can be either quantitative or
qualitative.
It is used to develop a model which attempts to adjust the best and to be sufficiently
reasonable to describe the relation between the result (dependent variable or the response) and the
set of independent (or explanatory) variables. The fundamental characteristic of this regression is
10
that the dependent variable is dichotomic. Mathematically the function used in logistic
distribution is extremely flexible and easy to use.
Indicating the dichotomic variable to predict by Y and the p predictor variables by
x1, ..,xp, the objective is to determine the coefficients β β β0 1, ,..., p in order to satisfy the logit
transformation formula:
g(X) = ln
P Y
P Y
x xp p
( )
( )
=
=
= + + +
1
0 0 1 1β β β
The coefficients ( β0 is the intercept and β β1 ,..., p are the p parameters)
determination is carried out by the method of maximum likelihood. These coefficients are
interpreted in terms of "odd-ratios", and the variables selection can be carried out by three
methods: forward, backward or stepwise. Stepwise method is the most commonly used one. In it
the variables are selected to be included or excluded from a statistical criteria.
After obtaining g(X), a classification model can be constructed. The logistic regression
model is described by the following formula:
π(x) =
e
g X
e
g X
( )
( )
1+
.
It is used to classify new individuals starting from rules in the following way:
“ If π(x) > c then individual i is classified as 0,
on the contrary is classified as 1”.
For this work the PROC LOGISTIC procedure of the SAS software has been used for the
empirical comparison.
3.1.3 RULE INDUCTION
In the task of constructing Expert Systems, systems for inducing concept descriptions
from examples have proved useful in easing the bottleneck of knowledge acquisition. One of these
induction systems, CN2 (The Turing Institute (1988)), has been used as a representative of the
approach called Machine Learning. CN2 was designed for the efficient induction of simple,
comprehensive production rules in domains where problems of poor description language and/or
noise may be present. CN2 produces an ordered list of if-then rules, rather than an unordered set
of if-then rules, but also is possible to produce the last one changing the evaluation function. The
rules induced by CN2 are of the form:
"if complex then predict class"
where complex is a conjunction of attribute tests.
In order to carry out a classification of new examples CN2 applies an interpretation in
which each rule is tried in order until one is found whose conditions are satisfied by the example
that is wanted to be classified. The resulting class prediction of this rule is then assigned as the
class of that example. If no induced rules are satisfied, the final default rule assigns the most
commonly occurring class in the training data for all new examples.
The learning algorithm of CN2 works in an iterative fashion, by means of searching in
each iteration for a complex covering a large number of examples of a single class C and few of
Page - 11 -
other classes. Having found a good complex, those examples it covers are removed from the
training set and the rule "if complex then predict C" is added to the end of the rule list. This
process iterates until no more satisfactory complexes can be found.
In this project The CN2 Induction Algorithm developed by The Turing Institute has been
used with the option to produce ordered list of if-then rules (The CN2 Induction Algorithm).
3.1.4 CLASSIFICATION TREES
The Classification and Regression Trees (CART) software was developed by Breiman et
al. (1984). CART is used for classification or regression analysis of large, complex data sets
containing many variables, and is used to construct both binary prediction and classification trees
by recursive partitioning (or targeted stratification). CART can construct a binary prediction tree
with a structure which is easy to understand, interpret, and use. Tree construction starts by using
computationally intensive algorithm that recursively searches over all the variables to produce a
sequence of optimal binary splits, so that an extremely large tree is grown. A pruning algorithm is
then applied which cuts off all branches of the tree that impair the overall accuracy. The result is
a parsimonious decision tree: the simplest tree that gives the maximum accuracy. The tree can be
inspected and used to modify the staging algorithm. The procedure is repeated until the best set is
found. Cross-validation can be performed a number of times with each analysis.
An introduction to Classification Trees paradigm which is a popular representation of
classifiers. The interior nodes of a classification tree are tests applied to instances during
classification. Branches from an interior node correspond to the possible test outcomes.
Classification begins with the application of the root node test, its outcome determining the
branch to a succeeding node. The process is recursively applied until a leaf node is reached. Then
the instance is labeled with the class of the leaf node, and the process halts. The trees are
constructed beginning with the root of the tree and proceeding down to its leaves.
The family's palindrome name (TDIDT) emphasizes that its members carry out the top-
down induction of decision trees. The patriarch of this family is Hunt's concept learning system
(CLS) framework, (Hunt et al. (1966)). ID3 (Quinlan (1979) and Quinlan (1986)), one of the
most famous TDIDT was developed from CLS, and used a selection measure based on the
computation of an information gain for each variable, and the variable that maximizes this gain is
selected. A notable disadvantage of this measure is that it is biased towards selecting variables
with many different values. ACLS (Patterson et al. (1983)) and ASSISTANT (Cestnik et al.
(1987)), acknowledges ID3 Quinlan (1979) as its direct ancestor. ASSISTANT has been used in
several medical domains with promising results. In our example the statistical software SPAD.S
has been used. In it the approximation proposed by Breiman et al. (1984) has been implemented.
The procedure used for classification tree’s construction was DISAR. This procedure constructs
complete binary decision trees for discrimination of 2 groups, categorized by modalities of one
nominal variable.
3.1.5 BAYESIAN NETWORKS
While the first attempts to building Expert Systems regarded probabilities as an
underlying formalism, the large number of parameters to estimate - 2 1n
− for the case of n
dichotomic variables - forced the researches to reject this massive approach and instead adopt
probabilistic models based on the hypothesis of independence among variables. These models
presented the advantage of their simplicity but they were incapable of giving good solutions in
problems with a high degree of interdependence among variables.
As a consequence of these extreme positions the probability was not taken into account in
the building of Expert Systems until the late 80s when Lauritzen and Spiegelhalter (1988)
12
developed an algorithm for the propagation of evidence. This algorithm allowed probabilistic
reasoning within graphical models which represented conditional independence among the
variables of the system.
Excellent introductions to Bayesian Network paradigm can be found in Castillo et al.
(1997) and Jensen (1996).
One possible classification of the structural learning methods can be carried out by
grouping them into two main groups.
(i) Methods based on hypothesis tests which check the assumption of conditional
independence among variables and create a Bayesian Network structure which
represents the obtained independence.
(ii) Methods based on score and search, in which a measure of the goodness of fit
(likelihood, entropy, percentage of well-classified) of a particular structure is
defined as well as, a searching procedure over the space of all possible structures of
Bayesian Networks. This searching procedure usually works like a greedy
algorithm.
In our case we used Genetic Algorithms as an intelligent metaheuristic in the searching
process. Genetic Algorithms (Goldberg (1989)) use a direct analogy with the natural behavior.
They work with a population of individuals, each individual representing a feasible solution to a
given problem. To each individual we assign a value or score according to the goodness of that
solution represented by that individual.
The better the adaptation of the individual to the problem, the more probable is that the
individual will be selected for reproduction, crossing its genetic material with other individual
selected in the same way. This crossover will produce new individuals - offspring of the previous
- which share some of the features of their parents. In this way a new population of feasible
solutions is produced, replacing the previous one and verifying the interesting property of having
a greater proportion of good features than the previous population. Thus, through these
generations, good features are propagated through the population. Favoring the cross of the fittest
individuals the most promising areas of the search space will be explored.
Figure 5 summarizes the pseudo-code for the so-called Abstract Genetic Algorithm. In it, the
parent selection doesn't need to be made by assigning to each individual a value proportional to
its objective function, as is usual in the so-called Simple Genetic Algorithm. This selection can be
carried out by any function that selects parents in a natural way. It is worth noticing that
descendants are not necessarily the next generation of individuals, but that this generation is made
up of the union of parents and descendants. That is why the operations of extension and reduction
in the cycle are required.
begin AGA
Make initial population at random
WHILE NOT stop DO
BEGIN
Select parents from the population
Produce children from the selected parents
Mutate the individuals
Extend the population by adding the children to it
Reduce the extended population
END
Output the best individual found
end AGA
Figure 5. The pseudo-code of the Abstract Genetic Algorithm
Page - 13 -
The individuals of the Genetic Algorithm will be Bayesian Network structures. A
Bayesian Network structure, for a fixed domain with n variables, can be represented by a n x n
connectivity matrix C, where its elements, cij , can be defined as:
c
j
ij =



1 if is a parent of ,
0 otherwise.
ι
In this genetic approach, an individual of the population is represented by the string:
c c c c c c c c cn n n n nn11 21 1 12 22 2 1 2   
As can be seen, in the case that a total order among the variables is assumed, Larrañaga
et al. (1996a), the usual genetic operators are closed operators with respect to the DAG
conditions. In the more general case in which there is no assumption of order among the
variables, Larrañaga et al. (1996b), the usual genetic operators are not closed operators and to
assume the closeness a repair operator is introduced. The objective of this repair operator is to
transform the children structure that do not verify the DAG conditions into DAGs, by randomly
eliminating the edges that invalidate the DAG conditions.
This approach, as can be seen in the previous reference has yielded good results in the
reconstruction of the ALARM net. Although, in the corporate failure problem the cardinal of the
searching space is not too large, the previous approach has been used. See Larrañaga et al.
(1996c) in order to consult another approach about the problem of structural learning of
Bayesian Network in which an individual is a cyclic permutation of n considered variables, and
the Genetic Algorithm uses crossover and mutation operators developed for the Traveling
Salesman Problem.
Genetic Algorithms have been used as optimizing means in other combinatorial problems
that arise from the Bayesian Networks context. Thus, for example, in Larrañaga et al. (1997)
they obtain good decomposition of the moral graph associated with the propagation algorithm
proposed by Lauritzen and Spiegelhalter. Finally in Larrañaga et al. (1996d) the problem of the
fusion of Bayesian Network proposed by different authors, seeking the consensual BN is handled.
Taking into account that in a Bayesian Network any variable is only influenced by its
Markov Blanket, that is, its parent, children variables and the parent variables of his children
variables, it seems to be intuitive to search in the set of structures that are Markov Blanket of the
special variable (Sierra and Larrañaga (1997)).
Now, individuals in the Genetic Algorithms are Markov Blankets for the variable to be
classified. One operator has been introduced that guarantees that the obtained children comply
with a Markov Blanket of the variable to be classified. On the other hand the score used to search
for the best Markov Blanket uses the percentage of well-classified individuals obtained by
applying the evidence propagation feature of the HUGIN software, Andersen et al. (1989).
3.2 Methodologies - Combining Classifiers
Recently in the area of Artificial Intelligence the concept of combining classifiers is
proposed as a new direction for the improvement of the performance of individual classifiers.
These classifiers could be based on a variety of classification methodologies, and could achieve
different rate of correctly classified individuals. The goal of classification result integration
algorithms is to generate more certain, precise and accurate system results. As explained in Xu et
al. (1992), there are a couple of problems in the process of classifiers combination to be solved.
14
First of all there is a problem of “how many classifiers to choose? and what kind of classifiers
should they be?”. The second problem is to determinate the adequate technique to use to combine
the results of chosen classifiers so that a better result could be obtained.
In this work five individual classifiers are chosen coming from two different areas,
Statistics and Machine Learning. To tackle the second problem a large variety of combining
techniques was consulted, and finally two of the several approaches proposed in Xu et al. (1992),
were chosen. These techniques are The Voting by Majority Principle and The Bayesian
Formalism. Decision to use this combining approaches has been influenced by their generality.
They allow to integrate different classifiers based on different methodologies, which at least can
supply the output information at the abstract level - class j -, what is the only requirement these
techniques have.
The following are the two techniques for the combination of individual classifiers used.
The combination is based on the output information of the abstract level. The representation that
will be used to explain their characteristics is the following:
“Given K individual classifiers, ek k=1,…K, each of which assigns to the input X a
label jk , i.e. produces an event ek (X) = jk , the problem is to use these events to build an
integrated classifier E, which gives X one definitive label j, i.e. E(X) = j, and j ∈ Λ ∪ {M+1},
where Λ = {1,2,…M}, representing a set of specified patterns called class and {M+1}denotes
that e has no idea which class X comes from, or in other words, X is rejected by e.” (e.g. M=2 for
corporate failure problem, and K=5 in this work - DA, LR, CN2, CART, and BAYES ).
3.2.1 VOTING BY MAJORITY PRINCIPLE
As indicated before, the problem is to produce a new event E(X) = j from the given
events ek (X) = jk , k=1,…K, where the following equation may not necessarily hold:
e X e X e XK1 2( ) ( ) ... ( ).= = =
That is, conflicts may exist among the decisions of K classifiers. A simple and common
rule used for solving this kind of conflicts in human social life is Voting By Majority Principle. In
it if the majority of the K classifiers vote for the same label then the combining classifier takes
this label as the final result, where majority means that more than a half of the classifiers ek vote
for the same label.
The most general majority voting rule is the following:
E X
j if T X j max T X i
K
M otherwise
E i E
( )
, ( ) ( )
,
=
= = = >
+




∈Λ
2
1
where T X i T X i i ME k
k
K
( ) ( ), ,...= = = =
=
∑1
1
and T X i
if e X iandi
otherwisek
k
( )
, ( )
,
= =
= ∈


1
0
Λ
Page - 15 -
that is, the combined classifier E decides that X comes from class j, if the majority of the
K classifiers decide that X comes from class j, otherwise it rejects X. It is easy to see that
classifiers that reject X have no impact on the combined E, unless the majority of them reject X
or if the number K is even and the half of them vote for one class and half for another class. In
this project the individual classifiers are five and none of them has the option to decide that class
of individual X is unknown, so there won’t be any situation where integrated classifier E won’t
know which class to assign to X or in other words there won’t be any X rejected by E.
3.2.2 THE BAYESIAN FORMALISM
In the previous voting method that combine the results of individual classifiers, it is only
based on the label outputted by each classifier (i.e., the event e X jk k( ) = ). Each of ek (X) =
jk ’s is equally treated as one vote without considering the error of each ek itself. This method
will take these errors into consideration, representing each classifier’s error by a confusion matrix
that is given by:
PT
n n n
n n n
n n n
k
k k
M
k
k k
M
k
M
k
M
k
MM
k
=












11 12 1
21 22 2
1 2
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
...
...
: : ... :
...
for k=1,2…K (1)
where each row i corresponds to class i and each column j corresponds to the event
e X jk ( ) = . Thus, an element nij
k( )
denotes that nij
k( )
samples of class i have been assigned a
label j by ek . It follows from (1) that the size of the sample is:
N nk
ij
k
j
M
i
M
( ) ( )
=
==
∑∑
11
(2)
in which the number of cases in each class i is:
n n i Mi
k
ij
k
j
M
•
( ) ( )
, ,...= =
=
∑1
1 (3)
and the number of cases that are assigned class j by ek is:
n n j Mj
k
ij
k
i
M
•
( ) ( )
, ,... .= =
=
∑1
1 (4)
For an event e X jk ( ) = of an error-bearing classifier ek , its truth (i.e., X comes from
class j) has uncertainty. With the knowledge of its confusion matrix PTk , such an uncertainty
could be described by the conditional probabilities that propositions X ∈ i, i=1,…M are true
under the occurrence of the event e X jk ( ) = , that is:
16
P X i e X j
n
n
n
n
k
ij
k
j
k
ij
k
ij
k
i
M( / ( ) ) .
( )
•
( )
( )
( )
∈ = = =
=
∑1
(5)
From another point of view , the confusion matrix PTk could be regarded as the prior
knowledge of an expert. The expert can express his belief for the event e X jk ( ) = , by a real
number bel(.) , called belief value. The higher the bel(.) he gives to a proposition, the more
likely it is true. With the knowledge of PTk , bel(.) for proposition X i∈ is given in the form
of a conditional probability, given by (5), represented with the following equation:
bel X i e X j P X i e X j
n
n
n
n
k k
ij
k
j
k
ij
k
ij
k
i
M( / ( ) ) ( / ( ) ) .
( )
•
( )
( )
( )
∈ = = ∈ = = =
=
∑1
(6)
With K classifiers e e eK1 2, ,..., ,there will be K matrices PT PTK1 ,..., . When
these classifiers are used on the same input X, K events ek (X) = jk , k=1,…K will happen.
Each ek (X) = jk and its corresponding PTk could supply a set of:
bel X i e X jk( / ( ) )∈ = , i=1,…M.
Now, the question is how to integrate these individual supports to give the combined
value bel i( ) . Xu et al. (1992) deduce that bel i( ) could be calculated as follows:
bel i P X i e
k
X
k
K
j
k
( ) ( / ( ) )= ∈
=
∏ =η
1
(7)
with η as an constant that ensures that bel i
i
M
( ) =
=
∑ 1
1
(since x ∈ i, i=1,…M are
mutually exclusive and exhaustive). That is:
1
11η
= ∈
=
∏
=
∑ =P(X i / e
k
k
K
i
M
(X) j
k
). (8)
Finally, depending on these bel i( ) values, integrated classifier E can classify X into a
class according to the decision rule given by:
E X
j ifbel j max
i
beli
M otherwise
( )
, ( ) ()
,
=




=
∈
+
Λ
1
that is, E decides that X comes from class j, if the belief value bel j( ) is the greater
than any other belief bel i( ) i=1,…M, otherwise it rejects X or in other words E has no idea
about which class X comes from.
4. Results
Page - 17 -
In this section the results of applying the methods described in the previous section to the
problem of corporate failure are explained in full detail. The models for each one of five
paradigms used in this project, and for a period of time before failure (1 year, 2 years and 3
years) are shown in this section. The results are represented with the percentage of well-classified
companies for each paradigm and period.
Discriminant Analysis (DA)
Experiments with this paradigm were carried out using the SPSS software. The
following are the classifiers, obtained with a sample of 120 companies, for each one of the 3
consecutive years before the failure, using DA.
Three years before the failure
S(F) = -2.0081765 + X4*2.2476778 + X5*2.529652
Scores for the centers of groups are: - class 1 = -0.33467
- class 2 = 0.33467
X4= Earnings before interests and taxes / financial charges
X5= Own funds / Total debt
S(F): Score of Failure
where:
“if S(F) > 0 then the corporate is classified as failed,
on contrary as non-failed”
Two years before the failure
S(F) = -1.4674983 + X2*2.3656861 + X3*-8.1494602 + X5*3.1317845
Scores for the centers of groups are: - class 1 = -0.63454
- class 2 = 0.63454
X2= Current assets / Total assets
X3= Net result / Total assets
X5= Own funds / Total debt
X7= Stocks / Sales
S(F): Score of Failure
where:
“if S(F) > 0 then the corporate is classified as failed,
on contrary as non-failed”
One year before the failure
S(F) = -0.5673571 + X3*9.364902 + X5*0.5510232
Scores for the centers of groups are: - class 1 = -0.75698
- class 2 = 0.75698
18
X3= Net result / Total assets
X5= Own funds / Total debt
S(F): Score of Failure
where:
“if S(F) > 0 then the corporate is classified as failed,
on contrary as non-failed”
Logistic Regression (LR)
Experiments with this paradigm were carried out using the SAS software. The following
are the classifiers, obtained with a sample of 120 companies, for each one of the 3 consecutive
years before the failure, using LR. The following probability function:
P(F) =
e
g X
e
g X
( )
( )
1+
is the same for each one of the three models.
where: P(F) is The Failure Probability and
“if P(F) > 0.5 then the corporate is classified as failed,
on contrary as non-failed”
Three years before the failure
g(X) = -1.3287 + X4*1.4877 + X5*1.6896
X4= Earnings before interests and taxes / financial charges
X5= Own funds / Total debt
Two years before the failure
g(X) = -0.4999 + X3*-13.5892 + X5*3.8058
X3= Net Result / Total Assets
X5= Own funds / Total debt
One year before the failure
g(X) = -1.2854 + X3*26.1304 + X5*1.3535
X3= Net Result / Total Assets
X5= Own funds / Total debt
Rule Induction (CN2)
Page - 19 -
Experiments with this paradigm were carried out using the CN2 algorithm. The
following are classifiers obtained with a sample of 120 companies as a training set for each one
of the 3 consecutive years before the failure, using CN2 algorithm. Models are represented with
ordered if-then rules lists, as it can be seen in Figure 10, 11, and 12.
Three years before the failure
if ((x4 < 0.70) && (x8 < 0.11)) then X=NON-FAILED
else if (( x2 > 0.52 ) && (x3 > 0.01) && ( x5 > 0.68 ) && (x9 > 0.04)) then X=FAILED
else if (( x2 > 0.32 ) && ( x4 > 0.76 ) && ( x9 < 0.15 )) then X=FAILED
else if (( x1 > 0.06 ) && ( x2 < 0.51 ) && ( x8 > 0.17 )) then X=NON-FAILED
else if (( x1 > 0.16 ) && ( x5 < 0.56 ) && ( x7 < 0.09 ) && (x8 > 0.19)) then X=FAILED
else if (( x2 > 0.88 ) && ( x6 > 0.22 )) then X=NON-FAILED
else if (( x2 < 0.87 ) && ( x6 < 0.19 ) && ( x7 > 0.14 )) then X=NON-FAILED
else if (( x4 < 0.39 ) && ( x6 < 0.26 )) then X=FAILED
else if (( x1 < 0.98 ) && ( x6 < 0.41 )) then X=NON-FAILED
else if (( x2 > 0.74 ) && ( x3 > 0.02)) then X=FAILED
else if ((x4 < 0.61) && (x9 > 0.19)) then X=NON-FAILED
else if ((x3 < 0.25) && ( x9 > 0.08 )) then X=FAILED
else if (( x1 < 0.90 ) && ( x4 > 0.08) && ( x7 > 0.05)) then X=NON-FAILED
else X=FAILED
Figure 10. Rule Induction model for three years before the failure
Two years before the failure
if ((x4 > 0.17) && (x5 > 0.50) && (x8 > 0.19)) then X=FAILED
else if (( x3 > 0.09 ) && ( x8 > 0.20 )) then X=NON-FAILED
else if (( x1 > 0.06 ) && ( x7 > 0.30 ) && ( x7 < 0.44 )) then X=NON-FAILED
else if (( x1 > 0.02 ) && ( x6 > 0.75 ) && ( x9 < 0.19 )) then X=FAILED
else if (( x1 > 0.25 ) && ( x4 > 0.06 ) && ( x9 < 0.05 )) then X=NON-FAILED
else if (( x1 > 0.11 ) && ( x2 > 0.74 ) && ( x9 < 0.24 )) then X=FAILED
else if (( x4 > 0.08 ) && ( x7 > 0.22 ) && ( x7 < 0.34 )) then X=NON-FAILED
else if (( x1 < 0.72 ) && ( x2 > 0.48 ) && ( x2 < 0.71 ) && ( x3 < 0.10 )) then X=FAILED
else if (( x3 < 0.10 ) && ( x6 > 0.10 )) then X=NON-FAILED
else if ( x2 > 0.69 ) then X=FAILED
else X=NON-FAILED
Figure 11. Rule Induction model for two years before the failure
One year before the failure
20
if (x4 < 0.80) then X=NON-FAILED
else if (( x1 > 1.41 ) && ( x4 > 0.87 ) && ( x7 < 0.36)) then X=FAILED
else if (( x4 > 0.95 ) && ( x5 < 0.22 )) then X=NON-FAILED
else if (( x5 < 0.36 ) && ( x8 > 0.03 )) then X=FAILED
else if (( x4 < 1.90 ) && ( x6 > 4.69 )) then X=NON-FAILED
else if (( x1 > 0.81 ) && ( x1 < 1.38 ) && ( x7 < 0.23 )) then X=FAILED
else if (( x8 > 0.17 ) && ( x7 < 0.67 )) then X=NON-FAILED
else X=FAILED
Figure 12. Rule Induction model for one year before the failure
The failure probability is always 100%, whether the corporate is classified as failed or
as non-failed.
For The Classification Trees and Bayesian Networks paradigms, variables has been
categorized into three categories. The following is the explanation of how this process has been
carried out for each one of the 3 years. For each group of 60 healthy companies and 60 of failed
companies, and for each one of the nine ratio-variables means were calculated, in order to replace
the missing values that the original sample contained. Later, for each ratio-variable, using
FREQUENCIES and DESCRIPTIVES procedures from the SPSS software, their distributions
were analyzed. And finally, the values of each ratio-variable were separated in three categories -
1, 2, and 3 -, according to their cumulative percentages. The next figure shows the process of
categorization.
*******************"3 years before the failure"************************
x1 (lowest through 0.30=1) (0.31 through 0.66=2) (0.67 through highest=3)
x2 (lowest through 0.63=1) (0.64 through 0.77=2) (0.78 through highest=3)
x3 (lowest through 0.02=1) (0.03 through 0.06=2) (0.07 through highest=3)
x4 (lowest through 0.21=1) (0.22 through 0.61=2) (0.62 through highest=3)
x5 (lowest through 0.24=1) (0.25 through 0.51=2) (0.52 through highest=3)
x6 (lowest through 0.32=1) (0.33 through 0.68=2) (0.69 through highest=3)
x7 (lowest through 0.11=1) (0.12 through 0.22=2) (0.23 through highest=3)
x8 (lowest through 0.21=1) (0.22 through 0.32=2) (0.33 through highest=3)
x9 (lowest through 0.07=1) (0.08 through 0.16=2) (0.17 through highest=3)
******************"2 years before the failure"************************
x1 (lowest through 0.26=1) (0.27 through 0.62=2) (0.63 through highest=3)
x2 (lowest through 0.61=1) (0.62 through 0.74=2) (0.75 through highest=3)
x3 (lowest through 0.02=1) (0.03 through 0.06=2) (0.07 through highest=3)
x4 (lowest through 0.24=1) (0.25 through 0.49=2) (0.50 through highest=3)
Page - 21 -
x5 (lowest through 0.20=1) (0.21 through 0.43=2) (0.44 through highest=3)
x6 (lowest through 0.31=1) (0.32 through 0.59=2) (0.60 through highest=3)
x7 (lowest through 0.13=1) (0.14 through 0.24=2) (0.25 through highest=3)
x8 (lowest through 0.24=1) (0.25 through 0.36=2) (0.37 through highest=3)
x9 (lowest through 0.05=1) (0.06 through 0.12=2) (0.13 through highest=3)
*******************"1 year before the failure"***********************
x1 (lowest through 1.05=1) (1.06 through 1.46=2) (1.47 through highest=3)
x2 (lowest through 0.63=1) (0.64 through 0.77=2) (0.78 through highest=3)
x3 (lowest through -0.01=1) (0.0 through 0.04=2) (0.05 through highest=3)
x4 (lowest through 0.89=1) (0.90 through 1.89=2) (1.90 through highest=3)
x5 (lowest through 0.32=1) (0.33 through 0.85=2) (0.86 through highest=3)
x6 (lowest through 2.89=1) (2.90 through 5.89=2) (5.90 through highest=3)
x7 (lowest through 0.11=1) (0.12 through 0.22=2) (0.23 through highest=3)
x8 (lowest through 0.21=1) (0.22 through 0.32=2) (0.33 through highest=3)
x9 (lowest through -0.05=1) (-0.04 through 0.05=2) (0.06 through highest=3)
Figure 6. The process of categorization of ratio variables
Classification Trees (CART)
To carry out experiments with this paradigm the SPAD.N and the SPAD.S software
have been used. The following are the classifiers obtained with a sample of 120 companies as a
training set, for each one of the 3 consecutive years before the failure, using the procedure
DISAR. The models that represent these classifiers are the binary classification trees shown in
Figures 7, 8 and 9.
Three years before the failure
Node 1
Size = 120
H=60 and F=60
X7 /= 1 X 7= 1
Node 2 Node 3
Size = 84 Size = 36
22
H=48 and F=36 H=12 and F=24
X5 = 3 X5 /= 3 X4 = 3 X4 /= 3
Node 4 Node 5 Node 6 Node 7
Size = 26 Size = 58 Size = 13 Size = 23
H=11 and F=15 H=37 and F=21 H=2 and F=11 H=10 and F=13
FINAL NODE
X1 /= 1 X1 = 1 X1 /= 2 X1 = 2 X2 = 3 X2 /= 3
Node 8 Node 9 Node 10 Node 11 Node 14 Node 15
Size = 16 Size = 10 Size = 43 Size = 15 Size = 4 Size = 19
H=4 and F=12 H=7 and F=3 H=4 and F=12 H=12 and F=3 H=0 and F=4 H=10 and F=9
FINAL NODE FINAL NODE
X6 = 3 X6/= 3 X2 /= 1 X2 = 1 X5 = 2 X5/= 2
Node 16 Node 17 Node 18 Node 19 Node 20 Node 21
Size = 7 Size = 9 Size = 6 Size = 4 Size = 21 Size = 22
H=0 and F=7 H=4 and F=5 H=3 and F=3 H=4 and F=0 H=9 and F=12 H=16 and F=6
FINAL NODE FINAL NODE
X8 /= 2 X8 = 2 X8 /= 1 X8 = 1 X7 /=3 X7 = 3
Node 34 Node 35 Node 36 Node 37 Node 40 Node 41
Size = 7 Size = 2 Size = 4 Size = 2 Size = 9 Size = 12
H=2 and F=5 H=2 and F=0 H=3 and F=1 H=0 and F=2 H=1 and F=8 H=8 and F=4
FINAL NODE FINAL NODE FINAL NODE
X3 /= 2 X3 = 2 X3 = 3 X3 /= 3 X3 = 3 X3 /= 3
Node 68 Node 69 Node 80 Node 81 Node 82 Node 83
Size = 4 Size = 3 Size = 1 Size = 8 Size = 2 Size = 10
H=0 and F=4 H=2 and F=1 H=1 and F=0 H=0 and F=8 H=0 and F=2 H=8 and F=2
FINAL NODE FINAL NODE FINAL NODE FINAL NODE FINAL NODE FINAL NODE
Node 21
Size = 22
H=16 and F=6
X3 = 3 X 3 /= 3
Node 42 Node 43
Size = 7 Size = 15
H=7 and F= 0 H=9 and F= 6
FINAL NODE
X2 = 3 X2 /= 3
Node 86 Node 87
Size = 8 Size = 7
Page - 23 -
H=6 and F=2 H=3 and F=4
X6 /= 2 X6 = 2 X1 = 3 X /= 3
Node 172 Node 173 Node 174 Node 175
Size = 7 Size = 1 Size = 4 Size = 3
H=6 and F=1 H=0 and F=1 H=3 and F=1 H=0 and F=3
FINAL NODE FINAL NODE FINAL NODE FINAL NODE
Node 15 Node 121
Size = 19 Size = 8
H=10 and F=9 H=5 and F=3
X9 /= 2 X9 = 2 X3 = 3 X3 /= 3
Node 30 Node 31 Node 242 Node 243
Size = 16 Size = 3 Size = 6 Size = 2
H=10 and F=6 H=0 and F=3 H=10 and F=6 H=2 and F=0
FINAL NODE FINAL NODE
X3 /= 1 X3 = 1 X9 = 3 X9 /= 3
Node 60 Node 61 Node 484 Node 485
Size = 13 Size = 3 Size = 4 Size = 2
H=7 and F=6 H=3 and F=0 H=1 and F=3 H=2 and F=0
FINAL NODE FINAL NODE FINAL NODE
X3 /= 1 X3 = 1
Node 120 Node 121
Size = 5 Size = 8
H=2 and F=3 H=5 and F=3
FINAL NODE
Figure 7. Classification Tree model for three years before the failure
Two years before the failure
Node 1
Size = 120
H=60 and F=60
X5 = 3 X 5 /= 3
Node 2 Node 3
Size = 41 Size = 79
H=11 and F=30 H=49 and F=30
X7 /= 1 X7 = 1 X3 = 3 X3 /= 3
Node 4 Node 5 Node 6 Node 7
Size = 25 Size = 16 Size = 27 Size = 52
H=11 and F=14 H=0 and F=16 H=23 and F=4 H=26 and F=26
FINAL NODE FINAL NODE
X4 /= 1 X4 = 1 X9 /= 1 X9 = 1
24
Node 8 Node 9 Node 14 Node 15
Size = 16 Size = 9 Size = 37 Size = 15
H=4 and F=12 H=7 and F=2 H=15 and F=22 H=11 and F=4
FINAL NODE
X6 = 3 X6/= 3 X2 = 3 X2 /= 3 X4 /= 2 X4 = 2
Node 16 Node 17 Node 18 Node 19 Node 28 Node 29
Size = 10 Size = 6 Size = 3 Size = 6 Size = 27 Size = 10
H=4 and F=6 H=0 and F=6 H=1 and F=2 H=6 and F=0 H=14 and F=13 H=1 and F=9
FINAL NODE FINAL NODE FINAL NODE FINAL NODE
X1 /= 1 X1 = 1 X5 = 2 X5 /=2
Node 32 Node 33 Node 56 Node 57
Size = 8 Size = 2 Size = 14 Size = 13
H=2 and F=6 H=2 and F=0 H=5 and F= 9 H=9 and F=4
FINAL NODE FINAL NODE FINAL NODE
X9 = 3 X9 /=3
Node 112 Node 113
Size = 8 Size = 6
H=5 and F=3 H=0 and F=6
FINAL NODE
X3 = 2 X3 /=2
Node 224 Node 225
Size = 4 Size = 4
H=1 and F=3 H=4 and F=0
FINAL NODE FINAL NODE
Figure 8. Classification Tree model for two years before the failure
One year before the failure
Node 1
Size = 120
H=60 and F=60
X4 = 3 X 4 /= 3
Node 2 Node 3
Size = 40 Size = 80
H=3 and F=37 H=57 and F=23
X7 = 3 X7/= 3 X3 /= 1 X3 = 1
Node 4 Node 5 Node 6 Node 7
Size = 8 Size = 32 Size = 40 Size = 40
H=2 and F=6 H=1 and F=31 H=22 and F=18 H=35 and F=5
FINAL NODE FINAL NODE
X4 /= 1 X4 = 1 X2 /= 2 X2 = 2
Node 8 Node 9 Node 12 Node 13
Size = 7 Size = 1 Size = 27 Size = 13
H=1 and F=6 H=1 and F=0 H=19 and F=8 H=3 and F=10
Page - 25 -
FINAL NODE FINAL NODE
X1 /= 1 X1 = 1 X9 /= 1 X9 = 1
Node 24 Node 25 Node 26 Node 27
Size = 16 Size = 10 Size = 8 Size = 5
H=8 and F=8 H=11 and F=0 H=0 and F=8 H=3 and F=2
FINAL NODE FINAL NODE FINAL NODE
X8 /= 1 X8 =1
Node 48 Node 49
Size = 10 Size = 6
H=8 and F= 2 H=0 and F=6
FINAL NODE
X9 = 3 X9 /=3
Node 96 Node 97
Size = 2 Size = 8
H=0 and F=2 H=8 and F=0
FINAL NODE FINAL NODE
Figure 9. Classification Tree model for one year before the failure
Where: F : represents class-failed (Failed)
H : represents class-non-failed (Healthy)
FINAL NODE: is a leaf node where the process ends, and where letter F or H
represents that for individual that comes at this node the class
assigned is failed or non-failed respectively.
The failure probability is calculated with
H
H F+
in the case when corporate is
classified as non-failed, reaching the leaf node, or with
F
H F+
when classified as failed.
Bayesian Networks
Experiments with Bayesian Networks were carried out using the HUGIN software,
Genetic Algorithm, and Markov Blanket. Models were obtained with a sample of 120 companies
as a training set for each one of the 3 years prior to failure are shown in Figure 13, 14 and 15.
Initially, P(X) = 0.50, which means that there is the same probability that the corporate
will fail as that it will not. After giving real values to the nine variables it’s destiny is predicted on
the following way:
- for three years before the failure:
“if P(X) > 0.30 then the corporate is classified as non-failed,
on contrary as failed”
26
- for two years before the failure:
“if P(X) > 0.40 then the corporate is classified as non-failed,
on contrary as failed”
- for one year before the failure:
“if P(X) > 0.45 then the corporate is classified as non-failed,
on contrary as failed”
Three years before the failure
X9 = OCF º / TOTAL ASSETS X8 = DEBTORS / SALES
X9 = 1 29 % X8 = 1 33 %
X9 = 2 32 % X8 = 2 31 %
X9 = 3 39 % X8 = 3 36 %
X = NON-FAILED / FAILED X = 1 - P(X) = 50 %
X = 2 - P(X) = 50 %
X1 = 1 - 33 %
X1 = 2 - 34 %
X1 = 3 - 33 %
X1 = CURRENT ASSETS / CURRENT LIABILITIES
X6 = 1 - 33 % X4 = 1 - 33 %
X6 = 2 - 34 % X4 = 2 - 32 %
X6 = 3 - 32 % X4 = 3 - 35 %
X6 = SALES / OWN FUNDS X4 = EBIT / FINANC. CHARGES
Page - 27 -
X5 = OWN FUNDS / TOTAL DEBT X2 = CURR. ASSETS / TOT. ASSETS
X5 = 1 - 32 % X2 = 1 - 32 %
X5 = 2 - 33 % X2 = 2 - 35 %
X5 = 3 - 34 % X2 = 3 - 33 %
X7 = STOCKS / SALES X3 = NET RESULT / TOTAL ASSETS
X7 = 1 - 30 % X3 = 1 - 24 %
X7 = 2 - 38 % X3 = 2 - 43 %
X7 = 3 - 32 % X3 = 3 - 32 %
Figure 13. Bayesian Networks model for three years before the failure
Two years before the failure
X7 = 1 30 % X7 = STOCKS / SALES
X7 = 2 35 %
X7 = 3 35 %
X = NON-FAILED / FAILED X = 1 - P(X) = 50 %
X = 2 - P(X) = 50 %
X2 = 1 - 33 %
X2 = 2 - 33 %
X2 = 3 - 33 %
X2 = CURRENT ASSETS / TOTAL ASSETS
X6 = 1 - 33 % X4 = 1 - 33 %
X6 = 2 - 33 % X4 = 2 - 33 %
X6 = 3 - 34 % X4 = 3 - 33 %
X6 = SALES / OWN FUNDS X4 = EBIT / FINANC. CHARGES
X1 = 1 - 33 %
X1 = 2 - 33 %
X1 = 3 - 33 %
X5 = OWN FUNDS / TOTAL DEBT X1 = CURR. ASSETS / CURR. LIABIL.
X5 = 1 - 33 %
X5 = 2 - 33 %
X5 = 3 - 33 %
X8 = DEBTORS / SALES X9 = OCF º / TOTAL ASSETS
28
X8 = 1 - 34 % X9 = 1 - 34 %
X8 = 2 - 32 % X9 = 2 - 31 %
X8 = 3 - 34 % X9 = 3 - 35 %
X3 = NET RESULT / TOTAL ASSETS X3 = 1 - 30 %
X3 = 2 - 34 %
X3 = 3 - 36 %
Figure 14. Bayesian Networks model for two years before the failure
One year before the failure
X = NON-FAILED / FAILED X = 1 - P(X) = 50 %
X = 2 - P(X) = 50 %
X1 = CURR. ASSETS / CURR. LIABILITIES X7 = STOCKS / SALES
X1 = 1 - 35 % X4 = 1 - 31 %
X1 = 2 - 31 % X4 = 2 - 32 %
X1 = 3 - 34 % X4 = 3 - 37 %
X3 = NET RES. / TOT. ASS.
X3 = 1 - 33 % X8 = DEBTORS / SALES
X3 = 1 - 38 %
X3 = 1 - 29 % X8 = 1 - 34 % X6 = SALES / OWN FUNDS
X8 = 1 - 32 %
X8 = 1 - 34 % X6 = 1 - 33 %
X6 = 1 - 33 %
X6 = 1 - 33 %
X2 = CURR. ASSETS / TOT. ASSETS
X2 = 1 - 33 % X9 = OCF º / TOTAL ASSETS
X2 = 1 - 33 %
X2 = 1 - 34 % X9 = 1 - 35 %
X9 = 1 - 28 % X4 = EBIT / FIN. CHARGES
X9 = 1 - 38 %
X4 = 1 - 32 %
X4 = 1 - 34 %
X4 = 1 - 33 %
X5 = OWN FUNDS / TOTAL DEBT
X5 = 1 - 33 %
Page - 29 -
X5 = 2 - 33 %
X5 = 3 - 33 %
Figure 15. Bayesian Networks model for one year before the failure
Results obtained with the aforementioned models are resumed in the next two tables
with the percentage of well classified companies for each paradigm and period of time (1 year, 2
years, and 3 years before the failure).
Table 1 summarize these results regarding the goodness of fit of each one of paradigms
calculated by the 5-fold cross-validation method, explained in Section 2.
PARADIGMS
Years before
the failure
DA LR CART CN2 BAYESIAN
NETWORKS
1 78.33 82.50 79.17 80.0 60.83
2 69.17 69.16 60.00 66.66 62.00
3 55.00 55.00 45.00 57.50 60.83
Table 1. Results regarding the 5-fold cross-validation
As showed in Table 1, the more you go back in time the worse the results are. Though
in Bayesian Networks they remain more or less constants in time. These results can be compared
with the ones shown in Table 2. In it can be seen the goodness-of-fit of each paradigm calculated
with the second validation method proposed in Section 2.
PARADIGMS
Years before
the failure
DA LR CART CN2 BAYESIAN
NETWORKS
1 81.00 82.5 89.00 95.17 91.17
2 72.17 71.51 84.50 91.17 92.00
3 59.84 60.33 76.67 88.00 92.17
Table 2. Results regarding the second validation
In this case it is worth mentioning the good behavior of paradigms such as CN2 and
Bayesian Networks as well as the Classification Trees.
The goal of any algorithm for integration of classification results is to generate more
certain, precise and accurate results. Two experiments were conducted in order to compare the
performance of the aforementioned methods and combining techniques. The following two tables
summarize the results got applying combining methods - Voting By Majority Principle and
Bayesian Formalism -, to classification results when 5-fold cross-validation is used (explained in
Section 2).
PARADIGMS
Years before
the failure
DA LR CART CN2 BAYESIAN
NETWORKS
VOTING
BY
MAYORITY
1 78.33 82.50 79.17 80.0 60.83 88.33
30
2 69.17 69.16 60.00 66.66 62.00 79.17
3 55.00 55.00 45.00 57.50 60.83 73.33
Table 3. Results regarding the 5-fold cross-validation and
Voting By Majority Principle
As expected Voting By Majority Principle has succeeded in improving classification
results, being better than the best individual classifier for any of the 3 years. These pretty good
results can be compared with the ones obtained applying Bayesian Formalism combining method,
whose results are shown in Table 4.
PARADIGMS
Years before
the failure
DA LR CART CN2 BAYESIAN
NETWORKS
BAYESIAN
FORMALISM
1 78.33 82.50 79.17 80.0 60.83 73.33
2 69.17 69.16 60.00 66.66 62.00 75.83
3 55.00 55.00 45.00 57.50 60.83 62.50
Table 4. Results regarding the 5-fold cross-validation and
Bayesian Formalism
As can be noticed, Bayesian Formalism presents better results that the best individual
classifier only for 3 and 2 years before the failure, but in the case of 1 year it is, surprisingly,
only better than the worst classifier.
It is interesting to mention, that when the second validation method is used to produce
classification results for individual classifiers these two combining methods reach 100% in
classification, for each one of the 3 years. It is too good to be truth, but still this situation could
be understood as, when more classifiers we have (in this case there are 25 models, 5 for each
paradigm) the more the possibility that the majority of them vote for the real class and that
Bayesian Formalism reaches 100% too.
In the following Section, comparison of results is presented, trying to reach some
conclusions about the performance of individual classifiers and their integration.
5. Conclusions and Future Work
Techniques coming from Statistics and Artificial Intelligence have been applied to the
problem of Corporate Failure prediction in different period of time.
The discrepancy between the results obtained when training set is with 96 of 120
companies and the test set is with 120 of them, and the ones obtained when the 5-fold cross-
validation suggests the existence of a problem of overfitting in the Bayesian Networks (Markov
Blankets) and CN2 paradigms. It is possible that these problems could be relieved by penalizing
complex structures in the learning process of these paradigms - see Akaike’s criteria, Minimum
Description Length, … - and by trying to guide the search by the parsimony principle.
Page - 31 -
It is interesting to see that Discriminant Analysis and Logistic Regression models select
the same ratio-variables as significant ones, for all of the three years, and that the percentages of
well-classified individuals are rather similar, being Logistic Regression classifier a little bit
better.
Observing Classification Trees it is easy to notice that, as the date of failure is closer the
depth of tree is smaller, which can be understood as, the closer the date of failure is less variables
are necessary to be examined, and sooner the conclusions about the “destiny” of the individuals in
testing set are reached.
This phenomenon is also present in ordered if-then-rule lists. Here the closer the date of
failure is less rules are needed to be checked out to determine the “destiny” of the corporate. For
three years before the failure there are thirteen rules in the list, ten for two years and just seven
for one year.
Bayesian Networks paradigm provides another aspect of interest. Observing the three
networks it can be seen that the closer the date of failure is more connected nodes are, - there are
more dependencies between the variables -, and the thresholds are greater.
Analyzing classification results when combining techniques are used it can be seen that
they really achieve their main goal, to improve the performance of individual models. Voting By
Majority Principle is quite better than Bayesian Principle, which for one year before the failure
only improve the performance of the worst individual one. Another aspect which deserves further
investigation is the development or implementation of procedures that would achieve even better
results that these two used in this project. There are several approaches. One of them is to use
another kind of information outputted by individual classifiers too, like probability that the class
for individual X is the right one. Another one is to guide the performance of integrated classifiers
with the goodness of each one of individual classifiers.
32
ACKNOWLEDGEMENTS
The authors wish to thank Dr. Fermin Lizarraga for providing the original data sample
and his doctoral dissertation, which was the guidance for comprehension of the corporate failure
problem, and of the use of financial ratios for failure prediction. The authors acknowledge Miren
Josune Gallego for helpful guidance in the use of the software packages, specially with the SAS
software. The authors would also like to thank the Diputación Foral de Guipuzcoa since
this study was supported under grant no. OF 131/1997.
REFERENCES
• Altman, E.I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Business
Failure. Journal of Finance, 589-609.
• Andersen, S.K., Olesen, K.G., Jensen, F.V. and Jensen, F. (1989). HUGIN - a shell for
building Bayesian belief universes for Expert Systems. Eleventh International Joint
Conference on Artificial Intelligence, vol. I, 1128-1133.
• Argenti, J. (1976). Corporate Collapse: the Causes and Symptoms. McGraw-Hill. London.
• Beaver, W. (1966). Financial Ratios as Predictors of Failure. Empirical Research in
Accounting: Selected Studies. Supplement of Journal of Accounting Research, 71-111.
• Blum, M. (1974). Failing Company Discriminant Analysis. Journal of Accounting Research,
1-23.
• Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and
Regression Trees. Monterey, CA: Wadswooeth and Brooks.
• Casey, C.J. and Baztczak, N.J. (1985). Cash-Flow: it’s not the bottom line. Harvard Business
Review, 61-66.
• Castillo, E., Gutierrez, J.M. and Hadi, A.S. (1997). Expert Systems and Probabilistic
Network Models. Springer-Verlag.
• Cestnik, B., Kononenko, I. and Bratko, I. (1987). ASSISTANT 86: A knowledge-elicitation
tool for sophisticated users, in Bratko, I. and Lavrac, N. (Eds.) Progress in Machine
Learning, Sigma Press, Wilmslow.
• Clark, P. and Niblett, T. (1989). The CN2 Induction Algorithm, Machine Learning, 3(4),
261-283.
• Deakin, E.B. (1972). A Discriminant Analysis of Predictors of Business Failure. Journal of
Accounting Research, 167-179.
• Dizdarevic, S., Lizarraga F., Larrañaga P., Sierra B. and Gallego M.J. (1997). Statistical and
Machine Learning Methods in the prediction of Bankruptcy. International Meeting on
Artificial Intelligence in Accounting Finances and Taxes, Huelva, Spain, 85-100.
• Edmister, R.O. (1972). An Empirical Test of Financial Ratio Analysis for Small Business
Failure Prediction. Journal of Financial and Quantitative Analysis. Vol. 7, 1477-1493.
• Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of
Eugenics, 7, 179-188.
• Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley, Reading, MA.
• Ho T.K., Hull J.J. and Srihari S.N. (1994). Decision Combination in Multiple Classifier
Systems, IEEE Transactions on Pattern analysis and machine intelligence, Vol. 16. 1.
January, 66-75.
• Hosmer, D. W. and Lemeshow, S. (1989). Applied Logistic Regression. Wiley Series in
Probability and Mathematical Statistics.
• Hunt, E.B., Marin, J. and Stone P.J. (1966). Experiments in Induction, Academic Press.
• Jensen, F.V. (1996). Introduction to Bayesian networks. University College of London.
Page - 33 -
• Larrañaga, P., Murga, R., Poza, M. and Kuijpers, C. (1996a). Structure Learning of
Bayesian Networks by Hybrid Genetic Algorithms. Learning from Data: AI and Statistics
V, Lecture Notes in Statistics 112. D. Fisher, H.-J. Lenz (eds.), New York, NY: Spriger-
Verlag, 165-174.
• Larrañaga, P., Poza, M., Yurramendi, Y., Murga, R. and Kuijpers, C. (1996b). Structure
Learning of Bayesian Networks by Genetic Algorithms: A Performance Analysis of
Control Parameters. IEEE Transactions on Pattern Analysis and Machine Intelligence,
18, 912-926.
• Larrañaga, P., Kuijpers, C., Murga, R. and Yurramendi, Y. (1996c). Bayesian Network
Structures by searching for the best ordering with genetic algorithms. IEEE Transactions
on System, Man and Cybernetics. Vol. 26, no. 4, 487-492.
• Larrañaga, P., Kuijpers, C., Murga, R., Yurramendi, Y., Graña, M., Lozano, J.A.,
Albizuri, X., D'Anjou, A. and Torrealdea, F.J. (1996d). Genetic Algorithms applied to
Bayesian Networks. A. Gammerman (ed.). Computational Learning and Probabilistic
Reasoning. John Wiley, 211-234.
• Larrañaga, P., Kuijpers, C., Poza, M. and Murga, R. (1997). Decomposing Bayesian
Networks by Genetic Algorithms. Statistics and Computing. No 7, 19-34.
• Lauritzen, S.L. (1996). Graphical models. Oxford Science Publications.
• Lauritzen, S.L., and Spiegelhalter, D.J. (1988). Local computations with probabilities on
graphical structures and their application on Expert Systems. J.R. Statist. Soc. B, vol. 50,
no. 2, 157-224.
• Libby, R. (1975). Accounting Ratios and the Prediction of Failure: Some Behavioral
Evidence. Journal of Accounting Research, 150-161.
• Lizarraga, F. (1996). Modelos Multivariantes de predicción del fracaso empresarial: una
aplicación a la realidad de la información contable española. Ph.D. Public University of
Navarra.
• Mensah, Y. (1983). The Differential Bankruptcy Predictive Ability of Specific Price Level
Adjustments: Some Empirical Evidence, Accounting Review, 228-246.
• Michie, D., Spiegelhalter, D. J. and Taylor, C. C. (1994). Machine Learning, Neural and
Statistical Classification. Ellis Horwood Series in Artificial Intelligence. New York.
• Ohlson, J.A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal
of Accounting Research, 18, 1, 109-111.
• Patterson, A. and Niblett, T.(1983). ACLS user manual, Intelligent Terminals Ltd.,
Glasgow.
• Peel, M.J. and Peel, D.A. (1987). Some Further Empirical Evidence on Predicting Private
Company Failure. Accounting and Business Research, 18, 69, 57-66.
• Platt, H.D. (1985). Why Companies Fail: Strategies for Detecting Avoiding and Profiting
from Bankruptcy. Lexington Books. Massachusetts.
• Quinlan, J.R. (1979). Discovering rules by induction from large collection of examples, in
Expert systems in the micro electronic age, Edinburgh University Press.
• Quinlan, J.R. (1986). Induction of Decision Trees, Machine Learning, 1(1), 81-106.
• SAS Institute Inc. (1993). SAS Language: Reference, Version 6, SAS Institute Inc.
• Scott, J. (1981). The probability of Bankruptcy: A Comparison of Empirical Predictions and
Theoretical Models. Journal of Banking and Finance, 317-344.
• Sierra B. and Larrañaga P. (1997). Searching for the optimal Bayesian Network in
classification tasks by Genetic Algorithms, WUPES 97, 144-154.
• SPAD.S (1994). Version 2.5 Sistema Compatible para el Análisis de Datos.
• SPSS Inc. (1990). SPSS Reference Guide, SPSS INC.
• Stone, M. (1974). Cross-validation choice and assessment of statistical procedures. Journal
of Royal Statistical Society, n° 36, 111-147.
• Taffler, R. (1982). Finding those Firms in Danger. Accountancy Age, 16.
• The Turing Institute, The CN2 Induction Algorithm (1988), The Turing Institute, 36 N.
Hanover St., Glasgow, GI 2AD, U.K. October.
34
• Xu L., Kryzak A. and Suen C.Y.(1992). Methods of Computing Multiple Classifiers and
Their Applications to Handwriting Recognition. IEEE Transactions on Systems, Man and
Cybernetics, Vol. 22. no. 3., 418-435.
• Zavgren, C. (1985). A Probabilistic Model of Financial Distress. Ph.D. The University of
Nebraska.
Page - 35 -

More Related Content

What's hot

Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They WorkUnderstanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They WorkAbsolute Home Mortgage Corp.
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesShantanu Deshpande
 
Bankruptcy prediction final report by sanapathi kiran kumar
Bankruptcy prediction final report by sanapathi kiran kumarBankruptcy prediction final report by sanapathi kiran kumar
Bankruptcy prediction final report by sanapathi kiran kumarKiranKumarSanapathi1
 
CECL Project Overview
CECL Project OverviewCECL Project Overview
CECL Project OverviewRohit Khurana
 
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING mlaij
 
DOUBLE DUTY - Payments Cards as a Doorway to Greater Financial Health
DOUBLE DUTY - Payments Cards as a Doorway to Greater Financial HealthDOUBLE DUTY - Payments Cards as a Doorway to Greater Financial Health
DOUBLE DUTY - Payments Cards as a Doorway to Greater Financial HealthVivastream
 
Transaction_Scoring - WVK MasterCard
Transaction_Scoring - WVK MasterCardTransaction_Scoring - WVK MasterCard
Transaction_Scoring - WVK MasterCardWestley Koenen
 
The ability of previous quarterly earnings, net interest margin, and average ...
The ability of previous quarterly earnings, net interest margin, and average ...The ability of previous quarterly earnings, net interest margin, and average ...
The ability of previous quarterly earnings, net interest margin, and average ...RyanMHolcomb
 
Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011gertjankaart
 
Syntelli_BankOfAmerica_FinalPresentation
Syntelli_BankOfAmerica_FinalPresentationSyntelli_BankOfAmerica_FinalPresentation
Syntelli_BankOfAmerica_FinalPresentationBhavya kothari
 
IRJET- Analysis on Credit Card Fraud Detection using Capsule Network
IRJET- Analysis on Credit Card Fraud Detection using Capsule NetworkIRJET- Analysis on Credit Card Fraud Detection using Capsule Network
IRJET- Analysis on Credit Card Fraud Detection using Capsule NetworkIRJET Journal
 
A Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
A Statistical/Mathematical Approach to Enhanced Loan Modification TargetingA Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
A Statistical/Mathematical Approach to Enhanced Loan Modification TargetingCognizant
 
Historical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendHistorical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendExperian
 
Credit decision-indices-a-flexible-tool-for-both-credit-consumers-and-providers
Credit decision-indices-a-flexible-tool-for-both-credit-consumers-and-providersCredit decision-indices-a-flexible-tool-for-both-credit-consumers-and-providers
Credit decision-indices-a-flexible-tool-for-both-credit-consumers-and-providersSrikanth Minnam
 
Chapter8 - Beyond Classification
Chapter8 - Beyond ClassificationChapter8 - Beyond Classification
Chapter8 - Beyond ClassificationAnna Olecka
 
Pankaj kamaliya 2003 (2)
Pankaj kamaliya 2003 (2)Pankaj kamaliya 2003 (2)
Pankaj kamaliya 2003 (2)KAMALIYA PANKAJ
 
Evaluation of individual depositors’ satisfaction from the deposit
Evaluation of individual depositors’ satisfaction from the depositEvaluation of individual depositors’ satisfaction from the deposit
Evaluation of individual depositors’ satisfaction from the depositAlexander Decker
 

What's hot (20)

Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They WorkUnderstanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
Understanding How Your Fair Issac Credit Scores (FICO) Scores and How They Work
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniques
 
Bankruptcy prediction final report by sanapathi kiran kumar
Bankruptcy prediction final report by sanapathi kiran kumarBankruptcy prediction final report by sanapathi kiran kumar
Bankruptcy prediction final report by sanapathi kiran kumar
 
Academic writer 23
Academic writer 23Academic writer 23
Academic writer 23
 
CECL Project Overview
CECL Project OverviewCECL Project Overview
CECL Project Overview
 
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
 
DOUBLE DUTY - Payments Cards as a Doorway to Greater Financial Health
DOUBLE DUTY - Payments Cards as a Doorway to Greater Financial HealthDOUBLE DUTY - Payments Cards as a Doorway to Greater Financial Health
DOUBLE DUTY - Payments Cards as a Doorway to Greater Financial Health
 
Transaction_Scoring - WVK MasterCard
Transaction_Scoring - WVK MasterCardTransaction_Scoring - WVK MasterCard
Transaction_Scoring - WVK MasterCard
 
The ability of previous quarterly earnings, net interest margin, and average ...
The ability of previous quarterly earnings, net interest margin, and average ...The ability of previous quarterly earnings, net interest margin, and average ...
The ability of previous quarterly earnings, net interest margin, and average ...
 
Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011
 
B05840510
B05840510B05840510
B05840510
 
Syntelli_BankOfAmerica_FinalPresentation
Syntelli_BankOfAmerica_FinalPresentationSyntelli_BankOfAmerica_FinalPresentation
Syntelli_BankOfAmerica_FinalPresentation
 
IRJET- Analysis on Credit Card Fraud Detection using Capsule Network
IRJET- Analysis on Credit Card Fraud Detection using Capsule NetworkIRJET- Analysis on Credit Card Fraud Detection using Capsule Network
IRJET- Analysis on Credit Card Fraud Detection using Capsule Network
 
A Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
A Statistical/Mathematical Approach to Enhanced Loan Modification TargetingA Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
A Statistical/Mathematical Approach to Enhanced Loan Modification Targeting
 
Historical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendHistorical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card Spend
 
Credit decision-indices-a-flexible-tool-for-both-credit-consumers-and-providers
Credit decision-indices-a-flexible-tool-for-both-credit-consumers-and-providersCredit decision-indices-a-flexible-tool-for-both-credit-consumers-and-providers
Credit decision-indices-a-flexible-tool-for-both-credit-consumers-and-providers
 
Chapter8 - Beyond Classification
Chapter8 - Beyond ClassificationChapter8 - Beyond Classification
Chapter8 - Beyond Classification
 
Pankaj kamaliya 2003 (2)
Pankaj kamaliya 2003 (2)Pankaj kamaliya 2003 (2)
Pankaj kamaliya 2003 (2)
 
Assessing Customers’ Eagerness to Adopt Online Banking
Assessing Customers’ Eagerness to Adopt Online BankingAssessing Customers’ Eagerness to Adopt Online Banking
Assessing Customers’ Eagerness to Adopt Online Banking
 
Evaluation of individual depositors’ satisfaction from the deposit
Evaluation of individual depositors’ satisfaction from the depositEvaluation of individual depositors’ satisfaction from the deposit
Evaluation of individual depositors’ satisfaction from the deposit
 

Viewers also liked

Detecting early warning bank distress signals in nigeria
Detecting early warning bank distress signals in nigeriaDetecting early warning bank distress signals in nigeria
Detecting early warning bank distress signals in nigeriaAlexander Decker
 
Discussion of Financial Ratios as Predictors of Failure
Discussion of Financial Ratios as Predictors of FailureDiscussion of Financial Ratios as Predictors of Failure
Discussion of Financial Ratios as Predictors of FailureSudarshan Kadariya
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 

Viewers also liked (6)

Detecting early warning bank distress signals in nigeria
Detecting early warning bank distress signals in nigeriaDetecting early warning bank distress signals in nigeria
Detecting early warning bank distress signals in nigeria
 
Discussion of Financial Ratios as Predictors of Failure
Discussion of Financial Ratios as Predictors of FailureDiscussion of Financial Ratios as Predictors of Failure
Discussion of Financial Ratios as Predictors of Failure
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 

Similar to ART1197.DOC

Predicting corporate business failure in the nigerian manufacturing industry
Predicting corporate business failure in the nigerian manufacturing industryPredicting corporate business failure in the nigerian manufacturing industry
Predicting corporate business failure in the nigerian manufacturing industryAlexander Decker
 
Predicting Corporate Failure - An Application of Discriminate Analysis
Predicting Corporate Failure - An Application of Discriminate AnalysisPredicting Corporate Failure - An Application of Discriminate Analysis
Predicting Corporate Failure - An Application of Discriminate Analysisscmsnoida5
 
The Case of Enterprise Risk Management
The Case of Enterprise Risk ManagementThe Case of Enterprise Risk Management
The Case of Enterprise Risk ManagementWeibull AS
 
Financial Analysis on Recession Period at M&M Tractors
Financial Analysis on Recession Period at M&M TractorsFinancial Analysis on Recession Period at M&M Tractors
Financial Analysis on Recession Period at M&M TractorsProjects Kart
 
D522736.pdf
D522736.pdfD522736.pdf
D522736.pdfaijbm
 
I477585.pdf
I477585.pdfI477585.pdf
I477585.pdfaijbm
 
Eng rev proposal ch.123_24 jan 20
Eng rev proposal  ch.123_24 jan 20Eng rev proposal  ch.123_24 jan 20
Eng rev proposal ch.123_24 jan 20Aminullah Assagaf
 
Hedging and the Failures of Corporate Governance: Lessons from the Financial ...
Hedging and the Failures of Corporate Governance: Lessons from the Financial ...Hedging and the Failures of Corporate Governance: Lessons from the Financial ...
Hedging and the Failures of Corporate Governance: Lessons from the Financial ...Fundação Dom Cabral - FDC
 
Financial analysis on recession period conducted at mahindra & mahindra tractors
Financial analysis on recession period conducted at mahindra & mahindra tractorsFinancial analysis on recession period conducted at mahindra & mahindra tractors
Financial analysis on recession period conducted at mahindra & mahindra tractorsProjects Kart
 
An Empirical Case Study on Prediction of Corporate Failure in The Selected In...
An Empirical Case Study on Prediction of Corporate Failure in The Selected In...An Empirical Case Study on Prediction of Corporate Failure in The Selected In...
An Empirical Case Study on Prediction of Corporate Failure in The Selected In...Waqas Tariq
 
Fin analysis chap1
Fin analysis chap1Fin analysis chap1
Fin analysis chap1erlindaajah
 
Ratio analysis - Introduction
Ratio analysis - IntroductionRatio analysis - Introduction
Ratio analysis - Introductionuma reur
 
Establishing the-effectiveness-of-market-ratios-in-predicting-financial-distr...
Establishing the-effectiveness-of-market-ratios-in-predicting-financial-distr...Establishing the-effectiveness-of-market-ratios-in-predicting-financial-distr...
Establishing the-effectiveness-of-market-ratios-in-predicting-financial-distr...oircjournals
 
Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Shantanu Deshpande
 

Similar to ART1197.DOC (20)

Predicting corporate business failure in the nigerian manufacturing industry
Predicting corporate business failure in the nigerian manufacturing industryPredicting corporate business failure in the nigerian manufacturing industry
Predicting corporate business failure in the nigerian manufacturing industry
 
Predicting Corporate Failure - An Application of Discriminate Analysis
Predicting Corporate Failure - An Application of Discriminate AnalysisPredicting Corporate Failure - An Application of Discriminate Analysis
Predicting Corporate Failure - An Application of Discriminate Analysis
 
The Case of Enterprise Risk Management
The Case of Enterprise Risk ManagementThe Case of Enterprise Risk Management
The Case of Enterprise Risk Management
 
Using Binary Logistic Analysis for Analyzing Bankruptcy
Using Binary Logistic Analysis for Analyzing BankruptcyUsing Binary Logistic Analysis for Analyzing Bankruptcy
Using Binary Logistic Analysis for Analyzing Bankruptcy
 
Swan Davis Inc
Swan Davis IncSwan Davis Inc
Swan Davis Inc
 
Financial Analysis on Recession Period at M&M Tractors
Financial Analysis on Recession Period at M&M TractorsFinancial Analysis on Recession Period at M&M Tractors
Financial Analysis on Recession Period at M&M Tractors
 
AKUNTANSI
AKUNTANSIAKUNTANSI
AKUNTANSI
 
D522736.pdf
D522736.pdfD522736.pdf
D522736.pdf
 
I477585.pdf
I477585.pdfI477585.pdf
I477585.pdf
 
Eng rev proposal ch.123_24 jan 20
Eng rev proposal  ch.123_24 jan 20Eng rev proposal  ch.123_24 jan 20
Eng rev proposal ch.123_24 jan 20
 
Hedging and the Failures of Corporate Governance: Lessons from the Financial ...
Hedging and the Failures of Corporate Governance: Lessons from the Financial ...Hedging and the Failures of Corporate Governance: Lessons from the Financial ...
Hedging and the Failures of Corporate Governance: Lessons from the Financial ...
 
Ijetr042103
Ijetr042103Ijetr042103
Ijetr042103
 
Financial analysis on recession period conducted at mahindra & mahindra tractors
Financial analysis on recession period conducted at mahindra & mahindra tractorsFinancial analysis on recession period conducted at mahindra & mahindra tractors
Financial analysis on recession period conducted at mahindra & mahindra tractors
 
Research Paper 2.0
Research Paper 2.0Research Paper 2.0
Research Paper 2.0
 
An Empirical Case Study on Prediction of Corporate Failure in The Selected In...
An Empirical Case Study on Prediction of Corporate Failure in The Selected In...An Empirical Case Study on Prediction of Corporate Failure in The Selected In...
An Empirical Case Study on Prediction of Corporate Failure in The Selected In...
 
Fin analysis chap1
Fin analysis chap1Fin analysis chap1
Fin analysis chap1
 
Ratio analysis - Introduction
Ratio analysis - IntroductionRatio analysis - Introduction
Ratio analysis - Introduction
 
Horvath
HorvathHorvath
Horvath
 
Establishing the-effectiveness-of-market-ratios-in-predicting-financial-distr...
Establishing the-effectiveness-of-market-ratios-in-predicting-financial-distr...Establishing the-effectiveness-of-market-ratios-in-predicting-financial-distr...
Establishing the-effectiveness-of-market-ratios-in-predicting-financial-distr...
 
Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques
 

More from butest

Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 
resume.doc
resume.docresume.doc
resume.docbutest
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.docbutest
 
Resume
ResumeResume
Resumebutest
 

More from butest (20)

Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 
resume.doc
resume.docresume.doc
resume.doc
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.doc
 
Resume
ResumeResume
Resume
 

ART1197.DOC

  • 1. COMBINING STATISTICAL AND MACHINE LEARNING BASED CLASSIFIERS IN THE PREDICTION OF CORPORATE FAILURE S. Dizdarevic, P. Larrañaga, B. Sierra, J.A. Lozano, J.M. Peña Department of Computer Science and Artificial Intelligence University of the Basque Country Abstract This project presents the application of methods coming from Statistics as well as from an area of the Artificial Intelligence called Machine Learning, in the problem of the corporate failure prediction. The empirically compared paradigms applied to a sample of 120 Spanish companies, 60 of which had gone bankrupt, and 60 had not, are Discriminant Analysis, Logistic Regression, Classification Trees, Rule Induction and Bayesian Networks. Two Artificial Intelligence techniques - Voting by Majority Principle and Bayesian Formalism -, are implemented in order to obtain prediction improvement over the single models that are compared. The predictor variables that gather the accountant information taken for every company over the three years previous to the date of survey are financial ratios.
  • 2. 1. Introduction Corporate failure prediction, through classification of known cases and generalization to other cases, has been a subject of study for almost 30 years. Accurate prediction of corporate failure is important to investors, creditors and auditors. It also can help shareholders, creditors and governments to avoid heavy losses stemming from surprise bankrupts. Therefore, using analytic tools and data from corporate financial reports, one can evaluate and predict its future financial status. Although the idea of a company going bankrupt is associated with its disappearance, before this really happens the company has gone through a long period of crisis with several stages in it. Many authors comprise them into two main stages taking into account the two senses of the concept of failure: economic and financial. The economic failure starts when the profitability of the invested capital is under its costs of opportunity, receiving its owner an investment yield lower than other alternative opportunities with the same risk. As the economical failure advances and settles down in the company, the incomes start to be lower than the expenses appearing the first negative results. The deterioration produced during the economic failure process, if not corrected, will drive the company into technical insolvency. This is the first stage of what is called the financial failure. In this situation the company does not have enough liquid for the payments as these are increasing. The breaking point of this ruinous process will be reached when the company is not only unable to pay off its falling dues but also in a situation of negative net patrimony. This means that its current liabilities are greater than the value of its assets, so it could soon lead the company to its disappearance. The study of the corporate failure must be dealt always through the search of its causes that we can analyze through their visible symptoms. As Argenti (1976) proposes, it is very interesting to know the reasons why others companies have failed in order to avoid them in ours. Nevertheless, the capture of these causes is impossible if not through the discovery of their symptoms. Some of these causes are the following: management fault, deficiency in the systems of accounting information, disability of adaptation to the changes of environment, overtrading, the launch of big projects, abuse of financing by debt, the actual risks from the business world. As for the symptoms, Argenti accepts as such, the deterioration suffered by the financial ratios, as the corporate comes closer to failure, and indicating also that accounting manipulation is one clear symptom of the failure. According to Platt (1985) different sources that an external economic agent can use to detect the aforementioned symptoms, can be grouped together into three sets of sources: the common sense, the analysis of statement of accounts published publicly by the companies and statistical tools. 2
  • 3. The use of common sense, although a very simple strategy, has the following advantages: it does not need sophisticate computers and there is no need of assimilation of financial conditions, sometimes not easily understandable. All you should do is to pay attention to the daily reality of the corporate and its environment. Some signs of getting close to a situation of crisis are: auditor change, development of the relation with the new value, the members of council resign suddenly, credit lines are reduced or canceled, the sale of stocks done by the directors of the board, the appreciation of its stocks in the market to the prices inferior to its accounting value, excess of stock,… The analysis of the statement of accounts is part of a process of information whose aim is to provide data for decision making. The idea of failure, and more precisely, the idea of insolvency has remained connected to the technique of accounting ratios. It was thought that the ratios are worsen as the corporate approached the crisis process, in this way the deterioration suffered by the corporate could be measured. Due to big complicity of information and comprehension contained in financial statement data, the analysis of financial ratios, which gather all this information, has been the most used technique. The great interest in comparison between different companies (industrial sector, size, …) has influenced its use. There are two main difficulties related to financial ratios, their creation and their interpretation. Another difficulty added to the previous ones is that the same value of ratio for two companies from different sectors may represent different situations. The financial information gathered in ratios has to be homogenized, so that it could be used for description and prediction of corporate failure. The second task is directly related to the use of Statistics. Although ignored for half a century by the analysts, nowadays the use of statistical techniques have became a helpful tool commonly used because they give objectivity to the analysis. Beaver (1966) was among the pioneers who used them for the analysis of financial ratios in order to predict corporate failure. In his work, starting from 30 variables-ratio taken from 79 pairs of companies, failed and non-failed, 6 variables-ratio are selected. An analysis of profiles is based upon them by comparing the means of the values of every ratio in each group, failed and non-failed, and observing the important differences, five years before the date of failure. Beaver developed a dichotomic heuristic test of classification for every ratio by using a process of trial and error that allowed him to choose the suitable cut-point for every ratio and every year that minimized the errors of classification. Nevertheless the univariate model of Beaver contrasts with the inherent character of multivariable documents of the financial situation. Therefore, in order to make the above mentioned documents valuables, they will have to be interpreted from a perspective that allows to think over the several financial aspects of a corporate as a whole. The search of this perspective has been the reason why several researchers have used multivariate statistical techniques for the corporate failure prediction. Altman (1968) was the pioneer in application of Discriminant Analysis to the aforementioned problem obtaining surprising results. The lineal combination of five ratios in a profile created a score capable of discriminating between “healthy” and “failed” companies with very high percentages of success in the two years previous to the failure. The initial work of Altman was adapted, updated and improved by several researchers. It is worth noticing the works of Deakin (1972), Blum (1974), Edmister (1972), Libby (1975), Scott (1981), and Taffler (1982). The necessity of a statistical alternative to avoid the problems related to the Discriminant Analysis leads to the use of models of conditional probability, logit and probit, more flexible in their requirements. Ohlson (1980) is considered as the first author who published a model for the prediction of failure based on conditional probability models. Though he had no brilliant results Page - 3 -
  • 4. his methodology was followed by other authors: Mensah (1983), Zavgren (1985), Casey and Baztczak (1985), and Peel and Peel (1987). This chapter is organized as follows. Section 2 presents the features of the case study used for carrying out the empirical comparison among several paradigms coming from Statistics and Artificial Intelligence and the combining techniques. These paradigms are explained in Section 3. Section 4 shows the results obtained for every method in terms of the percentage of well-classified companies, as well as models descriptions and analysis of the results. In Section 5 it finishes with the conclusion of the work, proposing further research. 2. Problem Description Starting from the hypothesis that the accounting information pattern of non-failed and failed companies are different, the fundamental aim of this chapter was to show by means of an example how to create models, able to predict in advance (1 year, 2 years and 3 years) the failure of companies. These models could be considered as normative systems as they are founded on the probability theory. Following the recent progressive research in Artificial Intelligence two techniques have been implemented and used for integration of individual models in one, in order to improve predictive ability of every one. In this section the problem is presented, dealing with aspects of it such as, the concept of failure, sample obtaining and validating, selection of financial ratios with which models can be constructed, and sample for multiple models. A more detailed description of the failure problem can be found in Lizarraga (1996), which could be considered as one that inspired elaboration of this project. The data sample used here was the same that Lizarraga gathered from several Provincial Trade Register Offices and used for empirical comparison in his doctoral dissertation. The following is the procedure of how the data sample of 120 companies was selected and formed. The need of determining the concept of failure to use was the first methodological problem to solve. Finally, he chose the concept of suspension of payments, given that it is related not with a specific financial problem but with a situation of profound economic crisis. This concept presents three fundamental advantages: objectivity, it gives a representative date of the moment of failure and the large increment in the number of companies which had to turn to it in the period of study. Finally, the availability of the annual accounts deposited in the several Provincial Trade Register Office was another aspect that helped Lizarraga to carry out the empirical work of information gathering. The sample was made of 120 companies, half of them belonged to a group of companies classified as “failed” and the other half was classified as “healthy” in order to incorporate them to the analysis. The selection was carried out by a matching process. Using a list of “failed” companies previously selected, matching them with a “healthy” corporate of the same size and industrial sector. This matching process is justified by the convenience of avoiding any possible distortion effect related with the size and industrial sector. As the access to each Provincial Trade Register Office was not possible Lizarraga decided to reduce the scope to the 10 provinces with larger number of records of payment suspension requested during the period of the study. These 4
  • 5. selected provinces gathered the 63% of the total number of records of payment suspension. The information was gathered through the Official Bulletin of the Trade Register Office. The interval of time was of 18 months (from January 1993 to July 1994), and it can be regarded as representative of a period of severe crisis among the companies in Spain. For every company in the study the economical and financial data corresponding to the three years previous to the end of the study were obtained. Though Lizarraga (1996) validated the model created using a sample of 44 companies (22 “healthy” and 22 “failed”) different from the ones used to construct the models, in this study the models are created using a sample of 120. A validation of the original model, based in the 5- fold cross-validation (Stone 1976) has been carried out, as well as another validation methodology which will be explained later. Estimates of the goodness of fit of every model, that is, the probability of the model classifying correctly, are calculated following the next steps: first, the sample is randomly ordered and then partitioned in 5 disjointed subsets. Secondly, choosing 4 of those subsets as training set, a model - which was tested with the fifth subset -, is obtained. These steps are repeated five times, using every time a different testing set and obtaining this way 5 percentages p 1 , p 2 , …, p 5 of well-classified cases, as well as the “destiny” (failed or non failed) of each one of 120 companies distributed between five disjointed test sets. The estimation of p , the probability with which the model created with the original sample classifies correctly is given by the following formula: p p = = ∑ i i 51 5 . Figure 1 shows graphically the process of estimation of the goodness of fit of the model with 5-fold cross-validation. Training 1 ◊ model 1 Test 1 ◊ p 1 Training 2 ◊ model 2 Test 2 ◊ p 2 Training 3 ◊ model 3 Page - 5 -
  • 6. MODEL p p = = ∑ i i 51 5 Test 3 ◊ p 3 Training 4 ◊ model 4 Test 4 ◊ p 4 Training 5 ◊ model 5 Test 5 ◊ p 5 Figure 1. Process of estimation of the goodness of fit of the model with 5-fold cross-validation The other approach to model validation presented in this chapter is the next one. Using 4 of 5 disjointed subsets (got with 5-fold cross-validation) as training set a model is obtained and later tested with a sample of the 120 companies, instead of 24 used in previous approach. The same process of estimation of p - the probability with which the model created with the original sample classifies correctly -, explained before is used. Figure 2 shows graphically the process of estimation of the goodness of fit of the model with this validation. Training 1 ◊ model 1 Test 1 ◊ p 1 Training 2 ◊ model 2 Test 2 ◊ p 2 Training 3 ◊ model 3 MODEL 6
  • 7. p p = = ∑ i i 51 5 Test 3 ◊ p 3 Training 4 ◊ model 4 Test 4 ◊ p 4 Training 5 ◊ model 5 Test 5 ◊ p 5 Figure 2. Process of estimation of the goodness of fit of the model with the second validation Lizarraga selected the explanation variables from the certificate of the Trial Balance, from the profit and loss account and from the financial chart. The election of these financial ratios was based on two main criteria: i) they were frequently mentioned in the literature treating the analysis of financial statements, ii) the possibility of calculating them in a direct way or with simple adjustments. Starting from the 50 variables that Lizarraga had obtained in the previous selection, a Principal Components Factorial Analysis was made. It allowed to reduce them to 9 ratios which were used. These ratios were: X1. CURRENT ASSETS / CURRENT LIABILITIES X2. CURRENT ASSETS / TOTAL ASSETS X3. NET RESULT / TOTAL ASSETS X4. EARNINGS BEFORE INTEREST AND TAXES / FINANCIAL CHARGES X5. OWN FUNDS / TOTAL DEBT X6. SALES / OWN FUNDS X7. STOCKS / SALES X8. DEBTORS / SALES X9. OPERATIVE CASH FLOW º / TOTAL ASSETS X1 - CURRENT ASSETS / CURRENT LIABILITIES ratio measures the firm’s ability to meet maturing short-term obligations. It is used to measure liquidity - if the firm has sufficient cash to meet short-term claims. X2 - CURRENT ASSETS / TOTAL ASSETS ratio represents the position of current assets (money, bank, quick assets, stocks…) in total assets (current assets + fixed assets). X3 - NET RESULT / TOTAL ASSETS ratio is a profitability ratio which explains the extend to which a company earns the highest return possible for resources used or capital employed, consistent with the desire and ability to assume risk. Page - 7 -
  • 8. X4 - EARNINGS BEFORE INTEREST AND TAXES / FINANCIAL CHARGES ratio is calculated by dividing the earnings before interest and tax reductions by the financial charges of a firm. In essence, it is a measure of the true productivity of the firm’s charges. Earnings before interest and taxes is calculated as: earnings before taxes + financial charges. X5 - OWN FUNDS / TOTAL DEBT ratio is usually used to measure debt. It is calculated by dividing its funds by total debt. Own funds are those funds for financing that consist of capital that the owner deposited at the beginning of firm’s existence, reserves, amortization, subventions …. X6 - SALES / OWN FUNDS ratios is calculated by dividing sales by own funds. Sales represents net income of firm’s business. X7 - STOCKS / SALES is ratio used to observe rotation of stocks. It is calculated by dividing stocks (merchandise, raw materials, products terminated and semi-terminated…) by sales. It represents percentage of stocks in sales. X8 - DEBTORS / SALES ratio used to observe rotation of current assets. It is calculated dividing debtors by sales. X9 - OPERATIVE CASH FLOW º / TOTAL ASSETS ratio is calculated by dividing total assets into operative cash flow. This last value is calculated with the following formula: OCF º = traditional cash flow ( net result + amortization + large term reserves ) - increment of exploitation of current assets + decrement of exploitation of current liabilities. The several classificatory models constructed - described in the following section - use the aforementioned ratios. The process of sample for obtaining the combined classifiers is conducted as follows. Regardless what internal structure an individual classifier has, and methodology it bases on, a classifier may be regarded as a function that receives an input individual X and outputs a class j , or in short denoted by e(X) = j. In the problem of corporate failure there are only two classes: non-failed and failed, represented in our case by class 1 and by class 2 respectively. As explained before an individual X is represented by the set of nine ratio-variable values. Even though some classifiers may supply some extra information like, probability with which the class j is assigned to the individual X, only class j was considered as the output information. The decision to use this approach was based on its generality, as the individual classifiers could be very different from each other in their methodologies and this kind of information is common for all of them. A good summary of existing techniques for combining classification results could be found in Xu et al. (1992) and Ho et al. (1994). Having the result (class j) for every classifier, these are now considered knowledge from different sources and used as input data for the combined classifiers. As explained before, two validation methods are used. Figure 3 shows graphically the process of individual classifiers output data’s integration when 5-fold cross-validation is used. PARADIGMS individual s DA LR CN2 CART Bayes test set 1 1…24 test set 2 25…48 8
  • 9. test set 3 49…72 test set 4 73…96 test set 5 97…120 Figure 3. Process of construction of the sample for combined classifier with 5-fold cross-validation Elements of pair (test set i , paradigm j), where i = 1 ... 5; and j = 1 … 5, represent the classes that model i , created with training set i , using paradigm j, assigns to each one of 24 individuals of the test set i. Figure 4 shows graphically the same process but when using the second validation explained before. PARADIGMS DA LR CN2 CART Bayes individuals M1 … M5 M1 … M5 M1 … M5 M1 … M5 M1 … M5 1 2 : : 120 Figure 4. Process of construction of the sample for combined classifier with the second validation The pair ( individual i, ( paradigm j, model k )) , where i = 1 ... 120; j = 1 … 5; and k = 1 … 5, represents the class that model k (M k) created with training set k , using paradigm j, assigns to the individual i. Methodologies used to integrate individual classifiers that use aforementioned matrixes are described in the following section. 3. Methodologies 3.1 Methodologies - Individual Classifiers Supervised classification is one of the tasks most frequently carried out by the so-called Intelligent Systems. Thus, a large number of paradigms developed either by Statistics (Logistic Regression, Discriminant Analysis, K-N-N) or by Artificial Intelligence (Neural Networks, Rule Induction, Classification Trees, Bayesian Networks) are capable of carrying out classification tasks. The comparison among the previous paradigms cannot be carried out in an analytical way, thus it is usual to compare them empirically, - Michie et al. (1994). Page - 9 -
  • 10. Five of the previous paradigms have been selected to make this comparison - Discriminant Analysis, Logistic Regression, Classification Trees, Rule Induction and Bayesian Networks. 3.1.1 DISCRIMINANT ANALYSIS Discriminant Analysis - introduced by Fisher (1936) - is a statistical technique used in Pattern Matching. The main use of Discriminant Analysis is to predict group membership from a set of predictors. Discriminant function analysis consists of finding a transformation which gives the maximum ratio of difference between a pair of group multivariate means to the multivariate variance within the two groups. Accordingly, an attempt is made to delineate based upon maximizing between group variance while minimizing within group variance. The predictors characteristics are related to form groups based upon similarities of the distribution in the p- dimensional space, which are then compared to groups which are input by user as truth. This enables the user to test the validity of groups based upon actual data, to test groups which have been created, or to put objects into groups. Two things can then be done with Discriminant Analysis (DA) : analysis and classification. Analysis is related to how the material is to be interpreted i.e., are there any differences between the groups and which variables make these differences?. Classification is used to find a set of variables - which provide satisfactory discrimination - so that classification equations can be derived, their use checked out through hit/rate tables, and if good, they can be used to classify new subjects who were not in the original analysis. DA creates a discriminant lineal function starting from the predictor variables x1, x2, …, xp and the results of this function are used later as a classification criteria. The discriminant function takes this form: Z = a1x1 + a2x2 + … +apxp xj are explaining variables (j = 1, 2 , …, p), aj real coefficients estimated by the model (j = 1, 2 , …, p), Z discriminant score. The discriminant function is used for classifying new individuals starting from rules in the following way: “If a a ... a ci i p p i 1 1 2 2x x x+ + + > then individual i is classified as 0, on the contrary is classified as 1.” The procedure PROC DISCRIMINANT of SPSS software has been used in this project. The method selected to model construction was MAHAL, and variables were selected using stepwise selection. Selection rule was: maximize minimum Mahalanobis distance between groups. For each model some of the p variables were selected, and their corresponding coefficients were calculated. 3.1.2 LOGISTIC REGRESSION Logistic Regression (Hosmer and Lemeshow (1989)) is a method coming from Statistics whose objective is to obtain a functional relationship between a transformation - from a qualitative variable - called logit and p predictor variables which can be either quantitative or qualitative. It is used to develop a model which attempts to adjust the best and to be sufficiently reasonable to describe the relation between the result (dependent variable or the response) and the set of independent (or explanatory) variables. The fundamental characteristic of this regression is 10
  • 11. that the dependent variable is dichotomic. Mathematically the function used in logistic distribution is extremely flexible and easy to use. Indicating the dichotomic variable to predict by Y and the p predictor variables by x1, ..,xp, the objective is to determine the coefficients β β β0 1, ,..., p in order to satisfy the logit transformation formula: g(X) = ln P Y P Y x xp p ( ) ( ) = = = + + + 1 0 0 1 1β β β The coefficients ( β0 is the intercept and β β1 ,..., p are the p parameters) determination is carried out by the method of maximum likelihood. These coefficients are interpreted in terms of "odd-ratios", and the variables selection can be carried out by three methods: forward, backward or stepwise. Stepwise method is the most commonly used one. In it the variables are selected to be included or excluded from a statistical criteria. After obtaining g(X), a classification model can be constructed. The logistic regression model is described by the following formula: π(x) = e g X e g X ( ) ( ) 1+ . It is used to classify new individuals starting from rules in the following way: “ If π(x) > c then individual i is classified as 0, on the contrary is classified as 1”. For this work the PROC LOGISTIC procedure of the SAS software has been used for the empirical comparison. 3.1.3 RULE INDUCTION In the task of constructing Expert Systems, systems for inducing concept descriptions from examples have proved useful in easing the bottleneck of knowledge acquisition. One of these induction systems, CN2 (The Turing Institute (1988)), has been used as a representative of the approach called Machine Learning. CN2 was designed for the efficient induction of simple, comprehensive production rules in domains where problems of poor description language and/or noise may be present. CN2 produces an ordered list of if-then rules, rather than an unordered set of if-then rules, but also is possible to produce the last one changing the evaluation function. The rules induced by CN2 are of the form: "if complex then predict class" where complex is a conjunction of attribute tests. In order to carry out a classification of new examples CN2 applies an interpretation in which each rule is tried in order until one is found whose conditions are satisfied by the example that is wanted to be classified. The resulting class prediction of this rule is then assigned as the class of that example. If no induced rules are satisfied, the final default rule assigns the most commonly occurring class in the training data for all new examples. The learning algorithm of CN2 works in an iterative fashion, by means of searching in each iteration for a complex covering a large number of examples of a single class C and few of Page - 11 -
  • 12. other classes. Having found a good complex, those examples it covers are removed from the training set and the rule "if complex then predict C" is added to the end of the rule list. This process iterates until no more satisfactory complexes can be found. In this project The CN2 Induction Algorithm developed by The Turing Institute has been used with the option to produce ordered list of if-then rules (The CN2 Induction Algorithm). 3.1.4 CLASSIFICATION TREES The Classification and Regression Trees (CART) software was developed by Breiman et al. (1984). CART is used for classification or regression analysis of large, complex data sets containing many variables, and is used to construct both binary prediction and classification trees by recursive partitioning (or targeted stratification). CART can construct a binary prediction tree with a structure which is easy to understand, interpret, and use. Tree construction starts by using computationally intensive algorithm that recursively searches over all the variables to produce a sequence of optimal binary splits, so that an extremely large tree is grown. A pruning algorithm is then applied which cuts off all branches of the tree that impair the overall accuracy. The result is a parsimonious decision tree: the simplest tree that gives the maximum accuracy. The tree can be inspected and used to modify the staging algorithm. The procedure is repeated until the best set is found. Cross-validation can be performed a number of times with each analysis. An introduction to Classification Trees paradigm which is a popular representation of classifiers. The interior nodes of a classification tree are tests applied to instances during classification. Branches from an interior node correspond to the possible test outcomes. Classification begins with the application of the root node test, its outcome determining the branch to a succeeding node. The process is recursively applied until a leaf node is reached. Then the instance is labeled with the class of the leaf node, and the process halts. The trees are constructed beginning with the root of the tree and proceeding down to its leaves. The family's palindrome name (TDIDT) emphasizes that its members carry out the top- down induction of decision trees. The patriarch of this family is Hunt's concept learning system (CLS) framework, (Hunt et al. (1966)). ID3 (Quinlan (1979) and Quinlan (1986)), one of the most famous TDIDT was developed from CLS, and used a selection measure based on the computation of an information gain for each variable, and the variable that maximizes this gain is selected. A notable disadvantage of this measure is that it is biased towards selecting variables with many different values. ACLS (Patterson et al. (1983)) and ASSISTANT (Cestnik et al. (1987)), acknowledges ID3 Quinlan (1979) as its direct ancestor. ASSISTANT has been used in several medical domains with promising results. In our example the statistical software SPAD.S has been used. In it the approximation proposed by Breiman et al. (1984) has been implemented. The procedure used for classification tree’s construction was DISAR. This procedure constructs complete binary decision trees for discrimination of 2 groups, categorized by modalities of one nominal variable. 3.1.5 BAYESIAN NETWORKS While the first attempts to building Expert Systems regarded probabilities as an underlying formalism, the large number of parameters to estimate - 2 1n − for the case of n dichotomic variables - forced the researches to reject this massive approach and instead adopt probabilistic models based on the hypothesis of independence among variables. These models presented the advantage of their simplicity but they were incapable of giving good solutions in problems with a high degree of interdependence among variables. As a consequence of these extreme positions the probability was not taken into account in the building of Expert Systems until the late 80s when Lauritzen and Spiegelhalter (1988) 12
  • 13. developed an algorithm for the propagation of evidence. This algorithm allowed probabilistic reasoning within graphical models which represented conditional independence among the variables of the system. Excellent introductions to Bayesian Network paradigm can be found in Castillo et al. (1997) and Jensen (1996). One possible classification of the structural learning methods can be carried out by grouping them into two main groups. (i) Methods based on hypothesis tests which check the assumption of conditional independence among variables and create a Bayesian Network structure which represents the obtained independence. (ii) Methods based on score and search, in which a measure of the goodness of fit (likelihood, entropy, percentage of well-classified) of a particular structure is defined as well as, a searching procedure over the space of all possible structures of Bayesian Networks. This searching procedure usually works like a greedy algorithm. In our case we used Genetic Algorithms as an intelligent metaheuristic in the searching process. Genetic Algorithms (Goldberg (1989)) use a direct analogy with the natural behavior. They work with a population of individuals, each individual representing a feasible solution to a given problem. To each individual we assign a value or score according to the goodness of that solution represented by that individual. The better the adaptation of the individual to the problem, the more probable is that the individual will be selected for reproduction, crossing its genetic material with other individual selected in the same way. This crossover will produce new individuals - offspring of the previous - which share some of the features of their parents. In this way a new population of feasible solutions is produced, replacing the previous one and verifying the interesting property of having a greater proportion of good features than the previous population. Thus, through these generations, good features are propagated through the population. Favoring the cross of the fittest individuals the most promising areas of the search space will be explored. Figure 5 summarizes the pseudo-code for the so-called Abstract Genetic Algorithm. In it, the parent selection doesn't need to be made by assigning to each individual a value proportional to its objective function, as is usual in the so-called Simple Genetic Algorithm. This selection can be carried out by any function that selects parents in a natural way. It is worth noticing that descendants are not necessarily the next generation of individuals, but that this generation is made up of the union of parents and descendants. That is why the operations of extension and reduction in the cycle are required. begin AGA Make initial population at random WHILE NOT stop DO BEGIN Select parents from the population Produce children from the selected parents Mutate the individuals Extend the population by adding the children to it Reduce the extended population END Output the best individual found end AGA Figure 5. The pseudo-code of the Abstract Genetic Algorithm Page - 13 -
  • 14. The individuals of the Genetic Algorithm will be Bayesian Network structures. A Bayesian Network structure, for a fixed domain with n variables, can be represented by a n x n connectivity matrix C, where its elements, cij , can be defined as: c j ij =    1 if is a parent of , 0 otherwise. ι In this genetic approach, an individual of the population is represented by the string: c c c c c c c c cn n n n nn11 21 1 12 22 2 1 2    As can be seen, in the case that a total order among the variables is assumed, Larrañaga et al. (1996a), the usual genetic operators are closed operators with respect to the DAG conditions. In the more general case in which there is no assumption of order among the variables, Larrañaga et al. (1996b), the usual genetic operators are not closed operators and to assume the closeness a repair operator is introduced. The objective of this repair operator is to transform the children structure that do not verify the DAG conditions into DAGs, by randomly eliminating the edges that invalidate the DAG conditions. This approach, as can be seen in the previous reference has yielded good results in the reconstruction of the ALARM net. Although, in the corporate failure problem the cardinal of the searching space is not too large, the previous approach has been used. See Larrañaga et al. (1996c) in order to consult another approach about the problem of structural learning of Bayesian Network in which an individual is a cyclic permutation of n considered variables, and the Genetic Algorithm uses crossover and mutation operators developed for the Traveling Salesman Problem. Genetic Algorithms have been used as optimizing means in other combinatorial problems that arise from the Bayesian Networks context. Thus, for example, in Larrañaga et al. (1997) they obtain good decomposition of the moral graph associated with the propagation algorithm proposed by Lauritzen and Spiegelhalter. Finally in Larrañaga et al. (1996d) the problem of the fusion of Bayesian Network proposed by different authors, seeking the consensual BN is handled. Taking into account that in a Bayesian Network any variable is only influenced by its Markov Blanket, that is, its parent, children variables and the parent variables of his children variables, it seems to be intuitive to search in the set of structures that are Markov Blanket of the special variable (Sierra and Larrañaga (1997)). Now, individuals in the Genetic Algorithms are Markov Blankets for the variable to be classified. One operator has been introduced that guarantees that the obtained children comply with a Markov Blanket of the variable to be classified. On the other hand the score used to search for the best Markov Blanket uses the percentage of well-classified individuals obtained by applying the evidence propagation feature of the HUGIN software, Andersen et al. (1989). 3.2 Methodologies - Combining Classifiers Recently in the area of Artificial Intelligence the concept of combining classifiers is proposed as a new direction for the improvement of the performance of individual classifiers. These classifiers could be based on a variety of classification methodologies, and could achieve different rate of correctly classified individuals. The goal of classification result integration algorithms is to generate more certain, precise and accurate system results. As explained in Xu et al. (1992), there are a couple of problems in the process of classifiers combination to be solved. 14
  • 15. First of all there is a problem of “how many classifiers to choose? and what kind of classifiers should they be?”. The second problem is to determinate the adequate technique to use to combine the results of chosen classifiers so that a better result could be obtained. In this work five individual classifiers are chosen coming from two different areas, Statistics and Machine Learning. To tackle the second problem a large variety of combining techniques was consulted, and finally two of the several approaches proposed in Xu et al. (1992), were chosen. These techniques are The Voting by Majority Principle and The Bayesian Formalism. Decision to use this combining approaches has been influenced by their generality. They allow to integrate different classifiers based on different methodologies, which at least can supply the output information at the abstract level - class j -, what is the only requirement these techniques have. The following are the two techniques for the combination of individual classifiers used. The combination is based on the output information of the abstract level. The representation that will be used to explain their characteristics is the following: “Given K individual classifiers, ek k=1,…K, each of which assigns to the input X a label jk , i.e. produces an event ek (X) = jk , the problem is to use these events to build an integrated classifier E, which gives X one definitive label j, i.e. E(X) = j, and j ∈ Λ ∪ {M+1}, where Λ = {1,2,…M}, representing a set of specified patterns called class and {M+1}denotes that e has no idea which class X comes from, or in other words, X is rejected by e.” (e.g. M=2 for corporate failure problem, and K=5 in this work - DA, LR, CN2, CART, and BAYES ). 3.2.1 VOTING BY MAJORITY PRINCIPLE As indicated before, the problem is to produce a new event E(X) = j from the given events ek (X) = jk , k=1,…K, where the following equation may not necessarily hold: e X e X e XK1 2( ) ( ) ... ( ).= = = That is, conflicts may exist among the decisions of K classifiers. A simple and common rule used for solving this kind of conflicts in human social life is Voting By Majority Principle. In it if the majority of the K classifiers vote for the same label then the combining classifier takes this label as the final result, where majority means that more than a half of the classifiers ek vote for the same label. The most general majority voting rule is the following: E X j if T X j max T X i K M otherwise E i E ( ) , ( ) ( ) , = = = = > +     ∈Λ 2 1 where T X i T X i i ME k k K ( ) ( ), ,...= = = = = ∑1 1 and T X i if e X iandi otherwisek k ( ) , ( ) , = = = ∈   1 0 Λ Page - 15 -
  • 16. that is, the combined classifier E decides that X comes from class j, if the majority of the K classifiers decide that X comes from class j, otherwise it rejects X. It is easy to see that classifiers that reject X have no impact on the combined E, unless the majority of them reject X or if the number K is even and the half of them vote for one class and half for another class. In this project the individual classifiers are five and none of them has the option to decide that class of individual X is unknown, so there won’t be any situation where integrated classifier E won’t know which class to assign to X or in other words there won’t be any X rejected by E. 3.2.2 THE BAYESIAN FORMALISM In the previous voting method that combine the results of individual classifiers, it is only based on the label outputted by each classifier (i.e., the event e X jk k( ) = ). Each of ek (X) = jk ’s is equally treated as one vote without considering the error of each ek itself. This method will take these errors into consideration, representing each classifier’s error by a confusion matrix that is given by: PT n n n n n n n n n k k k M k k k M k M k M k MM k =             11 12 1 21 22 2 1 2 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ... ... : : ... : ... for k=1,2…K (1) where each row i corresponds to class i and each column j corresponds to the event e X jk ( ) = . Thus, an element nij k( ) denotes that nij k( ) samples of class i have been assigned a label j by ek . It follows from (1) that the size of the sample is: N nk ij k j M i M ( ) ( ) = == ∑∑ 11 (2) in which the number of cases in each class i is: n n i Mi k ij k j M • ( ) ( ) , ,...= = = ∑1 1 (3) and the number of cases that are assigned class j by ek is: n n j Mj k ij k i M • ( ) ( ) , ,... .= = = ∑1 1 (4) For an event e X jk ( ) = of an error-bearing classifier ek , its truth (i.e., X comes from class j) has uncertainty. With the knowledge of its confusion matrix PTk , such an uncertainty could be described by the conditional probabilities that propositions X ∈ i, i=1,…M are true under the occurrence of the event e X jk ( ) = , that is: 16
  • 17. P X i e X j n n n n k ij k j k ij k ij k i M( / ( ) ) . ( ) • ( ) ( ) ( ) ∈ = = = = ∑1 (5) From another point of view , the confusion matrix PTk could be regarded as the prior knowledge of an expert. The expert can express his belief for the event e X jk ( ) = , by a real number bel(.) , called belief value. The higher the bel(.) he gives to a proposition, the more likely it is true. With the knowledge of PTk , bel(.) for proposition X i∈ is given in the form of a conditional probability, given by (5), represented with the following equation: bel X i e X j P X i e X j n n n n k k ij k j k ij k ij k i M( / ( ) ) ( / ( ) ) . ( ) • ( ) ( ) ( ) ∈ = = ∈ = = = = ∑1 (6) With K classifiers e e eK1 2, ,..., ,there will be K matrices PT PTK1 ,..., . When these classifiers are used on the same input X, K events ek (X) = jk , k=1,…K will happen. Each ek (X) = jk and its corresponding PTk could supply a set of: bel X i e X jk( / ( ) )∈ = , i=1,…M. Now, the question is how to integrate these individual supports to give the combined value bel i( ) . Xu et al. (1992) deduce that bel i( ) could be calculated as follows: bel i P X i e k X k K j k ( ) ( / ( ) )= ∈ = ∏ =η 1 (7) with η as an constant that ensures that bel i i M ( ) = = ∑ 1 1 (since x ∈ i, i=1,…M are mutually exclusive and exhaustive). That is: 1 11η = ∈ = ∏ = ∑ =P(X i / e k k K i M (X) j k ). (8) Finally, depending on these bel i( ) values, integrated classifier E can classify X into a class according to the decision rule given by: E X j ifbel j max i beli M otherwise ( ) , ( ) () , =     = ∈ + Λ 1 that is, E decides that X comes from class j, if the belief value bel j( ) is the greater than any other belief bel i( ) i=1,…M, otherwise it rejects X or in other words E has no idea about which class X comes from. 4. Results Page - 17 -
  • 18. In this section the results of applying the methods described in the previous section to the problem of corporate failure are explained in full detail. The models for each one of five paradigms used in this project, and for a period of time before failure (1 year, 2 years and 3 years) are shown in this section. The results are represented with the percentage of well-classified companies for each paradigm and period. Discriminant Analysis (DA) Experiments with this paradigm were carried out using the SPSS software. The following are the classifiers, obtained with a sample of 120 companies, for each one of the 3 consecutive years before the failure, using DA. Three years before the failure S(F) = -2.0081765 + X4*2.2476778 + X5*2.529652 Scores for the centers of groups are: - class 1 = -0.33467 - class 2 = 0.33467 X4= Earnings before interests and taxes / financial charges X5= Own funds / Total debt S(F): Score of Failure where: “if S(F) > 0 then the corporate is classified as failed, on contrary as non-failed” Two years before the failure S(F) = -1.4674983 + X2*2.3656861 + X3*-8.1494602 + X5*3.1317845 Scores for the centers of groups are: - class 1 = -0.63454 - class 2 = 0.63454 X2= Current assets / Total assets X3= Net result / Total assets X5= Own funds / Total debt X7= Stocks / Sales S(F): Score of Failure where: “if S(F) > 0 then the corporate is classified as failed, on contrary as non-failed” One year before the failure S(F) = -0.5673571 + X3*9.364902 + X5*0.5510232 Scores for the centers of groups are: - class 1 = -0.75698 - class 2 = 0.75698 18
  • 19. X3= Net result / Total assets X5= Own funds / Total debt S(F): Score of Failure where: “if S(F) > 0 then the corporate is classified as failed, on contrary as non-failed” Logistic Regression (LR) Experiments with this paradigm were carried out using the SAS software. The following are the classifiers, obtained with a sample of 120 companies, for each one of the 3 consecutive years before the failure, using LR. The following probability function: P(F) = e g X e g X ( ) ( ) 1+ is the same for each one of the three models. where: P(F) is The Failure Probability and “if P(F) > 0.5 then the corporate is classified as failed, on contrary as non-failed” Three years before the failure g(X) = -1.3287 + X4*1.4877 + X5*1.6896 X4= Earnings before interests and taxes / financial charges X5= Own funds / Total debt Two years before the failure g(X) = -0.4999 + X3*-13.5892 + X5*3.8058 X3= Net Result / Total Assets X5= Own funds / Total debt One year before the failure g(X) = -1.2854 + X3*26.1304 + X5*1.3535 X3= Net Result / Total Assets X5= Own funds / Total debt Rule Induction (CN2) Page - 19 -
  • 20. Experiments with this paradigm were carried out using the CN2 algorithm. The following are classifiers obtained with a sample of 120 companies as a training set for each one of the 3 consecutive years before the failure, using CN2 algorithm. Models are represented with ordered if-then rules lists, as it can be seen in Figure 10, 11, and 12. Three years before the failure if ((x4 < 0.70) && (x8 < 0.11)) then X=NON-FAILED else if (( x2 > 0.52 ) && (x3 > 0.01) && ( x5 > 0.68 ) && (x9 > 0.04)) then X=FAILED else if (( x2 > 0.32 ) && ( x4 > 0.76 ) && ( x9 < 0.15 )) then X=FAILED else if (( x1 > 0.06 ) && ( x2 < 0.51 ) && ( x8 > 0.17 )) then X=NON-FAILED else if (( x1 > 0.16 ) && ( x5 < 0.56 ) && ( x7 < 0.09 ) && (x8 > 0.19)) then X=FAILED else if (( x2 > 0.88 ) && ( x6 > 0.22 )) then X=NON-FAILED else if (( x2 < 0.87 ) && ( x6 < 0.19 ) && ( x7 > 0.14 )) then X=NON-FAILED else if (( x4 < 0.39 ) && ( x6 < 0.26 )) then X=FAILED else if (( x1 < 0.98 ) && ( x6 < 0.41 )) then X=NON-FAILED else if (( x2 > 0.74 ) && ( x3 > 0.02)) then X=FAILED else if ((x4 < 0.61) && (x9 > 0.19)) then X=NON-FAILED else if ((x3 < 0.25) && ( x9 > 0.08 )) then X=FAILED else if (( x1 < 0.90 ) && ( x4 > 0.08) && ( x7 > 0.05)) then X=NON-FAILED else X=FAILED Figure 10. Rule Induction model for three years before the failure Two years before the failure if ((x4 > 0.17) && (x5 > 0.50) && (x8 > 0.19)) then X=FAILED else if (( x3 > 0.09 ) && ( x8 > 0.20 )) then X=NON-FAILED else if (( x1 > 0.06 ) && ( x7 > 0.30 ) && ( x7 < 0.44 )) then X=NON-FAILED else if (( x1 > 0.02 ) && ( x6 > 0.75 ) && ( x9 < 0.19 )) then X=FAILED else if (( x1 > 0.25 ) && ( x4 > 0.06 ) && ( x9 < 0.05 )) then X=NON-FAILED else if (( x1 > 0.11 ) && ( x2 > 0.74 ) && ( x9 < 0.24 )) then X=FAILED else if (( x4 > 0.08 ) && ( x7 > 0.22 ) && ( x7 < 0.34 )) then X=NON-FAILED else if (( x1 < 0.72 ) && ( x2 > 0.48 ) && ( x2 < 0.71 ) && ( x3 < 0.10 )) then X=FAILED else if (( x3 < 0.10 ) && ( x6 > 0.10 )) then X=NON-FAILED else if ( x2 > 0.69 ) then X=FAILED else X=NON-FAILED Figure 11. Rule Induction model for two years before the failure One year before the failure 20
  • 21. if (x4 < 0.80) then X=NON-FAILED else if (( x1 > 1.41 ) && ( x4 > 0.87 ) && ( x7 < 0.36)) then X=FAILED else if (( x4 > 0.95 ) && ( x5 < 0.22 )) then X=NON-FAILED else if (( x5 < 0.36 ) && ( x8 > 0.03 )) then X=FAILED else if (( x4 < 1.90 ) && ( x6 > 4.69 )) then X=NON-FAILED else if (( x1 > 0.81 ) && ( x1 < 1.38 ) && ( x7 < 0.23 )) then X=FAILED else if (( x8 > 0.17 ) && ( x7 < 0.67 )) then X=NON-FAILED else X=FAILED Figure 12. Rule Induction model for one year before the failure The failure probability is always 100%, whether the corporate is classified as failed or as non-failed. For The Classification Trees and Bayesian Networks paradigms, variables has been categorized into three categories. The following is the explanation of how this process has been carried out for each one of the 3 years. For each group of 60 healthy companies and 60 of failed companies, and for each one of the nine ratio-variables means were calculated, in order to replace the missing values that the original sample contained. Later, for each ratio-variable, using FREQUENCIES and DESCRIPTIVES procedures from the SPSS software, their distributions were analyzed. And finally, the values of each ratio-variable were separated in three categories - 1, 2, and 3 -, according to their cumulative percentages. The next figure shows the process of categorization. *******************"3 years before the failure"************************ x1 (lowest through 0.30=1) (0.31 through 0.66=2) (0.67 through highest=3) x2 (lowest through 0.63=1) (0.64 through 0.77=2) (0.78 through highest=3) x3 (lowest through 0.02=1) (0.03 through 0.06=2) (0.07 through highest=3) x4 (lowest through 0.21=1) (0.22 through 0.61=2) (0.62 through highest=3) x5 (lowest through 0.24=1) (0.25 through 0.51=2) (0.52 through highest=3) x6 (lowest through 0.32=1) (0.33 through 0.68=2) (0.69 through highest=3) x7 (lowest through 0.11=1) (0.12 through 0.22=2) (0.23 through highest=3) x8 (lowest through 0.21=1) (0.22 through 0.32=2) (0.33 through highest=3) x9 (lowest through 0.07=1) (0.08 through 0.16=2) (0.17 through highest=3) ******************"2 years before the failure"************************ x1 (lowest through 0.26=1) (0.27 through 0.62=2) (0.63 through highest=3) x2 (lowest through 0.61=1) (0.62 through 0.74=2) (0.75 through highest=3) x3 (lowest through 0.02=1) (0.03 through 0.06=2) (0.07 through highest=3) x4 (lowest through 0.24=1) (0.25 through 0.49=2) (0.50 through highest=3) Page - 21 -
  • 22. x5 (lowest through 0.20=1) (0.21 through 0.43=2) (0.44 through highest=3) x6 (lowest through 0.31=1) (0.32 through 0.59=2) (0.60 through highest=3) x7 (lowest through 0.13=1) (0.14 through 0.24=2) (0.25 through highest=3) x8 (lowest through 0.24=1) (0.25 through 0.36=2) (0.37 through highest=3) x9 (lowest through 0.05=1) (0.06 through 0.12=2) (0.13 through highest=3) *******************"1 year before the failure"*********************** x1 (lowest through 1.05=1) (1.06 through 1.46=2) (1.47 through highest=3) x2 (lowest through 0.63=1) (0.64 through 0.77=2) (0.78 through highest=3) x3 (lowest through -0.01=1) (0.0 through 0.04=2) (0.05 through highest=3) x4 (lowest through 0.89=1) (0.90 through 1.89=2) (1.90 through highest=3) x5 (lowest through 0.32=1) (0.33 through 0.85=2) (0.86 through highest=3) x6 (lowest through 2.89=1) (2.90 through 5.89=2) (5.90 through highest=3) x7 (lowest through 0.11=1) (0.12 through 0.22=2) (0.23 through highest=3) x8 (lowest through 0.21=1) (0.22 through 0.32=2) (0.33 through highest=3) x9 (lowest through -0.05=1) (-0.04 through 0.05=2) (0.06 through highest=3) Figure 6. The process of categorization of ratio variables Classification Trees (CART) To carry out experiments with this paradigm the SPAD.N and the SPAD.S software have been used. The following are the classifiers obtained with a sample of 120 companies as a training set, for each one of the 3 consecutive years before the failure, using the procedure DISAR. The models that represent these classifiers are the binary classification trees shown in Figures 7, 8 and 9. Three years before the failure Node 1 Size = 120 H=60 and F=60 X7 /= 1 X 7= 1 Node 2 Node 3 Size = 84 Size = 36 22
  • 23. H=48 and F=36 H=12 and F=24 X5 = 3 X5 /= 3 X4 = 3 X4 /= 3 Node 4 Node 5 Node 6 Node 7 Size = 26 Size = 58 Size = 13 Size = 23 H=11 and F=15 H=37 and F=21 H=2 and F=11 H=10 and F=13 FINAL NODE X1 /= 1 X1 = 1 X1 /= 2 X1 = 2 X2 = 3 X2 /= 3 Node 8 Node 9 Node 10 Node 11 Node 14 Node 15 Size = 16 Size = 10 Size = 43 Size = 15 Size = 4 Size = 19 H=4 and F=12 H=7 and F=3 H=4 and F=12 H=12 and F=3 H=0 and F=4 H=10 and F=9 FINAL NODE FINAL NODE X6 = 3 X6/= 3 X2 /= 1 X2 = 1 X5 = 2 X5/= 2 Node 16 Node 17 Node 18 Node 19 Node 20 Node 21 Size = 7 Size = 9 Size = 6 Size = 4 Size = 21 Size = 22 H=0 and F=7 H=4 and F=5 H=3 and F=3 H=4 and F=0 H=9 and F=12 H=16 and F=6 FINAL NODE FINAL NODE X8 /= 2 X8 = 2 X8 /= 1 X8 = 1 X7 /=3 X7 = 3 Node 34 Node 35 Node 36 Node 37 Node 40 Node 41 Size = 7 Size = 2 Size = 4 Size = 2 Size = 9 Size = 12 H=2 and F=5 H=2 and F=0 H=3 and F=1 H=0 and F=2 H=1 and F=8 H=8 and F=4 FINAL NODE FINAL NODE FINAL NODE X3 /= 2 X3 = 2 X3 = 3 X3 /= 3 X3 = 3 X3 /= 3 Node 68 Node 69 Node 80 Node 81 Node 82 Node 83 Size = 4 Size = 3 Size = 1 Size = 8 Size = 2 Size = 10 H=0 and F=4 H=2 and F=1 H=1 and F=0 H=0 and F=8 H=0 and F=2 H=8 and F=2 FINAL NODE FINAL NODE FINAL NODE FINAL NODE FINAL NODE FINAL NODE Node 21 Size = 22 H=16 and F=6 X3 = 3 X 3 /= 3 Node 42 Node 43 Size = 7 Size = 15 H=7 and F= 0 H=9 and F= 6 FINAL NODE X2 = 3 X2 /= 3 Node 86 Node 87 Size = 8 Size = 7 Page - 23 -
  • 24. H=6 and F=2 H=3 and F=4 X6 /= 2 X6 = 2 X1 = 3 X /= 3 Node 172 Node 173 Node 174 Node 175 Size = 7 Size = 1 Size = 4 Size = 3 H=6 and F=1 H=0 and F=1 H=3 and F=1 H=0 and F=3 FINAL NODE FINAL NODE FINAL NODE FINAL NODE Node 15 Node 121 Size = 19 Size = 8 H=10 and F=9 H=5 and F=3 X9 /= 2 X9 = 2 X3 = 3 X3 /= 3 Node 30 Node 31 Node 242 Node 243 Size = 16 Size = 3 Size = 6 Size = 2 H=10 and F=6 H=0 and F=3 H=10 and F=6 H=2 and F=0 FINAL NODE FINAL NODE X3 /= 1 X3 = 1 X9 = 3 X9 /= 3 Node 60 Node 61 Node 484 Node 485 Size = 13 Size = 3 Size = 4 Size = 2 H=7 and F=6 H=3 and F=0 H=1 and F=3 H=2 and F=0 FINAL NODE FINAL NODE FINAL NODE X3 /= 1 X3 = 1 Node 120 Node 121 Size = 5 Size = 8 H=2 and F=3 H=5 and F=3 FINAL NODE Figure 7. Classification Tree model for three years before the failure Two years before the failure Node 1 Size = 120 H=60 and F=60 X5 = 3 X 5 /= 3 Node 2 Node 3 Size = 41 Size = 79 H=11 and F=30 H=49 and F=30 X7 /= 1 X7 = 1 X3 = 3 X3 /= 3 Node 4 Node 5 Node 6 Node 7 Size = 25 Size = 16 Size = 27 Size = 52 H=11 and F=14 H=0 and F=16 H=23 and F=4 H=26 and F=26 FINAL NODE FINAL NODE X4 /= 1 X4 = 1 X9 /= 1 X9 = 1 24
  • 25. Node 8 Node 9 Node 14 Node 15 Size = 16 Size = 9 Size = 37 Size = 15 H=4 and F=12 H=7 and F=2 H=15 and F=22 H=11 and F=4 FINAL NODE X6 = 3 X6/= 3 X2 = 3 X2 /= 3 X4 /= 2 X4 = 2 Node 16 Node 17 Node 18 Node 19 Node 28 Node 29 Size = 10 Size = 6 Size = 3 Size = 6 Size = 27 Size = 10 H=4 and F=6 H=0 and F=6 H=1 and F=2 H=6 and F=0 H=14 and F=13 H=1 and F=9 FINAL NODE FINAL NODE FINAL NODE FINAL NODE X1 /= 1 X1 = 1 X5 = 2 X5 /=2 Node 32 Node 33 Node 56 Node 57 Size = 8 Size = 2 Size = 14 Size = 13 H=2 and F=6 H=2 and F=0 H=5 and F= 9 H=9 and F=4 FINAL NODE FINAL NODE FINAL NODE X9 = 3 X9 /=3 Node 112 Node 113 Size = 8 Size = 6 H=5 and F=3 H=0 and F=6 FINAL NODE X3 = 2 X3 /=2 Node 224 Node 225 Size = 4 Size = 4 H=1 and F=3 H=4 and F=0 FINAL NODE FINAL NODE Figure 8. Classification Tree model for two years before the failure One year before the failure Node 1 Size = 120 H=60 and F=60 X4 = 3 X 4 /= 3 Node 2 Node 3 Size = 40 Size = 80 H=3 and F=37 H=57 and F=23 X7 = 3 X7/= 3 X3 /= 1 X3 = 1 Node 4 Node 5 Node 6 Node 7 Size = 8 Size = 32 Size = 40 Size = 40 H=2 and F=6 H=1 and F=31 H=22 and F=18 H=35 and F=5 FINAL NODE FINAL NODE X4 /= 1 X4 = 1 X2 /= 2 X2 = 2 Node 8 Node 9 Node 12 Node 13 Size = 7 Size = 1 Size = 27 Size = 13 H=1 and F=6 H=1 and F=0 H=19 and F=8 H=3 and F=10 Page - 25 -
  • 26. FINAL NODE FINAL NODE X1 /= 1 X1 = 1 X9 /= 1 X9 = 1 Node 24 Node 25 Node 26 Node 27 Size = 16 Size = 10 Size = 8 Size = 5 H=8 and F=8 H=11 and F=0 H=0 and F=8 H=3 and F=2 FINAL NODE FINAL NODE FINAL NODE X8 /= 1 X8 =1 Node 48 Node 49 Size = 10 Size = 6 H=8 and F= 2 H=0 and F=6 FINAL NODE X9 = 3 X9 /=3 Node 96 Node 97 Size = 2 Size = 8 H=0 and F=2 H=8 and F=0 FINAL NODE FINAL NODE Figure 9. Classification Tree model for one year before the failure Where: F : represents class-failed (Failed) H : represents class-non-failed (Healthy) FINAL NODE: is a leaf node where the process ends, and where letter F or H represents that for individual that comes at this node the class assigned is failed or non-failed respectively. The failure probability is calculated with H H F+ in the case when corporate is classified as non-failed, reaching the leaf node, or with F H F+ when classified as failed. Bayesian Networks Experiments with Bayesian Networks were carried out using the HUGIN software, Genetic Algorithm, and Markov Blanket. Models were obtained with a sample of 120 companies as a training set for each one of the 3 years prior to failure are shown in Figure 13, 14 and 15. Initially, P(X) = 0.50, which means that there is the same probability that the corporate will fail as that it will not. After giving real values to the nine variables it’s destiny is predicted on the following way: - for three years before the failure: “if P(X) > 0.30 then the corporate is classified as non-failed, on contrary as failed” 26
  • 27. - for two years before the failure: “if P(X) > 0.40 then the corporate is classified as non-failed, on contrary as failed” - for one year before the failure: “if P(X) > 0.45 then the corporate is classified as non-failed, on contrary as failed” Three years before the failure X9 = OCF º / TOTAL ASSETS X8 = DEBTORS / SALES X9 = 1 29 % X8 = 1 33 % X9 = 2 32 % X8 = 2 31 % X9 = 3 39 % X8 = 3 36 % X = NON-FAILED / FAILED X = 1 - P(X) = 50 % X = 2 - P(X) = 50 % X1 = 1 - 33 % X1 = 2 - 34 % X1 = 3 - 33 % X1 = CURRENT ASSETS / CURRENT LIABILITIES X6 = 1 - 33 % X4 = 1 - 33 % X6 = 2 - 34 % X4 = 2 - 32 % X6 = 3 - 32 % X4 = 3 - 35 % X6 = SALES / OWN FUNDS X4 = EBIT / FINANC. CHARGES Page - 27 -
  • 28. X5 = OWN FUNDS / TOTAL DEBT X2 = CURR. ASSETS / TOT. ASSETS X5 = 1 - 32 % X2 = 1 - 32 % X5 = 2 - 33 % X2 = 2 - 35 % X5 = 3 - 34 % X2 = 3 - 33 % X7 = STOCKS / SALES X3 = NET RESULT / TOTAL ASSETS X7 = 1 - 30 % X3 = 1 - 24 % X7 = 2 - 38 % X3 = 2 - 43 % X7 = 3 - 32 % X3 = 3 - 32 % Figure 13. Bayesian Networks model for three years before the failure Two years before the failure X7 = 1 30 % X7 = STOCKS / SALES X7 = 2 35 % X7 = 3 35 % X = NON-FAILED / FAILED X = 1 - P(X) = 50 % X = 2 - P(X) = 50 % X2 = 1 - 33 % X2 = 2 - 33 % X2 = 3 - 33 % X2 = CURRENT ASSETS / TOTAL ASSETS X6 = 1 - 33 % X4 = 1 - 33 % X6 = 2 - 33 % X4 = 2 - 33 % X6 = 3 - 34 % X4 = 3 - 33 % X6 = SALES / OWN FUNDS X4 = EBIT / FINANC. CHARGES X1 = 1 - 33 % X1 = 2 - 33 % X1 = 3 - 33 % X5 = OWN FUNDS / TOTAL DEBT X1 = CURR. ASSETS / CURR. LIABIL. X5 = 1 - 33 % X5 = 2 - 33 % X5 = 3 - 33 % X8 = DEBTORS / SALES X9 = OCF º / TOTAL ASSETS 28
  • 29. X8 = 1 - 34 % X9 = 1 - 34 % X8 = 2 - 32 % X9 = 2 - 31 % X8 = 3 - 34 % X9 = 3 - 35 % X3 = NET RESULT / TOTAL ASSETS X3 = 1 - 30 % X3 = 2 - 34 % X3 = 3 - 36 % Figure 14. Bayesian Networks model for two years before the failure One year before the failure X = NON-FAILED / FAILED X = 1 - P(X) = 50 % X = 2 - P(X) = 50 % X1 = CURR. ASSETS / CURR. LIABILITIES X7 = STOCKS / SALES X1 = 1 - 35 % X4 = 1 - 31 % X1 = 2 - 31 % X4 = 2 - 32 % X1 = 3 - 34 % X4 = 3 - 37 % X3 = NET RES. / TOT. ASS. X3 = 1 - 33 % X8 = DEBTORS / SALES X3 = 1 - 38 % X3 = 1 - 29 % X8 = 1 - 34 % X6 = SALES / OWN FUNDS X8 = 1 - 32 % X8 = 1 - 34 % X6 = 1 - 33 % X6 = 1 - 33 % X6 = 1 - 33 % X2 = CURR. ASSETS / TOT. ASSETS X2 = 1 - 33 % X9 = OCF º / TOTAL ASSETS X2 = 1 - 33 % X2 = 1 - 34 % X9 = 1 - 35 % X9 = 1 - 28 % X4 = EBIT / FIN. CHARGES X9 = 1 - 38 % X4 = 1 - 32 % X4 = 1 - 34 % X4 = 1 - 33 % X5 = OWN FUNDS / TOTAL DEBT X5 = 1 - 33 % Page - 29 -
  • 30. X5 = 2 - 33 % X5 = 3 - 33 % Figure 15. Bayesian Networks model for one year before the failure Results obtained with the aforementioned models are resumed in the next two tables with the percentage of well classified companies for each paradigm and period of time (1 year, 2 years, and 3 years before the failure). Table 1 summarize these results regarding the goodness of fit of each one of paradigms calculated by the 5-fold cross-validation method, explained in Section 2. PARADIGMS Years before the failure DA LR CART CN2 BAYESIAN NETWORKS 1 78.33 82.50 79.17 80.0 60.83 2 69.17 69.16 60.00 66.66 62.00 3 55.00 55.00 45.00 57.50 60.83 Table 1. Results regarding the 5-fold cross-validation As showed in Table 1, the more you go back in time the worse the results are. Though in Bayesian Networks they remain more or less constants in time. These results can be compared with the ones shown in Table 2. In it can be seen the goodness-of-fit of each paradigm calculated with the second validation method proposed in Section 2. PARADIGMS Years before the failure DA LR CART CN2 BAYESIAN NETWORKS 1 81.00 82.5 89.00 95.17 91.17 2 72.17 71.51 84.50 91.17 92.00 3 59.84 60.33 76.67 88.00 92.17 Table 2. Results regarding the second validation In this case it is worth mentioning the good behavior of paradigms such as CN2 and Bayesian Networks as well as the Classification Trees. The goal of any algorithm for integration of classification results is to generate more certain, precise and accurate results. Two experiments were conducted in order to compare the performance of the aforementioned methods and combining techniques. The following two tables summarize the results got applying combining methods - Voting By Majority Principle and Bayesian Formalism -, to classification results when 5-fold cross-validation is used (explained in Section 2). PARADIGMS Years before the failure DA LR CART CN2 BAYESIAN NETWORKS VOTING BY MAYORITY 1 78.33 82.50 79.17 80.0 60.83 88.33 30
  • 31. 2 69.17 69.16 60.00 66.66 62.00 79.17 3 55.00 55.00 45.00 57.50 60.83 73.33 Table 3. Results regarding the 5-fold cross-validation and Voting By Majority Principle As expected Voting By Majority Principle has succeeded in improving classification results, being better than the best individual classifier for any of the 3 years. These pretty good results can be compared with the ones obtained applying Bayesian Formalism combining method, whose results are shown in Table 4. PARADIGMS Years before the failure DA LR CART CN2 BAYESIAN NETWORKS BAYESIAN FORMALISM 1 78.33 82.50 79.17 80.0 60.83 73.33 2 69.17 69.16 60.00 66.66 62.00 75.83 3 55.00 55.00 45.00 57.50 60.83 62.50 Table 4. Results regarding the 5-fold cross-validation and Bayesian Formalism As can be noticed, Bayesian Formalism presents better results that the best individual classifier only for 3 and 2 years before the failure, but in the case of 1 year it is, surprisingly, only better than the worst classifier. It is interesting to mention, that when the second validation method is used to produce classification results for individual classifiers these two combining methods reach 100% in classification, for each one of the 3 years. It is too good to be truth, but still this situation could be understood as, when more classifiers we have (in this case there are 25 models, 5 for each paradigm) the more the possibility that the majority of them vote for the real class and that Bayesian Formalism reaches 100% too. In the following Section, comparison of results is presented, trying to reach some conclusions about the performance of individual classifiers and their integration. 5. Conclusions and Future Work Techniques coming from Statistics and Artificial Intelligence have been applied to the problem of Corporate Failure prediction in different period of time. The discrepancy between the results obtained when training set is with 96 of 120 companies and the test set is with 120 of them, and the ones obtained when the 5-fold cross- validation suggests the existence of a problem of overfitting in the Bayesian Networks (Markov Blankets) and CN2 paradigms. It is possible that these problems could be relieved by penalizing complex structures in the learning process of these paradigms - see Akaike’s criteria, Minimum Description Length, … - and by trying to guide the search by the parsimony principle. Page - 31 -
  • 32. It is interesting to see that Discriminant Analysis and Logistic Regression models select the same ratio-variables as significant ones, for all of the three years, and that the percentages of well-classified individuals are rather similar, being Logistic Regression classifier a little bit better. Observing Classification Trees it is easy to notice that, as the date of failure is closer the depth of tree is smaller, which can be understood as, the closer the date of failure is less variables are necessary to be examined, and sooner the conclusions about the “destiny” of the individuals in testing set are reached. This phenomenon is also present in ordered if-then-rule lists. Here the closer the date of failure is less rules are needed to be checked out to determine the “destiny” of the corporate. For three years before the failure there are thirteen rules in the list, ten for two years and just seven for one year. Bayesian Networks paradigm provides another aspect of interest. Observing the three networks it can be seen that the closer the date of failure is more connected nodes are, - there are more dependencies between the variables -, and the thresholds are greater. Analyzing classification results when combining techniques are used it can be seen that they really achieve their main goal, to improve the performance of individual models. Voting By Majority Principle is quite better than Bayesian Principle, which for one year before the failure only improve the performance of the worst individual one. Another aspect which deserves further investigation is the development or implementation of procedures that would achieve even better results that these two used in this project. There are several approaches. One of them is to use another kind of information outputted by individual classifiers too, like probability that the class for individual X is the right one. Another one is to guide the performance of integrated classifiers with the goodness of each one of individual classifiers. 32
  • 33. ACKNOWLEDGEMENTS The authors wish to thank Dr. Fermin Lizarraga for providing the original data sample and his doctoral dissertation, which was the guidance for comprehension of the corporate failure problem, and of the use of financial ratios for failure prediction. The authors acknowledge Miren Josune Gallego for helpful guidance in the use of the software packages, specially with the SAS software. The authors would also like to thank the Diputación Foral de Guipuzcoa since this study was supported under grant no. OF 131/1997. REFERENCES • Altman, E.I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Business Failure. Journal of Finance, 589-609. • Andersen, S.K., Olesen, K.G., Jensen, F.V. and Jensen, F. (1989). HUGIN - a shell for building Bayesian belief universes for Expert Systems. Eleventh International Joint Conference on Artificial Intelligence, vol. I, 1128-1133. • Argenti, J. (1976). Corporate Collapse: the Causes and Symptoms. McGraw-Hill. London. • Beaver, W. (1966). Financial Ratios as Predictors of Failure. Empirical Research in Accounting: Selected Studies. Supplement of Journal of Accounting Research, 71-111. • Blum, M. (1974). Failing Company Discriminant Analysis. Journal of Accounting Research, 1-23. • Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees. Monterey, CA: Wadswooeth and Brooks. • Casey, C.J. and Baztczak, N.J. (1985). Cash-Flow: it’s not the bottom line. Harvard Business Review, 61-66. • Castillo, E., Gutierrez, J.M. and Hadi, A.S. (1997). Expert Systems and Probabilistic Network Models. Springer-Verlag. • Cestnik, B., Kononenko, I. and Bratko, I. (1987). ASSISTANT 86: A knowledge-elicitation tool for sophisticated users, in Bratko, I. and Lavrac, N. (Eds.) Progress in Machine Learning, Sigma Press, Wilmslow. • Clark, P. and Niblett, T. (1989). The CN2 Induction Algorithm, Machine Learning, 3(4), 261-283. • Deakin, E.B. (1972). A Discriminant Analysis of Predictors of Business Failure. Journal of Accounting Research, 167-179. • Dizdarevic, S., Lizarraga F., Larrañaga P., Sierra B. and Gallego M.J. (1997). Statistical and Machine Learning Methods in the prediction of Bankruptcy. International Meeting on Artificial Intelligence in Accounting Finances and Taxes, Huelva, Spain, 85-100. • Edmister, R.O. (1972). An Empirical Test of Financial Ratio Analysis for Small Business Failure Prediction. Journal of Financial and Quantitative Analysis. Vol. 7, 1477-1493. • Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188. • Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA. • Ho T.K., Hull J.J. and Srihari S.N. (1994). Decision Combination in Multiple Classifier Systems, IEEE Transactions on Pattern analysis and machine intelligence, Vol. 16. 1. January, 66-75. • Hosmer, D. W. and Lemeshow, S. (1989). Applied Logistic Regression. Wiley Series in Probability and Mathematical Statistics. • Hunt, E.B., Marin, J. and Stone P.J. (1966). Experiments in Induction, Academic Press. • Jensen, F.V. (1996). Introduction to Bayesian networks. University College of London. Page - 33 -
  • 34. • Larrañaga, P., Murga, R., Poza, M. and Kuijpers, C. (1996a). Structure Learning of Bayesian Networks by Hybrid Genetic Algorithms. Learning from Data: AI and Statistics V, Lecture Notes in Statistics 112. D. Fisher, H.-J. Lenz (eds.), New York, NY: Spriger- Verlag, 165-174. • Larrañaga, P., Poza, M., Yurramendi, Y., Murga, R. and Kuijpers, C. (1996b). Structure Learning of Bayesian Networks by Genetic Algorithms: A Performance Analysis of Control Parameters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 912-926. • Larrañaga, P., Kuijpers, C., Murga, R. and Yurramendi, Y. (1996c). Bayesian Network Structures by searching for the best ordering with genetic algorithms. IEEE Transactions on System, Man and Cybernetics. Vol. 26, no. 4, 487-492. • Larrañaga, P., Kuijpers, C., Murga, R., Yurramendi, Y., Graña, M., Lozano, J.A., Albizuri, X., D'Anjou, A. and Torrealdea, F.J. (1996d). Genetic Algorithms applied to Bayesian Networks. A. Gammerman (ed.). Computational Learning and Probabilistic Reasoning. John Wiley, 211-234. • Larrañaga, P., Kuijpers, C., Poza, M. and Murga, R. (1997). Decomposing Bayesian Networks by Genetic Algorithms. Statistics and Computing. No 7, 19-34. • Lauritzen, S.L. (1996). Graphical models. Oxford Science Publications. • Lauritzen, S.L., and Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their application on Expert Systems. J.R. Statist. Soc. B, vol. 50, no. 2, 157-224. • Libby, R. (1975). Accounting Ratios and the Prediction of Failure: Some Behavioral Evidence. Journal of Accounting Research, 150-161. • Lizarraga, F. (1996). Modelos Multivariantes de predicción del fracaso empresarial: una aplicación a la realidad de la información contable española. Ph.D. Public University of Navarra. • Mensah, Y. (1983). The Differential Bankruptcy Predictive Ability of Specific Price Level Adjustments: Some Empirical Evidence, Accounting Review, 228-246. • Michie, D., Spiegelhalter, D. J. and Taylor, C. C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood Series in Artificial Intelligence. New York. • Ohlson, J.A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 18, 1, 109-111. • Patterson, A. and Niblett, T.(1983). ACLS user manual, Intelligent Terminals Ltd., Glasgow. • Peel, M.J. and Peel, D.A. (1987). Some Further Empirical Evidence on Predicting Private Company Failure. Accounting and Business Research, 18, 69, 57-66. • Platt, H.D. (1985). Why Companies Fail: Strategies for Detecting Avoiding and Profiting from Bankruptcy. Lexington Books. Massachusetts. • Quinlan, J.R. (1979). Discovering rules by induction from large collection of examples, in Expert systems in the micro electronic age, Edinburgh University Press. • Quinlan, J.R. (1986). Induction of Decision Trees, Machine Learning, 1(1), 81-106. • SAS Institute Inc. (1993). SAS Language: Reference, Version 6, SAS Institute Inc. • Scott, J. (1981). The probability of Bankruptcy: A Comparison of Empirical Predictions and Theoretical Models. Journal of Banking and Finance, 317-344. • Sierra B. and Larrañaga P. (1997). Searching for the optimal Bayesian Network in classification tasks by Genetic Algorithms, WUPES 97, 144-154. • SPAD.S (1994). Version 2.5 Sistema Compatible para el Análisis de Datos. • SPSS Inc. (1990). SPSS Reference Guide, SPSS INC. • Stone, M. (1974). Cross-validation choice and assessment of statistical procedures. Journal of Royal Statistical Society, n° 36, 111-147. • Taffler, R. (1982). Finding those Firms in Danger. Accountancy Age, 16. • The Turing Institute, The CN2 Induction Algorithm (1988), The Turing Institute, 36 N. Hanover St., Glasgow, GI 2AD, U.K. October. 34
  • 35. • Xu L., Kryzak A. and Suen C.Y.(1992). Methods of Computing Multiple Classifiers and Their Applications to Handwriting Recognition. IEEE Transactions on Systems, Man and Cybernetics, Vol. 22. no. 3., 418-435. • Zavgren, C. (1985). A Probabilistic Model of Financial Distress. Ph.D. The University of Nebraska. Page - 35 -