SlideShare a Scribd company logo
1 of 7
Computing and Information Systems, 7 (2000), p. 91-97                                          © University of Paisley 2000



                         Decision Trees as a Data Mining Tool
                                              Bruno Crémilleux


 The production of decision trees is usually regarded            step of data preparation, but also during the whole
as an automatic method to discover knowledge from                process. In fact, using decision trees can be embedded
data: trees directly stemmed from the data without               in the KDD process within the main steps (selection,
other intervention. However, we cannot expect                    preprocessing, data mining, interpretation /
acceptable results if we naively apply machine                   evaluation). The aim of the paper is to show the role
learning to arbitrary data. By reviewing the whole               of the user and to connect the use of decision trees
process and some other works which implicitly have               within the data mining framework.
to be done to generate a decision tree, this papers              This paper is organized as follows. Section 2 outlines
shows that this method has to be placed in the                   the core of decision trees method (i.e. building and
knowledge discovery in databases processing and, in              pruning). Literature usually presents these points from
fact, the user has to intervene both during the core of          a technical side without describing the part regarding
the method (building and pruning) and other                      the user: we will see that he has a role to play. Section
associated tasks.                                                3 deals with associated tasks which are, in fact,
1. INTRODUCTION                                                  absolutely necessary. These tasks, where clearly the
                                                                 user has to intervene, are often not emphasized when
Data mining and Knowledge Discovery in Databases                 we speak of decision trees. We will see that they have
(KDD) are fields of increasing interest combining                a great relevance and they act upon the final result.
databases, artificial intelligence, machine learning and
statistics. Briefly, the purpose of KDD is to extract            2. BUILDING AND PRUNING
from largeamounts of data, non trivial ”nuggets” of              2.1 Building decision trees: choice of an attribute
information in an easily understandable form. Such               selection criterion
discovered knowledge may be for instance regularities
or exceptions.                                                   In induction of decision trees various attribute
                                                                 selection criteria are used to estimate the quality of
Decision tree is a method which comes from the                   attributes in order to select the best one to split on. But
machine learning community and explores data. Such               we know at a theoretical level that criteria derived
a method is able to give a summary of the data (which            from an impurity measure have suitable properties to
is easier to analyze than the raw data) or can be used           generate decision trees and perform comparably (see
to build a tool (like for example a classifier) to help a        [10], [1] and [6]). We call such criteria C.M. criteria
user formany different decision making tasks. Broadly            (concave-maximum criteria) because an impurity
speaking, a decision tree is built from a set of training        measure, among other characteristics, is defined by a
data having attribute values and a class name. The               concave function. The most commonly used criteria
result of the process is represented as a tree which             which are the Shannon entropy (in the family of ID3
nodes specify attributes and branches specify attribute          algorithms) and the Gini criterion (in CART
values. Leaves of the tree correspond to sets of                 algorithms, see [1] for details), are C.M. criteria.
examples with the same class or to elements in which
no more attributes are available. Construction of                Nevertheless, it exists other paradigms to build
decision trees is described, among others, by Breiman            decision trees. For example, Fayyad and Irani [10]
etal. (1984) [1] who present an important and well-              claim that grouping values of attributes and building
know monograph on classification trees. A number of              binary trees yield better trees. For that, they propose
standard techniques havebeen developed, for example              the ORT measure. ORT favours attributes that simply
like the basic algorithms ID3 [20] and CART [1]. A               separate the different classes without taking into
survey of different methods of decision tree classifiers         account the number of examples of nodes so that ORT
and the various existing issues are presented in                 produces trees with small pure (or nearly pure) leaves
Safavian and Landgrebe [25].                                     at their top more often than C.M. criteria.

Usually, the production of decision trees is regarded as         To better understand the differences between C.M.
an automatic process: trees are straightforwardly                and ORT criteria, let us consider the data set given in
generated from data and the user is relegated to a               the appendix and the trees induced from this data
minor role. Nevertheless, we think that this method              depicted in Figure 1: a tree built with a C.M. criterion
intrinsically requires the user and not only during the          is represented at the top and the tree built with the

                                                            91
Computing and Information Systems, 7 (2000), p. 91-97                                           © University of Paisley 2000


ORT criterion at the bottom. ORT rapidly comes out                has been resumed by Wallace and Patrick [26] who
with the pure leaf Y2 = y21 while C.M. criterion splits           suggest some improvements and show they generally
it and arrives later at the split leaves.                         obtain better empirical results than those found by
                                                                  Quinlan. Buntine [3] presents a tree learning algorithm
                                                                  stemmed from Bayesian statistics whose main
                  (2500,200,2500)                                 objective is to provide outstanding predicted class
              Y1 = y11       Y1 = y12                             probabilities on the nodes.
                                                                  We can also address the question of deciding which
         (2350,150,150)   (150,50,2350)                           sub-nodes have to be built. For a splitting, the GID3*
 Y2 = y21       Y2 = y22 Y2 = y21     Y2 = y22                    algorithm [12] groups in a single branch the values of
                                                                  an attribute which are estimated meaningless
                                                                  compared to its other values. For building of binary
(0,150,0) (2350,0,150)           (0,50,0) (150,0,2350)
                                                                  trees, another criterion is twoing [1]. Twoing groups
                       C.M. tree                                  classes into two superclasses so that considered as a
                                                                  two-class problem, the greatest decrease in node
                                                                  impurity is realized. Some properties of twoing are
             (2500,200,2500)                                      described in Breiman [2]. About binary decision trees,
         Y2 = y21       Y2 = y22                                  let us note that in some situations, users do not always
                                                                  agree to group values since it yields meaningless trees
       (0,200,0)       (2500,0,2500)                              and thus non-binary trees must not be definitively
              Y1 = y11            Y1 = y12                        discarded.
                                                                  So, we have seen that there are many attribute
             (2350,0,150)          (150,0,2350)                   selection criteria and even if some of them can be
                                                                  gathered in families, some choice has to be done.
                      ORT tree                                    According to us, we think that the choice of a
Figure 1: An example of C.M. and ORT trees.                       paradigm depends whether the used data sets embed
                                                                  uncertainty or not, whether the phenomenon under
We give here just a simple example, but some others               study admits deterministic causes, and what level of
both in artificial and real world domains are detailed            intelligibility is required.
in [6]: they show that ORT criterion produces more
often than C.M. criteria trees with small leaves at their         In the next paragraph, we move to the pruning stage.
top. We also see in [6] that overspecified leaves with            2.2 Pruning decision trees: what about the
C.M criteria tend to be small and at the bottom of the            classification and the quality?
tree (thus easy to prune) while leaves at the bottom of
ORT trees can be large. In uncertain domains (we will             We know that in many areas, like in medicine, data
see this point on the next paragraph), such leaves                are uncertain: there are always some examples which
produced by ORT may be irrelevant and it is difficult             escape from the rules. Translated in the context of
to prune them without destroying the tree.                        decision trees, that means these examples seem similar
                                                                  but in fact differ from their classes. In these situations,
Let us note that other selection criteria, such as the            it is well-known (see [1], [4]) that decision trees
ratio criterion, are related to other specific issues. The        algorithms tend to divide nodes having few examples
ratio criterion proposed by Quinlan [20], deriving                and that the resulting trees tend to be very large and
from the entropy criterion, is customized to avoid                overspecified. Some branches, especially towards the
favouring attributes with many values. Actually, in               bottom, are present due to sample variability and
some situations, to select an attribute essentially               arestatistically meaningless (one can also say that they
because it has many values might jeopardize the                   are due to noise in the sample). Such branches must
semantic acceptance of the induced trees ([27] and                either not be built or be pruned. If we do not want to
[18]). The J-measure [15] is the product of two terms             build them, we have to set out rules to stop the
that are considered by Goodman and Smyth as the two               building of the tree. We know it is better to generate
basic criteria for evaluating a rule: one term is derived         the entire tree and then to prune it (see for example [1]
from the entropy function and the other measures the              and [14]). Pruning methods (see [1], [19], [20]) try to
simplicity of a rule. Quinlan and Rivest [21] were                cut such branches in order to avoid this drawback.
interested in the minimum description length principle
to construct a decision tree minimizing a false                   The principal methods for pruning decision trees are
classification rate when one looks for general rules              examined in [9] and [19]. Most of these pruning
and their case’s exceptional conditions. This principle           methods are based on minimizing a classification error

                                                             92
Computing and Information Systems, 7 (2000), p. 91-97                                          © University of Paisley 2000


rate when each element of the same node is classified             the quality of each node is a key-point in uncertain
in the most frequent class in this node. The latter is            domains.
estimated with a test file or using statistical methods
such as cross-validation or bootstrap.
                                                                  So, about the pruning stage, the user is confronted to
These pruning methods are inferred from situations
                                                                  some questions:
where the built tree will be used as a classifier and
they systematically discard a sub-tree which doesn’t              -      am I interested in obtaining a quality value of
improve the used classification error rate. Let us                each node?
consider the sub-tree depicted in Figure 2. D is the              -       is there uncertainty in the data?
class and it is here bivalued. In each node the first
(resp. second) value indicates the number of examples             and he has to know which use of the tree is pursued:
having the first (resp. second) value of D. This sub-             -       a tree can be an efficient description oriented
tree doesn't lessen the error rate, which is 10% both in          by an a priori classification of its elements. Then,
its root or in its leaves; nevertheless the sub-tree is of        pruning the tree discards overspecific information to
interest since it points out a specific population with a         get a more legible description.
constant value of D while in the remaining population
it's impossible to predict a value for D.                         -        a tree can be built to highlight reliable sub-
                                                                  populations. Here only some leaves of the pruned tree
                                                                  will be considered for further investigation.
                           (90,10)                                -      the tree can be transformed into a classifier for
                                                                  any new element in a large population.
                 (79,0)              (11,10)                      The choice of a pruning strategy is tied to the answers
                                                                  to these questions.
Figure 2: A tree which could be interesting although it           3. ASSOCIATED TASKS
doesn’t decrease the number of errors.                            We indicate in this paragraph when and how the users,
                                                                  by means of various associated tasks, intervene in the
                                                                  process of developing decision trees. Schematically, it
In [5], we have proposed a pruning method (called                 is about gathering the data for the design of the
C.M. pruning because a C.M. criterion is used to build            training set, the encoding of the attributes, the specific
the entire tree) suitable in uncertain domains. C.M.              analysis of examples, the resulting tree analysis,…
pruning builds a new attribute binding the root of a
tree with its leaves, the attribute’s values                      Generally, these tasks are not emphasized in the
corresponding to the branches leading to a leaf. It               literature, they are usually considered as secondary,
permits computation of the global quality of a tree.              but we will see that they have a great relevance and
The best sub-tree for pruning is the one that yields the          that they act upon the final result. Of course, these
highest quality pruned tree. This pruning method is               tasks intersect with the building and pruning work that
not tied to the use of the pruned tree as a classifier.           we have previously described.

This work has been resumed in [13]. In uncertain                  In practice, apart from the building and pruning steps,
domains, a deep tree is less relevant than a small one:           there is another step: the data preparation. We add a
the deeper a tree, the less understandable and reliable.          fourth step which aims to study the classification of
So, a new quality index (called DI for Depth-Impurity)            new examples on an - potentially pruned - tree. The
has been defined in [13]. The latter manages a trade-             user strongly intervenes during the first step, but also
off between depth and impurity of each node of a tree.            has a supervising role during all steps and more
From this index, a new pruning method (denoted DI                 particularly a critics role after the second and third
pruning) has been inferred. With regard to C.M.                   steps (see Figure 3). We do not detail here the fourth
pruning, DI pruning introduces a damping function to              step which is marginal from the point of view of the
take into account the depth of the leaves. Moreover,              user's role.
by giving the quality of each nodes (and not only of a            3.1 Data preparation
sub-tree), DI pruning is able to distinguish some sub-
populations of interest in large populations, or, on the          The aim of thisstep is to supply, from the database
contrary, highlight set of examples with high                     gathering examples in their raw form, a training set as
uncertainty (in the context of the studied problem). In           adapted as possible to the decision trees development.
this case, the user has to come back to the data to try           This step is the one where the user intervenes most
and improve their collection and preparation. Getting             directly. His tasks are numerous: deleting examples

                                                             93
Computing and Information Systems, 7 (2000), p. 91-97                                            © University of Paisley 2000


                                                decision trees software



          data                   building                 pruning             classification
       manipulation
                                                                                 60-65

                                                                                         40-62
            data set                                                          20-3
                                                                               10-100-50 30-2
                                     entire                pruned              results of the
                                      tree                  tree               classification
                               checks and            checks and
                                intervenes            intervenes

                          prepares                                          checks and intervenes
                                              user                                                  Figure 3: Process to
                                generate decision trees and relations with the user.




considered as aberrant (outliers) and/or containing too             new re-encodings and/or fusions of attributes, often
many missing values, deleting attributes evaluated as               causing a more general description level.
irrelevant to the given task, re-encoding the attributes
                                                                    The current decision trees construction algorithms
values (one knows that if the attributes have very
                                                                    deal most often with missing values by means of
different numbers of values, those having more values
                                                                    specific and internal treatments [7]. On the contrary,
tend to be chosen first ([27] and [18]), we have
                                                                    by a preliminary analysis of the database, relying on
already referred to this point with the gain ratio
                                                                    the search of associations between data and leading to
criterion), re-encoding several attributes (for example,
                                                                    uncertain rules that determine missing values, Ragel
the fusion of attributes), segmenting continuous
                                                                    ([7], [24]) offers a strategy where the user can
attributes, analyzing missing data, ...
                                                                    intervene: such a method leaves a place for the user
Let us get back to some of these tasks. At first [16],              and his knowledge in order to delete, add or modify
the decision trees algorithms did not accept                        some rules.
quantitative attributes, these had to be discretized.
                                                                    As we can see, this step depends in fact a lot on the
This initial segmentation can be done by asking
                                                                    user's work.
experts to set thresholds or by using a strategy relying
on an impurity function [11]. The segmentation can                  3.2 Building step
also be done while building the trees as is the case                The aim of this step is to induce a tree from a training
with the software C4.5 [22]. A continuous attribute                 set arising from the previous step. Some system
can then be segmented several times in a same tree. It              parameters are to be specified. For example, it is
seems relevant to us that the user may actively                     useless to keep on building a tree from a node having
intervene in this process by indicating, for example, an            too few examples, this amount being relative to the
a priori discretization of the attributes for which it is           initial number of examples in the base. An important
meaningful and by letting the system manage the                     parameter to set is thus the minimum amount of
others. One shall remark that, if one knows in a                    examples necessary for the node segmentation. Facing
reasonable way how to split a continuous attribute to               a particularly huge tree, the user will ask for the
binary, the question is more delicate for a three-valued            construction of a new tree by setting this parameter to
(or more) discretization.                                           a higher value, which is pruning the tree by means of a
The user also has generally to decide the deletion, re-             pragmatic process. We have seen (paragraph 2.1) that
encoding or fusion of attributes. He has a priori ideas             in uncertain induction, the user will most probably
allowing a first pass in this task. But we shall see that           choose a C.M. criterion in order to be able to prune.
the tree construction, by making explicit the                       But if he knows that the studied phenomenon allows
underlying studied phenomenon, suggests to the user                 deterministic causes in situations with few examples,


                                                              94
Computing and Information Systems, 7 (2000), p. 91-97                                            © University of Paisley 2000


he can choose the ORT criterion to get a more concise              attributes from those that it can be necessary to
description of these situations.                                   redefine.
The presentation of the attributes and their respective            Finally, building and pruning steps can be viewed as
criterion scores at each node may allow the user to                part of the study of the attributes. Experts of the
select attributes that might not have the best score but           domain usually appreciate to be able to restructure the
that provide a promising way to lead to a relevant leaf.           set of the initial attributes and to see at once the effect
                                                                   of such a modification on the tree (in general, after a
The critics of the tree thus obtained is the most
                                                                   preliminary decision tree, they define new attributes
important participation of the user in this step. He
                                                                   which summarize some of the initial ones). We have
checks if the tree is understandable regarding his
                                                                   noticed [5] that when such attributes are used, the
domain knowledge, if its general structure conforms to
                                                                   shape of the graphic representation of the quality
his expectations. Facing a surprising result, he
                                                                   index as a function of the number of pruned sub-trees
wonders if this is due to a bias in the training step or if
                                                                   changes and tends to show three parts: in the first one,
it reflects a phenomenon, sometimes suspected, but
                                                                   the variation of the quality index is small, in the
not yet explicitly uttered. Most often, seeing the tree
                                                                   second part this quality decreases regularly and in the
gives the user new ideas about the attributes and he
                                                                   third part the quality becomes rapidly very low. It
will choose to build again the tree after working again
                                                                   shows that the information embedded in the data set is
on the training set and/or changing a parameter in the
                                                                   mainly in the top of the tree while the bottom can be
induction system to confirm or infirm a conjecture.
                                                                   pruned.
3.3 Pruning step
                                                                   3.4 Conclusion
Apart from the questions at the end of paragraph 2.2
                                                                   Through this paragraph, we have seen that the user
about the data types and the aim searched for in
                                                                   interventions are numerous, that the associated tasks
producing a tree, more questions arise to the user if he
                                                                   realization are closely linked to him. These tasks are
uses a technique such as DI pruning.
                                                                   fundamental since they directly affect the results: the
In fact, in this situation, the user has more information          study of the results brings new experiments. The user
to react upon. First, he knows the quality index of the            starts again many times the work done during a step
entire tree, which allows him to evaluate the global               by changing the parameters or comes back to previous
complexity of the problem. If this index is low, this              steps (the arrows in Figure 3 shows all the relations
means that the problem is delicate or inadequately                 between the different steps). At each step, the user
described, that the training set is not representative, or         may accept, override, or modify the generated rules,
even that the decision trees method is not adapted to              but more often he suggests alternative features and
this specific problem. If the user has several trees, the          experiments. Finally, the rule set is redefined through
quality index allows to compare them and eventually                subsequent data collection, rule induction, and expert
to suggest new experiments.                                        consideration.
Moreover, the quality index on each node enhances                  We think it is necessary for the user to take part in the
the populations where the class is easy to determine               system so that a real development cycle takes place.
with regards to sets of examples where it is impossible            The latter seems fundamental to us in order to obtain
to predict it. Such areas can suggest new experiments              useful and satisfying trees. The user does not usually
on smaller populations or even can question on the                 know beforehand which tree is relevant to his problem
existence of additional attributes (which will have to             and this is because he finds it gratifying to take part in
be collected) to help determine the class for examples             this search that he takes interest in the induction work.
where it is not yet possible.
                                                                   Let us note that most authors try and define software
From experiments [13], we noticed that the degree of               architecture explicitly integrating the user. In the area
pruning is quite bound to the uncertainty embedded in              of induction graph (which is a generalization of
the data. In practice, that means that the damping                 decision trees), the SIPINA software offers to the user
process has to be adjusted according to the data in                to fix the choice of an attribute, to gather temporarily
order to obtain, in all situations, a relevant number of           some values of an attribute, to stop the construction
pruned trees. For that, we introduce a parameter to                from some nodes, and so on. Dabija & al. [8] offer an
control the damping process. By varying this                       learning system architecture (called KAISER, for
parameter, one follows the quality index evolution                 Knowledge Acquisition Inductive System driven by
during the pruning (for example the user distinguishes             Explanatory Reasoning) for an interactive knowledge
the parts of the tree that are due to random from those            acquisition system based on decision trees and driven
reliable). Such a work enhances the most relevant                  by explanatory reasoning. Moreover, the experts can
                                                                   incrementally add knowledge corresponding to the
                                                              95
Computing and Information Systems, 7 (2000), p. 91-97                                              © University of Paisley 2000


domain theory. KAISER confronts built trees with the               [7]    Crémilleux B., Ragel A., & Bosson J. L. An
domain theory, so that some incoherences may be                           Interactive and Understandable Method to Treat
detected (for instance, the value of the attribute "eye"                  Missing Values: Application to a Medical Data Set. In
for a cat has to be "oval"). Keravhut & Potvin [17]                       proceedings of the 5th International Conference on
                                                                          Information Systems Analysis and Synthesis (ISAS /
have designed an assistant to collaborate with the user.
                                                                          SCI 99), pp. 137-144, M. Torres, B. Sanchez & E.
This assistant, which is in the form of a graphic                         Wills (Eds.), Orlando, FL, 1999.
interface, helps the user test the methods and their
                                                                   [8]    Dabija V. G., Tsujino K., & Nishida S. Theory
parameters in order to get the most relevant                              formation in the decision trees domain. Journal of
combination for the problem at hands.                                     Japanese Society for Artificial Intelligence, 7 (3),
4. CONCLUSION                                                             136-147, 1992.
                                                                   [9]    Esposito F., Malerba D., & Semeraro G. Decision tree
Producing decision trees is often presented as                            pruning as search in the state space. In proceedings of
"automatic" with a marginal participation from the                        European Conference on Machine Learning ECML
user: we have stressed on the fact that the user has a                    93, pp. 165-184, P. B. Brazdil (Ed.), Lecture notes in
fundamental critics and supervisor role and that he                       artificial intelligence, N° 667, Springer-Verlag,
intervenes in a major way. This leads to a real                           Vienna, Austria, 1993.
development cycle between the user and the system.                 [10]   Fayyad U. M., & Irani K. B. The attribute selection
This cycle is only possible because the construction of                   problem in decision tree generation. In proceedings of
a tree is nearly instantaneous.                                           Tenth National Conference on Artificial Intelligence,
                                                                          pp. 104-110, Cambridge, MA: AAAI Press/MIT Press,
The participation of the user for the data preparation,                   1992.
the choice of the parameters, the critics of the results           [11]    Fayyad U. M., & Irani K. B. Multi-interval
is in fact at the heart of the more general process of                    discretization of continuous-valued attributes for
Knowledge Discovery in Databases. As usual in KDD,                        classification learning. In proceedings of the
we claim that the understanding and the declarativity                     Thirteenth International Joint Conference on Artificial
of the mechanism of the methods is a key point to                         Intelligence IJCAI 93, pp. 1022-1027, Chambéry,
achieve in practice a fruitful process of information                     France, 1993.
extraction. Finally, we think that, in order to really             [12]   Fayyad U. M. Branching on attribute values in
reach a data exploration reasoning, associating the                       decision tree generation. In proceedings of Twelfth
user in a profitable way, it is important to give him a                   National Conference on Artificial Intelligence, pp.
                                                                          601-606, AAAI Press/MIT Press, 1994.
framework gathering all the tasks intervening in the
process, so that he may freely explore the data, react,            [13]   Fournier D., & Crémilleux B. Using impurity and
                                                                          depth for decision trees pruning. In proceedings of the
innovate with new experiments.
                                                                          2th International ICSC Symposium on Engineering of
References                                                                Intelligent Systems (EIS 2000), Paisley, UK, 2000.
[1]   Breiman L., Friedman J. H., Olshen R. A., & Stone C.         [14]   Gelfand S. B., Ravishankar C. S., & Delp E. J. An
      J. Classification and regression trees. Wadsworth.                  iterative growing and pruning algorithm for
      Statistics probability series. Belmont, 1984.                       classification tree design. IEEE Transactions on
[3]   Breiman L. Some properties of splitting criteria                    Pattern Analysis and Machine Intelligence 13(2),
      (technical note). Machine Learning 21, 41-47, 1996.                 163-174, 1991.
[3]   Buntine W. Learning classification trees. Statistics         [15]   Goodman R. M. F., & Smyth, P. Information-theoretic
      and Computing 2, 63-73, 1992.                                       rule induction. In proceedings of the Eighth European
                                                                          Conference on Artificial Intelligence ECAI 88, pp.
[4]   Catlett J. Overpruning large decision trees. In
                                                                          357-362, München, Germany, 1988.
      proceedings of the Twelfth International Joint
      Conference on Artificial Intelligence IJCAI 91, pp.          [16]   Hunt E. B., Marin J., & Stone P. J. Experiments in
      764-769, Sydney, Australia, 1991.                                   induction. New York Academic Press, 1966.
[5]   Crémilleux B., & Robert C. A Pruning Method for              [17]   Kervahut T., & Potvin J. Y. An interactive-graphic
      Decision Trees in Uncertain Domains: Applications in                environment for automatic generation of decision
      Medicine. In proceedings of the workshop Intelligent                trees. Decision Support Systems 18, 117-134, 1996.
      Data Analysis in Medicine and Pharmacology, ECAI             [18]   Kononenko I. On biases in estimating multi-valued
      96, pp. 15-20, Budapest, Hungary, 1996.                             attributes. In proceedings of the Fourteenth
[6]   Crémilleux B., Robert C., & Gaio M. Uncertain                       International Joint Conference on Artificial
      domains and decision trees: ORT versus C.M. criteria.               Intelligence IJCAI 95, pp. 1034-1040, Montréal,
      In proceedings of the 7th Conference on Information                 Canada, 1995.
      Processing and Management of Uncertainty in                  [19]   Mingers J. An empirical comparison of pruning
      Knowledge-based Systems, pp. 540-546, Paris, France,                methods for decision-tree induction. Machine
      1998.                                                               Learning 4, 227-243, 1989.


                                                              96
Computing and Information Systems, 7 (2000), p. 91-97                                        © University of Paisley 2000


[20] Quinlan J. R. Induction of decision trees. Machine
     Learning 1, 81-106, 1986.                                     APPENDIX
[21] Quinlan J. R., & Rivest R. L. Inferring decision trees        Data file used to build trees for Figure 1 (D denotes
     using the minimum description length principle.               the class and Y1 and Y2 are the attributes).
     Information and Computation 80(3), 227-248, 1989.
[22] Quinlan J. R. C4.5 Programs for Machine Learning.
     San Mateo, CA. Morgan Kaufmann, 1993.                                        D            Y1          Y2
[23] Quinlan J. R. Improved use of continuous attributes in
                                                                       1          d1          y11          y22
     C4.5. Journal of Artificial Intelligence Research 4,
     77-90, 1996.
[24] Ragel A., & Crémilleux B. Treatment of Missing                  2350
     Values for Association Rules, Second Pacific Asia
                                                                                  d1          y11          y22
     Conference on KDD, PAKDD 98, pp. 258-270, X. Wu,                2351         d1          y12          y22
     R. Kotagiri & K. B. Korb (Eds.), Lecture notes in
     artificial intelligence, N° 1394, Springer-Verlag,
     Melbourne, Australia, 1998.                                     2500         d1          y12          y22
[25] Safavian S. R., & Landgrebe D. A survey of decision
     tree classifier methodology. IEEE Transactions on
                                                                     2501         d2          y11          y21
     Systems, Man, and Cybernetics 21(3), 660-674, 1991.
[26] Wallace C. S., & Patrick J. D. Coding decision trees.
                                                                     2650         d2          y11          y21
     Machine Learning11, 7-22, 1993.
[27] White A. P., & Liu W. Z Bias in Information-Based               2651         d2          y12          y21
     Measures in Decision Tree Induction. Machine
     Learning 15, 321-329, 1994.
                                                                     2700         d2          y12          y21
                                                                     2701         d3          y11          y22


                                                                     2850         d3          y11          y22
                                                                     2851         d3          y12          y22


                                                                     5200         d3          y12          y22




B. Crémilleux is Maître des Conférences at the
Université do Caen, France.




                                                              97

More Related Content

Viewers also liked

Ambiete mercadotecnia-autoguardado
Ambiete mercadotecnia-autoguardadoAmbiete mercadotecnia-autoguardado
Ambiete mercadotecnia-autoguardadoMar Sánchez
 
Morten Bengtson Opgave
Morten Bengtson OpgaveMorten Bengtson Opgave
Morten Bengtson Opgaveguest609c2b
 
Il processo creativo
Il processo creativoIl processo creativo
Il processo creativoHibo
 
Smart Open Forum _ KT 기업고객부문 상생 생태계 조성방안
Smart Open Forum _ KT 기업고객부문 상생 생태계 조성방안Smart Open Forum _ KT 기업고객부문 상생 생태계 조성방안
Smart Open Forum _ KT 기업고객부문 상생 생태계 조성방안ollehktsocial
 
修心 青山無所爭.福田用心耕(Nx)
修心  青山無所爭.福田用心耕(Nx)修心  青山無所爭.福田用心耕(Nx)
修心 青山無所爭.福田用心耕(Nx)花東宏宣
 
父母一生只有一個
父母一生只有一個父母一生只有一個
父母一生只有一個花東宏宣
 
Serge P Nekoval Grails
Serge P  Nekoval  GrailsSerge P  Nekoval  Grails
Serge P Nekoval Grailsguest092df8
 
Medical Assisting students will maintain a neat, groomed, and ...
Medical Assisting students will maintain a neat, groomed, and ...Medical Assisting students will maintain a neat, groomed, and ...
Medical Assisting students will maintain a neat, groomed, and ...butest
 
MoI present like...a surfer
MoI present like...a surferMoI present like...a surfer
MoI present like...a surferMartin Barnes
 

Viewers also liked (13)

Ambiete mercadotecnia-autoguardado
Ambiete mercadotecnia-autoguardadoAmbiete mercadotecnia-autoguardado
Ambiete mercadotecnia-autoguardado
 
Morten Bengtson Opgave
Morten Bengtson OpgaveMorten Bengtson Opgave
Morten Bengtson Opgave
 
Il processo creativo
Il processo creativoIl processo creativo
Il processo creativo
 
Smart Open Forum _ KT 기업고객부문 상생 생태계 조성방안
Smart Open Forum _ KT 기업고객부문 상생 생태계 조성방안Smart Open Forum _ KT 기업고객부문 상생 생태계 조성방안
Smart Open Forum _ KT 기업고객부문 상생 생태계 조성방안
 
修心 青山無所爭.福田用心耕(Nx)
修心  青山無所爭.福田用心耕(Nx)修心  青山無所爭.福田用心耕(Nx)
修心 青山無所爭.福田用心耕(Nx)
 
父母一生只有一個
父母一生只有一個父母一生只有一個
父母一生只有一個
 
內湖花市
內湖花市內湖花市
內湖花市
 
Serge P Nekoval Grails
Serge P  Nekoval  GrailsSerge P  Nekoval  Grails
Serge P Nekoval Grails
 
Medical Assisting students will maintain a neat, groomed, and ...
Medical Assisting students will maintain a neat, groomed, and ...Medical Assisting students will maintain a neat, groomed, and ...
Medical Assisting students will maintain a neat, groomed, and ...
 
MoI present like...a surfer
MoI present like...a surferMoI present like...a surfer
MoI present like...a surfer
 
gozARTE_2010
gozARTE_2010gozARTE_2010
gozARTE_2010
 
Anita
AnitaAnita
Anita
 
Cuoc Doi K
Cuoc Doi KCuoc Doi K
Cuoc Doi K
 

Similar to Decision Trees as a Powerful Data Mining Tool

IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292HARDIK SINGH
 
Complicatedness_Tang
Complicatedness_TangComplicatedness_Tang
Complicatedness_Tangvictor tang
 
Coclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain DocumentsCoclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain Documentslau
 
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...IRJET Journal
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
A Comparative Study of Image Compression Algorithms
A Comparative Study of Image Compression AlgorithmsA Comparative Study of Image Compression Algorithms
A Comparative Study of Image Compression AlgorithmsIJORCS
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesNTNU
 
Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain ...
Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain ...Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain ...
Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain ...IJMER
 
Au2640944101
Au2640944101Au2640944101
Au2640944101IJMER
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.pptLaxmi139487
 
Dbms narrative question answers
Dbms narrative question answersDbms narrative question answers
Dbms narrative question answersshakhawat02
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONVLSICS Design
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterIOSR Journals
 

Similar to Decision Trees as a Powerful Data Mining Tool (20)

IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292
 
Hx3115011506
Hx3115011506Hx3115011506
Hx3115011506
 
Complicatedness_Tang
Complicatedness_TangComplicatedness_Tang
Complicatedness_Tang
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
New view of fuzzy aggregations. part I: general information structure for dec...
New view of fuzzy aggregations. part I: general information structure for dec...New view of fuzzy aggregations. part I: general information structure for dec...
New view of fuzzy aggregations. part I: general information structure for dec...
 
Coclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain DocumentsCoclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain Documents
 
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
 
4.Database Management System.pdf
4.Database Management System.pdf4.Database Management System.pdf
4.Database Management System.pdf
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
A Comparative Study of Image Compression Algorithms
A Comparative Study of Image Compression AlgorithmsA Comparative Study of Image Compression Algorithms
A Comparative Study of Image Compression Algorithms
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees
 
Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain ...
Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain ...Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain ...
Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain ...
 
Au2640944101
Au2640944101Au2640944101
Au2640944101
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
 
Dbms narrative question answers
Dbms narrative question answersDbms narrative question answers
Dbms narrative question answers
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
 
1861 1865
1861 18651861 1865
1861 1865
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Decision Trees as a Powerful Data Mining Tool

  • 1. Computing and Information Systems, 7 (2000), p. 91-97 © University of Paisley 2000 Decision Trees as a Data Mining Tool Bruno Crémilleux The production of decision trees is usually regarded step of data preparation, but also during the whole as an automatic method to discover knowledge from process. In fact, using decision trees can be embedded data: trees directly stemmed from the data without in the KDD process within the main steps (selection, other intervention. However, we cannot expect preprocessing, data mining, interpretation / acceptable results if we naively apply machine evaluation). The aim of the paper is to show the role learning to arbitrary data. By reviewing the whole of the user and to connect the use of decision trees process and some other works which implicitly have within the data mining framework. to be done to generate a decision tree, this papers This paper is organized as follows. Section 2 outlines shows that this method has to be placed in the the core of decision trees method (i.e. building and knowledge discovery in databases processing and, in pruning). Literature usually presents these points from fact, the user has to intervene both during the core of a technical side without describing the part regarding the method (building and pruning) and other the user: we will see that he has a role to play. Section associated tasks. 3 deals with associated tasks which are, in fact, 1. INTRODUCTION absolutely necessary. These tasks, where clearly the user has to intervene, are often not emphasized when Data mining and Knowledge Discovery in Databases we speak of decision trees. We will see that they have (KDD) are fields of increasing interest combining a great relevance and they act upon the final result. databases, artificial intelligence, machine learning and statistics. Briefly, the purpose of KDD is to extract 2. BUILDING AND PRUNING from largeamounts of data, non trivial ”nuggets” of 2.1 Building decision trees: choice of an attribute information in an easily understandable form. Such selection criterion discovered knowledge may be for instance regularities or exceptions. In induction of decision trees various attribute selection criteria are used to estimate the quality of Decision tree is a method which comes from the attributes in order to select the best one to split on. But machine learning community and explores data. Such we know at a theoretical level that criteria derived a method is able to give a summary of the data (which from an impurity measure have suitable properties to is easier to analyze than the raw data) or can be used generate decision trees and perform comparably (see to build a tool (like for example a classifier) to help a [10], [1] and [6]). We call such criteria C.M. criteria user formany different decision making tasks. Broadly (concave-maximum criteria) because an impurity speaking, a decision tree is built from a set of training measure, among other characteristics, is defined by a data having attribute values and a class name. The concave function. The most commonly used criteria result of the process is represented as a tree which which are the Shannon entropy (in the family of ID3 nodes specify attributes and branches specify attribute algorithms) and the Gini criterion (in CART values. Leaves of the tree correspond to sets of algorithms, see [1] for details), are C.M. criteria. examples with the same class or to elements in which no more attributes are available. Construction of Nevertheless, it exists other paradigms to build decision trees is described, among others, by Breiman decision trees. For example, Fayyad and Irani [10] etal. (1984) [1] who present an important and well- claim that grouping values of attributes and building know monograph on classification trees. A number of binary trees yield better trees. For that, they propose standard techniques havebeen developed, for example the ORT measure. ORT favours attributes that simply like the basic algorithms ID3 [20] and CART [1]. A separate the different classes without taking into survey of different methods of decision tree classifiers account the number of examples of nodes so that ORT and the various existing issues are presented in produces trees with small pure (or nearly pure) leaves Safavian and Landgrebe [25]. at their top more often than C.M. criteria. Usually, the production of decision trees is regarded as To better understand the differences between C.M. an automatic process: trees are straightforwardly and ORT criteria, let us consider the data set given in generated from data and the user is relegated to a the appendix and the trees induced from this data minor role. Nevertheless, we think that this method depicted in Figure 1: a tree built with a C.M. criterion intrinsically requires the user and not only during the is represented at the top and the tree built with the 91
  • 2. Computing and Information Systems, 7 (2000), p. 91-97 © University of Paisley 2000 ORT criterion at the bottom. ORT rapidly comes out has been resumed by Wallace and Patrick [26] who with the pure leaf Y2 = y21 while C.M. criterion splits suggest some improvements and show they generally it and arrives later at the split leaves. obtain better empirical results than those found by Quinlan. Buntine [3] presents a tree learning algorithm stemmed from Bayesian statistics whose main (2500,200,2500) objective is to provide outstanding predicted class Y1 = y11 Y1 = y12 probabilities on the nodes. We can also address the question of deciding which (2350,150,150) (150,50,2350) sub-nodes have to be built. For a splitting, the GID3* Y2 = y21 Y2 = y22 Y2 = y21 Y2 = y22 algorithm [12] groups in a single branch the values of an attribute which are estimated meaningless compared to its other values. For building of binary (0,150,0) (2350,0,150) (0,50,0) (150,0,2350) trees, another criterion is twoing [1]. Twoing groups C.M. tree classes into two superclasses so that considered as a two-class problem, the greatest decrease in node impurity is realized. Some properties of twoing are (2500,200,2500) described in Breiman [2]. About binary decision trees, Y2 = y21 Y2 = y22 let us note that in some situations, users do not always agree to group values since it yields meaningless trees (0,200,0) (2500,0,2500) and thus non-binary trees must not be definitively Y1 = y11 Y1 = y12 discarded. So, we have seen that there are many attribute (2350,0,150) (150,0,2350) selection criteria and even if some of them can be gathered in families, some choice has to be done. ORT tree According to us, we think that the choice of a Figure 1: An example of C.M. and ORT trees. paradigm depends whether the used data sets embed uncertainty or not, whether the phenomenon under We give here just a simple example, but some others study admits deterministic causes, and what level of both in artificial and real world domains are detailed intelligibility is required. in [6]: they show that ORT criterion produces more often than C.M. criteria trees with small leaves at their In the next paragraph, we move to the pruning stage. top. We also see in [6] that overspecified leaves with 2.2 Pruning decision trees: what about the C.M criteria tend to be small and at the bottom of the classification and the quality? tree (thus easy to prune) while leaves at the bottom of ORT trees can be large. In uncertain domains (we will We know that in many areas, like in medicine, data see this point on the next paragraph), such leaves are uncertain: there are always some examples which produced by ORT may be irrelevant and it is difficult escape from the rules. Translated in the context of to prune them without destroying the tree. decision trees, that means these examples seem similar but in fact differ from their classes. In these situations, Let us note that other selection criteria, such as the it is well-known (see [1], [4]) that decision trees ratio criterion, are related to other specific issues. The algorithms tend to divide nodes having few examples ratio criterion proposed by Quinlan [20], deriving and that the resulting trees tend to be very large and from the entropy criterion, is customized to avoid overspecified. Some branches, especially towards the favouring attributes with many values. Actually, in bottom, are present due to sample variability and some situations, to select an attribute essentially arestatistically meaningless (one can also say that they because it has many values might jeopardize the are due to noise in the sample). Such branches must semantic acceptance of the induced trees ([27] and either not be built or be pruned. If we do not want to [18]). The J-measure [15] is the product of two terms build them, we have to set out rules to stop the that are considered by Goodman and Smyth as the two building of the tree. We know it is better to generate basic criteria for evaluating a rule: one term is derived the entire tree and then to prune it (see for example [1] from the entropy function and the other measures the and [14]). Pruning methods (see [1], [19], [20]) try to simplicity of a rule. Quinlan and Rivest [21] were cut such branches in order to avoid this drawback. interested in the minimum description length principle to construct a decision tree minimizing a false The principal methods for pruning decision trees are classification rate when one looks for general rules examined in [9] and [19]. Most of these pruning and their case’s exceptional conditions. This principle methods are based on minimizing a classification error 92
  • 3. Computing and Information Systems, 7 (2000), p. 91-97 © University of Paisley 2000 rate when each element of the same node is classified the quality of each node is a key-point in uncertain in the most frequent class in this node. The latter is domains. estimated with a test file or using statistical methods such as cross-validation or bootstrap. So, about the pruning stage, the user is confronted to These pruning methods are inferred from situations some questions: where the built tree will be used as a classifier and they systematically discard a sub-tree which doesn’t - am I interested in obtaining a quality value of improve the used classification error rate. Let us each node? consider the sub-tree depicted in Figure 2. D is the - is there uncertainty in the data? class and it is here bivalued. In each node the first (resp. second) value indicates the number of examples and he has to know which use of the tree is pursued: having the first (resp. second) value of D. This sub- - a tree can be an efficient description oriented tree doesn't lessen the error rate, which is 10% both in by an a priori classification of its elements. Then, its root or in its leaves; nevertheless the sub-tree is of pruning the tree discards overspecific information to interest since it points out a specific population with a get a more legible description. constant value of D while in the remaining population it's impossible to predict a value for D. - a tree can be built to highlight reliable sub- populations. Here only some leaves of the pruned tree will be considered for further investigation. (90,10) - the tree can be transformed into a classifier for any new element in a large population. (79,0) (11,10) The choice of a pruning strategy is tied to the answers to these questions. Figure 2: A tree which could be interesting although it 3. ASSOCIATED TASKS doesn’t decrease the number of errors. We indicate in this paragraph when and how the users, by means of various associated tasks, intervene in the process of developing decision trees. Schematically, it In [5], we have proposed a pruning method (called is about gathering the data for the design of the C.M. pruning because a C.M. criterion is used to build training set, the encoding of the attributes, the specific the entire tree) suitable in uncertain domains. C.M. analysis of examples, the resulting tree analysis,… pruning builds a new attribute binding the root of a tree with its leaves, the attribute’s values Generally, these tasks are not emphasized in the corresponding to the branches leading to a leaf. It literature, they are usually considered as secondary, permits computation of the global quality of a tree. but we will see that they have a great relevance and The best sub-tree for pruning is the one that yields the that they act upon the final result. Of course, these highest quality pruned tree. This pruning method is tasks intersect with the building and pruning work that not tied to the use of the pruned tree as a classifier. we have previously described. This work has been resumed in [13]. In uncertain In practice, apart from the building and pruning steps, domains, a deep tree is less relevant than a small one: there is another step: the data preparation. We add a the deeper a tree, the less understandable and reliable. fourth step which aims to study the classification of So, a new quality index (called DI for Depth-Impurity) new examples on an - potentially pruned - tree. The has been defined in [13]. The latter manages a trade- user strongly intervenes during the first step, but also off between depth and impurity of each node of a tree. has a supervising role during all steps and more From this index, a new pruning method (denoted DI particularly a critics role after the second and third pruning) has been inferred. With regard to C.M. steps (see Figure 3). We do not detail here the fourth pruning, DI pruning introduces a damping function to step which is marginal from the point of view of the take into account the depth of the leaves. Moreover, user's role. by giving the quality of each nodes (and not only of a 3.1 Data preparation sub-tree), DI pruning is able to distinguish some sub- populations of interest in large populations, or, on the The aim of thisstep is to supply, from the database contrary, highlight set of examples with high gathering examples in their raw form, a training set as uncertainty (in the context of the studied problem). In adapted as possible to the decision trees development. this case, the user has to come back to the data to try This step is the one where the user intervenes most and improve their collection and preparation. Getting directly. His tasks are numerous: deleting examples 93
  • 4. Computing and Information Systems, 7 (2000), p. 91-97 © University of Paisley 2000 decision trees software data building pruning classification manipulation 60-65 40-62 data set 20-3 10-100-50 30-2 entire pruned results of the tree tree classification checks and checks and intervenes intervenes prepares checks and intervenes user Figure 3: Process to generate decision trees and relations with the user. considered as aberrant (outliers) and/or containing too new re-encodings and/or fusions of attributes, often many missing values, deleting attributes evaluated as causing a more general description level. irrelevant to the given task, re-encoding the attributes The current decision trees construction algorithms values (one knows that if the attributes have very deal most often with missing values by means of different numbers of values, those having more values specific and internal treatments [7]. On the contrary, tend to be chosen first ([27] and [18]), we have by a preliminary analysis of the database, relying on already referred to this point with the gain ratio the search of associations between data and leading to criterion), re-encoding several attributes (for example, uncertain rules that determine missing values, Ragel the fusion of attributes), segmenting continuous ([7], [24]) offers a strategy where the user can attributes, analyzing missing data, ... intervene: such a method leaves a place for the user Let us get back to some of these tasks. At first [16], and his knowledge in order to delete, add or modify the decision trees algorithms did not accept some rules. quantitative attributes, these had to be discretized. As we can see, this step depends in fact a lot on the This initial segmentation can be done by asking user's work. experts to set thresholds or by using a strategy relying on an impurity function [11]. The segmentation can 3.2 Building step also be done while building the trees as is the case The aim of this step is to induce a tree from a training with the software C4.5 [22]. A continuous attribute set arising from the previous step. Some system can then be segmented several times in a same tree. It parameters are to be specified. For example, it is seems relevant to us that the user may actively useless to keep on building a tree from a node having intervene in this process by indicating, for example, an too few examples, this amount being relative to the a priori discretization of the attributes for which it is initial number of examples in the base. An important meaningful and by letting the system manage the parameter to set is thus the minimum amount of others. One shall remark that, if one knows in a examples necessary for the node segmentation. Facing reasonable way how to split a continuous attribute to a particularly huge tree, the user will ask for the binary, the question is more delicate for a three-valued construction of a new tree by setting this parameter to (or more) discretization. a higher value, which is pruning the tree by means of a The user also has generally to decide the deletion, re- pragmatic process. We have seen (paragraph 2.1) that encoding or fusion of attributes. He has a priori ideas in uncertain induction, the user will most probably allowing a first pass in this task. But we shall see that choose a C.M. criterion in order to be able to prune. the tree construction, by making explicit the But if he knows that the studied phenomenon allows underlying studied phenomenon, suggests to the user deterministic causes in situations with few examples, 94
  • 5. Computing and Information Systems, 7 (2000), p. 91-97 © University of Paisley 2000 he can choose the ORT criterion to get a more concise attributes from those that it can be necessary to description of these situations. redefine. The presentation of the attributes and their respective Finally, building and pruning steps can be viewed as criterion scores at each node may allow the user to part of the study of the attributes. Experts of the select attributes that might not have the best score but domain usually appreciate to be able to restructure the that provide a promising way to lead to a relevant leaf. set of the initial attributes and to see at once the effect of such a modification on the tree (in general, after a The critics of the tree thus obtained is the most preliminary decision tree, they define new attributes important participation of the user in this step. He which summarize some of the initial ones). We have checks if the tree is understandable regarding his noticed [5] that when such attributes are used, the domain knowledge, if its general structure conforms to shape of the graphic representation of the quality his expectations. Facing a surprising result, he index as a function of the number of pruned sub-trees wonders if this is due to a bias in the training step or if changes and tends to show three parts: in the first one, it reflects a phenomenon, sometimes suspected, but the variation of the quality index is small, in the not yet explicitly uttered. Most often, seeing the tree second part this quality decreases regularly and in the gives the user new ideas about the attributes and he third part the quality becomes rapidly very low. It will choose to build again the tree after working again shows that the information embedded in the data set is on the training set and/or changing a parameter in the mainly in the top of the tree while the bottom can be induction system to confirm or infirm a conjecture. pruned. 3.3 Pruning step 3.4 Conclusion Apart from the questions at the end of paragraph 2.2 Through this paragraph, we have seen that the user about the data types and the aim searched for in interventions are numerous, that the associated tasks producing a tree, more questions arise to the user if he realization are closely linked to him. These tasks are uses a technique such as DI pruning. fundamental since they directly affect the results: the In fact, in this situation, the user has more information study of the results brings new experiments. The user to react upon. First, he knows the quality index of the starts again many times the work done during a step entire tree, which allows him to evaluate the global by changing the parameters or comes back to previous complexity of the problem. If this index is low, this steps (the arrows in Figure 3 shows all the relations means that the problem is delicate or inadequately between the different steps). At each step, the user described, that the training set is not representative, or may accept, override, or modify the generated rules, even that the decision trees method is not adapted to but more often he suggests alternative features and this specific problem. If the user has several trees, the experiments. Finally, the rule set is redefined through quality index allows to compare them and eventually subsequent data collection, rule induction, and expert to suggest new experiments. consideration. Moreover, the quality index on each node enhances We think it is necessary for the user to take part in the the populations where the class is easy to determine system so that a real development cycle takes place. with regards to sets of examples where it is impossible The latter seems fundamental to us in order to obtain to predict it. Such areas can suggest new experiments useful and satisfying trees. The user does not usually on smaller populations or even can question on the know beforehand which tree is relevant to his problem existence of additional attributes (which will have to and this is because he finds it gratifying to take part in be collected) to help determine the class for examples this search that he takes interest in the induction work. where it is not yet possible. Let us note that most authors try and define software From experiments [13], we noticed that the degree of architecture explicitly integrating the user. In the area pruning is quite bound to the uncertainty embedded in of induction graph (which is a generalization of the data. In practice, that means that the damping decision trees), the SIPINA software offers to the user process has to be adjusted according to the data in to fix the choice of an attribute, to gather temporarily order to obtain, in all situations, a relevant number of some values of an attribute, to stop the construction pruned trees. For that, we introduce a parameter to from some nodes, and so on. Dabija & al. [8] offer an control the damping process. By varying this learning system architecture (called KAISER, for parameter, one follows the quality index evolution Knowledge Acquisition Inductive System driven by during the pruning (for example the user distinguishes Explanatory Reasoning) for an interactive knowledge the parts of the tree that are due to random from those acquisition system based on decision trees and driven reliable). Such a work enhances the most relevant by explanatory reasoning. Moreover, the experts can incrementally add knowledge corresponding to the 95
  • 6. Computing and Information Systems, 7 (2000), p. 91-97 © University of Paisley 2000 domain theory. KAISER confronts built trees with the [7] Crémilleux B., Ragel A., & Bosson J. L. An domain theory, so that some incoherences may be Interactive and Understandable Method to Treat detected (for instance, the value of the attribute "eye" Missing Values: Application to a Medical Data Set. In for a cat has to be "oval"). Keravhut & Potvin [17] proceedings of the 5th International Conference on Information Systems Analysis and Synthesis (ISAS / have designed an assistant to collaborate with the user. SCI 99), pp. 137-144, M. Torres, B. Sanchez & E. This assistant, which is in the form of a graphic Wills (Eds.), Orlando, FL, 1999. interface, helps the user test the methods and their [8] Dabija V. G., Tsujino K., & Nishida S. Theory parameters in order to get the most relevant formation in the decision trees domain. Journal of combination for the problem at hands. Japanese Society for Artificial Intelligence, 7 (3), 4. CONCLUSION 136-147, 1992. [9] Esposito F., Malerba D., & Semeraro G. Decision tree Producing decision trees is often presented as pruning as search in the state space. In proceedings of "automatic" with a marginal participation from the European Conference on Machine Learning ECML user: we have stressed on the fact that the user has a 93, pp. 165-184, P. B. Brazdil (Ed.), Lecture notes in fundamental critics and supervisor role and that he artificial intelligence, N° 667, Springer-Verlag, intervenes in a major way. This leads to a real Vienna, Austria, 1993. development cycle between the user and the system. [10] Fayyad U. M., & Irani K. B. The attribute selection This cycle is only possible because the construction of problem in decision tree generation. In proceedings of a tree is nearly instantaneous. Tenth National Conference on Artificial Intelligence, pp. 104-110, Cambridge, MA: AAAI Press/MIT Press, The participation of the user for the data preparation, 1992. the choice of the parameters, the critics of the results [11] Fayyad U. M., & Irani K. B. Multi-interval is in fact at the heart of the more general process of discretization of continuous-valued attributes for Knowledge Discovery in Databases. As usual in KDD, classification learning. In proceedings of the we claim that the understanding and the declarativity Thirteenth International Joint Conference on Artificial of the mechanism of the methods is a key point to Intelligence IJCAI 93, pp. 1022-1027, Chambéry, achieve in practice a fruitful process of information France, 1993. extraction. Finally, we think that, in order to really [12] Fayyad U. M. Branching on attribute values in reach a data exploration reasoning, associating the decision tree generation. In proceedings of Twelfth user in a profitable way, it is important to give him a National Conference on Artificial Intelligence, pp. 601-606, AAAI Press/MIT Press, 1994. framework gathering all the tasks intervening in the process, so that he may freely explore the data, react, [13] Fournier D., & Crémilleux B. Using impurity and depth for decision trees pruning. In proceedings of the innovate with new experiments. 2th International ICSC Symposium on Engineering of References Intelligent Systems (EIS 2000), Paisley, UK, 2000. [1] Breiman L., Friedman J. H., Olshen R. A., & Stone C. [14] Gelfand S. B., Ravishankar C. S., & Delp E. J. An J. Classification and regression trees. Wadsworth. iterative growing and pruning algorithm for Statistics probability series. Belmont, 1984. classification tree design. IEEE Transactions on [3] Breiman L. Some properties of splitting criteria Pattern Analysis and Machine Intelligence 13(2), (technical note). Machine Learning 21, 41-47, 1996. 163-174, 1991. [3] Buntine W. Learning classification trees. Statistics [15] Goodman R. M. F., & Smyth, P. Information-theoretic and Computing 2, 63-73, 1992. rule induction. In proceedings of the Eighth European Conference on Artificial Intelligence ECAI 88, pp. [4] Catlett J. Overpruning large decision trees. In 357-362, München, Germany, 1988. proceedings of the Twelfth International Joint Conference on Artificial Intelligence IJCAI 91, pp. [16] Hunt E. B., Marin J., & Stone P. J. Experiments in 764-769, Sydney, Australia, 1991. induction. New York Academic Press, 1966. [5] Crémilleux B., & Robert C. A Pruning Method for [17] Kervahut T., & Potvin J. Y. An interactive-graphic Decision Trees in Uncertain Domains: Applications in environment for automatic generation of decision Medicine. In proceedings of the workshop Intelligent trees. Decision Support Systems 18, 117-134, 1996. Data Analysis in Medicine and Pharmacology, ECAI [18] Kononenko I. On biases in estimating multi-valued 96, pp. 15-20, Budapest, Hungary, 1996. attributes. In proceedings of the Fourteenth [6] Crémilleux B., Robert C., & Gaio M. Uncertain International Joint Conference on Artificial domains and decision trees: ORT versus C.M. criteria. Intelligence IJCAI 95, pp. 1034-1040, Montréal, In proceedings of the 7th Conference on Information Canada, 1995. Processing and Management of Uncertainty in [19] Mingers J. An empirical comparison of pruning Knowledge-based Systems, pp. 540-546, Paris, France, methods for decision-tree induction. Machine 1998. Learning 4, 227-243, 1989. 96
  • 7. Computing and Information Systems, 7 (2000), p. 91-97 © University of Paisley 2000 [20] Quinlan J. R. Induction of decision trees. Machine Learning 1, 81-106, 1986. APPENDIX [21] Quinlan J. R., & Rivest R. L. Inferring decision trees Data file used to build trees for Figure 1 (D denotes using the minimum description length principle. the class and Y1 and Y2 are the attributes). Information and Computation 80(3), 227-248, 1989. [22] Quinlan J. R. C4.5 Programs for Machine Learning. San Mateo, CA. Morgan Kaufmann, 1993. D Y1 Y2 [23] Quinlan J. R. Improved use of continuous attributes in 1 d1 y11 y22 C4.5. Journal of Artificial Intelligence Research 4, 77-90, 1996. [24] Ragel A., & Crémilleux B. Treatment of Missing 2350 Values for Association Rules, Second Pacific Asia d1 y11 y22 Conference on KDD, PAKDD 98, pp. 258-270, X. Wu, 2351 d1 y12 y22 R. Kotagiri & K. B. Korb (Eds.), Lecture notes in artificial intelligence, N° 1394, Springer-Verlag, Melbourne, Australia, 1998. 2500 d1 y12 y22 [25] Safavian S. R., & Landgrebe D. A survey of decision tree classifier methodology. IEEE Transactions on 2501 d2 y11 y21 Systems, Man, and Cybernetics 21(3), 660-674, 1991. [26] Wallace C. S., & Patrick J. D. Coding decision trees. 2650 d2 y11 y21 Machine Learning11, 7-22, 1993. [27] White A. P., & Liu W. Z Bias in Information-Based 2651 d2 y12 y21 Measures in Decision Tree Induction. Machine Learning 15, 321-329, 1994. 2700 d2 y12 y21 2701 d3 y11 y22 2850 d3 y11 y22 2851 d3 y12 y22 5200 d3 y12 y22 B. Crémilleux is Maître des Conférences at the Université do Caen, France. 97