Decision Trees as a Powerful Data Mining Tool

Computing and Information Systems, 7 (2000), p. 91-97 © University of Paisley 2000

Decision Trees as a Data Mining Tool
Bruno Crémilleux

The production of decision trees is usually regarded step of data preparation, but also during the whole
as an automatic method to discover knowledge from process. In fact, using decision trees can be embedded
data: trees directly stemmed from the data without in the KDD process within the main steps (selection,
other intervention. However, we cannot expect preprocessing, data mining, interpretation /
acceptable results if we naively apply machine evaluation). The aim of the paper is to show the role
learning to arbitrary data. By reviewing the whole of the user and to connect the use of decision trees
process and some other works which implicitly have within the data mining framework.
to be done to generate a decision tree, this papers This paper is organized as follows. Section 2 outlines
shows that this method has to be placed in the the core of decision trees method (i.e. building and
knowledge discovery in databases processing and, in pruning). Literature usually presents these points from
fact, the user has to intervene both during the core of a technical side without describing the part regarding
the method (building and pruning) and other the user: we will see that he has a role to play. Section
associated tasks. 3 deals with associated tasks which are, in fact,
1. INTRODUCTION absolutely necessary. These tasks, where clearly the
user has to intervene, are often not emphasized when
Data mining and Knowledge Discovery in Databases we speak of decision trees. We will see that they have
(KDD) are fields of increasing interest combining a great relevance and they act upon the final result.
databases, artificial intelligence, machine learning and
statistics. Briefly, the purpose of KDD is to extract 2. BUILDING AND PRUNING
from largeamounts of data, non trivial ”nuggets” of 2.1 Building decision trees: choice of an attribute
information in an easily understandable form. Such selection criterion
discovered knowledge may be for instance regularities
or exceptions. In induction of decision trees various attribute
selection criteria are used to estimate the quality of
Decision tree is a method which comes from the attributes in order to select the best one to split on. But
machine learning community and explores data. Such we know at a theoretical level that criteria derived
a method is able to give a summary of the data (which from an impurity measure have suitable properties to
is easier to analyze than the raw data) or can be used generate decision trees and perform comparably (see
to build a tool (like for example a classifier) to help a [10], [1] and [6]). We call such criteria C.M. criteria
user formany different decision making tasks. Broadly (concave-maximum criteria) because an impurity
speaking, a decision tree is built from a set of training measure, among other characteristics, is defined by a
data having attribute values and a class name. The concave function. The most commonly used criteria
result of the process is represented as a tree which which are the Shannon entropy (in the family of ID3
nodes specify attributes and branches specify attribute algorithms) and the Gini criterion (in CART
values. Leaves of the tree correspond to sets of algorithms, see [1] for details), are C.M. criteria.
examples with the same class or to elements in which
no more attributes are available. Construction of Nevertheless, it exists other paradigms to build
decision trees is described, among others, by Breiman decision trees. For example, Fayyad and Irani [10]
etal. (1984) [1] who present an important and well- claim that grouping values of attributes and building
know monograph on classification trees. A number of binary trees yield better trees. For that, they propose
standard techniques havebeen developed, for example the ORT measure. ORT favours attributes that simply
like the basic algorithms ID3 [20] and CART [1]. A separate the different classes without taking into
survey of different methods of decision tree classifiers account the number of examples of nodes so that ORT
and the various existing issues are presented in produces trees with small pure (or nearly pure) leaves
Safavian and Landgrebe [25]. at their top more often than C.M. criteria.

Usually, the production of decision trees is regarded as To better understand the differences between C.M.
an automatic process: trees are straightforwardly and ORT criteria, let us consider the data set given in
generated from data and the user is relegated to a the appendix and the trees induced from this data
minor role. Nevertheless, we think that this method depicted in Figure 1: a tree built with a C.M. criterion
intrinsically requires the user and not only during the is represented at the top and the tree built with the

91


ORT criterion at the bottom. ORT rapidly comes out has been resumed by Wallace and Patrick [26] who
with the pure leaf Y2 = y21 while C.M. criterion splits suggest some improvements and show they generally
it and arrives later at the split leaves. obtain better empirical results than those found by
Quinlan. Buntine [3] presents a tree learning algorithm
stemmed from Bayesian statistics whose main
(2500,200,2500) objective is to provide outstanding predicted class
Y1 = y11 Y1 = y12 probabilities on the nodes.
We can also address the question of deciding which
(2350,150,150) (150,50,2350) sub-nodes have to be built. For a splitting, the GID3*
Y2 = y21 Y2 = y22 Y2 = y21 Y2 = y22 algorithm [12] groups in a single branch the values of
an attribute which are estimated meaningless
compared to its other values. For building of binary
(0,150,0) (2350,0,150) (0,50,0) (150,0,2350)
trees, another criterion is twoing [1]. Twoing groups
C.M. tree classes into two superclasses so that considered as a
two-class problem, the greatest decrease in node
impurity is realized. Some properties of twoing are
(2500,200,2500) described in Breiman [2]. About binary decision trees,
Y2 = y21 Y2 = y22 let us note that in some situations, users do not always
agree to group values since it yields meaningless trees
(0,200,0) (2500,0,2500) and thus non-binary trees must not be definitively
Y1 = y11 Y1 = y12 discarded.
So, we have seen that there are many attribute
(2350,0,150) (150,0,2350) selection criteria and even if some of them can be
gathered in families, some choice has to be done.
ORT tree According to us, we think that the choice of a
Figure 1: An example of C.M. and ORT trees. paradigm depends whether the used data sets embed
uncertainty or not, whether the phenomenon under
We give here just a simple example, but some others study admits deterministic causes, and what level of
both in artificial and real world domains are detailed intelligibility is required.
in [6]: they show that ORT criterion produces more
often than C.M. criteria trees with small leaves at their In the next paragraph, we move to the pruning stage.
top. We also see in [6] that overspecified leaves with 2.2 Pruning decision trees: what about the
C.M criteria tend to be small and at the bottom of the classification and the quality?
tree (thus easy to prune) while leaves at the bottom of
ORT trees can be large. In uncertain domains (we will We know that in many areas, like in medicine, data
see this point on the next paragraph), such leaves are uncertain: there are always some examples which
produced by ORT may be irrelevant and it is difficult escape from the rules. Translated in the context of
to prune them without destroying the tree. decision trees, that means these examples seem similar
but in fact differ from their classes. In these situations,
Let us note that other selection criteria, such as the it is well-known (see [1], [4]) that decision trees
ratio criterion, are related to other specific issues. The algorithms tend to divide nodes having few examples
ratio criterion proposed by Quinlan [20], deriving and that the resulting trees tend to be very large and
from the entropy criterion, is customized to avoid overspecified. Some branches, especially towards the
favouring attributes with many values. Actually, in bottom, are present due to sample variability and
some situations, to select an attribute essentially arestatistically meaningless (one can also say that they
because it has many values might jeopardize the are due to noise in the sample). Such branches must
semantic acceptance of the induced trees ([27] and either not be built or be pruned. If we do not want to
[18]). The J-measure [15] is the product of two terms build them, we have to set out rules to stop the
that are considered by Goodman and Smyth as the two building of the tree. We know it is better to generate
basic criteria for evaluating a rule: one term is derived the entire tree and then to prune it (see for example [1]
from the entropy function and the other measures the and [14]). Pruning methods (see [1], [19], [20]) try to
simplicity of a rule. Quinlan and Rivest [21] were cut such branches in order to avoid this drawback.
interested in the minimum description length principle
to construct a decision tree minimizing a false The principal methods for pruning decision trees are
classification rate when one looks for general rules examined in [9] and [19]. Most of these pruning
and their case’s exceptional conditions. This principle methods are based on minimizing a classification error

92


rate when each element of the same node is classified the quality of each node is a key-point in uncertain
in the most frequent class in this node. The latter is domains.
estimated with a test file or using statistical methods
such as cross-validation or bootstrap.
So, about the pruning stage, the user is confronted to
These pruning methods are inferred from situations
some questions:
where the built tree will be used as a classifier and
they systematically discard a sub-tree which doesn’t - am I interested in obtaining a quality value of
improve the used classification error rate. Let us each node?
consider the sub-tree depicted in Figure 2. D is the - is there uncertainty in the data?
class and it is here bivalued. In each node the first
(resp. second) value indicates the number of examples and he has to know which use of the tree is pursued:
having the first (resp. second) value of D. This sub- - a tree can be an efficient description oriented
tree doesn't lessen the error rate, which is 10% both in by an a priori classification of its elements. Then,
its root or in its leaves; nevertheless the sub-tree is of pruning the tree discards overspecific information to
interest since it points out a specific population with a get a more legible description.
constant value of D while in the remaining population
it's impossible to predict a value for D. - a tree can be built to highlight reliable sub-
populations. Here only some leaves of the pruned tree
will be considered for further investigation.
(90,10) - the tree can be transformed into a classifier for
any new element in a large population.
(79,0) (11,10) The choice of a pruning strategy is tied to the answers
to these questions.
Figure 2: A tree which could be interesting although it 3. ASSOCIATED TASKS
doesn’t decrease the number of errors. We indicate in this paragraph when and how the users,
by means of various associated tasks, intervene in the
process of developing decision trees. Schematically, it
In [5], we have proposed a pruning method (called is about gathering the data for the design of the
C.M. pruning because a C.M. criterion is used to build training set, the encoding of the attributes, the specific
the entire tree) suitable in uncertain domains. C.M. analysis of examples, the resulting tree analysis,…
pruning builds a new attribute binding the root of a
tree with its leaves, the attribute’s values Generally, these tasks are not emphasized in the
corresponding to the branches leading to a leaf. It literature, they are usually considered as secondary,
permits computation of the global quality of a tree. but we will see that they have a great relevance and
The best sub-tree for pruning is the one that yields the that they act upon the final result. Of course, these
highest quality pruned tree. This pruning method is tasks intersect with the building and pruning work that
not tied to the use of the pruned tree as a classifier. we have previously described.

This work has been resumed in [13]. In uncertain In practice, apart from the building and pruning steps,
domains, a deep tree is less relevant than a small one: there is another step: the data preparation. We add a
the deeper a tree, the less understandable and reliable. fourth step which aims to study the classification of
So, a new quality index (called DI for Depth-Impurity) new examples on an - potentially pruned - tree. The
has been defined in [13]. The latter manages a trade- user strongly intervenes during the first step, but also
off between depth and impurity of each node of a tree. has a supervising role during all steps and more
From this index, a new pruning method (denoted DI particularly a critics role after the second and third
pruning) has been inferred. With regard to C.M. steps (see Figure 3). We do not detail here the fourth
pruning, DI pruning introduces a damping function to step which is marginal from the point of view of the
take into account the depth of the leaves. Moreover, user's role.
by giving the quality of each nodes (and not only of a 3.1 Data preparation
sub-tree), DI pruning is able to distinguish some sub-
populations of interest in large populations, or, on the The aim of thisstep is to supply, from the database
contrary, highlight set of examples with high gathering examples in their raw form, a training set as
uncertainty (in the context of the studied problem). In adapted as possible to the decision trees development.
this case, the user has to come back to the data to try This step is the one where the user intervenes most
and improve their collection and preparation. Getting directly. His tasks are numerous: deleting examples

93


decision trees software

data building pruning classification
manipulation
60-65

40-62
data set 20-3
10-100-50 30-2
entire pruned results of the
tree tree classification
checks and checks and
intervenes intervenes

prepares checks and intervenes
user Figure 3: Process to
generate decision trees and relations with the user.

considered as aberrant (outliers) and/or containing too new re-encodings and/or fusions of attributes, often
many missing values, deleting attributes evaluated as causing a more general description level.
irrelevant to the given task, re-encoding the attributes
The current decision trees construction algorithms
values (one knows that if the attributes have very
deal most often with missing values by means of
different numbers of values, those having more values
specific and internal treatments [7]. On the contrary,
tend to be chosen first ([27] and [18]), we have
by a preliminary analysis of the database, relying on
already referred to this point with the gain ratio
the search of associations between data and leading to
criterion), re-encoding several attributes (for example,
uncertain rules that determine missing values, Ragel
the fusion of attributes), segmenting continuous
([7], [24]) offers a strategy where the user can
attributes, analyzing missing data, ...
intervene: such a method leaves a place for the user
Let us get back to some of these tasks. At first [16], and his knowledge in order to delete, add or modify
the decision trees algorithms did not accept some rules.
quantitative attributes, these had to be discretized.
As we can see, this step depends in fact a lot on the
This initial segmentation can be done by asking
user's work.
experts to set thresholds or by using a strategy relying
on an impurity function [11]. The segmentation can 3.2 Building step
also be done while building the trees as is the case The aim of this step is to induce a tree from a training
with the software C4.5 [22]. A continuous attribute set arising from the previous step. Some system
can then be segmented several times in a same tree. It parameters are to be specified. For example, it is
seems relevant to us that the user may actively useless to keep on building a tree from a node having
intervene in this process by indicating, for example, an too few examples, this amount being relative to the
a priori discretization of the attributes for which it is initial number of examples in the base. An important
meaningful and by letting the system manage the parameter to set is thus the minimum amount of
others. One shall remark that, if one knows in a examples necessary for the node segmentation. Facing
reasonable way how to split a continuous attribute to a particularly huge tree, the user will ask for the
binary, the question is more delicate for a three-valued construction of a new tree by setting this parameter to
(or more) discretization. a higher value, which is pruning the tree by means of a
The user also has generally to decide the deletion, re- pragmatic process. We have seen (paragraph 2.1) that
encoding or fusion of attributes. He has a priori ideas in uncertain induction, the user will most probably
allowing a first pass in this task. But we shall see that choose a C.M. criterion in order to be able to prune.
the tree construction, by making explicit the But if he knows that the studied phenomenon allows
underlying studied phenomenon, suggests to the user deterministic causes in situations with few examples,

94


he can choose the ORT criterion to get a more concise attributes from those that it can be necessary to
description of these situations. redefine.
The presentation of the attributes and their respective Finally, building and pruning steps can be viewed as
criterion scores at each node may allow the user to part of the study of the attributes. Experts of the
select attributes that might not have the best score but domain usually appreciate to be able to restructure the
that provide a promising way to lead to a relevant leaf. set of the initial attributes and to see at once the effect
of such a modification on the tree (in general, after a
The critics of the tree thus obtained is the most
preliminary decision tree, they define new attributes
important participation of the user in this step. He
which summarize some of the initial ones). We have
checks if the tree is understandable regarding his
noticed [5] that when such attributes are used, the
domain knowledge, if its general structure conforms to
shape of the graphic representation of the quality
his expectations. Facing a surprising result, he
index as a function of the number of pruned sub-trees
wonders if this is due to a bias in the training step or if
changes and tends to show three parts: in the first one,
it reflects a phenomenon, sometimes suspected, but
the variation of the quality index is small, in the
not yet explicitly uttered. Most often, seeing the tree
second part this quality decreases regularly and in the
gives the user new ideas about the attributes and he
third part the quality becomes rapidly very low. It
will choose to build again the tree after working again
shows that the information embedded in the data set is
on the training set and/or changing a parameter in the
mainly in the top of the tree while the bottom can be
induction system to confirm or infirm a conjecture.
pruned.
3.3 Pruning step
3.4 Conclusion
Apart from the questions at the end of paragraph 2.2
Through this paragraph, we have seen that the user
about the data types and the aim searched for in
interventions are numerous, that the associated tasks
producing a tree, more questions arise to the user if he
realization are closely linked to him. These tasks are
uses a technique such as DI pruning.
fundamental since they directly affect the results: the
In fact, in this situation, the user has more information study of the results brings new experiments. The user
to react upon. First, he knows the quality index of the starts again many times the work done during a step
entire tree, which allows him to evaluate the global by changing the parameters or comes back to previous
complexity of the problem. If this index is low, this steps (the arrows in Figure 3 shows all the relations
means that the problem is delicate or inadequately between the different steps). At each step, the user
described, that the training set is not representative, or may accept, override, or modify the generated rules,
even that the decision trees method is not adapted to but more often he suggests alternative features and
this specific problem. If the user has several trees, the experiments. Finally, the rule set is redefined through
quality index allows to compare them and eventually subsequent data collection, rule induction, and expert
to suggest new experiments. consideration.
Moreover, the quality index on each node enhances We think it is necessary for the user to take part in the
the populations where the class is easy to determine system so that a real development cycle takes place.
with regards to sets of examples where it is impossible The latter seems fundamental to us in order to obtain
to predict it. Such areas can suggest new experiments useful and satisfying trees. The user does not usually
on smaller populations or even can question on the know beforehand which tree is relevant to his problem
existence of additional attributes (which will have to and this is because he finds it gratifying to take part in
be collected) to help determine the class for examples this search that he takes interest in the induction work.
where it is not yet possible.
Let us note that most authors try and define software
From experiments [13], we noticed that the degree of architecture explicitly integrating the user. In the area
pruning is quite bound to the uncertainty embedded in of induction graph (which is a generalization of
the data. In practice, that means that the damping decision trees), the SIPINA software offers to the user
process has to be adjusted according to the data in to fix the choice of an attribute, to gather temporarily
order to obtain, in all situations, a relevant number of some values of an attribute, to stop the construction
pruned trees. For that, we introduce a parameter to from some nodes, and so on. Dabija & al. [8] offer an
control the damping process. By varying this learning system architecture (called KAISER, for
parameter, one follows the quality index evolution Knowledge Acquisition Inductive System driven by
during the pruning (for example the user distinguishes Explanatory Reasoning) for an interactive knowledge
the parts of the tree that are due to random from those acquisition system based on decision trees and driven
reliable). Such a work enhances the most relevant by explanatory reasoning. Moreover, the experts can
incrementally add knowledge corresponding to the
95


domain theory. KAISER confronts built trees with the [7] Crémilleux B., Ragel A., & Bosson J. L. An
domain theory, so that some incoherences may be Interactive and Understandable Method to Treat
detected (for instance, the value of the attribute "eye" Missing Values: Application to a Medical Data Set. In
for a cat has to be "oval"). Keravhut & Potvin [17] proceedings of the 5th International Conference on
Information Systems Analysis and Synthesis (ISAS /
have designed an assistant to collaborate with the user.
SCI 99), pp. 137-144, M. Torres, B. Sanchez & E.
This assistant, which is in the form of a graphic Wills (Eds.), Orlando, FL, 1999.
interface, helps the user test the methods and their
[8] Dabija V. G., Tsujino K., & Nishida S. Theory
parameters in order to get the most relevant formation in the decision trees domain. Journal of
combination for the problem at hands. Japanese Society for Artificial Intelligence, 7 (3),
4. CONCLUSION 136-147, 1992.
[9] Esposito F., Malerba D., & Semeraro G. Decision tree
Producing decision trees is often presented as pruning as search in the state space. In proceedings of
"automatic" with a marginal participation from the European Conference on Machine Learning ECML
user: we have stressed on the fact that the user has a 93, pp. 165-184, P. B. Brazdil (Ed.), Lecture notes in
fundamental critics and supervisor role and that he artificial intelligence, N° 667, Springer-Verlag,
intervenes in a major way. This leads to a real Vienna, Austria, 1993.
development cycle between the user and the system. [10] Fayyad U. M., & Irani K. B. The attribute selection
This cycle is only possible because the construction of problem in decision tree generation. In proceedings of
a tree is nearly instantaneous. Tenth National Conference on Artificial Intelligence,
pp. 104-110, Cambridge, MA: AAAI Press/MIT Press,
The participation of the user for the data preparation, 1992.
the choice of the parameters, the critics of the results [11] Fayyad U. M., & Irani K. B. Multi-interval
is in fact at the heart of the more general process of discretization of continuous-valued attributes for
Knowledge Discovery in Databases. As usual in KDD, classification learning. In proceedings of the
we claim that the understanding and the declarativity Thirteenth International Joint Conference on Artificial
of the mechanism of the methods is a key point to Intelligence IJCAI 93, pp. 1022-1027, Chambéry,
achieve in practice a fruitful process of information France, 1993.
extraction. Finally, we think that, in order to really [12] Fayyad U. M. Branching on attribute values in
reach a data exploration reasoning, associating the decision tree generation. In proceedings of Twelfth
user in a profitable way, it is important to give him a National Conference on Artificial Intelligence, pp.
601-606, AAAI Press/MIT Press, 1994.
framework gathering all the tasks intervening in the
process, so that he may freely explore the data, react, [13] Fournier D., & Crémilleux B. Using impurity and
depth for decision trees pruning. In proceedings of the
innovate with new experiments.
2th International ICSC Symposium on Engineering of
References Intelligent Systems (EIS 2000), Paisley, UK, 2000.
[1] Breiman L., Friedman J. H., Olshen R. A., & Stone C. [14] Gelfand S. B., Ravishankar C. S., & Delp E. J. An
J. Classification and regression trees. Wadsworth. iterative growing and pruning algorithm for
Statistics probability series. Belmont, 1984. classification tree design. IEEE Transactions on
[3] Breiman L. Some properties of splitting criteria Pattern Analysis and Machine Intelligence 13(2),
(technical note). Machine Learning 21, 41-47, 1996. 163-174, 1991.
[3] Buntine W. Learning classification trees. Statistics [15] Goodman R. M. F., & Smyth, P. Information-theoretic
and Computing 2, 63-73, 1992. rule induction. In proceedings of the Eighth European
Conference on Artificial Intelligence ECAI 88, pp.
[4] Catlett J. Overpruning large decision trees. In
357-362, München, Germany, 1988.
proceedings of the Twelfth International Joint
Conference on Artificial Intelligence IJCAI 91, pp. [16] Hunt E. B., Marin J., & Stone P. J. Experiments in
764-769, Sydney, Australia, 1991. induction. New York Academic Press, 1966.
[5] Crémilleux B., & Robert C. A Pruning Method for [17] Kervahut T., & Potvin J. Y. An interactive-graphic
Decision Trees in Uncertain Domains: Applications in environment for automatic generation of decision
Medicine. In proceedings of the workshop Intelligent trees. Decision Support Systems 18, 117-134, 1996.
Data Analysis in Medicine and Pharmacology, ECAI [18] Kononenko I. On biases in estimating multi-valued
96, pp. 15-20, Budapest, Hungary, 1996. attributes. In proceedings of the Fourteenth
[6] Crémilleux B., Robert C., & Gaio M. Uncertain International Joint Conference on Artificial
domains and decision trees: ORT versus C.M. criteria. Intelligence IJCAI 95, pp. 1034-1040, Montréal,
In proceedings of the 7th Conference on Information Canada, 1995.
Processing and Management of Uncertainty in [19] Mingers J. An empirical comparison of pruning
Knowledge-based Systems, pp. 540-546, Paris, France, methods for decision-tree induction. Machine
1998. Learning 4, 227-243, 1989.

96


[20] Quinlan J. R. Induction of decision trees. Machine
Learning 1, 81-106, 1986. APPENDIX
[21] Quinlan J. R., & Rivest R. L. Inferring decision trees Data file used to build trees for Figure 1 (D denotes
using the minimum description length principle. the class and Y1 and Y2 are the attributes).
Information and Computation 80(3), 227-248, 1989.
[22] Quinlan J. R. C4.5 Programs for Machine Learning.
San Mateo, CA. Morgan Kaufmann, 1993. D Y1 Y2
[23] Quinlan J. R. Improved use of continuous attributes in
1 d1 y11 y22
C4.5. Journal of Artificial Intelligence Research 4,
77-90, 1996.
[24] Ragel A., & Crémilleux B. Treatment of Missing 2350
Values for Association Rules, Second Pacific Asia
d1 y11 y22
Conference on KDD, PAKDD 98, pp. 258-270, X. Wu, 2351 d1 y12 y22
R. Kotagiri & K. B. Korb (Eds.), Lecture notes in
artificial intelligence, N° 1394, Springer-Verlag,
Melbourne, Australia, 1998. 2500 d1 y12 y22
[25] Safavian S. R., & Landgrebe D. A survey of decision
tree classifier methodology. IEEE Transactions on
2501 d2 y11 y21
Systems, Man, and Cybernetics 21(3), 660-674, 1991.
[26] Wallace C. S., & Patrick J. D. Coding decision trees.
2650 d2 y11 y21
Machine Learning11, 7-22, 1993.
[27] White A. P., & Liu W. Z Bias in Information-Based 2651 d2 y12 y21
Measures in Decision Tree Induction. Machine
Learning 15, 321-329, 1994.
2700 d2 y12 y21
2701 d3 y11 y22

2850 d3 y11 y22
2851 d3 y12 y22

5200 d3 y12 y22

B. Crémilleux is Maître des Conférences at the
Université do Caen, France.

97

Decision Trees as a Powerful Data Mining Tool

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Decision Trees as a Powerful Data Mining Tool

Similar to Decision Trees as a Powerful Data Mining Tool (20)

More from butest

More from butest (20)

Decision Trees as a Powerful Data Mining Tool