Classification with ant colony optimization

Outline
 Introduction
 Problem Statement
 Behavior of Real Ants
 Ant Colony Optimization
(ACO)
 Applications of ACO
 Importance of ACO in
classification
 Ant-Miner
 Example
 Results
 Conclusions
 References

Introduction
 The goal of data mining -
Extract (comprehensible) knowledge from data
 Comprehensibility is important when knowledge will be used
for supporting a decision making process.
 The classification task in data mining and decision making consists in
associating an object/case to a class (among a predefined set of
classes) based on the object/case’s attributes.
 Discovering classification rules is an important data mining task,
which generates a set of rules that describe each class or category in
a natural way.

Introduction
 Ant Colony Optimization (ACO) was First introduced by Marco
Dorigo in 1992.
 Parpinelli, Lopes and Freitas [Parpinelli et al. (2001)] proposed
the algorithm Ant-Miner (Ant Colony-based Data Miner).
 Based on the behavior of real ant colonies and on data
mining concepts.
 Discover classification rules in data sets.

Problem Statement
 Discovering rules for classification using ACO
 Given- training set
 Goal- (simple) rules to classify data
 Output- ordered decision list

Behaviour of real ants
 Ants go towards the food while laying down pheromone trails
 Shortest path is discovered via pheromone trails
 Each ant moves at random
 Pheromone is deposited on path
 Shorter path, more pheromone rails (positive feedback system)
 Ants follow the intense pheromone trails

 Ants are almost blind.
 Incapable of achieving
complex tasks alone.
 Rely on the phenomena of
swarm intelligence for
survival.
 Capable of establishing shortest-route paths from their colony
to feeding sources and back.
 Use stigmergic communication via pheromone trails.

 Ants follow existing pheromone trails with high probability.
 What emerges is a form of autocatalytic behavior: the more
ants follow a trail, the more attractive that trail becomes for
being followed.
 The process is thus characterized by a positive feedback loop,
where the probability of a discrete path choice increases with
the number of times the same path was chosen before.
ACO algorithms are called autocatalytic positive
feedback algorithms

 Ant colony optimization technique is based on the technique
known as Swarm Intelligence [Bonabeau et al. (1999)],
which is a part of Artificial Intelligence.
 Swarm intelligence is an approach to problem solving that
takes stimulus from the social behaviours of insects and of
other animals.
Swarm Intelligence

Swarm Intelligence
 A swarm is a large number of homogenous,
simple agents interacting locally among
themselves, and their environment.
 Achieving a collective performance which
could not normally be achieved by an
individual acting alone.
 Constitutes a natural model particularly suited
to distributed problem solving.
 Swarm-based algorithms have recently
emerged as a family of nature-inspired,
population-based algorithms that are capable
of producing low cost, fast, and robust
solutions to several complex problems.

Stigmergy
 Form of indirect communication in a group of similar
organisms is known as stigmergy.
 Two individuals interact indirectly when one of them modifies
the environment and the other responds to the new
environment at a later time. This is stigmergy.
 Real ants use stigmergy using – PHEROMONES.
 Pheromone is a secreted or excreted chemical factor that
triggers a social response in members of the same species.
Stigmergy provides the ant colony shortest-path finding
capabilities

How stigmergy works in ACO?
 Ants secret pheromone while traveling from the nest to food,
and vice versa in order to communicate with one another to
find the shortest path.
 Ants are forced to decide whether they should go left or right,
and the choice that is made is a random decision.
 Pheromone accumulation is faster on the shorter path. The
difference in pheromone content between the two paths over
time makes the ants choose the shorter path.
 The more ants follow a trail, the more attractive that trail
becomes for being followed.

Ant Colony Optimization (ACO)
“Ant Colony Optimization (ACO) studies artificial
systems that take inspiration from the behavior of
real ant colonies and which are used to solve
discrete optimization problems.”
-Source: ACO website, http://iridia.ulb.ac.be/~mdorigo/ACO/about.html

Design of ACO Algorithm
The design of the algorithm can be summarized as specification
of the following aspects [Parpinelli et al. (2001)]:
 An environment that represents its problem domain in such a way
that it helps in incrementally building a solution to the problem.
 A problem dependent heuristic evaluation function (η) that
provides quality measurement for the different solution
components.
 A pheromone updating rule, which considers pheromone
evaporation and reinforcement of pheromone trails.
 A probabilistic transition rule based on heuristic function (η) and
strength of the pheromone trail (τ) that determines path taken by
ants.
 A clear specification of when the algorithm converges for
solution.

ACO Algorithm
Figure 1. General ACO Algorithm

Algorithm Description
 After initializing the parameters and pheromone trail, the ant
construct the solution by managing the colony of ants, which
concurrently and asynchronously visits the adjacent states to
solve the problem of constructing solution.
 The ants search the solution by making use of pheromone trails
and the heuristic information. By this way, the ants build the
solution.
 Once the solution is build or being built, the ants evaluate the
partial solution which will be used to update the pheromone
trail, to find the deposition of pheromone.

 Update pheromone trail where the modification are done by the
way of updating. There is chance of increase or decrease in the
deposition of the trail, due to the pheromone evaporation content.
The less the evaporation of the pheromone, the more the
probability of the connection used by ants and good solution is
produced, which will again used by the following ants.
 Some problem specific actions are often called daemon actions,
and can be used to implement problem specific and/or centralized
actions, which cannot be performed by single ants. The most used
daemon action consists in the application of local search to the
constructed solutions: the locally optimized solutions are then
used to decide which pheromone values to update.
Algorithm Description

Several special cases of the ACO algorithm have been proposed in
literatures. Here we briefly overview, in the historical order in
which they were introduced, the three most successful ones:
 Ant system [Dorigo and Colorni (1996)]
 Ant colony system (ACS) [Dorigo and Gambardella (1997)]
 MAX-MIN ant system (MMAS) [Stutzle and Hoos (2000)]
Some Variants of ACO Algorithm

Ant System
 Ant system (AS) was the first ACO algorithm proposed in the
literature [Dorigo et al. (1996)].
 The main characteristic of this algorithm was that the
pheromone value is updated at each iteration itself by all the
ants involved.
 Many algorithms has been developed having this as the basic
structure.

 In Ant Colony System, a local pheromone update was
introduced where the updation is done at the end, also called
offline updation.
 Each ant performs the local pheromone update after each
construction step.
 The ants perform different action during one iteration
depending upon the pheromone [Dorigo and Gambardella
(1997)].
 Only one ant update the solution at the end whether it is
iteration best or best-so-far.
Ant colony system (ACS)

 The performance of traditional ACO algorithms, is seen to be
rather poor on large instance problems [Stutzle and Hoos
(1996)].
 Stutzle and Hoos [Stutzle and Hoos (2000)] advocate that
improved performance can be obtained by a stronger
exploitation of the best solutions, combined with an effective
mechanism for avoiding early search stagnation (the situation
where all ants take the same path and thus generate the same
solution).
 The authors propose a MAX-MIN ant system that differs from
the traditionally proposed Ant System in three aspects.
MAX-MIN Ant System (MMAS)

 After each iteration only the best ant is allowed to add
pheromone to its trail. This allows for a better exploitation of
the best solution found.
 The range of possible pheromone trails is limited to an interval
[τmax , τmin] so as to avoid early stagnation of the search.
 The initial pheromone value of each trail is set at τmax .This
determines a higher exploration at the beginning of the
algorithm.
MAX-MIN Ant System (MMAS)

 Traveling Salesman Problem [Dorigo and Gambardella (1997)]
 Job shop scheduling problem [Ventresca and Ombuki (2004)]
 Exam timetabling problem [Eley (2007)]
 Routing in communication networks [Zhao et al. (2010)]
 Image edge detection [Baterina and Oppus (2010)]
 Data mining domain
 Clustering [Jafar and Sivakumar (2010)]
 Web usage mining [Reena and Arora (2014)]
 Classification
 And so on…
Applications of ACO

 ACO algorithm for the classification task
 Assign each case to one class, out of a set of predefined
classes.
 Discovered knowledge is expressed in the form of IF-
THEN rules:
IF <conditions> THEN <class>
 The rule antecedent (IF) contains a set of conditions,
connected by AND operator (term1 AND term2 AND…).
 The rule consequent (THEN) specifies the class predicted
for cases whose predictor attributes satisfy all the terms
specified in IF part.
Application of ACO in Classification

Why ACO algorithms are important for Data Mining?
 ACS use simple agents (artificial ants) that, when working
together, cooperate with each other.
 System finds a high-quality solution for problems with a large
search space.
 Rule discovery:
 Performs a flexible search over all possible logic
combinations of the predicting attributes.
Importance of ACO

 Algorithm consists of several steps -
 Rule construction
 Rule pruning
 Pheromone updating
 Ant-Miner follows a sequential covering approach to
discover a list of classification rules covering all, or almost
all the training cases.
Ant-Miner: An ACO Algorithm for Classification

 Ant starts with empty rule.
 Ant adds one term at a time to rule.
 Choice of the terms depend on two factors -
 Heuristic function (problem dependent) (η)
 Pheromone value associated with each term (τ)
Rule Construction

 Let, = Heuristic function, = Pheromone amount,
 The probability that termij is chosen to be added to the current
partial rule is given by Equation:
 is the total number of attributes.
 xi is set to 1 if the attribute Ai was not yet used by the current
ant, or to 0 otherwise.
 bi is the number of values in the domain of the i-th attribute.
Choice of Terms
ij  ij t
ia

 Based on information theory
 In information theory, entropy is a measure of the
uncertainty associated with a random variable –
“amount of information”.
 Entropy for each termij is calculated as:
 The final normalized heuristic function is defined as:
Heuristic Function

Entropy for each termij is given by-
Where,
 W is the class attribute (i.e. the attribute whose domain consists
of the classes to be predicted).
 k is the number of classes.
 is the empirical probability of observing class w
conditional on having observed .
Heuristic Function
 | i ijP W A V
i ijA V

An ant keeps adding terms one-at-a-time to its current partial rule
until the ant is unable to continue constructing its rule.
 Num. of rules >= Num. of ants
 Convergence is met
 Last k ants found exactly the same rule,
k = No_rules_converg
 List of discovered rules is updated
 Pheromones reset for all trails
Stopping Criteria

 Remove irrelevant, unduly included terms in rule
 Imperfect heuristic function
 Ignoring attribute interactions
 Rule pruning-
 Iteratively remove one-term-at-a-time
 Test new rule against a rule-quality function.
 Repeat the process until further removal results no more
quality improvement of the rule.
Rule Pruning

Rule-quality function:
 TP (true positives) is the number of cases covered by the rule and having
the similar class that is expected by the rule.
 FP (False positives) is the number of cases covered by the rule and having
a class that was not expected by the rule.
 FN (False negatives) is the number of cases that are not covered by the
rule, whilst having the class that is expected by the rule.
 TN (True negatives) is the number of cases that are not covered and
which have a different class from the class that is expected by the rule.
 Q´s value is within the range and, the larger the value of Q, the higher
the quality of the rule.
Rule Pruning

 The initial amount of pheromone dropped at each path is
inversely proportional to the number of values of all attributes,
and is defined by-
 is the total number of attributes.
 bi is the number of values in the area of attribute i .
Pheromone Initialization
a

 Initialize pheromone value
 Increase pheromone in trail followed by current ant
 According to quality of found rule
 Decrease pheromone in other trails not used by ant
 Simulate pheromone evaporation
 New ant starts with rule construction
 Uses new pheromone data!
Process is repeated for a predefined number of ants. This number
is specified as a parameter in the system, called No_of_ants.
Pheromone Updating

 Increase probability termij will be chosen by other ants in
future
 In proportion to rule quality Q (0 <= Q <= 1)
 Pheromone updating rule is given by-
 Pheromone evaporation-
 Amount of pheromone related with each termij which
does not take place in the assembled rule must be
reduced.
 Divide the value of each current by the summation of
all
Pheromone Updating
ij
ij

 The best rule among the rules constructed by all ants is considered a
discovered rule. The other rules are discarded. This completes one
iteration of the system.
 When the number of cases left in the training set is less than
Max_uncovered_cases the search for rules stops.
 The discovered rules are stored in an ordered rule list (in order of
discovery), which will be used to classify new cases, unseen during
training.
 The system also adds a default rule to the last position of the rule list.
The default rule has an empty antecedent (i.e. no condition) and has a
consequent predicting the majority class in the set of training cases that
are not covered by any rule. This default rule is automatically applied if
none of the previous rules in the list cover a new case to be classified.
Rule Discovery

 Once the rule list is complete, the system is finally ready to
classify a new test case unseen during training.
 The system tries to apply the discovered rules, in order.
 The first rule that covers the new case is applied – i.e. the
case is assigned the class predicted by that rule’s consequent.
Classification using Discovered Rules

Heuristic Function
 P( play| outlook=overcast) = 4/14 = 0.286
 P(don’t play|outlook=overcast) = 0/14 = 0
 H(W, outlook=overcast)=-0.286*log(0.286) = 0.516
 ηovercast =log k-H(W, outlook=overcast) = 1-0.516 = 0.484

Heuristic Function
 P(play|outlook=sunny) = 2/14 = 0.143
 P(don’t play|outlook=sunny) = 3/14 = 0.214
 H(W,outlook=sunny)=-0.143*log(0.143)-0.214*log(0.214) =
0.877
 ηsunny =log k-H(W, outlook=sunny) = 1-0.877 = 0.123

DiscoveredRuleList=[]
‘Outlook’ -
 ηrain = 0.123,
 ηsunny = 0.123,
 ηovercast = 0.484
 τrain(1) = τsunny(1) = τovercast(1) = 1/3
Ant-Miner example
Overcast

 DiscoveredRuleList=[]
‘Temperature’-
 η72 = 0.456,
 η75 = 0.599,
 η71= η81= η69= η64= η65= η68= η70= η83= η80= η85= 0.728
 τall(1) = 1/12
Ant-Miner example
81

‘Humidity’-
 η75 = η95 = η65 =η96 = η78 = η85 = 0.728,
 η90 = 0.456,
 η70= η80= 0.327
 τall(1) = 1/9
Ant-Miner example
75

‘Windy’-
 ηf = 0.075,
 ηt = 0.048,
 τall(1) = 1/2
Ant-Miner example
False

 Rule=IF (outlook=overcast)
AND (temp=81)
AND (humid=75)
AND (windy=false)
Ant-Miner example
THEN PLAY

For total rule:
 TP=1, FN=8, TN=5, FP=0
 Q=0.111
Without ‘outlook=overcast’
 Q=0.111 (No improvement)
Without ‘temp=81 and humid=75’
 TP=2, FN=7, TN=5, FP=0
 Q=0.222 – better!
For – ‘Windy= False’
 TP=6, FN=3,TN=3, FP=2
 Q=0.4 – even better!
For the rule – ‘outlook=overcast’
 TP=4, FN=5, TN=5, FP=0
 Q=0.444 – BEST !
Ant-Miner Example

Pheromone update
 τovercast(2)=(1+0.444)* τovercast(1)= 0.481
 τovercast (2)=0.481
 τ sunny(2)=0.29
 τ rain(2)=0.29
Ant-Miner Example

DiscoveredRuleList=[IF overcast THEN play]
Ant-Miner Example

Well known data sets used for experiment [Lichman (2013)]:
Results
Data set #Cases #Categorica
l attributes
#Continuous
attributes
#Classes
Ljubljana
cancer
282 9 - 2
Wisconsin
cancer
683 - 9 2
Dermatology 358 33 1 6
Hepatitis 155 13 6 2

Data set No. of Rules No. of
terms/No. of
Rules
Predictive
Accuracy
of Ant-Miner(%)
Ljubljana cancer 7.10 ± 0.31 1.28 75.28 ± 2.24
Wisconsin cancer 6.20 ± 0.25 1.97 96.04 ± 0.93
Hepatitis 3.40 ± 0.16 2.41 94.29 ± 1.20
Dermatology 7.30 ± 0.15 3.16 90.00 ± 3.11
Results
Table 1 summarizes the results obtained by the proposed AntMiner algorithm
in the four datasets. The table shows the accuracy rate, the number of rules
found and the number of terms (the shown values are the average values of the
cross-validation procedure followed by the corresponding standard deviation).

 Ant-Miner is better, because:
 Uses feedback (pheromone
mechanism).
 Stochastic search, instead of
deterministic.
 Uses probabilty.
 End effect:
 Good predictive accuracy.
 Reduced number of simple and
short rules.
 Drawback: Computational cost,
especially when the search space
(number of predicting attributes) is
too large.
Conclusions

Conclusions
 Two important directions for future work are as follows.
 It would be interesting to investigate a variant of Ant-Miner that
can cope with continuous attributes, rather than requiring that
this kind of attribute be discretized in a preprocessing step.
 To investigate the performance of other kinds of heuristic
function and pheromone updating strategy so that the
computation time is reduced.

References
Baterina, A. V. and Oppus, C. (2010). Image edge detection using ant colony
optimization. WSEAS Transactions on Signal Processing, 6, 58-67.
Bonabeau, E., Dorigo, M. and Thera-ulaz, G. (1999). Swarm Intelligence: From
Natural to Artifical System, Oxford University Press, New York.
Chandra, S. and Bhattacharyya, S. (2015). Quantum Inspired Swarm
Optimization for Multi-Level Image Segmentation Using BDSONN
Architecture. Handbook of Research on Swarm Intelligence in Engineering,
286-326.
Colorni, A., Dorigo, M. and Maniezzo, V. (1991). Distributed Optimization by Ant
Colonies. Actes De La Première Conférence Européenne Sur La Vie
Artificielle, Elsevier, Paris, France, 134-142.

References
Dorigo, M. (1992). Optimization, Learning and Natural Algorithms. PhD thesis,
Politecnico di Milano, Italy.
Dorigo, M. and Colorni, A. (1996). Ant system: Optimization by a colony of
cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics, 26,
1-13.
Dorigo, M. and Gambardella, L. M. (1997). Ant colony system: A cooperative
learning approach to the traveling salesman problem, IEEE Transactions on
Evolutionary Computation, 1, 53-66.
Dorigo, M. and Stutzle, T. (2004). Ant Colony Optimization, MIT Press,
Cambridge, MA.

References
Dorigo, M., Maniezzo, V. and Colorni, A. (1991). The ant system: An autocatalytic
optimizing process. Technical Report, Politecnico di Milano, Italy.
Eley, M. (2007). Ant Algorithms for the Exam Timetabling Problem. Practice and
Theory of Automated Timetabling VI, Springer, Berlin, Heidelberg, 364-382.
Jafar, O. M. and Sivakumar, R. (2010). Ant-based clustering algorithms: A brief
survey. International Journal of Computer Theory and Engineering, 2, 787-
796.
Kohavi, R. and Sahami, M. (1996). Error-Based and Entropy-Based Discretization
of Continuous Features. In Proceedings of the 2nd International Conference
Knowledge Discovery and Data Mining, 114-119.

References
Lichman, M. (2013). UCI Machine Learning Repository,
http://archive.ics.uci.edu/ml, University of California, School of Information
and Computer Science, Irvine, California, USA.
Parpinelli, R. S., Lopes, H. S. and Freitas, A. A. (2002). Data Mining with an Ant
Colony Optimization Algorithm. IEEE Transaction on Evolutionary
Computation, special issue on Ant colony Algorithm, 6, 321-332.
Parpinelli, R. S., Lopes, H. S. and Freitas, A. A. (2001). An ant colony based system
for data mining: Applications to medical data. In Proccedings of Genetic and
Evolutionary Computation Conference, 791–797.
Quinlan, J. R. (2014). C-4. 5: programs for machine learning, Elsevier, San
Francisco, USA.

References
Reena and Arora, J. (2014). Web Usage Mining Based on Ant Colony
Optimization. International Journal of Advanced Research in Computer
Science and Software Engineering, 4, 984-988.
Stutzle, T. and Hoos, H. H. (1996). Improving the ant-system: A detailed report on
the MAX-MIN ant system. Technical Report, Darmstadt, Germany.
Stutzle, T. and Hoos, H. H. (2000). MAX–MIN Ant System. Future Generation
Computer Systems, 16, 889–914.
Sivakumar, P. and Elakia, K. (2016). A Survey of Ant Colony Optimization.
International Journal of Advanced Research in Computer Science and
Software Engineering, 6, 574-578.

References
Ventresca, M. and Ombuki, B. M. (2004). Ant Colony Optimization for Job Shop
Scheduling Problem. Technical Report, Department of Computer Science,
Brock University, Ontario, Canada.
Weiss, S. and Kulikowski, C. (1991). Computer systems that learn, San Francisco,
USA.
Zhao, D., Luo, L. and Zhang, K. (2010). An improved ant colony optimization for
the communication network routing problem. Mathematical and Computer
Modelling, 52, 1976-1981.

Classification with ant colony optimization

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Classification with ant colony optimization

Ähnlich wie Classification with ant colony optimization (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Classification with ant colony optimization

Hinweis der Redaktion