1. The document describes a facetwise analysis of the XCS learning classifier system for class imbalances.
2. It analyzes the population initialization process, generation of rules for minority classes, time to extinction of such rules, and derives a population size bound.
3. The analysis considers problems with multiple classes, one sampled at a lower frequency (minority class), and derives probabilities of sampling instances from each class.
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter Settings
1. Modeling XCS in Class
Imbalances: Population Size
and Parameter Settings
g
Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1
Bernadó Mansilla
1Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois
Genetic Algorithms Laboratory
Department of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign
2. Framework
New instance
Information based Knowledge
on experience extraction
Learner Model
Mdl
Domain
Predicted Output
Examples
Consisting Cou te e a p es
Counter-examples
of
, yp y
In real-world domains, typically:
Higher cost to obtain examples of the concept to be learnt
So, distribution of examples in the training dataset is usually imbalanced
Applications:
Fraud detection
Medical diagnosis of rare illnesses
Detection of oil spills in satellite images
Enginyeria i Arquitectura la Salle Slide 2
GRSI
3. Framework
Do learners suffer from class imbalances?
– Methods that do global optimization
Training Minimize the
Learner
L
Set global error
num. errorsc1 + num. errorsc 2
error =
Biased towards
number examples
the overwhelmed class
Maximization of the overwhelmed class accuracy,
in detriment of the minority class.
Enginyeria i Arquitectura la Salle Slide 3
GRSI
4. Motivation
And what about incremental learning?
Sampling instances of the minority class less frequently
Rules that match instances of the minority class poorly
activated
Rules of the minority class would receive less genetic
opportunities (Orriols & Bernadó, 2006)
Enginyeria i Arquitectura la Salle Slide 4
GRSI
5. Aim
Facetwise analysis of XCS for class imbalances
Impact of class imbalances on the initialization process
How can XCS create rules of the minority class if the
covering process fails
gp
Population size bound with respect to the imbalance ratio
U til which imbalance ratio would XCS be able to learn
Until hi h i b l ti ld b bl t l
from the minority class?
Enginyeria i Arquitectura la Salle Slide 5
GRSI
6. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Outline 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
1. Description of XCS
2.
2 Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis f D i ti
5 A l i of Deviations
6. Results
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 6
GRSI
7. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Description of XCS
p 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
In single-step tasks:
g p
Environment
Match Set [M]
Match Set [M]
Problem
Minority
Majority
classinstance
instance
1C A PεF num as ts exp
1C A PεF num as ts exp
Selected
3C A PεF num as ts exp
3C A PεF num as ts exp
action
5C A PεF num as ts exp
5C A PεF num as ts exp
Population [P]
Population [P] 6C A PεF num as ts exp
6C A PεF num as ts exp
Match set
Match set
REWARD
…
…
generation
generation
1C A PεF num as ts exp
1C A PεF num as ts exp
Prediction Array 1000/0
2C A PεF num as ts exp
2C A PεF num as ts exp
3C A PεF num as ts exp
3C A PεF num as ts exp
…
c1 c2 cn
4C A PεF num as ts exp
4C A PεF num as ts exp
5C A PεF num as ts exp
5C A PεF num as ts exp
6C A PεF num as ts exp
6C A PεF num as ts exp Random Action
Nourished niches
Starved niches
…
…
Action S t
A ti Set [A]
1C A PεF num as ts exp
Deletion
Classifier
3C A PεF num as ts exp
Selection, Reproduction,
Parameters
Mutation
5C A PεF num as ts exp
Update
6C A PεF num as ts exp
…
Genetic Algorithm
Problem niche: the schema defines the relevant
attributes for a particular problem niche. Eg: 10**1*
Enginyeria i Arquitectura la Salle Slide 7
GRSI
8. Outline
1. Description of XCS
2.
2 Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis f D i ti
5 A l i of Deviations
6. Results
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 8
GRSI
9. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Study XCS capabilities to provide representatives of
starved niches:
– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
Derive a bound on the population size
Depart from theory developed for XCS
– (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS
– (Butz, Goldberg & Lanzi, 04): Learning time bound
– (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche
support
– (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled
Approach to LCS Analysis and Design.
Enginyeria i Arquitectura la Salle Slide 9
GRSI
10. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Assumptions
– Problems consisting of n classes
– One class sampled with a lower frequency: minority class
num. instances of any class other than the minority class
ir =
num. instances of the minority class
– Probability of sampling an instance of the minority class:
1 ir
Ps(min) = Ps(maj) =
( ) ( j)
1+ i 1+ i
ir ir
Enginyeria i Arquitectura la Salle Slide 10
GRSI
11. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Facetwise Analysis
– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Enginyeria i Arquitectura la Salle Slide 11
GRSI
12. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Population Initialization
p 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Covering procedure
– Covering: Generalize over the input with probability P#
– P# needs to satisfy the covering challenge (Butz et al., 01)
Would I trigger covering on minority class instances?
– Probability that one instance is covered by at least
covered, by, least,
one rule is (Butz et. al, 01): Population Input
specificity length
Initially 1 – P#
y
Population size
Enginyeria i Arquitectura la Salle Slide 12
GRSI
13. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Population Initialization
p 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 13
GRSI
14. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Facetwise Analysis
– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Enginyeria i Arquitectura la Salle Slide 14
GRSI
15. 1. Description of XCS
Creation of Representatives of
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
Starved Niches 5. Analysis of Deviations
6. Results
7. Conclusions
Assumptions
– Covering has not provided any representative of starved niches
– Simplified model: only consider mutation in our model.
How can we generate representative of starved niches?
– Specifying correctly all the bits of the schema that represents the
starved niche
Enginyeria i Arquitectura la Salle Slide 15
GRSI
16. 1. Description of XCS
Creation of Representatives of
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
Starved Niches 5. Analysis of Deviations
6. Results
7. Conclusions
Possible cases:
1
Ps(min) =
– Sample a minority class instance
1 + ir
• Activate a niche of the minority class μ: Mutation probability
Km: Order of the schema
• Activate a niche of another class
ir
Ps(maj) =
– Sample a majority class instance
1 + ir
• Activate a niche of the minority class
• Activate a niche of another class
Enginyeria i Arquitectura la Salle Slide 16
GRSI
17. 1. Description of XCS
Creation of Representatives of
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
Starved Niches 5. Analysis of Deviations
6. Results
7. Conclusions
Summing up, time to get the first representative of a
sta ed c e
starved niche
n: number of classes
μ: Mutation probability
Km: Order of the schema
It increases:
Linearly with the number of classes
Exponentially with the order of the schema
It does not depend on the imbalance ratio
Enginyeria i Arquitectura la Salle Slide 17
GRSI
18. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Facetwise Analysis
– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Enginyeria i Arquitectura la Salle Slide 18
GRSI
19. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Bounding the Population Size
g p 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Time to extinction
– Consider random deletion:
Enginyeria i Arquitectura la Salle Slide 19
GRSI
20. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Facetwise Analysis
– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Enginyeria i Arquitectura la Salle Slide 20
GRSI
21. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Bounding the Population Size
g p 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Population size bound to guarantee that there will be
representatives o sta ed niches
ep ese tat es of starved c es
– Require that:
– Bound:
Enginyeria i Arquitectura la Salle Slide 21
GRSI
22. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Bounding the Population Size
g p 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Population size bound to guarantee that representatives of
starved niches will receive a genetic opportunity:
– Consider θGA = 0
– We require that the best representative of a starved niche receive a
genetic event before being removed
– Time to receive the first genetic event
Enginyeria i Arquitectura la Salle Slide 22
GRSI
23. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Bounding the Population Size
g p 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Population size bound to guarantee that representatives
o sta ed c es
of starved niches will receive a ge et c oppo tu ty
ece e genetic opportunity:
The population size to guarantee that the
best representatives of starve niches will
receive at least one genetic opportunity
g pp y
increases linearly with the imbalance ratio
Enginyeria i Arquitectura la Salle Slide 23
GRSI
24. Outline
1. Description of XCS
2.
2 Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis f D i ti
5 A l i of Deviations
6. Results
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 24
GRSI
25. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Design of Test Problems
g 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
One-bit problem
Condition
length (l)
000110 :0 Value of the left-most bit
– Only two schemas of order one: 0***** and 1*****
– Undersampling instances of the class labeled as 1
1
Ps(min) =
1 + ir
Enginyeria i Arquitectura la Salle Slide 25
GRSI
26. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Design of Test Problems
g 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Parity problem
Condition
length (l)
Number of 1 mod 2
01001010 :1
Relevant
bits ( k)
– The k bits of parity form a single building block
– Undersampling instances of the class labeled as 1
1
Ps(min) =
1 + ir
Enginyeria i Arquitectura la Salle Slide 26
GRSI
27. Outline
1. Description of XCS
2.
2 Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis f D i ti
5 A l i of Deviations
6. Results
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 27
GRSI
28. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
XCS on the one-bit Problem 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
XCS configuration
α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6
selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir
Evaluation of the results:
– Minimum population size to achieve:
TP rate * TN rate > 95%
–R
Results are averages over 25 seeds
lt d
Enginyeria i Arquitectura la Salle Slide 28
GRSI
29. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
XCS on the one-bit Problem 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
N remains constant up to ir = 64
N increases linearly from ir=64
to ir=256
N increases exponentially from
p y
ir=256 to ir=1024
Higher ir could not be solved
Enginyeria i Arquitectura la Salle Slide 29
GRSI
30. Outline
1. Description of XCS
2.
2 Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis f D i ti
5 A l i of Deviations
6. Results
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 30
GRSI
31. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Analysis of the Deviations
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Niched Mutation vs. Free Mutation
– Classifiers can only be created if minority class instances are sampled
Inheritance Error of Classifiers’ Parameters
– New promising representatives of starved niches are created from
classifiers th t b l
l ifi that belong t nourished niches
to ihd ih
– These new promising rules inherit parameters from these classifiers.
This is specially delicate for the action set size (as)
(as).
– Approach: initialize as=1.
Enginyeria i Arquitectura la Salle Slide 31
GRSI
32. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Analysis of the Deviations
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Subsumption
– An overgeneral classifier of the majority class may receive ir positive
reward before receiving the first negative reward
– Approach: set θsub>ir
pp
Stabilizing the population before testing
– Overgeneral classifiers poorly evaluated
– Approach: introduce some extra runs at the end of learning with the GA
switched off.
We gather all these little tweaks in XCS+PMC
Enginyeria i Arquitectura la Salle Slide 32
GRSI
33. Outline
1. Description of XCS
2.
2 Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis f D i ti
5 A l i of Deviations
6. Results
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 33
GRSI
34. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
XCS+PCM in the one-bit Problem 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
N remains constant up to ir = 128
For hi h i
F higher ir, N slightly increases
li htl i
We only have to guarantee that a
representative of the starved niche
will be created
Enginyeria i Arquitectura la Salle Slide 34
GRSI
35. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
XCS+PCM in the Parity Problem
y 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Building blocks of size 3 need to
be processed
Empirical results agree with the
theory
Population size bound to guarantee
P l ti ib dt t
that a representative of the niche
will receive a genetic event
Enginyeria i Arquitectura la Salle Slide 35
GRSI
36. Outline
1. Description of XCS
2.
2 Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis f D i ti
5 A l i of Deviations
6. Results
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 36
GRSI
37. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Conclusions and Further Work 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
We derived models that analyzed the representatives of starved
niches provided by covering and mutation
A population size bound was derived
We saw that the empirical observations met the theory if four
aspects were considered:
– Type of mutation
– as initialization
– Subsumption
– Stabilization of the population
Further analysis of the covering operator
Enginyeria i Arquitectura la Salle Slide 37
GRSI
38. Modeling XCS in Class
Imbalances: Population Size
and Parameter Settings
g
Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1
Bernadó Mansilla
1Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois
Genetic Algorithms Laboratory
Department of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign