Leveraging Bagging for Evolving Data Streams

Leveraging Bagging for Evolving Data Streams
Albert Bifet, Geoff Holmes, and Bernhard Pfahringer
University of Waikato
Hamilton, New Zealand
Barcelona, 21 September 2010
ECML PKDD 2010

Mining Data Streams with Concept Drift
Extract information from
potentially inﬁnite sequence of data
possibly varying over time
using few resources
Adaptively:
no prior knowledge of type or rate of change
2 / 32

Mining Data Streams with Concept Drift
Extract information from
potentially inﬁnite sequence of data
possibly varying over time
using few resources
Leveraging Bagging
New improvements for adaptive bagging methods using
input randomization
output randomization
2 / 32

Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
3 / 32

Outline
4 / 32

Mining Massive Data
Eric Schmidt, August 2010
Every two days now we create as much information as we did
from the dawn of civilization up until 2003.
5 exabytes of data
5 / 32

Data stream classiﬁcation cycle
1 Process an example at a
time, and inspect it only
once (at most)
2 Use a limited amount of
memory
3 Work in a limited amount
of time
4 Be ready to predict at
any point
6 / 32

Mining Massive Data
Koichi Kawana
Simplicity means the achievement of maximum effect with
minimum means.
time
accuracy
memory
Data Streams
7 / 32

Evaluation Example
Accuracy Time Memory
Classifier A 70% 100 20
Classifier B 80% 20 40
Which classifier is performing better?
8 / 32

RAM-Hours
RAM-Hour
Every GB of RAM deployed for 1 hour
Cloud Computing Rental Cost Options
9 / 32

Evaluation Example
Accuracy Time Memory RAM-Hours
Classifier A 70% 100 20 2,000
Classifier B 80% 20 40 800
Which classifier is performing better?
10 / 32

Outline
11 / 32

Hoeffding Trees
Hoeffding Tree : VFDT
Pedro Domingos and Geoff Hulten.
Mining high-speed data streams. 2000
With high probability, constructs an identical model that a
traditional (greedy) method would learn
With theoretical guarantees on the error rate
Time
Contains “Money”
YES
Yes
NO
No
Day
YES
Night
12 / 32

Hoeffding Naive Bayes Tree
Hoeffding Tree
Majority Class learner at leaves
Hoeffding Naive Bayes Tree
G. Holmes, R. Kirkby, and B. Pfahringer.
Stress-testing Hoeffding trees, 2005.
monitors accuracy of a Majority Class learner
monitors accuracy of a Naive Bayes learner
predicts using the most accurate method
13 / 32

Bagging
Figure: Poisson(1) Distribution.
Bagging builds a set of M base models, with a bootstrap
sample created by drawing random samples with
replacement.
14 / 32

Bagging
Figure: Poisson(1) Distribution.
Each base model’s training set contains each of the original
training example K times where P(K = k) follows a binomial
distribution.
14 / 32

Oza and Russell’s Online Bagging for M models
1: Initialize base models hm for all m ∈ {1,2,...,M}
2: for all training examples do
3: for m = 1,2,...,M do
4: Set w = Poisson(1)
5: Update hm with the current example with weight w
6: anytime output:
7: return hypothesis: hﬁn(x) = argmaxy∈Y ∑T
t=1 I(ht (x) = y)
15 / 32

ADWIN Bagging (KDD’09)
ADWIN
An adaptive sliding window whose size is recomputed online
according to the rate of change observed.
ADWIN has rigorous guarantees (theorems)
On ratio of false positives and negatives
On the relation of the size of the current window and
change rates
ADWIN Bagging
When a change is detected, the worst classiﬁer is removed and
a new classiﬁer is added.
16 / 32

ADWIN Bagging for M models
2: for all training examples do
3: for m = 1,2,...,M do
4: Set w = Poisson(1)
6: if ADWIN detects change in error of one of the
classifiers then
7: Replace classifier with higher error with a new one
8: anytime output:
9: return hypothesis: hfin(x) = argmaxy∈Y ∑T
t=1 I(ht (x) = y)
17 / 32

Leveraging Bagging for Evolving
Data Streams
Randomization as a powerful tool to increase accuracy and
diversity
There are three ways of using randomization:
Manipulating the input data
Manipulating the classiﬁer algorithms
Manipulating the output targets
18 / 32

Input Randomization
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
0,40
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
k
P(X=k)
λ=1
λ=6
λ=10
Figure: Poisson Distribution.
19 / 32

ECOC Output Randomization
Table: Example matrix of random output codes for 3 classes and 6
classifiers
Class 1 Class 2 Class 3
Classifier 1 0 0 1
Classifier 2 0 1 1
Classifier 3 1 0 0
Classifier 4 1 1 0
Classifier 5 1 0 1
Classifier 6 0 1 0
20 / 32

Leveraging Bagging
Using Poisson(λ)
Leveraging Bagging MC
Using Poisson(λ) and Random Output Codes
Fast Leveraging Bagging ME
if an instance is misclassiﬁed: weight = 1
if not: weight = eT /(1−eT ),
21 / 32

Input Randomization
Bagging
resampling with replacement using Poisson(1)
Other Strategies
subagging
resampling without replacement
half subagging
resampling without replacement half of the instances
bagging without taking out any instance
using 1+Poisson(1)
22 / 32

2:
3: for all training examples (x,y) do
4: for m = 1,2,...,M do
5: Set w = Poisson(λ)
7: if ADWIN detects change in error of one of the classiﬁers
then
9: anytime output:
10: return hﬁn(x) = argmaxy∈Y ∑T
t=1 I(ht (x) = y)
23 / 32

Leveraging Bagging for Evolving Data Streams MC
2: Compute coloring µm(y)
4: for m = 1,2,...,M do
5: Set w = Poisson(λ)
6: Update hm with the current example with weight w and
class µm(y)
then
9: anytime output:
t=1 I(ht (x) = µt (y))
23 / 32

Leveraging Bagging for Evolving Data Streams ME
2:
4: for m = 1,2,...,M do
5: Set w = 1 if misclassiﬁed, otherwise eT /(1−eT )
then
9: anytime output:
t=1 I(ht (x) = y)
23 / 32

Outline
24 / 32

What is MOA?
{M}assive {O}nline {A}nalysis is a framework for mining data
streams.
Based on experience with Weka and VFML
Focussed on classiﬁcation trees, but lots of active
development: clustering, item set and sequence mining,
regression
Easy to extend
Easy to design and run experiments
25 / 32

MOA: the bird
The Moa (another native NZ bird) is not only ﬂightless, like the
Weka, but also extinct.
26 / 32

Leveraging Bagging Empirical evaluation
Accuracy
75
77
79
81
83
85
87
89
91
93
951000080000150000220000290000360000430000500000570000640000710000780000850000920000990000
Instances
Accuracy(%)
Leveraging Bagging
ADWIN Bagging
Online Bagging
Figure: Accuracy on dataset SEA with three concept drifts.
27 / 32

Empirical evaluation
Accuracy RAM-Hours
Hoeffding Tree 74.03% 0.01
Online Bagging 77.15% 2.98
ADWIN Bagging 79.24% 1.48
ADWIN Half Subagging 78.36% 1.04
ADWIN Subagging 78.68% 1.13
ADWIN Bagging WT 81.49% 2.74
ADWIN Bagging Strategies
half subagging
resampling without replacement half of the instances
subagging
resampling without replacement
WT: bagging without taking out any instance
using 1+Poisson(1)
28 / 32

Accuracy RAM-Hours
Leveraging Bagging 85.54% 20.17
Leveraging Bagging MC 85.37% 22.04
Leveraging Bagging ME 80.77% 0.87
Leveraging Bagging
Leveraging Bagging
Using Poisson(λ)
Using Poisson(λ) and Random Output Codes
Leveraging Bagging ME
Using weight 1 if misclassiﬁed, otherwise eT /(1−eT )
29 / 32

Accuracy RAM-Hours
Random Forest Leveraging Bagging 80.69% 5.51
Random Forest Online Bagging 72.91% 1.30
Random Forest ADWIN Bagging 74.24% 0.89
Random Forests
the input training set is obtained by sampling with
replacement
the nodes of the tree use only (n) random attributes to
split
we only keep statistics of these attributes.
30 / 32

Leveraging Bagging Diversity
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0,16
0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96
Kappa Statistic
Error
Leveraging Bagging
Online Bagging
Figure: Kappa-Error diagrams for Leveraging Bagging and Online
bagging (bottom) on on the SEA data with three concept drifts,
plotting 576 pairs of classiﬁers.
31 / 32

Summary
http://moa.cs.waikato.ac.nz/
Conclusions
New improvements for bagging methods using input
randomization
Improving Accuracy: Using Poisson(λ)
Improving RAM-Hours: Using weight 1 if misclassiﬁed,
otherwise eT /(1−eT )
New improvements for bagging methods using output
randomization
No need for multi-class classiﬁers
32 / 32

Leveraging Bagging for Evolving Data Streams

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Leveraging Bagging for Evolving Data Streams

Similar to Leveraging Bagging for Evolving Data Streams (20)

More from Albert Bifet

More from Albert Bifet (14)

Recently uploaded

Recently uploaded (20)

Leveraging Bagging for Evolving Data Streams