Weitere ähnliche Inhalte Ähnlich wie Experiments with Randomisation and Boosting for Multi-instance Classification (20) Kürzlich hochgeladen (20) Experiments with Randomisation and Boosting for Multi-instance Classification1. Experiments with Randomisation
and Boosting for Multi-instance
Classification
Luke Bjerring, James Foulds, Eibe Frank
University of Waikato
September 13, 2011
2. What's in this talk?
• What is multi-instance learning?
• Basic multi-instance data format in WEKA
• The standard assumption in multi-instance learning
• Learning decision tree and rules
• Ensembles using randomisation
• Diverse density learning
• Boosting diverse density learning
• Experimental comparison
• Conclusions
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 2
3. Multi-instance learning
• Generalized (supervised) learning scenario where each
example for learning is a bag of instances
Single-instance
model
one feature vector classification
Multi-instance
model
classification
multiple feature vectors
Figure based on diagram in Dietterich et al (1997)
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 3
4. Example applications
• Applicable whenever an object can best be represented
as an unordered collection of instances
• Two popular application areas in the literature:
− Image classification (e.g. does an image contain a tiger?)
• Approach: image is split into regions, each region becomes
an instance described by a fixed-length feature vector
• Motivation for MI learning: location of object not important for
classification, some “key” regions determine outcome
− Activity of molecules (e.g. does molecule smell musky?)
• Approach: instances describe possible conformations in 3D
space, based on fixed-length feature vector
• Motivation for MI learning: conformations cannot easily be
ordered, only some responsible for activity
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 4
5. Multi-instance data in WEKA
• Bag of data given as value of relation-valued attribute
bag identifier instances in
bag
class label
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 5
6. What's the big deal?
• Multi-instance learning is challenging because instance-
level classifications are assumed to be unknown
− Algorithm is told that an image contains a tiger, but not which
regions are “tiger-like”
− Similarly, a molecule is known to be active (or inactive), but
algorithm is not told which conformation is responsible for this
• Basic (standard) assumption in MI learning: bag is
positive iff it contains at least one positive instance
− Example: molecule is active if at least one conformation is active,
and inactive otherwise
• Generalizations of this are possible that assume
interactions between instances in a bag
• Alternative: instances contribute collectively to bag label
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 6
7. A synthetic example
• 10 positive/negative bags, 10 instances per bag
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 7
8. A synthetic example
• Bag positive iff at least one instance in (0.4,0.6)x(0.4,0.6)
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 8
9. Assigning bag labels to instances...
• 100 positive/negative bags, 10 instances per bag
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 9
10. Partitioning generated by C4.5
• Many leaf nodes, only one of them matters...
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 10
11. Partitioning generated by C4.5
• Many leaf nodes, only one of them matters...
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 11
12. Blockeel et al.'s MITI tree learner
• Idea: home in on big positive leaf node, remove
instances associated with that leaf node
y <= 0.3942 : 443 [0 / 443] (-)
y > 0.3942 : 1189
| y <= 0.6004 : 418
| | x <= 0.6000 : 262
| | | x <= 0.3676 : 59 [0 / 59] (-)
| | | x > 0.3676 : 128
| | | | x <= 0.3975 : 2 [0 / 2] (-)
| | | | x > 0.3975 : 118
| | | | | y <= 0.3989 : 1 [0 / 1] (-)
| | | | | y > 0.3989 : 116 [116 / 0] (+)
| | x > 0.6000 : 88 [0 / 88] (-)
| y > 0.6004 : 407 [0 / 407] (-)
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 12
13. How MITI works
• Two key modifications compared to standard top-down
decision tree inducers:
Nodes are expanded in best-first manner, based on proportion of
positive instances (→ identify positive leaf nodes early)
Once a positive leaf node has been found, all bags associated
with this leaf node are removed from the training data
(→ all other instances in these bags are irrelevant)
• Blockeel et al. also use special purpose splitting
criterion and biased estimate of proportion of positives
• Our experiments indicate that it is better to use Gini
index and unbiased estimate of proportion
→Trees are generally slight more accurate and
substantially smaller (also affects runtime)
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 13
14. Learning rules: MIRI
• Conceptual drawback of MITI tree learner: deactivated
data may have already been used to grow other branches
• Simple fix based on separate-and-conquer rule learning
using partial trees:
‒ When positive leaf is found, make the path to this leaf into an if-then
rule, discard the rest of the tree
‒ Start (partial) tree generation from scratch on the remaining data to
generate the next rule
‒ Stop when no positive leaf can be made; add default rule
• Experiments show: resulting rule learner (MIRI) has
similar classification accuracy to MITI
• However: rule sets are much more compact than
corresponding decision trees
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 14
15. Random forests for MI learning
• Random forests are well-known to be high-performance
ensemble classifiers in single-instance learning
• Straightforward to adapt MITI to learn semi-random
decision trees from multi-instance data
– At each node, choose random fixed-size subset of
attributes, then choose best split amongst those
– Also possible to apply semi-random node expansion (not
best-first), but this yields little benefit
• Can trivially apply this to MIRI rule learning as well: it's
based on partially grown MITI trees
• Ensemble can be generated in WEKA using
RandomCommittee meta classifier
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 15
18. Maron's diverse density learning
• Idea: identify point x in instance space where positive
bags overlap, centre bell-shaped function at this point
• Using this function, probability that instance Bij is positive,
based on current hypothesis h, is assumed to be:
where hypothesis h includes location x, but also a feature
scaling vector s:
• Instance-level probabilities are turned into bag-level
probabilities using noisy-or function:
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 18
19. Boosting diverse density learning
• Point x and scaling vector s are found using gradient
descent by maximising bag-level likelihood
• Problem: very slow; takes very long to converge
• QuickDD heuristic: find best point x first, using fixed
scaling vector s, then optimise s; if necessary, iterate
• Much faster, similar accuracy on benchmark data (also,
compares favourably to subsampling-based EMDD)
• Makes it computationally practical to apply boosting
(RealAdaboost) to improve accuracy:
– In this case, QuickDD is applied with weighted likelihood,
symmetric learning, and localised model
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 19
21. So how do the ensembles compare?
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 21
22. But: improvement on “naive” methods?
• Can apply standard single-instance random forests to
multi-instance data using data transformations...
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 22
23. Summary
• MITI and MIRI are fast methods for learning compact
decision trees and rule sets for MI data
• Randomisation for ensemble learning yields significantly
improved accuracy in both cases
• Heuristic QuickDD variant of diverse density learning
makes it computationally practical to boost DD learning
• Boosting yields substantially improved accuracy
• Neither boosting nor randomisation has clear advantage
in accuracy, but randomisation is much faster
• However: marginal improvement in accuracy compared
to “naive” methods
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 23
24. Where in WEKA?
• Location of multi-instance learners in Explorer GUI:
• Available via package manager in WEKA 3.7, which
also provides MITI, MIRI, and QuickDD
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 24
25. Details on QuickDD for RealAdaboost
• Weights in RealAdaboost are updated using odds ratio:
• Weighted conditional likelihood is used in QuickDD:
• QuickDD model is thresholded at 0.5 probability to achieve
local effect on weight updates:
• Symmetric learning is applied (i.e. both classes are tried as the
positive class in turn)
– Of the two models, the one that maximises weighted
conditional likelihood is added into the ensemble
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 25
26. Random forest vs. bagging and boosting
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 26