2. The Mushroom Dataset
Data Set Number of
Multivariate 8124 Area: Life
Characteristics: Instances:
Attribute Number of Date
Categorical 22 1987
Characteristics: Attributes: Donated:
This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family.
Each species is identified as definitely edible, definitely
poisonous, or of unknown edibility and not recommended.
This latter class was combined with the poisonous one.
3. Mushroom Dataset
22 Independent attributes
1 Class Attribute (Can you eat it?)
Edible(4,208)51.8%
Poisonous(3,916)48.2%
5. Odor attribute, 1R Learner
The Simplest Rule 98.52% Acc.
A = almond N = none
C = creosote P = pungent
F = foul S = spicy
L = anise Y = fishy
M = musty
a c f l m n p s y
6. J48 Tree 100% E = Edible
Classification P = Poisonous
E P P E P P P P
almond creosote foul anise musty none pungent spicy fishy
E E E E P E E E
black brown buff chocolate green orange purple white yellow
E
P E
narrow broad
close crowded distant
E P E E E E
abundant clustered numerous scattered several solitary
7. Simplest rule-set (Benchmark)
These are Poisonous
1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)
2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)
3. Odor=none and stalk-surface-below-ring = scaly
and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)
4. Habitat= leaves and cap-color=white
4. May also be population=clustered and cap-color=white
(100% accuracy)
8. Habitat Insights
Waste is safe but stay away from paths
Woods Grasses Leaves Meadows Paths Urban Waste
9. Population Insights
Mushrooms travel safer in groups
Abundant Clustered Numerous Scattered Several Solitary
10. Information Knowledge
Population Data %Rates vs. Mushrooms
120.00%
100.00%
80.00%
60.00%
40.00%
20.00%
Abundant Clustered Numerous Scattered Several Solitary 0.00%
% Poisonous % Edible
11. Poisonous/Edible Ratio
vs. Mushroom Population Density
300.00%
250.00%
several
Poisonous/Edible Ratio
200.00%
150.00%
100.00%
50.00% solitary
scattered
clustered
0.00% numerous abundant
0 1 2 3 4 5 6 7
-50.00%
Mushroom Density
12. Conclusions
If it stinks don’t eat it, 98.52% accuracy
Ifit doesn’t stink and it’s spore color is not
green then you have a 99.41% chance of
survival
Odor and spore color may be the best
attributes statistically but not in the field
13. Future Work
Use more easily identified attributes to classify
mushrooms to produce a method of easier
visual classification
Eliminate nonvisual attributes
Focus on visual-queue attributes, e.g.
habitat, population, cap and stalk
Compare the two methods