SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Project II
Data Mining a
Mushroom Dataset
Group 1
Raymond Borges
Jarilyn Hernandez
The Mushroom Dataset
Data Set                      Number of
                 Multivariate            8124 Area:           Life
Characteristics:              Instances:
Attribute                    Number of           Date
                 Categorical             22               1987
Characteristics:             Attributes:         Donated:

This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family.

Each species is identified as definitely edible, definitely
poisonous, or of unknown edibility and not recommended.
This latter class was combined with the poisonous one.
Mushroom Dataset
 22 Independent attributes
 1 Class Attribute (Can you eat it?)
Edible(4,208)51.8%
Poisonous(3,916)48.2%
Mushroom Dataset
22 Attributes Total
18 Intrinsically
on Mushroom

4 Others
1 Habitat
1 Population
1 Bruises
1 Odor
Odor attribute, 1R Learner
The Simplest Rule 98.52% Acc.
A = almond             N = none
C = creosote           P = pungent
F = foul               S = spicy
L = anise              Y = fishy
M = musty




           a   c   f   l    m n      p   s   y
J48 Tree 100%                                                     E = Edible
Classification                                                    P = Poisonous



   E       P           P         E          P                 P        P           P
almond creosote    foul      anise        musty   none pungent spicy              fishy


   E      E        E         E             P          E       E                   E

 black   brown    buff chocolate green orange purple white                    yellow


                                                                              E
                            P                             E
                                                              narrow       broad
                           close         crowded distant

          E            P             E            E           E        E
       abundant clustered numerous scattered several               solitary
Simplest rule-set (Benchmark)
These are Poisonous
1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)

2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)

3. Odor=none and stalk-surface-below-ring = scaly
 and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)

4. Habitat= leaves and cap-color=white
4. May also be population=clustered and cap-color=white
(100% accuracy)
Habitat Insights
Waste is safe but stay away from paths




Woods   Grasses   Leaves Meadows Paths   Urban   Waste
Population Insights
  Mushrooms travel safer in groups




Abundant Clustered Numerous Scattered   Several   Solitary
Information  Knowledge

         Population Data                                        %Rates vs. Mushrooms
                                                           120.00%

                                                           100.00%

                                                            80.00%

                                                            60.00%

                                                            40.00%

                                                            20.00%

Abundant Clustered Numerous Scattered Several   Solitary     0.00%




                                                                     % Poisonous   % Edible
Poisonous/Edible Ratio
vs. Mushroom Population Density
                         300.00%


                         250.00%
                                                          several
Poisonous/Edible Ratio




                         200.00%


                         150.00%


                         100.00%


                          50.00%           solitary
                                                                        scattered
                                                                                           clustered
                           0.00%                                                    numerous         abundant
                                   0   1              2             3          4          5        6       7

                         -50.00%
                                                             Mushroom Density
Conclusions
 If   it stinks don’t eat it, 98.52% accuracy

 Ifit doesn’t stink and it’s spore color is not
  green then you have a 99.41% chance of
  survival

 Odor  and spore color may be the best
  attributes statistically but not in the field
Future Work
   Use more easily identified attributes to classify
    mushrooms to produce a method of easier
    visual classification

   Eliminate nonvisual attributes

Focus on visual-queue attributes, e.g.
habitat, population, cap and stalk

   Compare the two methods

Weitere ähnliche Inhalte

Andere mochten auch

Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetPawandeep Kaur
 
Scopus Overview
Scopus OverviewScopus Overview
Scopus OverviewFSC632
 
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014SHPINE TECHNOLOGIES
 
Plagiarism for Faculty Workshop
Plagiarism for Faculty WorkshopPlagiarism for Faculty Workshop
Plagiarism for Faculty WorkshopCathy Burwell
 
ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014SHPINE TECHNOLOGIES
 
Why publish in an international journal?
Why publish in an international journal?Why publish in an international journal?
Why publish in an international journal?Anindito Subagyo
 
Embedded project titles1:2015-2016
Embedded project titles1:2015-2016Embedded project titles1:2015-2016
Embedded project titles1:2015-2016SHPINE TECHNOLOGIES
 
PROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIESPROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIESSHPINE TECHNOLOGIES
 
Android ieee project titles 2015 2016
Android ieee project titles 2015 2016Android ieee project titles 2015 2016
Android ieee project titles 2015 2016SHPINE TECHNOLOGIES
 
Introduction to iOS and Objective-C
Introduction to iOS and Objective-CIntroduction to iOS and Objective-C
Introduction to iOS and Objective-CDaniela Da Cruz
 

Andere mochten auch (17)

Group7_Datamining_Project_Report_Final
Group7_Datamining_Project_Report_FinalGroup7_Datamining_Project_Report_Final
Group7_Datamining_Project_Report_Final
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
 
Scopus Overview
Scopus OverviewScopus Overview
Scopus Overview
 
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
 
Plagiarism for Faculty Workshop
Plagiarism for Faculty WorkshopPlagiarism for Faculty Workshop
Plagiarism for Faculty Workshop
 
ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014
 
Why publish in an international journal?
Why publish in an international journal?Why publish in an international journal?
Why publish in an international journal?
 
Embedded project titles1:2015-2016
Embedded project titles1:2015-2016Embedded project titles1:2015-2016
Embedded project titles1:2015-2016
 
PROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIESPROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIES
 
Java course
Java course Java course
Java course
 
Matlab titles 2015 2016
Matlab titles 2015 2016Matlab titles 2015 2016
Matlab titles 2015 2016
 
Marshmallow
MarshmallowMarshmallow
Marshmallow
 
Android os by jje
Android os by jjeAndroid os by jje
Android os by jje
 
Android ieee project titles 2015 2016
Android ieee project titles 2015 2016Android ieee project titles 2015 2016
Android ieee project titles 2015 2016
 
Java titles 2015 2016
Java titles 2015 2016Java titles 2015 2016
Java titles 2015 2016
 
Dot Net Course Syllabus
Dot Net Course SyllabusDot Net Course Syllabus
Dot Net Course Syllabus
 
Introduction to iOS and Objective-C
Introduction to iOS and Objective-CIntroduction to iOS and Objective-C
Introduction to iOS and Objective-C
 

Kürzlich hochgeladen

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Project 2 Data Mining Part 1

  • 1. Project II Data Mining a Mushroom Dataset Group 1 Raymond Borges Jarilyn Hernandez
  • 2. The Mushroom Dataset Data Set Number of Multivariate 8124 Area: Life Characteristics: Instances: Attribute Number of Date Categorical 22 1987 Characteristics: Attributes: Donated: This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.
  • 3. Mushroom Dataset  22 Independent attributes  1 Class Attribute (Can you eat it?) Edible(4,208)51.8% Poisonous(3,916)48.2%
  • 4. Mushroom Dataset 22 Attributes Total 18 Intrinsically on Mushroom 4 Others 1 Habitat 1 Population 1 Bruises 1 Odor
  • 5. Odor attribute, 1R Learner The Simplest Rule 98.52% Acc. A = almond N = none C = creosote P = pungent F = foul S = spicy L = anise Y = fishy M = musty a c f l m n p s y
  • 6. J48 Tree 100% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E narrow broad close crowded distant E P E E E E abundant clustered numerous scattered several solitary
  • 7. Simplest rule-set (Benchmark) These are Poisonous 1. Odor = not almond or anise or none (120 poisonous cases missed, 98.52% accuracy) 2. Spore-print-color =green (48 cases missed, 99.41% accuracy) 3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown (8 cases missed, 99.90% accuracy) 4. Habitat= leaves and cap-color=white 4. May also be population=clustered and cap-color=white (100% accuracy)
  • 8. Habitat Insights Waste is safe but stay away from paths Woods Grasses Leaves Meadows Paths Urban Waste
  • 9. Population Insights Mushrooms travel safer in groups Abundant Clustered Numerous Scattered Several Solitary
  • 10. Information  Knowledge Population Data %Rates vs. Mushrooms 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% Abundant Clustered Numerous Scattered Several Solitary 0.00% % Poisonous % Edible
  • 11. Poisonous/Edible Ratio vs. Mushroom Population Density 300.00% 250.00% several Poisonous/Edible Ratio 200.00% 150.00% 100.00% 50.00% solitary scattered clustered 0.00% numerous abundant 0 1 2 3 4 5 6 7 -50.00% Mushroom Density
  • 12. Conclusions  If it stinks don’t eat it, 98.52% accuracy  Ifit doesn’t stink and it’s spore color is not green then you have a 99.41% chance of survival  Odor and spore color may be the best attributes statistically but not in the field
  • 13. Future Work  Use more easily identified attributes to classify mushrooms to produce a method of easier visual classification  Eliminate nonvisual attributes Focus on visual-queue attributes, e.g. habitat, population, cap and stalk  Compare the two methods

Hinweis der Redaktion

  1. Pistasvisuales