SlideShare a Scribd company logo
1 of 42
Machine Learning
       &
 Decision Trees


    Nithum Thain
  January 12th, 2013
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
The Data Science Value Chain



                                   Visualization    Strategy,
               Storage &                           Marketing,
 Collection                             &
              Maintenance                           Product,
                                     Analysis      Operations




                            Machine Learning
                               lives here
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
Machine Learning vs. Artificial Intelligence

• Artificial Intelligence is a set of tools that allow machines to
perform higher order functions. These include natural language
processing, robotics, knowledge representation, etc.


• Machine Learning is a subset of artificial intelligence. It is a set of
(usually statistical) tools that allow machines to detect and extract
patterns from data.
Subdomains of Machine Learning
Unsupervised Learning
• Clustering
• Optimization
• Recommendation Systems
Supervised Learning
• Prediction & Classification
Reinforcement Learning
Clustering
Optimization
Recommendation Systems
Recommendation Systems
Recommendation Systems
Reinforcement Learning
Reinforcement Learning
Prediction & Classification
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
What is a Prediction Problem?

• A set of known input variables.
• An unknown output variable.
• A training set of data for which both the inputs and
outputs are known.
A Useful Formulation
                       Output Variable

Training Set




Test Set
The Algorithms Are Many

• Regression
• Decision Trees
• Neural Networks           Each has it’s own
                            strengths and
• Support Vector Machines   weaknesses.

• Random Forests
• Naive Bayes Classifier
Prediction vs. Classification


           1.618033988
           7498948482
           0458683436
           .....
Break Time!
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
What is a Decision Tree?
Why Not Automate It?
I Did!

                                Internet Friends?



                   Video Games?               XBOX 360




               Friends?            Friends?




         PS3              Wii     PC             PS3
How Our Algorithm Works

1. Start with the “root” node.
2. Check if the data all has the same output variable. If so, then
   you are done.
3. Check how every possible output variable splits the data.
4. Choose the one that splits the data MOST
    - The one which reduces the variance in the output variable
       in the resulting sets.
5. Repeat the process for the resulting “true” node and “false”
   node.
A Picture

            Friends?
A Picture

            Internet Friends?
                Friends?
A Picture

            Internet Friends?
                Friends?



                        XBOX 360
A Picture

                   Internet Friends?
                       Friends?



            Video Games?       XBOX 360
A Picture

                   Internet Friends?
                       Friends?



            Video Games?         XBOX 360




                      Friends?
A Picture

                                   Internet Friends?
                                       Friends?



                      Video Games?               XBOX 360




                  Friends?            Friends?




            PS3              Wii     PC             PS3
Coding Time
The Classes and Functions We Will Build:

Classes
• decisionnode: The basic building block of our tree
Functions
• divideset: Splits the tree into two sets based on a variable
• variance: Calculates the variance of the output variable in a set
• buildtree: Builds the tree according to the algorithm described

• classify: For any new data points, uses the tree to predict their value
• printree: Prints a text-based version of the full decision tree
The decisionnode Class


      if variable >= value
The decisionnode Class


      if variable >= value



                             or   result
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
The Data Ninja Methodology

1. Find the appropriate data
2. Play with the data (plot, sort, examine)
3. Clean the data
4. Choose the appropriate tool for analysis
5. Apply the tool
6. Repeat steps 2-6 until something works
7. ...
8. Profit!
Let’s Try Predicting Housing Prices!
Beware Overfitting!
How Can We Improve Our Results?
Appendix
Neural Network

More Related Content

Similar to Decision tree upload

rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
Jeff Heaton
 
00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf
eanyang7
 
Что такое Data Science
Что такое Data ScienceЧто такое Data Science
Что такое Data Science
Olga Lavrentieva
 
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdfIntroduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
bisan3
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
CSIRO
 

Similar to Decision tree upload (20)

Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Predicting the NBA MVP
Predicting the NBA MVPPredicting the NBA MVP
Predicting the NBA MVP
 
The (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology residentThe (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology resident
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummies
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
Что такое Data Science
Что такое Data ScienceЧто такое Data Science
Что такое Data Science
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Qiagram
QiagramQiagram
Qiagram
 
Qiagram Slides 2011 05
Qiagram Slides 2011 05Qiagram Slides 2011 05
Qiagram Slides 2011 05
 
Qiagram
QiagramQiagram
Qiagram
 
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdfIntroduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
 
From c# Into Machine Learning
From c# Into Machine LearningFrom c# Into Machine Learning
From c# Into Machine Learning
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
Demystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial IntelligenceDemystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial Intelligence
 
2. Data Preprocessing.pdf
2. Data Preprocessing.pdf2. Data Preprocessing.pdf
2. Data Preprocessing.pdf
 
Connected Components Labeling
Connected Components LabelingConnected Components Labeling
Connected Components Labeling
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Decision tree upload

  • 1. Machine Learning & Decision Trees Nithum Thain January 12th, 2013
  • 2. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 3. The Data Science Value Chain Visualization Strategy, Storage & Marketing, Collection & Maintenance Product, Analysis Operations Machine Learning lives here
  • 4. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 5. Machine Learning vs. Artificial Intelligence • Artificial Intelligence is a set of tools that allow machines to perform higher order functions. These include natural language processing, robotics, knowledge representation, etc. • Machine Learning is a subset of artificial intelligence. It is a set of (usually statistical) tools that allow machines to detect and extract patterns from data.
  • 6. Subdomains of Machine Learning Unsupervised Learning • Clustering • Optimization • Recommendation Systems Supervised Learning • Prediction & Classification Reinforcement Learning
  • 15. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 16. What is a Prediction Problem? • A set of known input variables. • An unknown output variable. • A training set of data for which both the inputs and outputs are known.
  • 17. A Useful Formulation Output Variable Training Set Test Set
  • 18. The Algorithms Are Many • Regression • Decision Trees • Neural Networks Each has it’s own strengths and • Support Vector Machines weaknesses. • Random Forests • Naive Bayes Classifier
  • 19. Prediction vs. Classification 1.618033988 7498948482 0458683436 .....
  • 21. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 22. What is a Decision Tree?
  • 24. I Did! Internet Friends? Video Games? XBOX 360 Friends? Friends? PS3 Wii PC PS3
  • 25. How Our Algorithm Works 1. Start with the “root” node. 2. Check if the data all has the same output variable. If so, then you are done. 3. Check how every possible output variable splits the data. 4. Choose the one that splits the data MOST - The one which reduces the variance in the output variable in the resulting sets. 5. Repeat the process for the resulting “true” node and “false” node.
  • 26. A Picture Friends?
  • 27. A Picture Internet Friends? Friends?
  • 28. A Picture Internet Friends? Friends? XBOX 360
  • 29. A Picture Internet Friends? Friends? Video Games? XBOX 360
  • 30. A Picture Internet Friends? Friends? Video Games? XBOX 360 Friends?
  • 31. A Picture Internet Friends? Friends? Video Games? XBOX 360 Friends? Friends? PS3 Wii PC PS3
  • 33. The Classes and Functions We Will Build: Classes • decisionnode: The basic building block of our tree Functions • divideset: Splits the tree into two sets based on a variable • variance: Calculates the variance of the output variable in a set • buildtree: Builds the tree according to the algorithm described • classify: For any new data points, uses the tree to predict their value • printree: Prints a text-based version of the full decision tree
  • 34. The decisionnode Class if variable >= value
  • 35. The decisionnode Class if variable >= value or result
  • 36. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 37. The Data Ninja Methodology 1. Find the appropriate data 2. Play with the data (plot, sort, examine) 3. Clean the data 4. Choose the appropriate tool for analysis 5. Apply the tool 6. Repeat steps 2-6 until something works 7. ... 8. Profit!
  • 38. Let’s Try Predicting Housing Prices!
  • 40. How Can We Improve Our Results?

Editor's Notes

  1. Google Analytics
  2. 2:Internet? T-> Xbox360F-> 0:Yes? T-> 1:No? T-> PS3 F-> PC F-> 1:No? T-> PS3 F-> Wii