SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Introduction to Machine
       Learning
                  Lecture 6


               Albert Orriols i Puig
              aorriols@salle.url.edu
                  i l @ ll       ld

     Artificial Intelligence – Machine Learning
         Enginyeria i Arquitectura La Salle
             gy           q
                Universitat Ramon Llull
Recap of Lecture 4
        ID3 is a strong system that
                      gy
                Uses hill-climbing search based on the information gain
                measure to sea c through the space o dec s o trees
                 easu e o search oug         e       of decision ees
                Outputs a single hypothesis.
                Never b kt k It converges to locally optimal solutions.
                N     backtracks.         tl     ll    ti l l ti
                Uses all training examples at each step, contrary to methods
                that
                th t make decisions i
                       k d ii        incrementally.
                                              t ll
                Uses statistical properties of all examples: the search is less
                sensitive t errors i i di id l t i i examples.
                    iti to          in individual training      l
                Can handle noisy data by modifying its termination criterion to
                accept hypotheses that imperfectly fit the data.
                     th    th     th t i    f tl       th d t




                                                                              Slide 2
Artificial Intelligence                 Machine Learning
Recap of Lecture 4

        However, ID3 has some drawbacks
                It
                I can only deal with nominal d
                        l d l ih        i l data
                It is not able to deal with noisy data sets
                It may be not robust in presence of noise




                                                              Slide 3
Artificial Intelligence                 Machine Learning
Today’s Agenda


        Going from ID3 to C4.5
        How C4.5 enhances C4.5 to
                Be robust in the presence of noise. Avoid overfitting
                Deal with continuous attributes
                Deal with missing data
                Convert trees to rules




                                                                  Slide 4
Artificial Intelligence            Machine Learning
What’s Overfitting?
      Overfitting = Given a hypothesis space H, a hypothesis hєH is said to
      overfit the training data if there exists some alternative hypothesis h’єH,
      such that
                h has smaller error than h’ over the training examples, but
                                         h                    examples
          1.
          1

                h’ has a smaller error than h over the entire distribution of instances.
          2.




                                                                                           Slide 5
Artificial Intelligence                    Machine Learning
Why May my System Overfit?
           In domains with noise or uncertainty
                                              y
                   the system may try to decrease the training error by completely
                   fitting a the training e a p es
                         g all e a      g examples

       The learner overfits
       to correctly classify
       the noisy instances                                       Noisy instances




Occam’s razor: Prefer the
simplest hypothesis that fits
the data with high accuracy




                                                                                   Slide 6
   Artificial Intelligence               Machine Learning
How to Avoid Overfitting?
        Ok, my system may overfit… Can I avoid it?
          , yy          y
                 Sure! Do not include branches that fit data too specifically
         How?
         H?
                 Pre-prune: Stop growing a branch when information becomes
        1.
                 unreliable
                     li bl
                 Post-prune: Take a fully-grown decision tree and discard
        2.
                 unreliable parts
                     li bl




                                                                                Slide 7
Artificial Intelligence                 Machine Learning
Pre-pruning
        Based on statistical significance test
                               g
                Stop growing the tree when there is no statistically significant
                assoc a o between any attribute and e class at particular
                association be ee a y a bu e a d the c ass a a pa cu a
                node
                Use all available da a for training a d app y the s a s ca test
                     a a a ab e data o a          g and apply e statistical es
                to estimate whether expanding/pruning a node is to produce an
                improvement beyond the training set
        Most popular test: chi-squared test
        ID3 used chi-squared test in addition to information gain
                Only statistically significant attributes were allowed to be
                selected by information gain procedure




                                                                               Slide 8
Artificial Intelligence                 Machine Learning
Pre-pruning
 Early stopping: Pre-pruning may stop the growth process prematurely
 Classic example: XOR/Parity-problem
             No individual attribute exhibits any significant association to the class
             Structure is only visible in fully expanded tree
             Pre-pruning won t
             Pre pruning won’t expand the root node
 But: XOR-type problems rare in practice
 And: pre-pruning faster than post-pruning

                          x1   x2    Class
                1         0    0      0
                                                                01     10
                2         0    1      1
                3         1    0      1
                4         1    1      0                         00    10




                                                                                    Slide 9
Artificial Intelligence                      Machine Learning
Post-pruning
         First, build the full tree
              ,
         Then, prune it
                 Fully-grown
                 Fully grown tree shows all attribute interactions
         Problem: some subtrees might be due to chance effects
         Two pruning operations:
                 Subtree replacement
        1.

                 Subtree raising
        2.

         Possible strategies:
                 error estimation
                 significance t ti
                  i ifi       testing
                 MDL principle


                                                                     Slide 10
Artificial Intelligence                    Machine Learning
Subtree Replacement
        Bottom up approach
                p pp
        Consider replacing a tree after considering all its subtrees
        Ex: labor negotiations




                                                                       Slide 11
Artificial Intelligence          Machine Learning
Subtree Replacement
Algorithm:
1. Split the data into training and validation set
2. Do until further pruning is harmful:
   a. Evaluate impact on the validation set of pruning
      each possible node
   b. Select th
   b S l t the node whose removal most i
                    d     h            l     t increases
      the validation set accuracy




                                                                  Slide 12
Artificial Intelligence                        Machine Learning
Subtree Raising
                                                  Delete node
                                                  Redistribute instances
                                                  Slower than subtree
                                                  replacement
                                                  (Worthwhile?)




                                                        X

                                                                           Slide 13
Artificial Intelligence        Machine Learning
Estimating Error Rates
        Ok we can prune. But when?
                  p
                Prune only if it reduces the estimated error
                Error on the training data is NOT a useful estimator
                Q: Why it would result in very little pruning?
                Use hold-out set for pruning
                    hold out
                                                                                 Training
                                                                                 T ii
                          Separate a validation set                              Data set’
                                                                 Training
                                                                        g
                          Use this validation set to
                          test the improvement                   Data set
                                                                                 Validation
                C4.5 s
                C4 5’s method                                                        set
                           Derive confidence interval from training data
                           Use a heuristic limit derived from this for pruning
                                           limit,             this,
                           Standard Bernoulli-process-based method
                           Shaky statistical assumptions (based on training data)
                               y                  p      (                g     )


                                                                                     Slide 14
Artificial Intelligence                       Machine Learning
Deal with continuous attributes
        When dealing with nominal data
                   g
                We evaluated the grain for each possible value
        In
        I continuous data, we have infinite values.
             ti      dt       h    i fi it    l
        What should we do?
                Continuous-valued attributes may take infinite values, but we
                have a limited number of values in our instances (at most N if
                we have N instances)
                Therefore, simulate that you have N nominal values
                          Evaluate information gain for every possible split point of the
                          attribute
                          Choose the best split point
                          The information gain of the attribute is the information gain
                          of the best split

                                                                                    Slide 15
Artificial Intelligence                      Machine Learning
Deal with continuous attributes

        Example

                    Outlook     Temperature   Humidity           Windy   Play

                     Sunny
                         y          85           85              False   No

                     Sunny          80           90              True    No

                    Overcast        83           86              False   Yes

                      Rainy         75           80              False   Yes

                          …         …             …               …       …




                               Continuous attributes



                                                                                Slide 16
Artificial Intelligence                       Machine Learning
Deal with continuous attributes
        Split on temperature attribute:
        64       65       68   69   70    71   72       72     75      75    80   81    83   85
       Yes
       Y         N
                 No       Y
                          Yes Y
                              Yes   Yes
                                    Y     No
                                          N    No
                                               N       Yes Y
                                                       Y   Yes         Yes
                                                                       Y     No
                                                                             N    Y
                                                                                  Yes   Yes N
                                                                                        Y   No


                   E.g.: temperature < 71.5: yes/ , no/2
                     g te pe atu e        5 yes/4, o/
                         temperature ≥ 71.5: yes/5, no/3


                   Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits



        Place split points halfway between values
        Can evaluate all split points in one pass!


                                                                                                  Slide 17
Artificial Intelligence                             Machine Learning
Deal with continuous attributes
        To speed up
            p     p
                Entropy only needs to be evaluated between points of different
                c asses
                classes


value 64            65    68   69   70    71   72    72       75   75    80   81    83   85
class Yes                                                 X
                    No    Yes Yes   Yes   No   No   Yes Yes        Yes   No   Yes   Yes No




                   Potential optimal breakpoints

                   Breakpoints between values of the same class cannot
                   be optimal




                                                                                              Slide 18
Artificial Intelligence                        Machine Learning
Deal with Missing Data
        Treat missing values as a separate value
                    g               p
                Missing value denoted “?” in C4.X
                Simple idea: treat missing as a separate value
                Q: When this is not appropriate?
                A: Wh
                A When values are missing d to diff
                         l         i i due different reasons
                          Example 1: gene expression could be missing when it is very
                          high or very low
                          Example 2: field IsPregnant=missing for a male patient should be
                          treated differently (no) than for a female patient of age 25
                          (unknown)




                                                                                     Slide 19
Artificial Intelligence                       Machine Learning
Deal with Missing Data
        Split instances with missing values into pieces
                   A piece going down a branch receives a weight proportional to
                   the popularity of the branch
                   weights sum to 1
        Info gain works with fractional instances
                   Use sums of weights instead of counts
        During classification, split the instance into pieces
        in the same way
                   Merge probability distribution using weights




                                                                           Slide 20
Artificial Intelligence                  Machine Learning
From Trees to Rules
        I finally g a tree from domains with
                y got
                Noisy instances
                Missing l
                Mi i values
                Continuous attributes
        But I prefer rules…
                No context dependent
        Procedure
                Generate a rule for each tree
                Get context-independent rules




                                                           Slide 21
Artificial Intelligence                 Machine Learning
From Trees to Rules
        A procedure a little more sophisticated: C4.5Rules
          p                         p
              C4.5rules: greedily prune conditions from each rule if this
              reduces its es a ed e o
               educes s estimated error
                      Can produce duplicate rules
                      Check for this at the end
              Then
                      look at each class in turn
                      consider the rules for that class
                      find a “good” subset (guided by MDL)
                              good
              Then rank the subsets to avoid conflicts
              Finally, remove rules (greedily) if this decreases error on the
              training data


                                                                            Slide 22
Artificial Intelligence                    Machine Learning
Next Class

        Instance-based Classifiers




                                                 Slide 23
Artificial Intelligence       Machine Learning
Introduction to Machine
       Learning
                  Lecture 6

               Albert Orriols i Puig
              aorriols@salle.url.edu
                  i l @ ll       ld

     Artificial Intelligence – Machine Learning
         Enginyeria i Arquitectura La Salle
             gy           q
                Universitat Ramon Llull

Weitere ähnliche Inhalte

Was ist angesagt?

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning FoundationsAlbert Y. C. Chen
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)cairo university
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Professor Lili Saghafi
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learningmahutte
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 

Was ist angesagt? (20)

Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning Foundations
 
Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learning
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 

Ähnlich wie Lecture6 - C4.5

An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)Julien SIMON
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItHolberton School
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do itGregory Renard
 
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...John Mathon
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeSiby Jose Plathottam
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseFormulatedby
 
Data Science Salon Miami Presentation
Data Science Salon Miami PresentationData Science Salon Miami Presentation
Data Science Salon Miami PresentationGreg Werner
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductiongiangbui0816
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 

Ähnlich wie Lecture6 - C4.5 (20)

Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
Lecture19
Lecture19Lecture19
Lecture19
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
 
Data Science Salon Miami Presentation
Data Science Salon Miami PresentationData Science Salon Miami Presentation
Data Science Salon Miami Presentation
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Deep learning
Deep learningDeep learning
Deep learning
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 

Mehr von Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIIAlbert Orriols-Puig
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IIAlbert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...Albert Orriols-Puig
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...Albert Orriols-Puig
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...Albert Orriols-Puig
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig
 

Mehr von Albert Orriols-Puig (20)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture23
Lecture23Lecture23
Lecture23
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture21
Lecture21Lecture21
Lecture21
 
Lecture20
Lecture20Lecture20
Lecture20
 
Lecture18
Lecture18Lecture18
Lecture18
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 

Kürzlich hochgeladen

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Kürzlich hochgeladen (20)

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Lecture6 - C4.5

  • 1. Introduction to Machine Learning Lecture 6 Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  • 2. Recap of Lecture 4 ID3 is a strong system that gy Uses hill-climbing search based on the information gain measure to sea c through the space o dec s o trees easu e o search oug e of decision ees Outputs a single hypothesis. Never b kt k It converges to locally optimal solutions. N backtracks. tl ll ti l l ti Uses all training examples at each step, contrary to methods that th t make decisions i k d ii incrementally. t ll Uses statistical properties of all examples: the search is less sensitive t errors i i di id l t i i examples. iti to in individual training l Can handle noisy data by modifying its termination criterion to accept hypotheses that imperfectly fit the data. th th th t i f tl th d t Slide 2 Artificial Intelligence Machine Learning
  • 3. Recap of Lecture 4 However, ID3 has some drawbacks It I can only deal with nominal d l d l ih i l data It is not able to deal with noisy data sets It may be not robust in presence of noise Slide 3 Artificial Intelligence Machine Learning
  • 4. Today’s Agenda Going from ID3 to C4.5 How C4.5 enhances C4.5 to Be robust in the presence of noise. Avoid overfitting Deal with continuous attributes Deal with missing data Convert trees to rules Slide 4 Artificial Intelligence Machine Learning
  • 5. What’s Overfitting? Overfitting = Given a hypothesis space H, a hypothesis hєH is said to overfit the training data if there exists some alternative hypothesis h’єH, such that h has smaller error than h’ over the training examples, but h examples 1. 1 h’ has a smaller error than h over the entire distribution of instances. 2. Slide 5 Artificial Intelligence Machine Learning
  • 6. Why May my System Overfit? In domains with noise or uncertainty y the system may try to decrease the training error by completely fitting a the training e a p es g all e a g examples The learner overfits to correctly classify the noisy instances Noisy instances Occam’s razor: Prefer the simplest hypothesis that fits the data with high accuracy Slide 6 Artificial Intelligence Machine Learning
  • 7. How to Avoid Overfitting? Ok, my system may overfit… Can I avoid it? , yy y Sure! Do not include branches that fit data too specifically How? H? Pre-prune: Stop growing a branch when information becomes 1. unreliable li bl Post-prune: Take a fully-grown decision tree and discard 2. unreliable parts li bl Slide 7 Artificial Intelligence Machine Learning
  • 8. Pre-pruning Based on statistical significance test g Stop growing the tree when there is no statistically significant assoc a o between any attribute and e class at particular association be ee a y a bu e a d the c ass a a pa cu a node Use all available da a for training a d app y the s a s ca test a a a ab e data o a g and apply e statistical es to estimate whether expanding/pruning a node is to produce an improvement beyond the training set Most popular test: chi-squared test ID3 used chi-squared test in addition to information gain Only statistically significant attributes were allowed to be selected by information gain procedure Slide 8 Artificial Intelligence Machine Learning
  • 9. Pre-pruning Early stopping: Pre-pruning may stop the growth process prematurely Classic example: XOR/Parity-problem No individual attribute exhibits any significant association to the class Structure is only visible in fully expanded tree Pre-pruning won t Pre pruning won’t expand the root node But: XOR-type problems rare in practice And: pre-pruning faster than post-pruning x1 x2 Class 1 0 0 0 01 10 2 0 1 1 3 1 0 1 4 1 1 0 00 10 Slide 9 Artificial Intelligence Machine Learning
  • 10. Post-pruning First, build the full tree , Then, prune it Fully-grown Fully grown tree shows all attribute interactions Problem: some subtrees might be due to chance effects Two pruning operations: Subtree replacement 1. Subtree raising 2. Possible strategies: error estimation significance t ti i ifi testing MDL principle Slide 10 Artificial Intelligence Machine Learning
  • 11. Subtree Replacement Bottom up approach p pp Consider replacing a tree after considering all its subtrees Ex: labor negotiations Slide 11 Artificial Intelligence Machine Learning
  • 12. Subtree Replacement Algorithm: 1. Split the data into training and validation set 2. Do until further pruning is harmful: a. Evaluate impact on the validation set of pruning each possible node b. Select th b S l t the node whose removal most i d h l t increases the validation set accuracy Slide 12 Artificial Intelligence Machine Learning
  • 13. Subtree Raising Delete node Redistribute instances Slower than subtree replacement (Worthwhile?) X Slide 13 Artificial Intelligence Machine Learning
  • 14. Estimating Error Rates Ok we can prune. But when? p Prune only if it reduces the estimated error Error on the training data is NOT a useful estimator Q: Why it would result in very little pruning? Use hold-out set for pruning hold out Training T ii Separate a validation set Data set’ Training g Use this validation set to test the improvement Data set Validation C4.5 s C4 5’s method set Derive confidence interval from training data Use a heuristic limit derived from this for pruning limit, this, Standard Bernoulli-process-based method Shaky statistical assumptions (based on training data) y p ( g ) Slide 14 Artificial Intelligence Machine Learning
  • 15. Deal with continuous attributes When dealing with nominal data g We evaluated the grain for each possible value In I continuous data, we have infinite values. ti dt h i fi it l What should we do? Continuous-valued attributes may take infinite values, but we have a limited number of values in our instances (at most N if we have N instances) Therefore, simulate that you have N nominal values Evaluate information gain for every possible split point of the attribute Choose the best split point The information gain of the attribute is the information gain of the best split Slide 15 Artificial Intelligence Machine Learning
  • 16. Deal with continuous attributes Example Outlook Temperature Humidity Windy Play Sunny y 85 85 False No Sunny 80 90 True No Overcast 83 86 False Yes Rainy 75 80 False Yes … … … … … Continuous attributes Slide 16 Artificial Intelligence Machine Learning
  • 17. Deal with continuous attributes Split on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes Y N No Y Yes Y Yes Yes Y No N No N Yes Y Y Yes Yes Y No N Y Yes Yes N Y No E.g.: temperature < 71.5: yes/ , no/2 g te pe atu e 5 yes/4, o/ temperature ≥ 71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits Place split points halfway between values Can evaluate all split points in one pass! Slide 17 Artificial Intelligence Machine Learning
  • 18. Deal with continuous attributes To speed up p p Entropy only needs to be evaluated between points of different c asses classes value 64 65 68 69 70 71 72 72 75 75 80 81 83 85 class Yes X No Yes Yes Yes No No Yes Yes Yes No Yes Yes No Potential optimal breakpoints Breakpoints between values of the same class cannot be optimal Slide 18 Artificial Intelligence Machine Learning
  • 19. Deal with Missing Data Treat missing values as a separate value g p Missing value denoted “?” in C4.X Simple idea: treat missing as a separate value Q: When this is not appropriate? A: Wh A When values are missing d to diff l i i due different reasons Example 1: gene expression could be missing when it is very high or very low Example 2: field IsPregnant=missing for a male patient should be treated differently (no) than for a female patient of age 25 (unknown) Slide 19 Artificial Intelligence Machine Learning
  • 20. Deal with Missing Data Split instances with missing values into pieces A piece going down a branch receives a weight proportional to the popularity of the branch weights sum to 1 Info gain works with fractional instances Use sums of weights instead of counts During classification, split the instance into pieces in the same way Merge probability distribution using weights Slide 20 Artificial Intelligence Machine Learning
  • 21. From Trees to Rules I finally g a tree from domains with y got Noisy instances Missing l Mi i values Continuous attributes But I prefer rules… No context dependent Procedure Generate a rule for each tree Get context-independent rules Slide 21 Artificial Intelligence Machine Learning
  • 22. From Trees to Rules A procedure a little more sophisticated: C4.5Rules p p C4.5rules: greedily prune conditions from each rule if this reduces its es a ed e o educes s estimated error Can produce duplicate rules Check for this at the end Then look at each class in turn consider the rules for that class find a “good” subset (guided by MDL) good Then rank the subsets to avoid conflicts Finally, remove rules (greedily) if this decreases error on the training data Slide 22 Artificial Intelligence Machine Learning
  • 23. Next Class Instance-based Classifiers Slide 23 Artificial Intelligence Machine Learning
  • 24. Introduction to Machine Learning Lecture 6 Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull