SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Machine Learning


     Erica Melis
A Little Introduction Only



   Introduction to Machine Learning

   Decision Trees

   Overfitting

   Artificial Neuronal Nets




Machine Learning
Why Machine Learning (1)


• Growing flood of online data
• Budding industry


• Computational power is available
• progress in algorithms and theory




   Machine Learning
Why Machine Learning (2)



• Data mining: using historical data to improve decision
   – medical records ⇒ medical knowledge
   – log data to model user
• Software applications we can’t program by hand
   – autonomous driving
   – speech recognition
• Self customizing programs
   – Newsreader that learns user interests




   Machine Learning
Some success stories

•   Data Mining, Lernen im Web
•   Analysis of astronomical data
•   Human Speech Recognition
•   Handwriting recognition
•   Fraudulent Use of Credit Cards
•   Drive Autonomous Vehicles
•   Predict Stock Rates
•   Intelligent Elevator Control
•   World champion Backgammon
•   Robot Soccer
•   DNA Classification

     Machine Learning
Problems Too Difficult to Program by Hand

ALVINN drives 70 mph on highways




   Machine Learning
Credit Risk Analysis




If   Other-Delinquent-Accounts > 2, and
     Number-Delinquent-Billing-Cycles > 1
Then Profitable-Customer? = No
     [Deny Credit Card application]

If   Other-Delinquent-Accounts = 0, and
     (Income > $30k) OR (Years-of-Credit > 3)
Then Profitable-Customer? = Yes
     [Accept Credit Card application]


     Machine Learning                Machine Learning, T. Mitchell, McGraw Hill, 1997
Typical Data Mining Task




Given:
• 9714 patient records, each describing a pregnancy and birth
• Each patient record contains 215 features

Learn to predict:
• Classes of future patients at high risk for Emergency Cesarean
  Section


    Machine Learning                        Machine Learning, T. Mitchell, McGraw Hill, 1997
Datamining Result




IF   No previous vaginal delivery, and
     Abnormal 2nd Trimester Ultrasound, and
     Malpresentation at admission
THEN Probability of Emergency C-Section is 0.6

Over training data: 26/41 = .63,
Over test data: 12/20 = .60


     Machine Learning                 Machine Learning, T. Mitchell, McGraw Hill, 1997
Machine Learning
How does an Agent learn?



                           Prior
                                     B
                         knowledge




                    Knowledge-based
Observations                             Hypotheses H   Predictions
                    inductive learning
     E




      Machine Learning
Machine Learning Techniques


•   Decision tree learning
•   Artificial neural networks
•   Naive Bayes
•   Bayesian Net structures
•   Instance-based learning
•   Reinforcement learning
•   Genetic algorithms
•   Support vector machines
•   Explanation Based Learning
•   Inductive logic programming


     Machine Learning
What is the Learning Problem?

Learning = Improving with experience at some task
• Improve over Task T
• with respect to performance measure P
• based on experience E




   Machine Learning
The Game of Checkers




Machine Learning
Learning to Play Checkers

• T: Play checkers
• P: Percent of games won in world tournament..
• E: games played against self..

• What exactly should be learned?
• How shall it be represented?
• What specific algorithm to learn it?




   Machine Learning
A Representation for Learned Function V’(b)
    Target function: V: Board         IR
    Target function representation:

V’(b)= w0 + w1* x1 + w2* x2+ w3* x3+ w4*x4 + w5* x5+ w6*x6



    • x1: number of black pieces on board b
    • x2 :number of red pieces on board b
    • x3 :number of black kings on board b
    • x4 number of red kings on board b
    • x5 number of read pieces threatened by black (i.e., which
      can be taken on black’s next turn)
    • x6 number of black pieces threatened by red


         Machine Learning
Function Approximation Algorithm*

 •   V(b): the true target function
 •   V’(b): the learned function
 •   Vtrain(b): the training value
 •   (b, Vtrain(b)) training example


One rule for estimating training values:

• Vtrain(b) ← V’(Successor(b))         for intermediate b




      Machine Learning
Contd: Choose Weight Tuning Rule*

LMS Weight update rule:

Do repeatedly:
• Select a training example b at random

1. Compute error(b) with current weights
             error(b) = Vtrain(b) – V’(b)
2. For each board feature xi, update weight wi:
             wi ← wi + c * xi * error(b)

c is small constant to moderate the rate of learning



   Machine Learning
...A.L. Samuel



Machine Learning
Design Choices for Checker Learning




Machine Learning
Overview



   Introduction to Machine Learning

   Inductive Learning

       Decision Trees
           Ensemble Learning
           Overfitting
       Artificial Neuronal Nets




Machine Learning
Supervised Inductive Learning (1)

Why is learning difficult?

 •     inductive learning generalizes from specific examples
       cannot be proven true; it can only be proven false
 •     not easy to tell whether hypothesis h is a good
       approximation of a target function f
 •     complexity of hypothesis – fitting data




     Machine Learning
Supervised Inductive Learning (2)
To generalize beyond the specific examples, one needs constraints
or biases on what h is best.

For that purpose, one has to specify
   •   the overall class of candidate hypotheses
        restricted hypothesis space bias
   •   a metric for comparing candidate hypotheses to
       determine whether one is better than another
        preference bias




   Machine Learning
Supervised Inductive Learning (3)




    Having fixed the bias, learning can be
    considered as search in the hypothesis
    space which is guided by the used
    preference bias.




Machine Learning
Decision Tree Learning Quinlan86, Feigenbaum61
Goal predicate: PlayTennis              temperature = hot &
Hypotheses space:                       windy = true &
Preference bias:                        humidity = normal &
                                        outlook = sunny
                                         PlayTennis = ?




   Machine Learning
Illustrating Example (RusselNorvig)

The problem: wait for a table in a restaurant?




      Machine Learning
Illustrating Example: Training Data




Machine Learning
A Decision Tree for WillWait (SR)




Machine Learning
Path in the Decision Tree




                       TAFEL




Machine Learning
General Approach

      •      let A1, A2, ..., and An be discrete attributes, i.e. each
             attribute has finitely many values

      •      let B be another discrete attribute, the goal attribute



Learning goal:

          learn a function f: A1 x A2 x ... x An  B
Examples:

          elements from A1 x A2 x ... x An x B




    Machine Learning
General Approach


   Restricted hypothesis space bias:

           the collection of all decision trees over the
           attributes A1, A2, ..., An, and B forms the set of
           possible candidate hypotheses

  Preference bias:

           prefer small trees consistent with the training
           examples




Machine Learning
Decision Trees: definition for record


   A decision tree over the attributes A1, A2,.., An, and B is
   a tree in which

   •    each non-leaf node is labelled with one of the
        attributes A1, A2, ..., and An
   •    each leaf node is labelled with one of the possible
        values for the goal attribute B
   •    a non-leaf node with the label Ai has as many
        outgoing arcs as there are possible values for the
        attribute Ai; each arc is labelled with one of the
        possible values for Ai




Machine Learning
Decision Trees: application of tree for record



   Let x be an element from A1 x A2 x ... x An and let T be
   a decision tree.


The element x is processed by the tree T starting at the root
and following the appropriate arc until a leaf is reached.
Moreover, x receives the value that is assigned to the leaf
reached.




Machine Learning
Expressiveness of Decision Trees

   Any boolean function can be written as a decision tree.




                                                   A1
   A1      A2      B                       0                    1
   0       0       0
                                          A2                        A2
   0       1       1
   1       0       0                 0         1        0                1

   1       1       1                  0        1            0            1




Machine Learning
Decision Trees


•   fully expressive within the class of propositional languages

•   in some cases, decision trees are not appropriate

    sometimes exponentially large decision trees
       (e.g. parity function; returns 1 iff an even number
                                       of inputs are 1)
        replicated subtree problem
              e.g. when coding the following two rules in a tree:

                „if A1 and A2 then B“
                „if A3 and A4 then B“




Machine Learning
Decision Trees

   Finding a smallest decision tree that is consistent with
   a set of examples presented is an NP-hard problem.



    instead of constructing a smallest decision tree the
    focus is on the construction of a pretty small one


                                       greedy algorithm



            smallest „=“ minimal in the overall number of nodes




Machine Learning
Inducing Decision Trees Algorithm for record
function DECISION-TREE-LEARNING(examples, attribs, default)
   returns a decision tree
   inputs:   examples, set of examples
             attribs, set of attributes
             default, default value for the goal predicate
  if examples is empty then return default
  else if all examples have the same classification
         then return the classification
  else if attribs is empty then return
                                     MAJORITY-VALUE(examples)
  else
         best    CHOOSE-ATTRIBUTE(attribs, examples)
         tree    a new decision tree with root test best
         m       MAJORITY-VALUE(examplesi)
  for each value vi of best do
         examplesi    {elements of examples with best = vi}
         subtree    DECISION-TREE-LEARNING(examplesi,
                                     attribs – best, m)
         add a branch to tree with label vi and subtree subtree
  return tree



     Machine Learning
Machine Learning
Training Examples
Day          Outlook      Temperature   Humidity   Wind     PlayTennis

D1           Sunny        Hot           High       Weak     No

D2           Sunny        Hot           High       Strong   No

D3           Overcast     Hot           High       Weak     Yes

D4           Rain         Mild          High       Weak     Yes

D5           Rain         Cool          Normal     Weak     Yes

D6           Rain         Cool          Normal     Strong   No

D7           Overcast     Cool          Normal     Strong   Yes

D8           Sunny        Mild          High       Weak     No

D9           Sunny        Cool          Normal     Weak     Yes

D10          Rain         Mild          Normal     Weak     Yes

D11          Sunny        Mild          Normal     Strong   Yes

D12          Overcast     Mild          High       Strong   Yes

D13          Overcast     Hot           Normal     Weak     Yes

D14          Rain         Mild          High       Strong   No




      Machine Learning                                            T. Mitchell, 1997
Machine Learning
Machine Learning
Entropy n = 2




•   S is a sample of training examples
•   p+ is the proportion of positive examples in S
•   p- is the proportion of negative examples in S
•   Entropy measures the impurity of S

     Entropy(S) ≡ -p+ log2p+ - p- log2 p-


•


      Machine Learning
Machine Learning
Machine Learning
Machine Learning
Machine Learning
Example WillWait               (do it yourself)




 the problem of whether to wait for a table in a restaurant


Machine Learning
WillWait (do it yourself)

 Which attribute to choose?




Machine Learning
Learned Tree WillWait




Machine Learning
Assessing Decision Trees

   Assessing the performance of a learning algorithm:
          a learning algorithm has done a good job, if its final
          hypothesis predicts the value of the goal attribute
          of unseen examples correctly


General strategy (cross-validation)
   1. collect a large set of examples
   2. divide it into two disjoint sets: the training set and the test set
   3. apply the learning algorithm to the training set, generating a
       hypothesis h
   4. measure the quality of h applied to the test set
   5. repeat steps 1 to 4 for different sizes of training sets and
       different randomly selected training sets of each size




    Machine Learning
When is decision tree learning appropriate?



•   Instances represented by attribute-value pairs
•   Target function has discret values
•   Disjunctive descriptions may be required
•   Training data may contain missing or noisy data




     Machine Learning
Extensions and Problems

•   dealing with continuous attributes
    - select thresholds defining intervals; as a result each
        interval becomes a discrete value
    - dynamic programming methods to find appropriate
        split points still expensive

•   missing attributes
    - introduce a new value
    - use default values (e.g. the majority value)

•   highly-branching attributes
    - e.g. Date has a different value for every example;
        information gain measure:
        GainRatio = Gain/SplitInformation
            penalizes broad+uniform




Machine Learning
Extensions and Problems

•   noise
    e.g. two or more examples with the same description but
        different classifications ->
    leaf nodes report the majority classification for its set
    Or report estimated probability (relative frequency)

•   overfitting
    the learning algorithm uses irrelevant attributes to find a
         hypothesis consistent with all examples; pruning
         techniques; e.g. new non-leaf nodes will only be
         introduced if the information gain is larger than a
         particular threshold




Machine Learning
Overview



   Introduction to Machine Learning

   Inductive Learning:
     Decision Trees
          Overfitting
     Artificial Neuronal Nets




Machine Learning
Overfitting in Decision Trees

Consider adding training example #15:
Sunny, Hot, Normal, Strong, PlayTennis = No
What effect on earlier tree?




   Machine Learning
Overfitting

Consider error of hypothesis h over
• training data: errortrain(h)
• entire distribution D of data: errorD(h)

Hypothesis h ∈ H overfits training data if there is an
  alternative hypothesis h’ ∈ H such that

         errortrain(h) < errortrain(h’)
and
          errorD(h) > errorD(h’)




   Machine Learning
Overfitting in Decision Tree Learning




Machine Learning                           T. Mitchell, 1997
Avoiding Overfitting

• stop growing when data split not statistically significant
• grow full tree, then post-prune

How to select “best” tree:
• Measure performance over training data (threshold)
• Statistical significance test whether expanding or pruning
  at node will improve beyond training set 2
• Measure performance over separate validation data set
  (utility of post-pruning) general cross-validation
• Use explicit measure for encoding complexity of tree, train
  MDL heuristics



    Machine Learning
Reduced-Error Pruning


Split data into training and validation set

Do until further pruning is harmful:

1. Evaluate impact on validation set of pruning each
   possible node (plus those below it)
2. Greedily remove the one that most improves validation
   set accuracy


• produces smallest version of most accurate subtree
• What if data is limited??


                           lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
    Machine Learning
Effect of Reduced-Error Pruning




                   lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Machine Learning
Software that Customizes to User




                                          Recommender systems
                                          (Amazon..)




Chapter 6.1 – Learning from Observation

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 

Was ist angesagt? (20)

Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
ppt
pptppt
ppt
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision trees
Decision treesDecision trees
Decision trees
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
LR1. Summary Day 1
LR1. Summary Day 1LR1. Summary Day 1
LR1. Summary Day 1
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 

Andere mochten auch

Lecture 7
Lecture 7Lecture 7
Lecture 7
butest
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
butest
 

Andere mochten auch (8)

C3.4.1
C3.4.1C3.4.1
C3.4.1
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRM
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
Decision making environment
Decision making environmentDecision making environment
Decision making environment
 
Decision making
Decision makingDecision making
Decision making
 
Machine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision TreesMachine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision Trees
 

Ähnlich wie Chapter II.6 (Book Part VI) Learning

Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
butest
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
butest
 
Microsoft PowerPoint - Chapter01
Microsoft PowerPoint - Chapter01Microsoft PowerPoint - Chapter01
Microsoft PowerPoint - Chapter01
butest
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
npinto
 
633-600 Machine Learning
633-600 Machine Learning633-600 Machine Learning
633-600 Machine Learning
butest
 

Ähnlich wie Chapter II.6 (Book Part VI) Learning (20)

Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Microsoft PowerPoint - Chapter01
Microsoft PowerPoint - Chapter01Microsoft PowerPoint - Chapter01
Microsoft PowerPoint - Chapter01
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
ML_Overview.ppt
ML_Overview.pptML_Overview.ppt
ML_Overview.ppt
 
ML overview
ML overviewML overview
ML overview
 
ML_Overview.pptx
ML_Overview.pptxML_Overview.pptx
ML_Overview.pptx
 
ML_Overview.ppt
ML_Overview.pptML_Overview.ppt
ML_Overview.ppt
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Machine learning
Machine learningMachine learning
Machine learning
 
633-600 Machine Learning
633-600 Machine Learning633-600 Machine Learning
633-600 Machine Learning
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
 
1. Introduction to deep learning.pptx
1. Introduction to deep learning.pptx1. Introduction to deep learning.pptx
1. Introduction to deep learning.pptx
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Chapter II.6 (Book Part VI) Learning

  • 1. Machine Learning Erica Melis
  • 2. A Little Introduction Only Introduction to Machine Learning Decision Trees Overfitting Artificial Neuronal Nets Machine Learning
  • 3. Why Machine Learning (1) • Growing flood of online data • Budding industry • Computational power is available • progress in algorithms and theory Machine Learning
  • 4. Why Machine Learning (2) • Data mining: using historical data to improve decision – medical records ⇒ medical knowledge – log data to model user • Software applications we can’t program by hand – autonomous driving – speech recognition • Self customizing programs – Newsreader that learns user interests Machine Learning
  • 5. Some success stories • Data Mining, Lernen im Web • Analysis of astronomical data • Human Speech Recognition • Handwriting recognition • Fraudulent Use of Credit Cards • Drive Autonomous Vehicles • Predict Stock Rates • Intelligent Elevator Control • World champion Backgammon • Robot Soccer • DNA Classification Machine Learning
  • 6. Problems Too Difficult to Program by Hand ALVINN drives 70 mph on highways Machine Learning
  • 7. Credit Risk Analysis If Other-Delinquent-Accounts > 2, and Number-Delinquent-Billing-Cycles > 1 Then Profitable-Customer? = No [Deny Credit Card application] If Other-Delinquent-Accounts = 0, and (Income > $30k) OR (Years-of-Credit > 3) Then Profitable-Customer? = Yes [Accept Credit Card application] Machine Learning Machine Learning, T. Mitchell, McGraw Hill, 1997
  • 8. Typical Data Mining Task Given: • 9714 patient records, each describing a pregnancy and birth • Each patient record contains 215 features Learn to predict: • Classes of future patients at high risk for Emergency Cesarean Section Machine Learning Machine Learning, T. Mitchell, McGraw Hill, 1997
  • 9. Datamining Result IF No previous vaginal delivery, and Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission THEN Probability of Emergency C-Section is 0.6 Over training data: 26/41 = .63, Over test data: 12/20 = .60 Machine Learning Machine Learning, T. Mitchell, McGraw Hill, 1997
  • 11. How does an Agent learn? Prior B knowledge Knowledge-based Observations Hypotheses H Predictions inductive learning E Machine Learning
  • 12. Machine Learning Techniques • Decision tree learning • Artificial neural networks • Naive Bayes • Bayesian Net structures • Instance-based learning • Reinforcement learning • Genetic algorithms • Support vector machines • Explanation Based Learning • Inductive logic programming Machine Learning
  • 13. What is the Learning Problem? Learning = Improving with experience at some task • Improve over Task T • with respect to performance measure P • based on experience E Machine Learning
  • 14. The Game of Checkers Machine Learning
  • 15. Learning to Play Checkers • T: Play checkers • P: Percent of games won in world tournament.. • E: games played against self.. • What exactly should be learned? • How shall it be represented? • What specific algorithm to learn it? Machine Learning
  • 16. A Representation for Learned Function V’(b) Target function: V: Board IR Target function representation: V’(b)= w0 + w1* x1 + w2* x2+ w3* x3+ w4*x4 + w5* x5+ w6*x6 • x1: number of black pieces on board b • x2 :number of red pieces on board b • x3 :number of black kings on board b • x4 number of red kings on board b • x5 number of read pieces threatened by black (i.e., which can be taken on black’s next turn) • x6 number of black pieces threatened by red Machine Learning
  • 17. Function Approximation Algorithm* • V(b): the true target function • V’(b): the learned function • Vtrain(b): the training value • (b, Vtrain(b)) training example One rule for estimating training values: • Vtrain(b) ← V’(Successor(b)) for intermediate b Machine Learning
  • 18. Contd: Choose Weight Tuning Rule* LMS Weight update rule: Do repeatedly: • Select a training example b at random 1. Compute error(b) with current weights error(b) = Vtrain(b) – V’(b) 2. For each board feature xi, update weight wi: wi ← wi + c * xi * error(b) c is small constant to moderate the rate of learning Machine Learning
  • 20. Design Choices for Checker Learning Machine Learning
  • 21. Overview Introduction to Machine Learning Inductive Learning Decision Trees Ensemble Learning Overfitting Artificial Neuronal Nets Machine Learning
  • 22. Supervised Inductive Learning (1) Why is learning difficult? • inductive learning generalizes from specific examples cannot be proven true; it can only be proven false • not easy to tell whether hypothesis h is a good approximation of a target function f • complexity of hypothesis – fitting data Machine Learning
  • 23. Supervised Inductive Learning (2) To generalize beyond the specific examples, one needs constraints or biases on what h is best. For that purpose, one has to specify • the overall class of candidate hypotheses  restricted hypothesis space bias • a metric for comparing candidate hypotheses to determine whether one is better than another  preference bias Machine Learning
  • 24. Supervised Inductive Learning (3) Having fixed the bias, learning can be considered as search in the hypothesis space which is guided by the used preference bias. Machine Learning
  • 25. Decision Tree Learning Quinlan86, Feigenbaum61 Goal predicate: PlayTennis temperature = hot & Hypotheses space: windy = true & Preference bias: humidity = normal & outlook = sunny  PlayTennis = ? Machine Learning
  • 26. Illustrating Example (RusselNorvig) The problem: wait for a table in a restaurant? Machine Learning
  • 27. Illustrating Example: Training Data Machine Learning
  • 28. A Decision Tree for WillWait (SR) Machine Learning
  • 29. Path in the Decision Tree TAFEL Machine Learning
  • 30. General Approach • let A1, A2, ..., and An be discrete attributes, i.e. each attribute has finitely many values • let B be another discrete attribute, the goal attribute Learning goal: learn a function f: A1 x A2 x ... x An  B Examples: elements from A1 x A2 x ... x An x B Machine Learning
  • 31. General Approach Restricted hypothesis space bias: the collection of all decision trees over the attributes A1, A2, ..., An, and B forms the set of possible candidate hypotheses Preference bias: prefer small trees consistent with the training examples Machine Learning
  • 32. Decision Trees: definition for record A decision tree over the attributes A1, A2,.., An, and B is a tree in which • each non-leaf node is labelled with one of the attributes A1, A2, ..., and An • each leaf node is labelled with one of the possible values for the goal attribute B • a non-leaf node with the label Ai has as many outgoing arcs as there are possible values for the attribute Ai; each arc is labelled with one of the possible values for Ai Machine Learning
  • 33. Decision Trees: application of tree for record Let x be an element from A1 x A2 x ... x An and let T be a decision tree. The element x is processed by the tree T starting at the root and following the appropriate arc until a leaf is reached. Moreover, x receives the value that is assigned to the leaf reached. Machine Learning
  • 34. Expressiveness of Decision Trees Any boolean function can be written as a decision tree. A1 A1 A2 B 0 1 0 0 0 A2 A2 0 1 1 1 0 0 0 1 0 1 1 1 1 0 1 0 1 Machine Learning
  • 35. Decision Trees • fully expressive within the class of propositional languages • in some cases, decision trees are not appropriate sometimes exponentially large decision trees (e.g. parity function; returns 1 iff an even number of inputs are 1) replicated subtree problem e.g. when coding the following two rules in a tree: „if A1 and A2 then B“ „if A3 and A4 then B“ Machine Learning
  • 36. Decision Trees Finding a smallest decision tree that is consistent with a set of examples presented is an NP-hard problem. instead of constructing a smallest decision tree the focus is on the construction of a pretty small one  greedy algorithm smallest „=“ minimal in the overall number of nodes Machine Learning
  • 37. Inducing Decision Trees Algorithm for record function DECISION-TREE-LEARNING(examples, attribs, default) returns a decision tree inputs: examples, set of examples attribs, set of attributes default, default value for the goal predicate if examples is empty then return default else if all examples have the same classification then return the classification else if attribs is empty then return MAJORITY-VALUE(examples) else best CHOOSE-ATTRIBUTE(attribs, examples) tree a new decision tree with root test best m MAJORITY-VALUE(examplesi) for each value vi of best do examplesi {elements of examples with best = vi} subtree DECISION-TREE-LEARNING(examplesi, attribs – best, m) add a branch to tree with label vi and subtree subtree return tree Machine Learning
  • 39. Training Examples Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Machine Learning T. Mitchell, 1997
  • 42. Entropy n = 2 • S is a sample of training examples • p+ is the proportion of positive examples in S • p- is the proportion of negative examples in S • Entropy measures the impurity of S Entropy(S) ≡ -p+ log2p+ - p- log2 p- • Machine Learning
  • 47. Example WillWait (do it yourself) the problem of whether to wait for a table in a restaurant Machine Learning
  • 48. WillWait (do it yourself) Which attribute to choose? Machine Learning
  • 50. Assessing Decision Trees Assessing the performance of a learning algorithm: a learning algorithm has done a good job, if its final hypothesis predicts the value of the goal attribute of unseen examples correctly General strategy (cross-validation) 1. collect a large set of examples 2. divide it into two disjoint sets: the training set and the test set 3. apply the learning algorithm to the training set, generating a hypothesis h 4. measure the quality of h applied to the test set 5. repeat steps 1 to 4 for different sizes of training sets and different randomly selected training sets of each size Machine Learning
  • 51. When is decision tree learning appropriate? • Instances represented by attribute-value pairs • Target function has discret values • Disjunctive descriptions may be required • Training data may contain missing or noisy data Machine Learning
  • 52. Extensions and Problems • dealing with continuous attributes - select thresholds defining intervals; as a result each interval becomes a discrete value - dynamic programming methods to find appropriate split points still expensive • missing attributes - introduce a new value - use default values (e.g. the majority value) • highly-branching attributes - e.g. Date has a different value for every example; information gain measure: GainRatio = Gain/SplitInformation penalizes broad+uniform Machine Learning
  • 53. Extensions and Problems • noise e.g. two or more examples with the same description but different classifications -> leaf nodes report the majority classification for its set Or report estimated probability (relative frequency) • overfitting the learning algorithm uses irrelevant attributes to find a hypothesis consistent with all examples; pruning techniques; e.g. new non-leaf nodes will only be introduced if the information gain is larger than a particular threshold Machine Learning
  • 54. Overview Introduction to Machine Learning Inductive Learning: Decision Trees Overfitting Artificial Neuronal Nets Machine Learning
  • 55. Overfitting in Decision Trees Consider adding training example #15: Sunny, Hot, Normal, Strong, PlayTennis = No What effect on earlier tree? Machine Learning
  • 56. Overfitting Consider error of hypothesis h over • training data: errortrain(h) • entire distribution D of data: errorD(h) Hypothesis h ∈ H overfits training data if there is an alternative hypothesis h’ ∈ H such that errortrain(h) < errortrain(h’) and errorD(h) > errorD(h’) Machine Learning
  • 57. Overfitting in Decision Tree Learning Machine Learning T. Mitchell, 1997
  • 58. Avoiding Overfitting • stop growing when data split not statistically significant • grow full tree, then post-prune How to select “best” tree: • Measure performance over training data (threshold) • Statistical significance test whether expanding or pruning at node will improve beyond training set 2 • Measure performance over separate validation data set (utility of post-pruning) general cross-validation • Use explicit measure for encoding complexity of tree, train MDL heuristics Machine Learning
  • 59. Reduced-Error Pruning Split data into training and validation set Do until further pruning is harmful: 1. Evaluate impact on validation set of pruning each possible node (plus those below it) 2. Greedily remove the one that most improves validation set accuracy • produces smallest version of most accurate subtree • What if data is limited?? lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997 Machine Learning
  • 60. Effect of Reduced-Error Pruning lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997 Machine Learning
  • 61. Software that Customizes to User Recommender systems (Amazon..) Chapter 6.1 – Learning from Observation