SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Machine Learning       
   
    
    
    
Srihari




                   Decision Theory

                     Sargur Srihari
              srihari@cedar.buffalo.edu


                                                     1
Machine Learning       
      
       
          
   
Srihari



                        Decision Theory
•  Using probability theory to make optimal decisions
•  Input vector x, target vector t
  –  Regression: t is continuous
  –  Classification: t will consist of class labels
•  Summary of uncertainty associated is given by p(x,t)
•  Inference problem is to obtain p(x,t) from data
•  Decision: make specific prediction for value of t and
   take specific actions based on t


                                                                     2
Machine Learning   
   
    
    
     
Srihari



     Medical Diagnosis Problem
•  X-ray image of patient
•  Whether patient has cancer or not
•  Input vector x is set of pixel intensities
•  Output variable t represents whether cancer or
   not C1 is cancer and C2 is absence of cancer
•  General inference problem is to determine
   p(x,Ck) which gives most complete description of
   situation
•  In the end we need to decide whether to give
   treatment or not. Decision theory helps do this
                                                       3
Machine Learning       
   
   
    
     
Srihari



                      Bayes Decision
•  How do probabilities play a role in making a
   decision?
•  Given input x and classes Ck using Bayes
   theorem



•  Quantities in Bayes theorem can be obtained
   from p(x,Ck) either by marginalizing or
   conditioning wrt appropriate variable
                                                        4
Machine Learning        
       
      
         
         
Srihari



      Minimizing Expected Error
•  Probability of mistake (2-class)




•  Minimum error decision rule
   –  For a given x choose class for
      which integrand is smaller          Single input variable x
   –  Since p(x,Ck)=p(Ck|x)p(x), choose
      class for which a posteriori        If priors are equal, decision is
      probability is highest              based on class-conditional
   –  Called Bayes Classifier             densities p(x|Ck)




                                                                             5
Machine Learning           
        
             
     
        
Srihari



          Minimizing Expected Loss
•  Unequal importance of mistakes
•  Medical Diagnosis                                Loss Function for Cancer Decision
•  Loss or Cost Function given by                                   Decision Made
   Loss Matrix
•  Utility is negative of Loss
•  Minimize Average Loss                          True
                                                  Class


•  Minimum Loss Decision Rule

   –  Choose class for which ∑ Lkj p(Ck | x) is
      minimum                 k


   –  Trivial once we know a posteriori
      probabilities
                           €
                                                                                    6
Machine Learning      
   
   
    
   
Srihari



                      Reject Option
•  Decisions can be made
   when a posteriori
   probabilities are
   significantly less than
   unity or joint
   probabilities have
   comparable values
•  Avoid making decisions
   on difficult cases

                                                     7
Machine Learning         
       
        
       
        
Srihari



          Inference and Decision
•  Classification problem broken into two separate
   stages
  –  Inference, where training data is used to learn a model for p(Ck,x)
  –  Decision, use posterior probabilities to make optimal class
     assignments
•  Alternatively can learn a function that maps
   inputs directly into labels
•  Three distinct approaches to Decision Problems
  1. Generative
  2. Discriminative
  3. Discriminant Function
                                                                         8
Machine Learning   
   
   
   
    
Srihari




                   1. Generative Models
•  First solve inference problem of determining
   class-conditional densities p(x|Ck) for each
   class separately
•  Then use Bayes theorem to determine
   posterior probabilities



•     Then use decision theory to determine
     class membership                                9
Machine Learning   
   
   
   
    
Srihari




             2. Discriminative Models

•  First solve inference problem to determine
   posterior class probabilities p(Ck|x)

•  Use decision theory to determine class
   membership




                                                   10
Machine Learning    
      
      
      
      
Srihari




           3. Discriminant Functions


•  Find a function f (x) that maps each input x
   directly to class label
  –  In two-class problem, f (.) is binary valued
     •  f =0 represents class C1 and f =1 represents class C2


•  Probabilities play no role
  –  No access to posterior probabilities p(Ck|x)
                                                               11
Machine Learning               
         
          
          
         
Srihari




     Need for Posterior Probabilities
•    Minimizing risk
      –  Loss matrix may be revised periodically as in a financial application
•    Reject option
      –  Minimize misclassification rate, or expected loss for a given fraction of rejected points
•    Compensating for class priors
      –  When far more samples from one class compared to another, we use a balanced data set
         (otherwise we may have 99.9% accuracy always classifying into one class)
      –  Take posterior probabilities from balanced data set, divide by class fractions in the data
         set and multiply by class fractions in population to which the model is applied
      –  Cannot be done if posterior probabilities are unavailable
•    Combining models
      –  X-ray images (xI) and Blood tests (xB)
      –  When posterior probabilities are available they can be combined using rules of probability
      –  Assume feature independence p(xI,, xB|Ck)= p(xI,|Ck) p(xB,|Ck) [Naïve Bayes Assumption]
      –  Then          p(Ck|xI,, xB) α p(xI,, xB|Ck)p(Ck)
                                     α p(xI,|Ck) p(xB,|Ck) p(Ck)
                                    α p(Ck|xI) p(Ck|xB)/p(Ck)
      –  Need p(Ck) which can be determined from fraction of data points in each class. Then need
         to normalize resulting probabilities to sum to one                                12
Machine Learning         
       
    
         
         
Srihari



    Loss Functions for Regression
•  Curve fitting can also use a
   loss function
•  Regression decision is to
   choose a specific estimate y(x)
   of t for a given x
•  Incur loss L(t,y(x))
•  Squared loss function                   Regression function y(x),
   L(t,y(x))={y(x)-t}2                     which minimizes
                                           the expected squared loss,
•  Minimize expected loss                  is given by the mean
                                           of the conditional distribution p(t|x)
     Taking derivative and setting
     equal to zero yields a solution
     y(x)=Et[t|x]
                                                                           13
Machine Learning     
      
      
      
      
Srihari



              Inference and Decision for
                     Regression
•     Three distinct approaches (decreasing complexity)
•     Analogous to those for classifiction
     1.  Determine joint density p(x,t)
         Then normalize to find conditional density p(t|x)
         Finally marginalize to find conditional mean Et[t|x]
     2.  Solve inference problem of determining conditional
         density p(t|x)
         Marginalize to find conditional mean
     3.  Find regression function y(x) directly from training data

                                                                    14
Machine Learning    
      
     
   
   
Srihari



         Minkowski Loss Function
•  Squared Loss is not only
   possible choice for regression
•  Important example concerns
   multimodal p(t|x)
•  Minkowski Loss Lq=|y-t|q
•  Minimum of E[t|x] is given by
  –  conditional mean for q=2,
  –  conditional median for q=1 and
  –  conditional mode for q 0

                                                         15

Weitere ähnliche Inhalte

Was ist angesagt?

digital image processing
digital image processingdigital image processing
digital image processingAbinaya B
 
Traffic-adaptive Medium Access Protocol
Traffic-adaptive Medium Access ProtocolTraffic-adaptive Medium Access Protocol
Traffic-adaptive Medium Access ProtocolGaurav Chauhan
 
Image Representation & Descriptors
Image Representation & DescriptorsImage Representation & Descriptors
Image Representation & DescriptorsPundrikPatel
 
Difference between Vector Quantization and Scalar Quantization
Difference between Vector Quantization and Scalar QuantizationDifference between Vector Quantization and Scalar Quantization
Difference between Vector Quantization and Scalar QuantizationHimanshuSirohi6
 
Design cycles of pattern recognition
Design cycles of pattern recognitionDesign cycles of pattern recognition
Design cycles of pattern recognitionAl Mamun
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
4.4 diversity combining techniques
4.4   diversity combining techniques4.4   diversity combining techniques
4.4 diversity combining techniquesJAIGANESH SEKAR
 
Chapter 9 morphological image processing
Chapter 9   morphological image processingChapter 9   morphological image processing
Chapter 9 morphological image processingAhmed Daoud
 
web connectivity in IoT
web connectivity in IoTweb connectivity in IoT
web connectivity in IoTFabMinds
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...Selvaraj Seerangan
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
The Wireless Channel Propagation
The Wireless Channel PropagationThe Wireless Channel Propagation
The Wireless Channel PropagationPei-Che Chang
 
MANET in Mobile Computing
MANET in Mobile ComputingMANET in Mobile Computing
MANET in Mobile ComputingKABILESH RAMAR
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
OFDMA - Orthogonal Frequency Division Multiple Access PPT by PREM KAMAL
OFDMA - Orthogonal Frequency Division Multiple Access PPT by PREM KAMALOFDMA - Orthogonal Frequency Division Multiple Access PPT by PREM KAMAL
OFDMA - Orthogonal Frequency Division Multiple Access PPT by PREM KAMALprem kamal
 
Congestion avoidance in TCP
Congestion avoidance in TCPCongestion avoidance in TCP
Congestion avoidance in TCPselvakumar_b1985
 
Machine learning for 5G and beyond
Machine learning for 5G and beyondMachine learning for 5G and beyond
Machine learning for 5G and beyondITU
 

Was ist angesagt? (20)

digital image processing
digital image processingdigital image processing
digital image processing
 
Traffic-adaptive Medium Access Protocol
Traffic-adaptive Medium Access ProtocolTraffic-adaptive Medium Access Protocol
Traffic-adaptive Medium Access Protocol
 
Image Representation & Descriptors
Image Representation & DescriptorsImage Representation & Descriptors
Image Representation & Descriptors
 
Difference between Vector Quantization and Scalar Quantization
Difference between Vector Quantization and Scalar QuantizationDifference between Vector Quantization and Scalar Quantization
Difference between Vector Quantization and Scalar Quantization
 
Design cycles of pattern recognition
Design cycles of pattern recognitionDesign cycles of pattern recognition
Design cycles of pattern recognition
 
Hiperlan
HiperlanHiperlan
Hiperlan
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
4.4 diversity combining techniques
4.4   diversity combining techniques4.4   diversity combining techniques
4.4 diversity combining techniques
 
Chapter 9 morphological image processing
Chapter 9   morphological image processingChapter 9   morphological image processing
Chapter 9 morphological image processing
 
web connectivity in IoT
web connectivity in IoTweb connectivity in IoT
web connectivity in IoT
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
The Wireless Channel Propagation
The Wireless Channel PropagationThe Wireless Channel Propagation
The Wireless Channel Propagation
 
MANET in Mobile Computing
MANET in Mobile ComputingMANET in Mobile Computing
MANET in Mobile Computing
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
OFDMA - Orthogonal Frequency Division Multiple Access PPT by PREM KAMAL
OFDMA - Orthogonal Frequency Division Multiple Access PPT by PREM KAMALOFDMA - Orthogonal Frequency Division Multiple Access PPT by PREM KAMAL
OFDMA - Orthogonal Frequency Division Multiple Access PPT by PREM KAMAL
 
Congestion avoidance in TCP
Congestion avoidance in TCPCongestion avoidance in TCP
Congestion avoidance in TCP
 
Machine learning for 5G and beyond
Machine learning for 5G and beyondMachine learning for 5G and beyond
Machine learning for 5G and beyond
 

Andere mochten auch

Decision theory
Decision theoryDecision theory
Decision theorySurekha98
 
DECISION THEORY WITH EXAMPLE
DECISION THEORY WITH EXAMPLEDECISION THEORY WITH EXAMPLE
DECISION THEORY WITH EXAMPLEAnasuya Barik
 
Gamma function for different negative numbers and its applications
Gamma function for different negative numbers and its applicationsGamma function for different negative numbers and its applications
Gamma function for different negative numbers and its applicationsAlexander Decker
 
My ppt @becdoms on importance of business management
My ppt @becdoms on importance of business managementMy ppt @becdoms on importance of business management
My ppt @becdoms on importance of business managementBabasab Patil
 
Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChristian Robert
 
Lecture slides stats1.13.l05.air
Lecture slides stats1.13.l05.airLecture slides stats1.13.l05.air
Lecture slides stats1.13.l05.airatutor_te
 
FRM - Level 1 Part 2 - Quantitative Methods including Probability Theory
FRM - Level 1 Part 2 - Quantitative Methods including Probability TheoryFRM - Level 1 Part 2 - Quantitative Methods including Probability Theory
FRM - Level 1 Part 2 - Quantitative Methods including Probability TheoryJoe McPhail
 
Rectangular Coordinates, Introduction to Graphing Equations
Rectangular Coordinates, Introduction to Graphing EquationsRectangular Coordinates, Introduction to Graphing Equations
Rectangular Coordinates, Introduction to Graphing EquationsSandyPoinsett
 
Bayseian decision theory
Bayseian decision theoryBayseian decision theory
Bayseian decision theorysia16
 
Gamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared DistributionsGamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared Distributionsmathscontent
 
Management functions and decision making
Management functions and decision makingManagement functions and decision making
Management functions and decision makingIva Walton
 
B.tech ii unit-2 material beta gamma function
B.tech ii unit-2 material beta gamma functionB.tech ii unit-2 material beta gamma function
B.tech ii unit-2 material beta gamma functionRai University
 

Andere mochten auch (20)

Decision Theory
Decision TheoryDecision Theory
Decision Theory
 
Decision theory
Decision theoryDecision theory
Decision theory
 
DECISION THEORY WITH EXAMPLE
DECISION THEORY WITH EXAMPLEDECISION THEORY WITH EXAMPLE
DECISION THEORY WITH EXAMPLE
 
Ppt on decision theory
Ppt on decision theoryPpt on decision theory
Ppt on decision theory
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Gamma function for different negative numbers and its applications
Gamma function for different negative numbers and its applicationsGamma function for different negative numbers and its applications
Gamma function for different negative numbers and its applications
 
My ppt @becdoms on importance of business management
My ppt @becdoms on importance of business managementMy ppt @becdoms on importance of business management
My ppt @becdoms on importance of business management
 
09 Unif Exp Gamma
09 Unif Exp Gamma09 Unif Exp Gamma
09 Unif Exp Gamma
 
Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysis
 
Lecture slides stats1.13.l05.air
Lecture slides stats1.13.l05.airLecture slides stats1.13.l05.air
Lecture slides stats1.13.l05.air
 
FRM - Level 1 Part 2 - Quantitative Methods including Probability Theory
FRM - Level 1 Part 2 - Quantitative Methods including Probability TheoryFRM - Level 1 Part 2 - Quantitative Methods including Probability Theory
FRM - Level 1 Part 2 - Quantitative Methods including Probability Theory
 
Rectangular Coordinates, Introduction to Graphing Equations
Rectangular Coordinates, Introduction to Graphing EquationsRectangular Coordinates, Introduction to Graphing Equations
Rectangular Coordinates, Introduction to Graphing Equations
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Bayseian decision theory
Bayseian decision theoryBayseian decision theory
Bayseian decision theory
 
Ch08
Ch08Ch08
Ch08
 
Gamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared DistributionsGamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared Distributions
 
Dm
DmDm
Dm
 
Qt decision theory
Qt decision theoryQt decision theory
Qt decision theory
 
Management functions and decision making
Management functions and decision makingManagement functions and decision making
Management functions and decision making
 
B.tech ii unit-2 material beta gamma function
B.tech ii unit-2 material beta gamma functionB.tech ii unit-2 material beta gamma function
B.tech ii unit-2 material beta gamma function
 

Ähnlich wie Decision theory

Free Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comFree Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comEdhole.com
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsMarina Santini
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewChia-Ching Lin
 
Chapter II.6 (Book Part VI) Learning
Chapter II.6 (Book Part VI) LearningChapter II.6 (Book Part VI) Learning
Chapter II.6 (Book Part VI) Learningbutest
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learningbutest
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selectionguasoni
 

Ähnlich wie Decision theory (20)

Free Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comFree Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.com
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
Chapter II.6 (Book Part VI) Learning
Chapter II.6 (Book Part VI) LearningChapter II.6 (Book Part VI) Learning
Chapter II.6 (Book Part VI) Learning
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
c23_ml1.ppt
c23_ml1.pptc23_ml1.ppt
c23_ml1.ppt
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
lecture_16.pptx
lecture_16.pptxlecture_16.pptx
lecture_16.pptx
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selection
 

Mehr von JULIO GONZALEZ SANZ

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]JULIO GONZALEZ SANZ
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]JULIO GONZALEZ SANZ
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]JULIO GONZALEZ SANZ
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]JULIO GONZALEZ SANZ
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement systemJULIO GONZALEZ SANZ
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean productionJULIO GONZALEZ SANZ
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrJULIO GONZALEZ SANZ
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioJULIO GONZALEZ SANZ
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data qualityJULIO GONZALEZ SANZ
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processesJULIO GONZALEZ SANZ
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 

Mehr von JULIO GONZALEZ SANZ (20)

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
 
Cmmi 26 ago_2009_
Cmmi 26 ago_2009_Cmmi 26 ago_2009_
Cmmi 26 ago_2009_
 
Creation use-of-simple-model
Creation use-of-simple-modelCreation use-of-simple-model
Creation use-of-simple-model
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement system
 
Magic quadrant
Magic quadrantMagic quadrant
Magic quadrant
 
6 six sigma presentation
6 six sigma presentation6 six sigma presentation
6 six sigma presentation
 
Volvo csr suppliers guide vsib
Volvo csr suppliers guide vsibVolvo csr suppliers guide vsib
Volvo csr suppliers guide vsib
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean production
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfr
 
Using minitab exec files
Using minitab exec filesUsing minitab exec files
Using minitab exec files
 
Sga iso-14001
Sga iso-14001Sga iso-14001
Sga iso-14001
 
Cslt closing plenary_portugal
Cslt closing plenary_portugalCslt closing plenary_portugal
Cslt closing plenary_portugal
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julio
 
Swebokv3
Swebokv3 Swebokv3
Swebokv3
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data quality
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processes
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 

Decision theory

  • 1. Machine Learning Srihari Decision Theory Sargur Srihari srihari@cedar.buffalo.edu 1
  • 2. Machine Learning Srihari Decision Theory •  Using probability theory to make optimal decisions •  Input vector x, target vector t –  Regression: t is continuous –  Classification: t will consist of class labels •  Summary of uncertainty associated is given by p(x,t) •  Inference problem is to obtain p(x,t) from data •  Decision: make specific prediction for value of t and take specific actions based on t 2
  • 3. Machine Learning Srihari Medical Diagnosis Problem •  X-ray image of patient •  Whether patient has cancer or not •  Input vector x is set of pixel intensities •  Output variable t represents whether cancer or not C1 is cancer and C2 is absence of cancer •  General inference problem is to determine p(x,Ck) which gives most complete description of situation •  In the end we need to decide whether to give treatment or not. Decision theory helps do this 3
  • 4. Machine Learning Srihari Bayes Decision •  How do probabilities play a role in making a decision? •  Given input x and classes Ck using Bayes theorem •  Quantities in Bayes theorem can be obtained from p(x,Ck) either by marginalizing or conditioning wrt appropriate variable 4
  • 5. Machine Learning Srihari Minimizing Expected Error •  Probability of mistake (2-class) •  Minimum error decision rule –  For a given x choose class for which integrand is smaller Single input variable x –  Since p(x,Ck)=p(Ck|x)p(x), choose class for which a posteriori If priors are equal, decision is probability is highest based on class-conditional –  Called Bayes Classifier densities p(x|Ck) 5
  • 6. Machine Learning Srihari Minimizing Expected Loss •  Unequal importance of mistakes •  Medical Diagnosis Loss Function for Cancer Decision •  Loss or Cost Function given by Decision Made Loss Matrix •  Utility is negative of Loss •  Minimize Average Loss True Class •  Minimum Loss Decision Rule –  Choose class for which ∑ Lkj p(Ck | x) is minimum k –  Trivial once we know a posteriori probabilities € 6
  • 7. Machine Learning Srihari Reject Option •  Decisions can be made when a posteriori probabilities are significantly less than unity or joint probabilities have comparable values •  Avoid making decisions on difficult cases 7
  • 8. Machine Learning Srihari Inference and Decision •  Classification problem broken into two separate stages –  Inference, where training data is used to learn a model for p(Ck,x) –  Decision, use posterior probabilities to make optimal class assignments •  Alternatively can learn a function that maps inputs directly into labels •  Three distinct approaches to Decision Problems 1. Generative 2. Discriminative 3. Discriminant Function 8
  • 9. Machine Learning Srihari 1. Generative Models •  First solve inference problem of determining class-conditional densities p(x|Ck) for each class separately •  Then use Bayes theorem to determine posterior probabilities •  Then use decision theory to determine class membership 9
  • 10. Machine Learning Srihari 2. Discriminative Models •  First solve inference problem to determine posterior class probabilities p(Ck|x) •  Use decision theory to determine class membership 10
  • 11. Machine Learning Srihari 3. Discriminant Functions •  Find a function f (x) that maps each input x directly to class label –  In two-class problem, f (.) is binary valued •  f =0 represents class C1 and f =1 represents class C2 •  Probabilities play no role –  No access to posterior probabilities p(Ck|x) 11
  • 12. Machine Learning Srihari Need for Posterior Probabilities •  Minimizing risk –  Loss matrix may be revised periodically as in a financial application •  Reject option –  Minimize misclassification rate, or expected loss for a given fraction of rejected points •  Compensating for class priors –  When far more samples from one class compared to another, we use a balanced data set (otherwise we may have 99.9% accuracy always classifying into one class) –  Take posterior probabilities from balanced data set, divide by class fractions in the data set and multiply by class fractions in population to which the model is applied –  Cannot be done if posterior probabilities are unavailable •  Combining models –  X-ray images (xI) and Blood tests (xB) –  When posterior probabilities are available they can be combined using rules of probability –  Assume feature independence p(xI,, xB|Ck)= p(xI,|Ck) p(xB,|Ck) [Naïve Bayes Assumption] –  Then p(Ck|xI,, xB) α p(xI,, xB|Ck)p(Ck) α p(xI,|Ck) p(xB,|Ck) p(Ck) α p(Ck|xI) p(Ck|xB)/p(Ck) –  Need p(Ck) which can be determined from fraction of data points in each class. Then need to normalize resulting probabilities to sum to one 12
  • 13. Machine Learning Srihari Loss Functions for Regression •  Curve fitting can also use a loss function •  Regression decision is to choose a specific estimate y(x) of t for a given x •  Incur loss L(t,y(x)) •  Squared loss function Regression function y(x), L(t,y(x))={y(x)-t}2 which minimizes the expected squared loss, •  Minimize expected loss is given by the mean of the conditional distribution p(t|x) Taking derivative and setting equal to zero yields a solution y(x)=Et[t|x] 13
  • 14. Machine Learning Srihari Inference and Decision for Regression •  Three distinct approaches (decreasing complexity) •  Analogous to those for classifiction 1.  Determine joint density p(x,t) Then normalize to find conditional density p(t|x) Finally marginalize to find conditional mean Et[t|x] 2.  Solve inference problem of determining conditional density p(t|x) Marginalize to find conditional mean 3.  Find regression function y(x) directly from training data 14
  • 15. Machine Learning Srihari Minkowski Loss Function •  Squared Loss is not only possible choice for regression •  Important example concerns multimodal p(t|x) •  Minkowski Loss Lq=|y-t|q •  Minimum of E[t|x] is given by –  conditional mean for q=2, –  conditional median for q=1 and –  conditional mode for q 0 15