SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Zhuowen Tu Lab of Neuro Imaging, Department of Neurology Department of Computer Science University of California, Los Angeles Ensemble Classification Methods: Bagging, Boosting, and Random Forests Some slides are due to Robert Schapire and Pier Luca Lnzi
Discriminative v.s. Generative Models Generative and discriminative learning are key problems in machine learning and computer vision. If you are asking, “ Are there any faces in this image ?”, then you would probably want to use  discriminative methods . If you are asking,  “Find a 3-d model that describes the runner”, then you would use  generative methods . ICCV W. Freeman and A. Blake
Discriminative v.s. Generative Models Discriminative models,  either explicitly or implicitly , study the posterior distribution directly. Generative approaches model the likelihood and prior separately.
Some Literature Perceptron and Neural networks   ( Rosenblatt 1958, Windrow and Hoff 1960,  Hopfiled 1982, Rumelhart and McClelland 1986, Lecun et al. 1998 ) Support Vector Machine  ( Vapnik 1995 ) Bagging, Boosting,…  ( Breiman 1994, Freund and Schapire 1995, Friedman et al. 1998, ) Discriminative Approaches: Nearest neighborhood classifier  ( Hart 1968 ) Fisher linear discriminant analysis ( Fisher ) … Generative Approaches: PCA, TCA, ICA  ( Karhunen and Loeve 1947, H´erault et al. 1980, Frey and Jojic 1999 )  MRFs, Particle Filtering  ( Ising,  Geman and Geman 1994, Isard and Blake 1996 ) Maximum Entropy Model  ( Della Pietra et al. 1997, Zhu et al. 1997, Hinton 2002 ) Deep Nets  ( Hinton et al. 2006 ) … .
Pros and Cons of Discriminative Models Focused on discrimination and marginal distributions. Easier to learn/compute than generative models (arguable). Good performance with large training volume. Often fast. Pros: Some general views, but might be outdated Cons: Limited modeling capability. Can not generate new data. Require both positive and negative training data (mostly). Performance largely degrades on small size training data.
Intuition about Margin Infant Elderly Man Woman ? ?
Problem with All Margin-based Discriminative Classifier It might be very miss-leading to return a high confidence.
Several Pair of Concepts Generative v.s. Discriminative Parametric v.s. Non-parametric Supervised  v.s. Unsupervised The gap between them is becoming increasingly small.
Parametric v.s. Non-parametric Non-parametric: Parametric: nearest neighborhood kernel methods decision tree neural nets Gaussian processes … logistic regression Fisher discriminant analysis Graphical models hierarchical models bagging, boosting … It roughly depends on if the number of parameters increases with the number of samples. Their distinction is not absolute.
Empirical Comparisons of Different Algorithms Caruana and Niculesu-Mizil, ICML 2006 Overall rank by mean performance across problems and metrics (based on bootstrap analysis). BST-DT: boosting with decision tree weak classifier  RF: random forest BAG-DT: bagging with decision tree weak classifier  SVM: support vector machine ANN: neural nets  KNN: k nearest neighboorhood BST-STMP: boosting with decision stump weak classifier  DT: decision tree  LOGREG: logistic regression  NB: naïve Bayesian It is informative, but by no means final.
Empirical Study on High-dimension Caruana et al., ICML 2008 Moving average standardized scores of each learning algorithm as a function of the dimension. The rank for the algorithms to perform consistently well: (1) random forest  (2) neural nets (3) boosted tree (4) SVMs
Ensemble Methods Bagging  ( Breiman 1994,… ) Boosting  ( Freund and Schapire 1995, Friedman et al. 1998,… ) Random forests  ( Breiman 2001,… ) Predict class label for unseen data by aggregating a set of predictions (classifiers learned from the training data).
General Idea S Training Data S 1 S 2 S n Multiple Data Sets C 1 C 2 C n Multiple Classifiers H Combined Classifier
Build Ensemble Classifiers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why do they work? ,[object Object],[object Object],[object Object],[object Object]
Bagging ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bagging ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bias-variance Decomposition ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
When does Bagging work? ,[object Object],[object Object],[object Object],[object Object]
Why Bagging works? ,[object Object],[object Object],[object Object],[object Object]
Why Bagging works? Direct error: Bagging error: Jensen’s inequality:
Randomization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Ensemble Methods Bagging  ( Breiman 1994,… ) Boosting  ( Freund and Schapire 1995, Friedman et al. 1998,… ) Random forests  ( Breiman 2001,… )
A Formal Description of Boosting
AdaBoost ( Freund and Schpaire ) ( not necessarily with equal weight )
Toy Example
Final Classifier
Training Error
Training Error Two take home messages: (1) The first chosen weak learner is already informative about the difficulty of the classification algorithm (1) Bound is achieved when they are complementary to each other. Tu et al. 2006
Training Error
Training Error
Training Error
Test Error?
Test Error
The Margin Explanation
The Margin Distribution
Margin Analysis
Theoretical Analysis
AdaBoost and Exponential Loss
Coordinate Descent Explanation
Coordinate Descent Explanation Step 1: find the best  to minimize the error. Step 2: estimate  to minimize the error on
Logistic Regression View
Benefits of Model Fitting View
Advantages of Boosting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Caveats ,[object Object],[object Object],[object Object],[object Object],[object Object]
Variations of Boosting Confidence rated Predictions ( Singer and Schapire )
Confidence Rated Prediction
Variations of Boosting ( Friedman et al. 98 ) The AdaBoost (discrete) algorithm fits an additive logistic regression model by using adaptive Newton updates for minimizing
LogiBoost The LogiBoost algorithm uses adaptive Newton steps for fitting an additive symmetric logistic model by maximum likelihood.
Real AdaBoost The  Real AdaBoost algorithm fits an additive logistic regression model by stage-wise optimization of
Gental AdaBoost The  Gental AdaBoost algorithmuses adaptive Newton steps for minimizing
Choices of Error Functions
Multi-Class Classification One v.s. All seems to work very well most of the time. R. Rifkin and A. Klautau, “In defense of one-vs-all classification”, J. Mach. Learn. Res, 2004 Error output code seems to be useful when the number of classes is big.
Data-assisted Output Code ( Jiang and Tu 09 )
Ensemble Methods Bagging  ( Breiman 1994,… ) Boosting  ( Freund and Schapire 1995, Friedman et al. 1998,… ) Random forests  ( Breiman 2001,… )
Random Forests ,[object Object],[object Object],[object Object],[object Object]
The Random Forests Algorithm Given a training set S For  i = 1 to k do: Build subset S i by sampling with replacement from S Learn tree T i from  Si At each node: Choose best split from random subset of F features Each tree grows to the largest extend, and no pruning Make predictions according to majority vote of the set of k trees.
Features of Random Forests ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Features of Random Forests ,[object Object],[object Object],[object Object],[object Object],[object Object]
Compared with Boosting ,[object Object],[object Object],[object Object],[object Object],Pros: ,[object Object],[object Object],[object Object],Cons:
Problems with On-line Boosting Oza and Russel The weights are changed gradually, but not the weak learners themselves!  Random forests can handle on-line more naturally.
Face Detection Viola and Jones 2001 A landmark paper in vision! ,[object Object],[object Object],[object Object],[object Object],All the components can be replaced now. HOG, part-based.. RF, SVM, PBT, NN
Empirical Observatations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Ensemble Methods ,[object Object],[object Object],[object Object],Leo Brieman

Weitere ähnliche Inhalte

Was ist angesagt?

ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Marina Santini
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationMarina Santini
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
slides
slidesslides
slidesbutest
 
Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - IntroductionMarina Santini
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 

Was ist angesagt? (18)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
slides
slidesslides
slides
 
Ch06
Ch06Ch06
Ch06
 
Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - Introduction
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Multiple Classifier Systems
Multiple Classifier SystemsMultiple Classifier Systems
Multiple Classifier Systems
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 

Andere mochten auch

ランダムフォレストとそのコンピュータビジョンへの応用
ランダムフォレストとそのコンピュータビジョンへの応用ランダムフォレストとそのコンピュータビジョンへの応用
ランダムフォレストとそのコンピュータビジョンへの応用Kinki University
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees Nikolaos Vergos
 
Ensemble modeling overview, Big Data meetup
Ensemble modeling overview, Big Data meetupEnsemble modeling overview, Big Data meetup
Ensemble modeling overview, Big Data meetupOptimalBI Limited
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forestsDebdoot Sheet
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Longhow Lam
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble MethodsDongseo University
 
5.4 Arbres et forêts aléatoires
5.4 Arbres et forêts aléatoires5.4 Arbres et forêts aléatoires
5.4 Arbres et forêts aléatoiresBoris Guarisma
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 

Andere mochten auch (12)

Data Mining
Data MiningData Mining
Data Mining
 
ランダムフォレストとそのコンピュータビジョンへの応用
ランダムフォレストとそのコンピュータビジョンへの応用ランダムフォレストとそのコンピュータビジョンへの応用
ランダムフォレストとそのコンピュータビジョンへの応用
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees
 
Ensemble modeling overview, Big Data meetup
Ensemble modeling overview, Big Data meetupEnsemble modeling overview, Big Data meetup
Ensemble modeling overview, Big Data meetup
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forests
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
 
5.4 Arbres et forêts aléatoires
5.4 Arbres et forêts aléatoires5.4 Arbres et forêts aléatoires
5.4 Arbres et forêts aléatoires
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 

Ähnlich wie Download It

Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
slides
slidesslides
slidesbutest
 
Summary.ppt
Summary.pptSummary.ppt
Summary.pptbutest
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...butest
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive ModelsDatamining Tools
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 

Ähnlich wie Download It (20)

Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
13 random forest
13 random forest13 random forest
13 random forest
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
slides
slidesslides
slides
 
Summary.ppt
Summary.pptSummary.ppt
Summary.ppt
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
ppt
pptppt
ppt
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
Supervised algorithms
Supervised algorithmsSupervised algorithms
Supervised algorithms
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Download It

  • 1. Zhuowen Tu Lab of Neuro Imaging, Department of Neurology Department of Computer Science University of California, Los Angeles Ensemble Classification Methods: Bagging, Boosting, and Random Forests Some slides are due to Robert Schapire and Pier Luca Lnzi
  • 2. Discriminative v.s. Generative Models Generative and discriminative learning are key problems in machine learning and computer vision. If you are asking, “ Are there any faces in this image ?”, then you would probably want to use discriminative methods . If you are asking, “Find a 3-d model that describes the runner”, then you would use generative methods . ICCV W. Freeman and A. Blake
  • 3. Discriminative v.s. Generative Models Discriminative models, either explicitly or implicitly , study the posterior distribution directly. Generative approaches model the likelihood and prior separately.
  • 4. Some Literature Perceptron and Neural networks ( Rosenblatt 1958, Windrow and Hoff 1960, Hopfiled 1982, Rumelhart and McClelland 1986, Lecun et al. 1998 ) Support Vector Machine ( Vapnik 1995 ) Bagging, Boosting,… ( Breiman 1994, Freund and Schapire 1995, Friedman et al. 1998, ) Discriminative Approaches: Nearest neighborhood classifier ( Hart 1968 ) Fisher linear discriminant analysis ( Fisher ) … Generative Approaches: PCA, TCA, ICA ( Karhunen and Loeve 1947, H´erault et al. 1980, Frey and Jojic 1999 ) MRFs, Particle Filtering ( Ising, Geman and Geman 1994, Isard and Blake 1996 ) Maximum Entropy Model ( Della Pietra et al. 1997, Zhu et al. 1997, Hinton 2002 ) Deep Nets ( Hinton et al. 2006 ) … .
  • 5. Pros and Cons of Discriminative Models Focused on discrimination and marginal distributions. Easier to learn/compute than generative models (arguable). Good performance with large training volume. Often fast. Pros: Some general views, but might be outdated Cons: Limited modeling capability. Can not generate new data. Require both positive and negative training data (mostly). Performance largely degrades on small size training data.
  • 6. Intuition about Margin Infant Elderly Man Woman ? ?
  • 7. Problem with All Margin-based Discriminative Classifier It might be very miss-leading to return a high confidence.
  • 8. Several Pair of Concepts Generative v.s. Discriminative Parametric v.s. Non-parametric Supervised v.s. Unsupervised The gap between them is becoming increasingly small.
  • 9. Parametric v.s. Non-parametric Non-parametric: Parametric: nearest neighborhood kernel methods decision tree neural nets Gaussian processes … logistic regression Fisher discriminant analysis Graphical models hierarchical models bagging, boosting … It roughly depends on if the number of parameters increases with the number of samples. Their distinction is not absolute.
  • 10. Empirical Comparisons of Different Algorithms Caruana and Niculesu-Mizil, ICML 2006 Overall rank by mean performance across problems and metrics (based on bootstrap analysis). BST-DT: boosting with decision tree weak classifier RF: random forest BAG-DT: bagging with decision tree weak classifier SVM: support vector machine ANN: neural nets KNN: k nearest neighboorhood BST-STMP: boosting with decision stump weak classifier DT: decision tree LOGREG: logistic regression NB: naïve Bayesian It is informative, but by no means final.
  • 11. Empirical Study on High-dimension Caruana et al., ICML 2008 Moving average standardized scores of each learning algorithm as a function of the dimension. The rank for the algorithms to perform consistently well: (1) random forest (2) neural nets (3) boosted tree (4) SVMs
  • 12. Ensemble Methods Bagging ( Breiman 1994,… ) Boosting ( Freund and Schapire 1995, Friedman et al. 1998,… ) Random forests ( Breiman 2001,… ) Predict class label for unseen data by aggregating a set of predictions (classifiers learned from the training data).
  • 13. General Idea S Training Data S 1 S 2 S n Multiple Data Sets C 1 C 2 C n Multiple Classifiers H Combined Classifier
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Why Bagging works? Direct error: Bagging error: Jensen’s inequality:
  • 22.
  • 23. Ensemble Methods Bagging ( Breiman 1994,… ) Boosting ( Freund and Schapire 1995, Friedman et al. 1998,… ) Random forests ( Breiman 2001,… )
  • 24. A Formal Description of Boosting
  • 25. AdaBoost ( Freund and Schpaire ) ( not necessarily with equal weight )
  • 29. Training Error Two take home messages: (1) The first chosen weak learner is already informative about the difficulty of the classification algorithm (1) Bound is achieved when they are complementary to each other. Tu et al. 2006
  • 41. Coordinate Descent Explanation Step 1: find the best to minimize the error. Step 2: estimate to minimize the error on
  • 43. Benefits of Model Fitting View
  • 44.
  • 45.
  • 46. Variations of Boosting Confidence rated Predictions ( Singer and Schapire )
  • 48. Variations of Boosting ( Friedman et al. 98 ) The AdaBoost (discrete) algorithm fits an additive logistic regression model by using adaptive Newton updates for minimizing
  • 49. LogiBoost The LogiBoost algorithm uses adaptive Newton steps for fitting an additive symmetric logistic model by maximum likelihood.
  • 50. Real AdaBoost The Real AdaBoost algorithm fits an additive logistic regression model by stage-wise optimization of
  • 51. Gental AdaBoost The Gental AdaBoost algorithmuses adaptive Newton steps for minimizing
  • 52. Choices of Error Functions
  • 53. Multi-Class Classification One v.s. All seems to work very well most of the time. R. Rifkin and A. Klautau, “In defense of one-vs-all classification”, J. Mach. Learn. Res, 2004 Error output code seems to be useful when the number of classes is big.
  • 54. Data-assisted Output Code ( Jiang and Tu 09 )
  • 55. Ensemble Methods Bagging ( Breiman 1994,… ) Boosting ( Freund and Schapire 1995, Friedman et al. 1998,… ) Random forests ( Breiman 2001,… )
  • 56.
  • 57. The Random Forests Algorithm Given a training set S For i = 1 to k do: Build subset S i by sampling with replacement from S Learn tree T i from Si At each node: Choose best split from random subset of F features Each tree grows to the largest extend, and no pruning Make predictions according to majority vote of the set of k trees.
  • 58.
  • 59.
  • 60.
  • 61. Problems with On-line Boosting Oza and Russel The weights are changed gradually, but not the weak learners themselves! Random forests can handle on-line more naturally.
  • 62.
  • 63.
  • 64.