SlideShare ist ein Scribd-Unternehmen logo
1 von 8
TreeNet Tree
Ensembles and CART
  Decision Trees: A
Winning Combination
                                                                             October 2012
                                                                          Mikhail Golovnya
                                                                           Salford Systems

CART® software is a trademark of California Statistical Software, Inc. and is licensed exclusively to Salford Systems.
TreeNet® software is a trademark of Salford Systems
Course Outline
• CART decision tree pros/cons
• TreeNet stochastic gradient boosting: a promising
  way to overcome the shortcomings of a single tree
• Introducing TreeNet, a powerful modern ensemble
  of boosted trees
    o   Methodology
    o   Reporting
    o   Interpretability
    o   Post-processing
    o   Interaction detection
• Advantages of using both CART and TreeNet
    o Contribution from CART
    o Contribution from TreeNet



 © Salford Systems 2012
Demonstration Dataset
108,376 bank customers (commercial and individual)
with 6,564 in bad standing over the past two years
Goal: identify customers in bad standing using the
following predictors
Revolving utilization of credit
Age of the primary account holder
Debt ratio of the primary account holder
Monthly income
Number of open credit lines
Number of mortgages
Number of dependents

 © Salford Systems 2012
CART Advantages
1. Relatively fast
2. All types of variables
    1.    Numeric, binary, categorical, missing values

3. Invariant under monotone transformations
    1.    Variable scales are irrelevant
    2.    Immunity to outliers
    3.    Most variables can be used “as is”

4. Resistance to many irrelevant variables
5. Few tunable parameters “off-the-shelf” procedure
6. Interpretable model representation



 © Salford Systems 2012
CART Disadvantages
1. Trade-off: accuracy vs. interpretability
2. Piecewise-constant model
    1.    Big errors near region boundaries
    2.    Impossible to detect fine differences within the segment

3. Instability => high variance
    1.    Small data change => big model change (especially for large trees)

4. Data fragmentation – splitting
5. High interaction order model, unreasonably
   complicated way to represent simple additive
   dependencies



 © Salford Systems 2012
TreeNet Tree Ensembles
• Complements CART advantages, while
  dramatically increasing accuracy

       Tree 1                  Tree 2                    Tree 3


                         +                        +




  First tree grown           2nd tree grown on        3rd tree grown on
     on original              residuals from            residuals from
        target.              first. Predictions       model consisting
    Intentionally            made to improve           of first two trees
   “weak” model                   first tree


© Salford Systems 2012
TreeNet Overcomes
         CART’s Shortcomings
Piecewise-Constant         CART                           TreeNet
Model                      Big errors near region         Fine predictions, nearly
                           boundaries, coarse             emulating smooth
                           predictions                    continuous response
                                                          surface
Instability and Variance   CART                           TreeNet
                           Small data changes             Stable models due to
                           induce big model changes       averaging of individual
                           (especially for large trees)   tree responses
Data Fragmentation         CART                           TreeNet
                           Relatively few predictors      Each tree works with the
                           make it into the model         entire data – many
                                                          opportunities for
                                                          variables to enter
High Interaction Order     CART                           TreeNet
Model                      Always enforced                Allows precise control
  © Salford Systems 2012                                  over the interactions
TreeNet and CART
 A Winning Combination



© Salford Systems 2012

Weitere ähnliche Inhalte

Andere mochten auch

医用画像情報イントロダクション Ver.1 0_20160726
医用画像情報イントロダクション Ver.1 0_20160726医用画像情報イントロダクション Ver.1 0_20160726
医用画像情報イントロダクション Ver.1 0_20160726Tatsuaki Kobayashi
 
Smokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSmokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSteven Kizior
 
FAQ How do I find My Ideal Virtual Assistant
FAQ How do I find My Ideal Virtual AssistantFAQ How do I find My Ideal Virtual Assistant
FAQ How do I find My Ideal Virtual AssistantKaren Repoli
 
Recommendation Letter - Xiuting
Recommendation Letter - XiutingRecommendation Letter - Xiuting
Recommendation Letter - XiutingXiuting Hao
 
PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814vereadoreduardo
 
Mapas conceptuales de proyectos .....
Mapas conceptuales de proyectos .....Mapas conceptuales de proyectos .....
Mapas conceptuales de proyectos .....Silvia Alba Gonzalez
 
8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은jin_yoo
 
Mobile marketing e geolocalização: um mundo de possibilidades
Mobile marketing e geolocalização: um mundo de possibilidadesMobile marketing e geolocalização: um mundo de possibilidades
Mobile marketing e geolocalização: um mundo de possibilidadesVanissa Wanick
 

Andere mochten auch (13)

医用画像情報イントロダクション Ver.1 0_20160726
医用画像情報イントロダクション Ver.1 0_20160726医用画像情報イントロダクション Ver.1 0_20160726
医用画像情報イントロダクション Ver.1 0_20160726
 
Smokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSmokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral Cancer
 
FAQ How do I find My Ideal Virtual Assistant
FAQ How do I find My Ideal Virtual AssistantFAQ How do I find My Ideal Virtual Assistant
FAQ How do I find My Ideal Virtual Assistant
 
94 1006-1-pb
94 1006-1-pb94 1006-1-pb
94 1006-1-pb
 
Recommendation Letter - Xiuting
Recommendation Letter - XiutingRecommendation Letter - Xiuting
Recommendation Letter - Xiuting
 
PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814
 
Ms word1
Ms word1Ms word1
Ms word1
 
日本
日本日本
日本
 
Mapas conceptuales de proyectos .....
Mapas conceptuales de proyectos .....Mapas conceptuales de proyectos .....
Mapas conceptuales de proyectos .....
 
8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은
 
8i standby
8i standby8i standby
8i standby
 
Mobile marketing e geolocalização: um mundo de possibilidades
Mobile marketing e geolocalização: um mundo de possibilidadesMobile marketing e geolocalização: um mundo de possibilidades
Mobile marketing e geolocalização: um mundo de possibilidades
 
Entonar
EntonarEntonar
Entonar
 

Ähnlich wie TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination

Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7Salford Systems
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model TreesStratio
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular dataJimmyLiang20
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Itembarthriley
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assemblyc.titus.brown
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsShouvic Banik0139
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?Ben Stopford
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfJayanti Pande
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Subdivision of large uniform stands lacking natural bounding features
Subdivision of large uniform stands lacking natural bounding featuresSubdivision of large uniform stands lacking natural bounding features
Subdivision of large uniform stands lacking natural bounding featuresKR Walters Consulting Services
 
DDS in SCADA, Utilities, Smart Grid and Smart Cities
DDS in SCADA, Utilities, Smart Grid and Smart CitiesDDS in SCADA, Utilities, Smart Grid and Smart Cities
DDS in SCADA, Utilities, Smart Grid and Smart CitiesAngelo Corsaro
 
Random forests-talk-nl-meetup
Random forests-talk-nl-meetupRandom forests-talk-nl-meetup
Random forests-talk-nl-meetupWillem Hendriks
 

Ähnlich wie TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination (20)

Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
18 Simple CART
18 Simple CART18 Simple CART
18 Simple CART
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Item
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assembly
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdf
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
10 best practices in operational analytics
10 best practices in operational analytics 10 best practices in operational analytics
10 best practices in operational analytics
 
Morse-Smale Regression
Morse-Smale RegressionMorse-Smale Regression
Morse-Smale Regression
 
Subdivision of large uniform stands lacking natural bounding features
Subdivision of large uniform stands lacking natural bounding featuresSubdivision of large uniform stands lacking natural bounding features
Subdivision of large uniform stands lacking natural bounding features
 
DDS in SCADA, Utilities, Smart Grid and Smart Cities
DDS in SCADA, Utilities, Smart Grid and Smart CitiesDDS in SCADA, Utilities, Smart Grid and Smart Cities
DDS in SCADA, Utilities, Smart Grid and Smart Cities
 
Random forests-talk-nl-meetup
Random forests-talk-nl-meetupRandom forests-talk-nl-meetup
Random forests-talk-nl-meetup
 

Mehr von Salford Systems

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetSalford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningSalford Systems
 
Global Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeGlobal Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeSalford Systems
 
Predicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetPredicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetSalford Systems
 

Mehr von Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
 
Global Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeGlobal Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate Change
 
Predicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetPredicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNet
 

TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination

  • 1. TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination October 2012 Mikhail Golovnya Salford Systems CART® software is a trademark of California Statistical Software, Inc. and is licensed exclusively to Salford Systems. TreeNet® software is a trademark of Salford Systems
  • 2. Course Outline • CART decision tree pros/cons • TreeNet stochastic gradient boosting: a promising way to overcome the shortcomings of a single tree • Introducing TreeNet, a powerful modern ensemble of boosted trees o Methodology o Reporting o Interpretability o Post-processing o Interaction detection • Advantages of using both CART and TreeNet o Contribution from CART o Contribution from TreeNet © Salford Systems 2012
  • 3. Demonstration Dataset 108,376 bank customers (commercial and individual) with 6,564 in bad standing over the past two years Goal: identify customers in bad standing using the following predictors Revolving utilization of credit Age of the primary account holder Debt ratio of the primary account holder Monthly income Number of open credit lines Number of mortgages Number of dependents © Salford Systems 2012
  • 4. CART Advantages 1. Relatively fast 2. All types of variables 1. Numeric, binary, categorical, missing values 3. Invariant under monotone transformations 1. Variable scales are irrelevant 2. Immunity to outliers 3. Most variables can be used “as is” 4. Resistance to many irrelevant variables 5. Few tunable parameters “off-the-shelf” procedure 6. Interpretable model representation © Salford Systems 2012
  • 5. CART Disadvantages 1. Trade-off: accuracy vs. interpretability 2. Piecewise-constant model 1. Big errors near region boundaries 2. Impossible to detect fine differences within the segment 3. Instability => high variance 1. Small data change => big model change (especially for large trees) 4. Data fragmentation – splitting 5. High interaction order model, unreasonably complicated way to represent simple additive dependencies © Salford Systems 2012
  • 6. TreeNet Tree Ensembles • Complements CART advantages, while dramatically increasing accuracy Tree 1 Tree 2 Tree 3 + + First tree grown 2nd tree grown on 3rd tree grown on on original residuals from residuals from target. first. Predictions model consisting Intentionally made to improve of first two trees “weak” model first tree © Salford Systems 2012
  • 7. TreeNet Overcomes CART’s Shortcomings Piecewise-Constant CART TreeNet Model Big errors near region Fine predictions, nearly boundaries, coarse emulating smooth predictions continuous response surface Instability and Variance CART TreeNet Small data changes Stable models due to induce big model changes averaging of individual (especially for large trees) tree responses Data Fragmentation CART TreeNet Relatively few predictors Each tree works with the make it into the model entire data – many opportunities for variables to enter High Interaction Order CART TreeNet Model Always enforced Allows precise control © Salford Systems 2012 over the interactions
  • 8. TreeNet and CART A Winning Combination © Salford Systems 2012