SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Gradient Boosted Regression Trees
scikit
Peter Prettenhofer (@pprett)
DataRobot
Gilles Louppe (@glouppe)
Universit´e de Li`ege, Belgium
Motivation
Motivation
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
About us
Peter
• @pprett
• Python & ML ∼ 6 years
• sklearn dev since 2010
Gilles
• @glouppe
• PhD student (Li`ege,
Belgium)
• sklearn dev since 2011
Chief tree hugger
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
Machine Learning 101
• Data comes as...
• A set of examples {(xi , yi )|0 ≤ i < n samples}, with
• Feature vector x ∈ Rn features
, and
• Response y ∈ R (regression) or y ∈ {−1, 1} (classification)
• Goal is to...
• Find a function ˆy = f (x)
• Such that error L(y, ˆy) on new (unseen) x is minimal
Classification and Regression Trees [Breiman et al, 1984]
MedInc <= 5.04
MedInc <= 3.07 MedInc <= 6.82
AveRooms <= 4.31 AveOccup <= 2.37
1.62 1.16 2.79 1.88
AveOccup <= 2.74 MedInc <= 7.82
3.39 2.56 3.73 4.57
sklearn.tree.DecisionTreeClassifier|Regressor
Function approximation with Regression Trees
0 2 4 6 8 10
x
8
6
4
2
0
2
4
6
8
10y
ground truth
RT d=1
RT d=3
RT d=20
Function approximation with Regression Trees
0 2 4 6 8 10
x
8
6
4
2
0
2
4
6
8
10y
ground truth
RT d=1
RT d=3
RT d=20
Deprecated
• Nowadays seldom used alone
• Ensembles: Random Forest, Bagging, or Boosting
(see sklearn.ensemble)
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
Gradient Boosted Regression Trees
Advantages
• Heterogeneous data (features measured on different scale),
• Supports different loss functions (e.g. huber),
• Automatically detects (non-linear) feature interactions,
Disadvantages
• Requires careful tuning
• Slow to train (but fast to predict)
• Cannot extrapolate
Boosting
AdaBoost [Y. Freund & R. Schapire, 1995]
• Ensemble: each member is an expert on the errors of its
predecessor
• Iteratively re-weights training examples based on errors
2 1 0 1 2 3
x0
2
1
0
1
2
x1
2 1 0 1 2 3
x0
2 1 0 1 2 3
x0
2 1 0 1 2 3
x0
sklearn.ensemble.AdaBoostClassifier|Regressor
Boosting
AdaBoost [Y. Freund & R. Schapire, 1995]
• Ensemble: each member is an expert on the errors of its
predecessor
• Iteratively re-weights training examples based on errors
2 1 0 1 2 3
x0
2
1
0
1
2
x1
2 1 0 1 2 3
x0
2 1 0 1 2 3
x0
2 1 0 1 2 3
x0
sklearn.ensemble.AdaBoostClassifier|Regressor
Huge success
• Viola-Jones Face Detector (2001)
• Freund & Schapire won the G¨odel prize 2003
Gradient Boosting [J. Friedman, 1999]
Statistical view on boosting
• ⇒ Generalization of boosting to arbitrary loss functions
Gradient Boosting [J. Friedman, 1999]
Statistical view on boosting
• ⇒ Generalization of boosting to arbitrary loss functions
Residual fitting
2 6 10
x
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0
2.5
y
Ground truth
2 6 10
x
∼
tree 1
2 6 10
x
+
tree 2
2 6 10
x
+
tree 3
sklearn.ensemble.GradientBoostingClassifier|Regressor
Functional Gradient Descent
Least Squares Regression
• Squared loss: L(yi , f (xi )) = (yi − f (xi ))2
• The residual ∼ the (negative) gradient ∂L(yi , f (xi ))
∂f (xi )
Functional Gradient Descent
Least Squares Regression
• Squared loss: L(yi , f (xi )) = (yi − f (xi ))2
• The residual ∼ the (negative) gradient ∂L(yi , f (xi ))
∂f (xi )
Steepest Descent
• Regression trees approximate the (negative) gradient
• Each tree is a successive gradient descent step
4 3 2 1 0 1 2 3 4
y−f(x)
0
1
2
3
4
5
6
7
8
L(y,f(x))
Squared error
Absolute error
Huber error
4 3 2 1 0 1 2 3 4
y·f(x)
0
1
2
3
4
5
6
7
8
L(y,f(x))
Zero-one loss
Log loss
Exponential loss
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
GBRT in scikit-learn
How to use it
>>> from sklearn.ensemble import GradientBoostingClassifier
>>> from sklearn.datasets import make_hastie_10_2
>>> X, y = make_hastie_10_2(n_samples=10000)
>>> est = GradientBoostingClassifier(n_estimators=200, max_depth=3)
>>> est.fit(X, y)
...
>>> # get predictions
>>> pred = est.predict(X)
>>> est.predict_proba(X)[0] # class probabilities
array([ 0.67, 0.33])
Implementation
• Written in pure Python/Numpy (easy to extend).
• Builds on top of sklearn.tree.DecisionTreeRegressor (Cython).
• Custom node splitter that uses pre-sorting (better for shallow trees).
Example
from sklearn.ensemble import GradientBoostingRegressor
est = GradientBoostingRegressor(n_estimators=2000, max_depth=1).fit(X, y)
for pred in est.staged_predict(X):
plt.plot(X[:, 0], pred, color=’r’, alpha=0.1)
0 2 4 6 8 10
x
8
6
4
2
0
2
4
6
8
10
y
High bias - low variance
Low bias - high variance
ground truth
RT d=1
RT d=3
GBRT d=1
Model complexity & Overfitting
test_score = np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test, pred)
plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’)
plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’)
0 200 400 600 800 1000
n_estimators
0.0
0.5
1.0
1.5
2.0
Error
Lowest test error
train-test gap
Test
Train
Model complexity & Overfitting
test_score = np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test, pred)
plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’)
plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’)
0 200 400 600 800 1000
n_estimators
0.0
0.5
1.0
1.5
2.0
Error
Lowest test error
train-test gap
Test
Train
Regularization
GBRT provides a number of knobs to control
overfitting
• Tree structure
• Shrinkage
• Stochastic Gradient Boosting
Regularization: Tree structure
• The max depth of the trees controls the degree of features interactions
• Use min samples leaf to have a sufficient nr. of samples per leaf.
Regularization: Shrinkage
• Slow learning by shrinking tree predictions with 0 < learning rate <= 1
• Lower learning rate requires higher n estimators
0 200 400 600 800 1000
n_estimators
0.0
0.5
1.0
1.5
2.0
Error
Requires more trees
Lower test error
Test
Train
Test learning_rate=0.1
Train learning_rate=0.1
Regularization: Stochastic Gradient Boosting
• Samples: random subset of the training set (subsample)
• Features: random subset of features (max features)
• Improved accuracy – reduced runtime
0 200 400 600 800 1000
n_estimators
0.0
0.5
1.0
1.5
2.0
Error
Even lower test error
Subsample alone does poorly
Train
Test
Train subsample=0.5, learning_rate=0.1
Test subsample=0.5, learning_rate=0.1
Hyperparameter tuning
1. Set n estimators as high as possible (eg. 3000)
2. Tune hyperparameters via grid search.
from sklearn.grid_search import GridSearchCV
param_grid = {’learning_rate’: [0.1, 0.05, 0.02, 0.01],
’max_depth’: [4, 6],
’min_samples_leaf’: [3, 5, 9, 17],
’max_features’: [1.0, 0.3, 0.1]}
est = GradientBoostingRegressor(n_estimators=3000)
gs_cv = GridSearchCV(est, param_grid).fit(X, y)
# best hyperparameter setting
gs_cv.best_params_
3. Finally, set n estimators even higher and tune
learning rate.
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
Case Study
California Housing dataset
• Predict log(medianHouseValue)
• Block groups in 1990 census
• 20.640 groups with 8 features
(median income, median age, lat,
lon, ...)
• Evaluation: Mean absolute error
on 80/20 split
Challenges
• Heterogeneous features
• Non-linear interactions
Predictive accuracy & runtime
Train time [s] Test time [ms] MAE
Mean - - 0.4635
Ridge 0.006 0.11 0.2756
SVR 28.0 2000.00 0.1888
RF 26.3 605.00 0.1620
GBRT 192.0 439.00 0.1438
0 500 1000 1500 2000 2500 3000
n_estimators
0.0
0.1
0.2
0.3
0.4
0.5
error
Test
Train
Model interpretation
Which features are important?
>>> est.feature_importances_
array([ 0.01, 0.38, ...])
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18
Relative importance
HouseAge
Population
AveBedrms
Latitude
AveOccup
Longitude
AveRooms
MedInc
Model interpretation
What is the effect of a feature on the response?
from sklearn.ensemble import partial_dependence import as pd
features = [’MedInc’, ’AveOccup’, ’HouseAge’, ’AveRooms’,
(’AveOccup’, ’HouseAge’)]
fig, axs = pd.plot_partial_dependence(est, X_train, features,
feature_names=names)
1.5 3.0 4.5 6.0 7.5
MedInc
0.4
0.2
0.0
0.2
0.4
0.6
Partialdependence
2.0 2.5 3.0 3.54.0 4.5
AveOccup
0.4
0.2
0.0
0.2
0.4
0.6
Partialdependence
10 20 30 40 50 60
HouseAge
0.4
0.2
0.0
0.2
0.4
0.6
Partialdependence
4 5 6 7 8
AveRooms
0.4
0.2
0.0
0.2
0.4
0.6
Partialdependence
2.0 2.5 3.0 3.5 4.0
AveOccup
10
20
30
40
50
HouseAge
-0.12
-0.05
0.02
0.090.16
0.23
Partial dependence of house value on nonlocation features
for the California housing dataset
Model interpretation
Automatically detects spatial effects
longitude
latitude
-1.54
-1.22
-0.91
-0.60
-0.28
0.03
0.34
0.66
0.97
partialdep.onmedianhousevalue
longitudelatitude
-0.15
-0.07
0.01
0.09
0.17
0.25
0.33
0.41
0.49
0.57
partialdep.onmedianhousevalue
Summary
• Flexible non-parametric classification and regression technique
• Applicable to a variety of problems
• Solid, battle-worn implementation in scikit-learn
Thanks! Questions?
Benchmarks
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Error
gbm
sklearn-0.15
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Traintime
Arcene
Boston
California
Covtype
Example10.2
Expedia
Madelon
Solar
Spam
YahooLTRC
bioresp
dataset
0.0
0.2
0.4
0.6
0.8
1.0
Testtime
Tipps & Tricks 1
Input layout
Use dtype=np.float32 to avoid memory copies and fortan layout for slight
runtime benefit.
X = np.asfortranarray(X, dtype=np.float32)
Tipps & Tricks 2
Feature interactions
GBRT automatically detects feature interactions but often explicit interactions
help.
Trees required to approximate X1 − X2: 10 (left), 1000 (right).
x 0.00.20.40.60.81.0
y
0.0
0.2
0.4
0.6
0.8
1.0
x-y
0.3
0.2
0.1
0.0
0.1
0.2
0.3
x 0.00.20.40.60.81.0
y
0.0
0.2
0.4
0.6
0.8
1.0
x-y
1.0
0.5
0.0
0.5
1.0
Tipps & Tricks 3
Categorical variables
Sklearn requires that categorical variables are encoded as numerics. Tree-based
methods work well with ordinal encoding:
df = pd.DataFrame(data={’icao’: [’CRJ2’, ’A380’, ’B737’, ’B737’]})
# ordinal encoding
df_enc = pd.DataFrame(data={’icao’: np.unique(df.icao,
return_inverse=True)[1]})
X = np.asfortranarray(df_enc.values, dtype=np.float32)

Weitere ähnliche Inhalte

Was ist angesagt?

Thyroid cancer / papillary carcinoma (Doctor Faris Alabeedi MSc, MMedSc, PgD...
Thyroid cancer /  papillary carcinoma (Doctor Faris Alabeedi MSc, MMedSc, PgD...Thyroid cancer /  papillary carcinoma (Doctor Faris Alabeedi MSc, MMedSc, PgD...
Thyroid cancer / papillary carcinoma (Doctor Faris Alabeedi MSc, MMedSc, PgD...Doctor Faris Alabeedi
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...Deep Learning JP
 
Natural Policy Gradient 직관적 접근
Natural Policy Gradient 직관적 접근Natural Policy Gradient 직관적 접근
Natural Policy Gradient 직관적 접근Sooyoung Moon
 
Pancreatic cancer Management (pancreatic adenocarcinoma)
Pancreatic cancer Management (pancreatic adenocarcinoma)Pancreatic cancer Management (pancreatic adenocarcinoma)
Pancreatic cancer Management (pancreatic adenocarcinoma)Dr mohamed Salat Gonjobe
 
パターン認識と機械学習 (PRML) 第1章-「多項式曲線フィッティング」「確率論」
パターン認識と機械学習 (PRML) 第1章-「多項式曲線フィッティング」「確率論」パターン認識と機械学習 (PRML) 第1章-「多項式曲線フィッティング」「確率論」
パターン認識と機械学習 (PRML) 第1章-「多項式曲線フィッティング」「確率論」Koichi Hamada
 
Necrotizing fascitis
Necrotizing fascitisNecrotizing fascitis
Necrotizing fascitisAkhil Joseph
 
From REINFORCE to PPO
From REINFORCE to PPOFrom REINFORCE to PPO
From REINFORCE to PPOWoong won Lee
 
PITUITARY TUMORS.pptx
PITUITARY TUMORS.pptxPITUITARY TUMORS.pptx
PITUITARY TUMORS.pptxDr Monica P
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論Taiji Suzuki
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)cairo university
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1Kumar P
 
십분딥러닝_16_WGAN (Wasserstein GANs)
십분딥러닝_16_WGAN (Wasserstein GANs)십분딥러닝_16_WGAN (Wasserstein GANs)
십분딥러닝_16_WGAN (Wasserstein GANs)HyunKyu Jeon
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016Taehoon Kim
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisXin-She Yang
 
Malinant Salivary Gland Tumours Dr.Shaji Thomas Additional Professor,Regional...
Malinant Salivary Gland Tumours Dr.Shaji Thomas Additional Professor,Regional...Malinant Salivary Gland Tumours Dr.Shaji Thomas Additional Professor,Regional...
Malinant Salivary Gland Tumours Dr.Shaji Thomas Additional Professor,Regional...shajithoma
 

Was ist angesagt? (20)

Thyroid cancer / papillary carcinoma (Doctor Faris Alabeedi MSc, MMedSc, PgD...
Thyroid cancer /  papillary carcinoma (Doctor Faris Alabeedi MSc, MMedSc, PgD...Thyroid cancer /  papillary carcinoma (Doctor Faris Alabeedi MSc, MMedSc, PgD...
Thyroid cancer / papillary carcinoma (Doctor Faris Alabeedi MSc, MMedSc, PgD...
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
 
Natural Policy Gradient 직관적 접근
Natural Policy Gradient 직관적 접근Natural Policy Gradient 직관적 접근
Natural Policy Gradient 직관적 접근
 
Pancreatic cancer Management (pancreatic adenocarcinoma)
Pancreatic cancer Management (pancreatic adenocarcinoma)Pancreatic cancer Management (pancreatic adenocarcinoma)
Pancreatic cancer Management (pancreatic adenocarcinoma)
 
パターン認識と機械学習 (PRML) 第1章-「多項式曲線フィッティング」「確率論」
パターン認識と機械学習 (PRML) 第1章-「多項式曲線フィッティング」「確率論」パターン認識と機械学習 (PRML) 第1章-「多項式曲線フィッティング」「確率論」
パターン認識と機械学習 (PRML) 第1章-「多項式曲線フィッティング」「確率論」
 
Necrotizing fascitis
Necrotizing fascitisNecrotizing fascitis
Necrotizing fascitis
 
From REINFORCE to PPO
From REINFORCE to PPOFrom REINFORCE to PPO
From REINFORCE to PPO
 
PITUITARY TUMORS.pptx
PITUITARY TUMORS.pptxPITUITARY TUMORS.pptx
PITUITARY TUMORS.pptx
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)
 
Thyroid nodule management
Thyroid nodule managementThyroid nodule management
Thyroid nodule management
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1
 
십분딥러닝_16_WGAN (Wasserstein GANs)
십분딥러닝_16_WGAN (Wasserstein GANs)십분딥러닝_16_WGAN (Wasserstein GANs)
십분딥러닝_16_WGAN (Wasserstein GANs)
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical Analysis
 
Minimally invasive thyroid surgery by A. Shaha
Minimally invasive thyroid surgery by A. ShahaMinimally invasive thyroid surgery by A. Shaha
Minimally invasive thyroid surgery by A. Shaha
 
Malinant Salivary Gland Tumours Dr.Shaji Thomas Additional Professor,Regional...
Malinant Salivary Gland Tumours Dr.Shaji Thomas Additional Professor,Regional...Malinant Salivary Gland Tumours Dr.Shaji Thomas Additional Professor,Regional...
Malinant Salivary Gland Tumours Dr.Shaji Thomas Additional Professor,Regional...
 

Ähnlich wie Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Prettenhofer

Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentationrohan_anil
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoringharmonylab
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionSeonho Park
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep LearningSourya Dey
 

Ähnlich wie Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Prettenhofer (20)

Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 

Mehr von PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

Mehr von PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Kürzlich hochgeladen

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Prettenhofer

  • 1. Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) DataRobot Gilles Louppe (@glouppe) Universit´e de Li`ege, Belgium
  • 4. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 5. About us Peter • @pprett • Python & ML ∼ 6 years • sklearn dev since 2010 Gilles • @glouppe • PhD student (Li`ege, Belgium) • sklearn dev since 2011 Chief tree hugger
  • 6. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 7. Machine Learning 101 • Data comes as... • A set of examples {(xi , yi )|0 ≤ i < n samples}, with • Feature vector x ∈ Rn features , and • Response y ∈ R (regression) or y ∈ {−1, 1} (classification) • Goal is to... • Find a function ˆy = f (x) • Such that error L(y, ˆy) on new (unseen) x is minimal
  • 8. Classification and Regression Trees [Breiman et al, 1984] MedInc <= 5.04 MedInc <= 3.07 MedInc <= 6.82 AveRooms <= 4.31 AveOccup <= 2.37 1.62 1.16 2.79 1.88 AveOccup <= 2.74 MedInc <= 7.82 3.39 2.56 3.73 4.57 sklearn.tree.DecisionTreeClassifier|Regressor
  • 9. Function approximation with Regression Trees 0 2 4 6 8 10 x 8 6 4 2 0 2 4 6 8 10y ground truth RT d=1 RT d=3 RT d=20
  • 10. Function approximation with Regression Trees 0 2 4 6 8 10 x 8 6 4 2 0 2 4 6 8 10y ground truth RT d=1 RT d=3 RT d=20 Deprecated • Nowadays seldom used alone • Ensembles: Random Forest, Bagging, or Boosting (see sklearn.ensemble)
  • 11. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 12. Gradient Boosted Regression Trees Advantages • Heterogeneous data (features measured on different scale), • Supports different loss functions (e.g. huber), • Automatically detects (non-linear) feature interactions, Disadvantages • Requires careful tuning • Slow to train (but fast to predict) • Cannot extrapolate
  • 13. Boosting AdaBoost [Y. Freund & R. Schapire, 1995] • Ensemble: each member is an expert on the errors of its predecessor • Iteratively re-weights training examples based on errors 2 1 0 1 2 3 x0 2 1 0 1 2 x1 2 1 0 1 2 3 x0 2 1 0 1 2 3 x0 2 1 0 1 2 3 x0 sklearn.ensemble.AdaBoostClassifier|Regressor
  • 14. Boosting AdaBoost [Y. Freund & R. Schapire, 1995] • Ensemble: each member is an expert on the errors of its predecessor • Iteratively re-weights training examples based on errors 2 1 0 1 2 3 x0 2 1 0 1 2 x1 2 1 0 1 2 3 x0 2 1 0 1 2 3 x0 2 1 0 1 2 3 x0 sklearn.ensemble.AdaBoostClassifier|Regressor Huge success • Viola-Jones Face Detector (2001) • Freund & Schapire won the G¨odel prize 2003
  • 15. Gradient Boosting [J. Friedman, 1999] Statistical view on boosting • ⇒ Generalization of boosting to arbitrary loss functions
  • 16. Gradient Boosting [J. Friedman, 1999] Statistical view on boosting • ⇒ Generalization of boosting to arbitrary loss functions Residual fitting 2 6 10 x 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 y Ground truth 2 6 10 x ∼ tree 1 2 6 10 x + tree 2 2 6 10 x + tree 3 sklearn.ensemble.GradientBoostingClassifier|Regressor
  • 17. Functional Gradient Descent Least Squares Regression • Squared loss: L(yi , f (xi )) = (yi − f (xi ))2 • The residual ∼ the (negative) gradient ∂L(yi , f (xi )) ∂f (xi )
  • 18. Functional Gradient Descent Least Squares Regression • Squared loss: L(yi , f (xi )) = (yi − f (xi ))2 • The residual ∼ the (negative) gradient ∂L(yi , f (xi )) ∂f (xi ) Steepest Descent • Regression trees approximate the (negative) gradient • Each tree is a successive gradient descent step 4 3 2 1 0 1 2 3 4 y−f(x) 0 1 2 3 4 5 6 7 8 L(y,f(x)) Squared error Absolute error Huber error 4 3 2 1 0 1 2 3 4 y·f(x) 0 1 2 3 4 5 6 7 8 L(y,f(x)) Zero-one loss Log loss Exponential loss
  • 19. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 20. GBRT in scikit-learn How to use it >>> from sklearn.ensemble import GradientBoostingClassifier >>> from sklearn.datasets import make_hastie_10_2 >>> X, y = make_hastie_10_2(n_samples=10000) >>> est = GradientBoostingClassifier(n_estimators=200, max_depth=3) >>> est.fit(X, y) ... >>> # get predictions >>> pred = est.predict(X) >>> est.predict_proba(X)[0] # class probabilities array([ 0.67, 0.33]) Implementation • Written in pure Python/Numpy (easy to extend). • Builds on top of sklearn.tree.DecisionTreeRegressor (Cython). • Custom node splitter that uses pre-sorting (better for shallow trees).
  • 21. Example from sklearn.ensemble import GradientBoostingRegressor est = GradientBoostingRegressor(n_estimators=2000, max_depth=1).fit(X, y) for pred in est.staged_predict(X): plt.plot(X[:, 0], pred, color=’r’, alpha=0.1) 0 2 4 6 8 10 x 8 6 4 2 0 2 4 6 8 10 y High bias - low variance Low bias - high variance ground truth RT d=1 RT d=3 GBRT d=1
  • 22. Model complexity & Overfitting test_score = np.empty(len(est.estimators_)) for i, pred in enumerate(est.staged_predict(X_test)): test_score[i] = est.loss_(y_test, pred) plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’) plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’) 0 200 400 600 800 1000 n_estimators 0.0 0.5 1.0 1.5 2.0 Error Lowest test error train-test gap Test Train
  • 23. Model complexity & Overfitting test_score = np.empty(len(est.estimators_)) for i, pred in enumerate(est.staged_predict(X_test)): test_score[i] = est.loss_(y_test, pred) plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’) plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’) 0 200 400 600 800 1000 n_estimators 0.0 0.5 1.0 1.5 2.0 Error Lowest test error train-test gap Test Train Regularization GBRT provides a number of knobs to control overfitting • Tree structure • Shrinkage • Stochastic Gradient Boosting
  • 24. Regularization: Tree structure • The max depth of the trees controls the degree of features interactions • Use min samples leaf to have a sufficient nr. of samples per leaf.
  • 25. Regularization: Shrinkage • Slow learning by shrinking tree predictions with 0 < learning rate <= 1 • Lower learning rate requires higher n estimators 0 200 400 600 800 1000 n_estimators 0.0 0.5 1.0 1.5 2.0 Error Requires more trees Lower test error Test Train Test learning_rate=0.1 Train learning_rate=0.1
  • 26. Regularization: Stochastic Gradient Boosting • Samples: random subset of the training set (subsample) • Features: random subset of features (max features) • Improved accuracy – reduced runtime 0 200 400 600 800 1000 n_estimators 0.0 0.5 1.0 1.5 2.0 Error Even lower test error Subsample alone does poorly Train Test Train subsample=0.5, learning_rate=0.1 Test subsample=0.5, learning_rate=0.1
  • 27. Hyperparameter tuning 1. Set n estimators as high as possible (eg. 3000) 2. Tune hyperparameters via grid search. from sklearn.grid_search import GridSearchCV param_grid = {’learning_rate’: [0.1, 0.05, 0.02, 0.01], ’max_depth’: [4, 6], ’min_samples_leaf’: [3, 5, 9, 17], ’max_features’: [1.0, 0.3, 0.1]} est = GradientBoostingRegressor(n_estimators=3000) gs_cv = GridSearchCV(est, param_grid).fit(X, y) # best hyperparameter setting gs_cv.best_params_ 3. Finally, set n estimators even higher and tune learning rate.
  • 28. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 29. Case Study California Housing dataset • Predict log(medianHouseValue) • Block groups in 1990 census • 20.640 groups with 8 features (median income, median age, lat, lon, ...) • Evaluation: Mean absolute error on 80/20 split Challenges • Heterogeneous features • Non-linear interactions
  • 30. Predictive accuracy & runtime Train time [s] Test time [ms] MAE Mean - - 0.4635 Ridge 0.006 0.11 0.2756 SVR 28.0 2000.00 0.1888 RF 26.3 605.00 0.1620 GBRT 192.0 439.00 0.1438 0 500 1000 1500 2000 2500 3000 n_estimators 0.0 0.1 0.2 0.3 0.4 0.5 error Test Train
  • 31. Model interpretation Which features are important? >>> est.feature_importances_ array([ 0.01, 0.38, ...]) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 Relative importance HouseAge Population AveBedrms Latitude AveOccup Longitude AveRooms MedInc
  • 32. Model interpretation What is the effect of a feature on the response? from sklearn.ensemble import partial_dependence import as pd features = [’MedInc’, ’AveOccup’, ’HouseAge’, ’AveRooms’, (’AveOccup’, ’HouseAge’)] fig, axs = pd.plot_partial_dependence(est, X_train, features, feature_names=names) 1.5 3.0 4.5 6.0 7.5 MedInc 0.4 0.2 0.0 0.2 0.4 0.6 Partialdependence 2.0 2.5 3.0 3.54.0 4.5 AveOccup 0.4 0.2 0.0 0.2 0.4 0.6 Partialdependence 10 20 30 40 50 60 HouseAge 0.4 0.2 0.0 0.2 0.4 0.6 Partialdependence 4 5 6 7 8 AveRooms 0.4 0.2 0.0 0.2 0.4 0.6 Partialdependence 2.0 2.5 3.0 3.5 4.0 AveOccup 10 20 30 40 50 HouseAge -0.12 -0.05 0.02 0.090.16 0.23 Partial dependence of house value on nonlocation features for the California housing dataset
  • 33. Model interpretation Automatically detects spatial effects longitude latitude -1.54 -1.22 -0.91 -0.60 -0.28 0.03 0.34 0.66 0.97 partialdep.onmedianhousevalue longitudelatitude -0.15 -0.07 0.01 0.09 0.17 0.25 0.33 0.41 0.49 0.57 partialdep.onmedianhousevalue
  • 34. Summary • Flexible non-parametric classification and regression technique • Applicable to a variety of problems • Solid, battle-worn implementation in scikit-learn
  • 37. Tipps & Tricks 1 Input layout Use dtype=np.float32 to avoid memory copies and fortan layout for slight runtime benefit. X = np.asfortranarray(X, dtype=np.float32)
  • 38. Tipps & Tricks 2 Feature interactions GBRT automatically detects feature interactions but often explicit interactions help. Trees required to approximate X1 − X2: 10 (left), 1000 (right). x 0.00.20.40.60.81.0 y 0.0 0.2 0.4 0.6 0.8 1.0 x-y 0.3 0.2 0.1 0.0 0.1 0.2 0.3 x 0.00.20.40.60.81.0 y 0.0 0.2 0.4 0.6 0.8 1.0 x-y 1.0 0.5 0.0 0.5 1.0
  • 39. Tipps & Tricks 3 Categorical variables Sklearn requires that categorical variables are encoded as numerics. Tree-based methods work well with ordinal encoding: df = pd.DataFrame(data={’icao’: [’CRJ2’, ’A380’, ’B737’, ’B737’]}) # ordinal encoding df_enc = pd.DataFrame(data={’icao’: np.unique(df.icao, return_inverse=True)[1]}) X = np.asfortranarray(df_enc.values, dtype=np.float32)